Package Api Documentation for pylexique
API Reference for the classes in pylexique.pylexique.py
Main module of pylexique.
- class pylexique.pylexique.Lexique383(lexique_path: Optional[str] = None, parser_type: str = 'csv')[source]
Bases:
object
This is the class handling the lexique database. It provides methods for interacting with the Lexique DB and retrieve lexical items. All the lexical items are then stored in an Ordered Dict.
- Parameters
lexique_path – string. Path to the lexique file.
parser_type – string. ‘pandas_csv’ and ‘csv’ are valid values. ‘csv’ is the default value.
- Variables
lexique – Dictionary containing all the LexicalItem objects indexed by orthography.
lemmes – Dictionary containing all the LexicalItem objects indexed by lemma.
anagrams – Dictionary containing all the LexicalItem objects indexed by anagram form.
- static _parse_csv(lexique_path: str) Generator[list, Any, None] [source]
- Parameters
lexique_path – string. Path to the lexique file.
- Returns
generator of rows: Content of the Lexique38x database.
- _parse_lexique(lexique_path: str, parser_type: str) None [source]
- Parses the given lexique file and creates 2 hash tables to store the data.
- Parameters
lexique_path – string. Path to the lexique file.
parser_type – string. Can be either ‘csv’, ‘pandas_csv’.
- Returns
- _create_db(lexicon: Generator[list, Any, None]) None [source]
- Creates 2 hash tables populated with the entries in lexique if it does not exist yet.One hash table holds the LexItems, the other holds the same data but grouped by lemmma to give access to all lexical forms of a word.
- Parameters
lexicon – Iterable. Iterable containing the lexique383 entries.
- Returns
- _convert_entries(row_fields: Union[List[str], List[Union[str, float, int, bool]]]) Tuple[str, str, str, str, str, str, float, float, float, float, str, int, int, bool, int, int, str, str, int, int, int, int, str, int, str, str, str, str, str, float, int, float, float, str, int] [source]
- Convert entries from strings to int, bool or float and generatesa new list with typed entries.
- Parameters
row_fields – List of column entries representing a row.
- Returns
ConvertedRow: List of typed column entries representing a typed row.
- get_lex(words: Union[Tuple[str, ...], str]) Dict[str, Union[pylexique.pylexique.LexItem, List[pylexique.pylexique.LexItem]]] [source]
Recovers the lexical entries for the words in the sequence
- Parameters
words – A string or a tuple of multiple strings for getting the LexItems for multiple words.
- Returns
Dictionary of LexItems.
- Raises
TypeError.
- get_all_forms(word: str) List[pylexique.pylexique.LexItem] [source]
Gets all lexical forms of a given word.
- Parameters
word – String.
- Returns
List of LexItem objects sharing the same root lemma.
- Raises
ValueError.
- Raises
TypeError.
- get_anagrams(word: str) List[pylexique.pylexique.LexItem] [source]
Gets all lexical forms of a given word.
- Parameters
word – String.
- Returns
List of LexItem objects which are anagrams of the given word.
- Raises
ValueError.
- Raises
TypeError.
- static _save_errors(errors: Union[List[Tuple[List[Union[str, float, int, bool]], List[str]]], List[DefaultDict[str, List[Dict[str, str]]]]], errors_path: str) None [source]
Saves the mismatched key/values in Lexique383 based on type coercion.
- Parameters
errors – List of errors encountered while parsing Lexique38x
errors_path – Path to save the errors.
- Returns
- class pylexique.pylexique.LexItem(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]
Bases:
pylexique.pylexique.LexEntryTypes
This class defines the lexical items in Lexique383.It uses slots for memory efficiency.
- class pylexique.pylexique.LexEntryTypes(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]
Bases:
object
Type information about all the lexical attributes in a LexItem object.