Package Api Documentation for pylexique
API Reference for the classes in pylexique.pylexique.py
Main module of pylexique.
- class pylexique.pylexique.Lexique383(lexique_path: Optional[str] = None, parser_type: str = 'csv')[source]
Bases :
object
This is the class handling the lexique database. It provides methods for interacting with the Lexique DB and retrieve lexical items. All the lexical items are then stored in an Ordered Dict.
- Paramètres:
lexique_path – string. Path to the lexique file.
parser_type – string. “pandas_csv” and “csv” are valid values. “csv” is the default value.
- Variables:
lexique – Dictionary containing all the LexicalItem objects indexed by orthography.
lemmes – Dictionary containing all the LexicalItem objects indexed by lemma.
anagrams – Dictionary containing all the LexicalItem objects indexed by anagram form.
- static _parse_csv(lexique_path: str) Generator[list, Any, None] [source]
- Paramètres:
lexique_path – string. Path to the lexique file.
- Renvoie:
generator of rows: Content of the Lexique38x database.
- _parse_lexique(lexique_path: str, parser_type: str) None [source]
- Parses the given lexique file and creates 2 hash tables to store the data.
- Paramètres:
lexique_path – string. Path to the lexique file.
parser_type – string. Can be either “csv”, “pandas_csv”.
- Renvoie:
- _create_db(lexicon: Generator[list, Any, None]) None [source]
- Creates 2 hash tables populated with the entries in lexique if it does not exist yet.One hash table holds the LexItems, the other holds the same data but grouped by lemmma to give access to all lexical forms of a word.
- Paramètres:
lexicon – Iterable. Iterable containing the lexique383 entries.
- Renvoie:
- _convert_entries(row_fields: Union[List[str], List[Union[str, float, int, bool]]]) Tuple[str, str, str, str, str, str, float, float, float, float, str, int, int, bool, int, int, str, str, int, int, int, int, str, int, str, str, str, str, str, float, int, float, float, str, int] [source]
- Convert entries from strings to int, bool or float and generatesa new list with typed entries.
- Paramètres:
row_fields – List of column entries representing a row.
- Renvoie:
ConvertedRow: List of typed column entries representing a typed row.
- get_lex(words: Union[Tuple[str, ...], str]) Dict[str, Union[LexItem, List[LexItem]]] [source]
Recovers the lexical entries for the words in the sequence
- Paramètres:
words – A string or a tuple of multiple strings for getting the LexItems for multiple words.
- Renvoie:
Dictionary of LexItems.
- Raises:
TypeError.
- get_all_forms(word: str) List[LexItem] [source]
Gets all lexical forms of a given word.
- Paramètres:
word – String.
- Renvoie:
List of LexItem objects sharing the same root lemma.
- Raises:
ValueError.
- Raises:
TypeError.
- get_anagrams(word: str) List[LexItem] [source]
Gets all lexical forms of a given word.
- Paramètres:
word – String.
- Renvoie:
List of LexItem objects which are anagrams of the given word.
- Raises:
ValueError.
- Raises:
TypeError.
- static _save_errors(errors: Union[List[Tuple[List[Union[str, float, int, bool]], List[str]]], List[DefaultDict[str, List[Dict[str, str]]]]], errors_path: str) None [source]
Saves the mismatched key/values in Lexique383 based on type coercion.
- Paramètres:
errors – List of errors encountered while parsing Lexique38x
errors_path – Path to save the errors.
- Renvoie:
- class pylexique.pylexique.LexItem(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]
Bases :
LexEntryTypes
This class defines the lexical items in Lexique383.It uses slots for memory efficiency.
- class pylexique.pylexique.LexEntryTypes(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]
Bases :
object
Type information about all the lexical attributes in a LexItem object.