Package Api Documentation for pylexique

API Reference for the classes in pylexique.pylexique.py

Main module of pylexique.

class pylexique.pylexique.Lexique383(lexique_path: Optional[str] = None, parser_type: str = 'csv')[source]

Bases: object

This is the class handling the lexique database. It provides methods for interacting with the Lexique DB and retrieve lexical items. All the lexical items are then stored in an Ordered Dict.

Parameters
  • lexique_path – string. Path to the lexique file.

  • parser_type – string. ‘pandas_csv’, ‘csv’ and ‘xlsb’ are valid values. ‘csv’ is the default value.

static _parse_csv(lexique_path: str) Generator[list, Any, None][source]
Parameters

lexique_path – string. Path to the lexique file.

Returns

generator of rows: Content of the Lexique38x database.

_parse_lexique(lexique_path: str, parser_type: str) None[source]
Parses the given lexique file and creates 2 hash tables to store the data.
Parameters
  • lexique_path – string. Path to the lexique file.

  • parser_type – string. Can be either ‘csv’, ‘pandas_csv’, ‘std_csv’ or ‘xlsb’.

Returns

_create_db(lexicon: Generator[list, Any, None]) None[source]
Creates 2 hash tables populated with the entries in lexique if it does not exist yet.
One hash table holds the LexItems, the other holds the same data but grouped by lemmma to give access to all lexical forms of a word.
Parameters

lexicon – Iterable. Iterable containing the lexique383 entries.

Returns

_convert_entries(row_fields: Union[List[str], List[Union[str, float, int, bool]]]) Tuple[str, str, str, str, str, str, float, float, float, float, str, int, int, bool, int, int, str, str, int, int, int, int, str, int, str, str, str, str, str, float, int, float, float, str, int][source]
Convert entries from strings to int, bool or float and generates
a new list with typed entries.
Parameters

row_fields – List of column entries representing a row.

Returns

ConvertedRow: List of typed column entries representing a typed row.

get_lex(words: Union[Tuple[str, ...], str]) Dict[str, Union[pylexique.pylexique.LexItem, List[pylexique.pylexique.LexItem]]][source]

Recovers the lexical entries for the words in the sequence

Parameters

words – A string or a tuple of multiple strings for getting the LexItems for multiple words.

Returns

Dictionary of LexItems.

get_all_forms(word: str) List[pylexique.pylexique.LexItem][source]

Gets all lexical forms of a given word.

Parameters

word – String.

Returns

List of LexItem objects sharing the same root lemma.

static _save_errors(errors: Union[List[Tuple[List[Union[str, float, int, bool]], List[str]]], List[DefaultDict[str, List[Dict[str, str]]]]], errors_path: str) None[source]

Saves the mismatched key/values in Lexique383 based on type coercion.

Parameters
  • errors – List of errors encountered while parsing Lexique38x

  • errors_path – Path to save the errors.

Returns

class pylexique.pylexique.LexItem(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]

Bases: pylexique.pylexique.LexEntryTypes

This class defines the lexical items in Lexique383.
It uses slots for memory efficiency.
to_dict() Dict[str, Union[str, float, int, bool]][source]
Converts the LexItem to a dict containing its attributes and their values
Returns

OrderedDict. Dictionary with key/values correspondence wit LexItem objects.

class pylexique.pylexique.LexEntryTypes(ortho: str, phon: str, lemme: str, cgram: str, genre: str, nombre: str, freqlemfilms2: float, freqlemlivres: float, freqfilms2: float, freqlivres: float, infover: str, nbhomogr: int, nbhomoph: int, islem: bool, nblettres: int, nbphons: int, cvcv: str, p_cvcv: str, voisorth: int, voisphon: int, puorth: int, puphon: int, syll: str, nbsyll: int, cv_cv: str, orthrenv: str, phonrenv: str, orthosyll: str, cgramortho: str, deflem: float, defobs: int, old20: float, pld20: float, morphoder: str, nbmorph: int)[source]

Bases: object

Type information about all the lexical attributes in a LexItem object.

API Reference for the classes in pylexique.utils.py