Morphological analyzer for Russian language.
For a given word it can find all possible inflectional paradigms and thus compute all possible tags and normal forms.
Analyzer uses morphological word features and a lexicon (dictionary compiled from XML available at OpenCorpora.org); for unknown words heuristic algorithm is used.
Create a MorphAnalyzer object:
>>> import pymorphy2
>>> morph = pymorphy2.MorphAnalyzer()
MorphAnalyzer uses dictionaries from pymorphy2-dicts package (which can be installed via pip install pymorphy2-dicts).
Alternatively (e.g. if you have your own precompiled dictionaries), either create PYMORPHY2_DICT_PATH environment variable with a path to dictionaries, or pass path argument to pymorphy2.MorphAnalyzer constructor:
>>> morph = pymorphy2.MorphAnalyzer('/path/to/dictionaries')
By default, methods of this class return parsing results as namedtuples Parse. This has performance implications under CPython, so if you need maximum speed then pass result_type=None to make analyzer return plain unwrapped tuples:
>>> morph = pymorphy2.MorphAnalyzer(result_type=None)
Return a list of parsed words that are closest to word and have all required_grammemes.
Return an iterator over parses of dictionary words that starts with a given prefix (default empty prefix means “all words”).
Analyze the word and return a list of Parse namedtuples:
Parse(word, tag, normal_form, para_id, idx, _estimate)
(or plain tuples if result_type=None was used in constructor).
Check if a word is in the dictionary. Pass strict_ee=True if word is guaranteed to have correct е/ё letters.
Примечание
Dictionary words are not always correct words; the dictionary also contains incorrect forms which are commonly used. So for spellchecking tasks this method should be used with extra care.
Utils for working with grammatical tags.
Wrapper class for OpenCorpora.org tags.
Предупреждение
In order to work properly, the class has to be globally initialized with actual grammemes (using _init_grammemes method).
Pymorphy2 initializes it when loading a dictionary; it may be not a good idea to use this class directly. If possible, use morph_analyzer.TagClass instead.
Example:
>>> from pymorphy2 import MorphAnalyzer
>>> morph = MorphAnalyzer()
>>> Tag = morph.TagClass # get an initialzed Tag class
>>> tag = Tag('VERB,perf,tran plur,impr,excl')
>>> tag
OpencorporaTag('VERB,perf,tran plur,impr,excl')
Tag instances have attributes for accessing grammemes:
>>> print(tag.POS)
VERB
>>> print(tag.number)
plur
>>> print(tag.case)
None
Available attributes are: POS, animacy, aspect, case, gender, involvement, mood, number, person, tense, transitivity and voice.
You may check if a grammeme is in tag or if all grammemes from a given set are in tag:
>>> 'perf' in tag
True
>>> 'nomn' in tag
False
>>> 'Geox' in tag
False
>>> set(['VERB', 'perf']) in tag
True
>>> set(['VERB', 'perf', 'sing']) in tag
False
In order to fight typos, for unknown grammemes an exception is raised:
>>> 'foobar' in tag
Traceback (most recent call last):
...
ValueError: Grammeme is unknown: foobar
>>> set(['NOUN', 'foo', 'bar']) in tag
Traceback (most recent call last):
...
ValueError: Grammemes are unknown: {'bar', 'foo'}
This also works for attributes:
>>> tag.POS == 'plur'
Traceback (most recent call last):
...
ValueError: 'plur' is not a valid grammeme for this attribute.
A frozenset with grammemes for this tag.
Return a new set of grammemes with required grammemes added and incompatible grammemes removed.
Usage:
pymorphy dict compile <XML_FILE> [--out <PATH>] [--force] [--verbose] [--min_ending_freq <NUM>] [--min_paradigm_popularity <NUM>] [--max_suffix_length <NUM>]
pymorphy dict download_xml <OUT_FILE> [--verbose]
pymorphy dict mem_usage [--dict <PATH>] [--verbose]
pymorphy dict make_test_suite <XML_FILE> <OUT_FILE> [--limit <NUM>] [--verbose]
pymorphy dict meta [--dict <PATH>]
pymorphy _parse <IN_FILE> <OUT_FILE> [--dict <PATH>] [--verbose]
pymorphy -h | --help
pymorphy --version
Options:
-v --verbose Be more verbose
-f --force Overwrite target folder
-o --out <PATH> Output folder name [default: dict]
--limit <NUM> Min. number of words per gram. tag [default: 100]
--min_ending_freq <NUM> Prediction: min. number of suffix occurances [default: 2]
--min_paradigm_popularity <NUM> Prediction: min. number of lexemes for the paradigm [default: 3]
--max_suffix_length <NUM> Prediction: max. length of prediction suffixes [default: 5]
--dict <PATH> Dictionary folder path
pymorphy2.opencorpora_dict.parse is a module for OpenCorpora XML dictionaries parsing.
ParsedDictionary(lexemes, links, grammemes, version, revision)
Alias for field number 2
Alias for field number 0
Alias for field number 1
Alias for field number 4
Alias for field number 3
Parse OpenCorpora dict XML and return a ParsedDictionary namedtuple.
pymorphy2.opencorpora_dict.compile is a module for converting OpenCorpora dictionaries to pymorphy2 representation.
CompiledDictionary(gramtab, suffixes, paradigms, words_dawg, prediction_suffixes_dawgs, parsed_dict, prediction_options)
Alias for field number 0
Alias for field number 2
Alias for field number 5
Alias for field number 6
Alias for field number 4
Alias for field number 1
Alias for field number 3
Return compacted dictionary data.
Convert a dictionary from OpenCorpora XML format to Pymorphy2 compacted format.
out_path should be a name of folder where to put dictionaries.
pymorphy2.opencorpora_dict.storage is a module for saving and loading pymorphy2 dictionaries.
LoadedDictionary(meta, gramtab, suffixes, paradigms, words, prediction_prefixes, prediction_suffixes_dawgs, Tag, paradigm_prefixes)
Alias for field number 7
Alias for field number 1
Alias for field number 0
Alias for field number 8
Alias for field number 3
Alias for field number 5
Alias for field number 6
Alias for field number 2
Alias for field number 4
Return an iterable with all possible combinations of items from it:
>>> for comb in combinations_of_all_lengths('ABC'):
... print("".join(comb))
A
B
C
AB
AC
BC
ABC
Download a bz2-encoded file from url and write it to out_fp file.
Read an object from a json file filename
Create file filename with obj serialized to JSON
Find a group of largest elements (according to key).
>>> s = [-4, 3, 5, 7, 4, -7]
>>> largest_group(s, abs)
[7, -7]