aymara package
Submodules
aymara.lima module
The LIMA python bindings.
This python API gives access to the major features of the LIMA linguistic analyzer. To make it easier to handle, it largely reproduces that of spaCy, including parts of the documentation. See the GitHub project for spaCy’s copyright notice.
Example:
import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Mr. Best flew to New York on Saturday morning.")
print(doc)
Classes:
Doc Lima Span Token
- class aymara.lima.Doc(doc: <Mock name='mock.Doc' id='139876415103312'>)[source]
Bases:
objectA document.
This is mainly an iterable of tokens.
Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("Give it back! He pleaded.")
TODO Some parts of the API are still not implemented:
- compounds The compounds found into the document text by the
CompoundsBuilderFromSyntacticData LIMA pipeline unit List[Compound]
- property ents
Iterate over the entites in the document. Returns an iterator yieldingnamed entity Span objects. Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("John Doe lives in New York") ents = list(doc.ents) assert ents[0].label == "Person.PERSON" assert ents[0].text == "John Doe"
- Yields:
Entities in the document.
- Type:
- property lang
Language of the document.
- property sents
- Iterate over the sentences in the document.
This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise. Example:
- sents = list(doc.sents)
import aymara.lima nlp = aymara.lima.Lima() doc = nlp(“This is a sentence. Here’s another…”) sents = list(doc.sents) assert len(sents) == 2 assert [s.root.text for s in sents] == [“is”, “‘s”]
- yields:
Sentences in the document.
- type:
Span
- property text
The original text. :type: str
- class aymara.lima.Lima(langs: str = 'fre,eng', pipes: str = 'main,deepud,tfud', user_config_path: str = '', user_resources_path: str = '', meta: Dict[str, str] = {})[source]
Bases:
objectA text-processing pipeline
Usually you’ll load this once per process as nlp and pass the instance around your application. The Lima class is a wrapper around the LimaAnalyzer class which is itself a binding around the C++ classes necessary to analyze text.
Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("Give it back! He pleaded.") print(doc)
- analyzeText(text: str, lang: Optional[str] = None, pipeline: Optional[str] = None, meta: Dict[str, str] = {}) str[source]
Analyze the given text in the given language. The lang language must have been initialized when instantiating this object.
Example:
import aymara.lima nlp = aymara.lima.Lima() result = nlp.analyzeText("Give it back! He pleaded.") print(result)
- Parameters:
text (str) – the text to analyze
lang (str) – the language of the text. If none, will backup to the first element of the langs member or to eng if empty (Default value = None).
pipeline (str) – the Lima pipeline to use for analysis. If none, will backup to the first element of the pipelines member or to main if empty (Default value = None).
meta (Dict[str, str]) – a dict of named metadata values (Default value = an empty dictionary).
- Returns:
the content of the text written by the text dumper of Lima if any. An empty string otherwise
- Return type:
- static export_system_conf(dir: Optional[Path] = None, lang: Optional[str] = None) bool[source]
Export LIMA configuration files from the module system path to the given dir in order to be able to easily change configuration files.
If lang is given, only the configuration files concerning this language are exported (NOT IMPLEMENTED).
Use this function to initiate a user configuration. For LIMA to take into account the configuration in the new path, you will have to add it in front of the LIMA_CONF environment variable (or define it if it does not exist).
Please refer to the LIMA documentation for how to configure the analysis:
Example:
import aymara.lima aymara.lima.Lima.export_system_conf("~/MyLima")
- Parameters:
dir (pathlib.Path) – the directory were to export the configuration (Default value = None)
lang (str) – the language whose configuration must be exported. If None, the whole configuration is exported (Default value = None)
- Returns:
True if the configuration is correctly exported and False otherwise.
- Return type:
- static get_system_paths() Tuple[str, str][source]
Get the system configuration and resoures paths.
Example:
import aymara.lima aymara.lima.Lima.get_system_paths()
- Returns:
the colon (; under Windows) -separated list of the paths that are searched by LIMA to load its configuration files and linguistic resources. This function is useful to understand from which dirs data are loaded to debug configuration errors. It can also be used to know where to put or edit files.
- Return type:
- class aymara.lima.Span(doc, start: int, end: int, label: str = '')[source]
Bases:
objectRepresents a continuous span of tokens in a Doc.
TODO Some parts of the API are still not implemented
- ents The named entities that fall completely within the span. Returns a tuple of
Span objects. Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("Mr. Best flew to New York on Saturday morning.") span = doc[0:6] ents = list(span.ents) assert ents[0].label == 346 assert ents[0].label_ == "PERSON" assert ents[0].text == "Mr. Best"
Name Description RETURNS Entities in the span, one Span per entity. Tuple[Span, …]
- sent The sentence span that this span is a part of.
This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise.
If the span happens to cross sentence boundaries, only the first sentence will be returned. If it is required that the sentence always includes the full span, the result can be adjusted as such:
sent = span.sent sent = doc[sent.start : max(sent.end, span.end)]
Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("Give it back! He pleaded.") span = doc[1:3] assert span.sent.text == "Give it back!"
Span
- sents Returns a generator over the sentences the span belongs to.
This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise.
If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned. Example:
import aymara.lima nlp = aymara.lima.Lima() doc = nlp("Give it back! He pleaded.") span = doc[2:4] assert len(span.sents) == 2
Iterable[Span]
- property doc
The parent document.
- property end
The token offset for the end of the span.
- property end_char
The character offset for the end of the span.
- property label
A label to attach to the span, e.g. for named entities.
- property start
The token offset for the start of the span.
- property start_char
The character offset for the start of the span.
- property text
A string representation of the span text.
- class aymara.lima.Token(token: <Mock name='mock.Token' id='139876415103056'>)[source]
Bases:
objectA token
TODO Some parts of the API are still not implemented
sent The sentence span that this token is a part of. Span
lang Language of the parent document’s vocabulary. str
- property dep
Syntactic dependency relation.
- property ent_iob
IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and “” means no entity tag is set.
- property ent_type
Named entity type.
- property features
Morphlogical features of this token .
- property head
The syntactic parent, or “governor”, of this token.
- property i
The index of this token in its parent document.
- property idx
Position of this token in its document text.
- property is_alpha
Does the token consist of alphabetic characters? Equivalent to token.text.isalpha().
- property is_bracket
Is the token a bracket?
- property is_digit
Does the token consist of digits? Equivalent to token.text.isdigit().
- property is_lower
Is the token in lowercase? Equivalent to token.text.islower().
- property is_punct
Is the token punctuation?
- property is_quote
Is the token a quotation mark?
- property is_sent_end
Does the token end a sentence? bool or None if unknown.
- property is_sent_start
Does the token start a sentence? bool or None if unknown. Default value = True for the first token in the Doc. TODO: implement for sentences other than the first one.
- property is_space
Does the token consist of whitespace characters? Equivalent to token.text.isspace(). Should always be False in LIMA as there is no space tokens
- property is_upper
Is the token in lowercase? Equivalent to token.text.isupper().
- property lemma
The token lemma.
- property pos
Coarse-grained part-of-speech from the Universal POS tag set.
- property t_status
The tokenization status of this token. Can also be explored with the is_* properties. The possible values are:
t_alphanumeric t_abbrev t_acronym t_capital t_capital_1st t_capital_small t_cardinal_roman t_comma_number t_dot_number t_fraction t_integer t_ordinal_integer t_ordinal_roman t_sentence_brk t_small t_word_brk
- property text
The original text of the token.
aymara.lima_models module
- aymara.lima_models.install_language(language: str, dest: Optional[str] = None, select: Optional[List[str]] = None, force: bool = False) bool[source]
Install models for the given language.
- Parameters:
language – str: the language to install
dest – str: the directory where to save the language data. Use a system default if None (Default value = None)
select – List[str]: the language submodels to install, a list of strings from “tokenizer”, “morphosyntax” and “lemmatizer”. If None, all will be installed (Default value = None)
force – bool: if False, only models not already present are installed. Otherwise, they are replaced by new ones. (Default value = False)
- Returns:
True if installation is successful and False otherwise.
- Return type:
- aymara.lima_models.list_installed_models(dest: Optional[str] = None) None[source]
Print the list of the models currently available in dest or in a default directory if dest is None.
- Parameters:
dest – the directory where to search installed models. a default
directory will be used if dest is None. (Default value = None) :type dest: str
- aymara.lima_models.remove_language(language: str, dest: Optional[str] = None, force: bool = False) bool[source]
Remove all the resources for a language from the system. Confirmation is asked by default before removing anything.
- Parameters:
language – str: the language to remove
dest – str: if given, remove from this directory. Otherwise, search in default directories (Default value = None)
force – bool: If False, confirmation will be asked before removing the
language (Default value = False) :return: True if removing is successful and Fales otherwise. :rtype: bool