aymara package

Submodules

aymara.lima module

The LIMA python bindings.

This python API gives access to the major features of the LIMA linguistic analyzer. To make it easier to handle, it largely reproduces that of spaCy, including parts of the documentation. See the GitHub project for spaCy’s copyright notice.

Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Mr. Best flew to New York on Saturday morning.")
print(doc)

Classes:

Doc Lima Span Token

class aymara.lima.Doc(doc: <Mock name='mock.Doc' id='139876415103312'>)[source]

Bases: object

A document.

This is mainly an iterable of tokens.

Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Give it back! He pleaded.")

TODO Some parts of the API are still not implemented:

compounds The compounds found into the document text by the

CompoundsBuilderFromSyntacticData LIMA pipeline unit List[Compound]

property ents

Iterate over the entites in the document. Returns an iterator yieldingnamed entity Span objects. Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("John Doe lives in New York")
ents = list(doc.ents)
assert ents[0].label == "Person.PERSON"
assert ents[0].text == "John Doe"
Yields:

Entities in the document.

Type:

Span

property lang

Language of the document.

property sents
Iterate over the sentences in the document.

This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise. Example:

sents = list(doc.sents)

import aymara.lima nlp = aymara.lima.Lima() doc = nlp(“This is a sentence. Here’s another…”) sents = list(doc.sents) assert len(sents) == 2 assert [s.root.text for s in sents] == [“is”, “‘s”]

yields:

Sentences in the document.

type:

Span

property text

The original text. :type: str

class aymara.lima.Lima(langs: str = 'fre,eng', pipes: str = 'main,deepud,tfud', user_config_path: str = '', user_resources_path: str = '', meta: Dict[str, str] = {})[source]

Bases: object

A text-processing pipeline

Usually you’ll load this once per process as nlp and pass the instance around your application. The Lima class is a wrapper around the LimaAnalyzer class which is itself a binding around the C++ classes necessary to analyze text.

Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Give it back! He pleaded.")
print(doc)
analyzeText(text: str, lang: Optional[str] = None, pipeline: Optional[str] = None, meta: Dict[str, str] = {}) str[source]

Analyze the given text in the given language. The lang language must have been initialized when instantiating this object.

Example:

import aymara.lima
nlp = aymara.lima.Lima()
result = nlp.analyzeText("Give it back! He pleaded.")
print(result)
Parameters:
  • text (str) – the text to analyze

  • lang (str) – the language of the text. If none, will backup to the first element of the langs member or to eng if empty (Default value = None).

  • pipeline (str) – the Lima pipeline to use for analysis. If none, will backup to the first element of the pipelines member or to main if empty (Default value = None).

  • meta (Dict[str, str]) – a dict of named metadata values (Default value = an empty dictionary).

Returns:

the content of the text written by the text dumper of Lima if any. An empty string otherwise

Return type:

str

static export_system_conf(dir: Optional[Path] = None, lang: Optional[str] = None) bool[source]

Export LIMA configuration files from the module system path to the given dir in order to be able to easily change configuration files.

If lang is given, only the configuration files concerning this language are exported (NOT IMPLEMENTED).

Use this function to initiate a user configuration. For LIMA to take into account the configuration in the new path, you will have to add it in front of the LIMA_CONF environment variable (or define it if it does not exist).

Please refer to the LIMA documentation for how to configure the analysis:

Example:

import aymara.lima
aymara.lima.Lima.export_system_conf("~/MyLima")
Parameters:
  • dir (pathlib.Path) – the directory were to export the configuration (Default value = None)

  • lang (str) – the language whose configuration must be exported. If None, the whole configuration is exported (Default value = None)

Returns:

True if the configuration is correctly exported and False otherwise.

Return type:

bool

static get_system_paths() Tuple[str, str][source]

Get the system configuration and resoures paths.

Example:

import aymara.lima
aymara.lima.Lima.get_system_paths()
Returns:

the colon (; under Windows) -separated list of the paths that are searched by LIMA to load its configuration files and linguistic resources. This function is useful to understand from which dirs data are loaded to debug configuration errors. It can also be used to know where to put or edit files.

Return type:

Tuple[str, str]

exception aymara.lima.LimaInternalError[source]

Bases: Exception

class aymara.lima.Span(doc, start: int, end: int, label: str = '')[source]

Bases: object

Represents a continuous span of tokens in a Doc.

TODO Some parts of the API are still not implemented

ents The named entities that fall completely within the span. Returns a tuple of

Span objects. Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Mr. Best flew to New York on Saturday morning.")
span = doc[0:6]
ents = list(span.ents)
assert ents[0].label == 346
assert ents[0].label_ == "PERSON"
assert ents[0].text == "Mr. Best"

Name Description RETURNS Entities in the span, one Span per entity. Tuple[Span, …]

sent The sentence span that this span is a part of.

This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise.

If the span happens to cross sentence boundaries, only the first sentence will be returned. If it is required that the sentence always includes the full span, the result can be adjusted as such:

sent = span.sent sent = doc[sent.start : max(sent.end, span.end)]

Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Give it back! He pleaded.")
span = doc[1:3]
assert span.sent.text == "Give it back!"

Span

sents Returns a generator over the sentences the span belongs to.

This property is only available when sentence boundaries have been set on the document by the pipeline. It will raise an error otherwise.

If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned. Example:

import aymara.lima
nlp = aymara.lima.Lima()
doc = nlp("Give it back! He pleaded.")
span = doc[2:4]
assert len(span.sents) == 2

Iterable[Span]

property doc

The parent document.

property end

The token offset for the end of the span.

property end_char

The character offset for the end of the span.

property label

A label to attach to the span, e.g. for named entities.

property start

The token offset for the start of the span.

property start_char

The character offset for the start of the span.

property text

A string representation of the span text.

class aymara.lima.Token(token: <Mock name='mock.Token' id='139876415103056'>)[source]

Bases: object

A token

TODO Some parts of the API are still not implemented

sent The sentence span that this token is a part of. Span

lang Language of the parent document’s vocabulary. str

property dep

Syntactic dependency relation.

property ent_iob

IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and “” means no entity tag is set.

property ent_type

Named entity type.

property features

Morphlogical features of this token .

property head

The syntactic parent, or “governor”, of this token.

property i

The index of this token in its parent document.

property idx

Position of this token in its document text.

property is_alpha

Does the token consist of alphabetic characters? Equivalent to token.text.isalpha().

property is_bracket

Is the token a bracket?

property is_digit

Does the token consist of digits? Equivalent to token.text.isdigit().

property is_lower

Is the token in lowercase? Equivalent to token.text.islower().

property is_punct

Is the token punctuation?

property is_quote

Is the token a quotation mark?

property is_sent_end

Does the token end a sentence? bool or None if unknown.

property is_sent_start

Does the token start a sentence? bool or None if unknown. Default value = True for the first token in the Doc. TODO: implement for sentences other than the first one.

property is_space

Does the token consist of whitespace characters? Equivalent to token.text.isspace(). Should always be False in LIMA as there is no space tokens

property is_upper

Is the token in lowercase? Equivalent to token.text.isupper().

property lemma

The token lemma.

property pos

Coarse-grained part-of-speech from the Universal POS tag set.

property t_status

The tokenization status of this token. Can also be explored with the is_* properties. The possible values are:

t_alphanumeric
t_abbrev
t_acronym
t_capital
t_capital_1st
t_capital_small
t_cardinal_roman
t_comma_number
t_dot_number
t_fraction
t_integer
t_ordinal_integer
t_ordinal_roman
t_sentence_brk
t_small
t_word_brk
property text

The original text of the token.

aymara.lima_models module

aymara.lima_models.info() None[source]

Print the mapping between language codes and language names

aymara.lima_models.install_language(language: str, dest: Optional[str] = None, select: Optional[List[str]] = None, force: bool = False) bool[source]

Install models for the given language.

Parameters:
  • language – str: the language to install

  • dest – str: the directory where to save the language data. Use a system default if None (Default value = None)

  • select – List[str]: the language submodels to install, a list of strings from “tokenizer”, “morphosyntax” and “lemmatizer”. If None, all will be installed (Default value = None)

  • force – bool: if False, only models not already present are installed. Otherwise, they are replaced by new ones. (Default value = False)

Returns:

True if installation is successful and False otherwise.

Return type:

bool

aymara.lima_models.list_installed_models(dest: Optional[str] = None) None[source]

Print the list of the models currently available in dest or in a default directory if dest is None.

Parameters:

dest – the directory where to search installed models. a default

directory will be used if dest is None. (Default value = None) :type dest: str

aymara.lima_models.load_lang_list_()[source]
aymara.lima_models.remove_language(language: str, dest: Optional[str] = None, force: bool = False) bool[source]

Remove all the resources for a language from the system. Confirmation is asked by default before removing anything.

Parameters:
  • language – str: the language to remove

  • dest – str: if given, remove from this directory. Otherwise, search in default directories (Default value = None)

  • force – bool: If False, confirmation will be asked before removing the

language (Default value = False) :return: True if removing is successful and Fales otherwise. :rtype: bool

aymara.version module

Module contents