API Reference

Abstract Base Class: Script

class potnia.script.Script(config: str)

Bases: object

The abstract base class for handling text transliteration and unicode conversion.

config

Path to the configuration file or configuration data in YAML format.

Type:: str

config: str

regularize(string: str) → str

Applies regularization rules to a given string.

Parameters:: string (str) – Text string to be regularized.
Returns:: Regularized text string.
Return type:: str

to_transliteration(text: str) → str

Converts unicode text to transliteration format.

NB. This function may not work as expected for all scripts/languages because there may not be a one-to-one mapping between unicode and transliteration.

Parameters:: text (str) – Input text in unicode format.
Returns:: Transliterated text.
Return type:: str

to_unicode(text: str, regularize: bool = False) → str

Converts transliterated text to unicode format.

Parameters:

text (str) – Input text in transliterated format.
regularize (bool, optional) – Whether to apply regularization. Defaults to False.

Returns:

Text converted to unicode format, optionally regularized.

Return type:

str

tokenize_transliteration(text: str) → list[str]

Tokenizes transliterated text according to specific patterns.

Parameters:: text (str) – Input text in transliterated format.
Returns:: List of tokens
Return type:: list[str]

tokenize_unicode(text: str) → list[str]

Tokenizes unicode text according to specific patterns.

By default, it tokenizes each character as a separate token. This method can be overridden in subclasses to provide more complex tokenization.

Parameters:: text (str) – Input text in unicode format.
Returns:: List of tokens
Return type:: list[str]

Scripts Available

Linear A

class potnia.scripts.linear_a.LinearA(config: str = 'linear_a.yaml')

Class for handling text transliteration and unicode conversion for Linear A.

To use the singleton instance, import like so: from potnia import linear_a

config

Path to the configuration file or configuration data in string format. By default, it uses the ‘linear_a.yaml file in the ‘data’ directory.

Type:: str

config: str = 'linear_a.yaml'

regularize(string: str) → str

Applies regularization rules to a given string.

Parameters:: string (str) – Text string to be regularized.
Returns:: Regularized text string.
Return type:: str

to_transliteration(text: str) → str

Converts unicode text to transliteration format.

NB. This function may not work as expected for all scripts/languages because there may not be a one-to-one mapping between unicode and transliteration.

Parameters:: text (str) – Input text in unicode format.
Returns:: Transliterated text.
Return type:: str

to_unicode(text: str, regularize: bool = False) → str

Converts transliterated text to unicode format.

Parameters:

text (str) – Input text in transliterated format.
regularize (bool, optional) – Whether to apply regularization. Defaults to False.

Returns:

Text converted to unicode format, optionally regularized.

Return type:

str

tokenize_transliteration(input_string: str) → list[str]

Tokenizes transliterated text according to specific patterns.

Parameters:: text (str) – Input text in transliterated format.
Returns:: List of tokens
Return type:: list[str]

tokenize_unicode(text: str) → list[str]

Tokenizes a unicode string by splitting and joining words with dashes.

Parameters:: text (str) – Input text in unicode format.
Returns:: List of tokenized strings.
Return type:: list[str]

Linear B

class potnia.scripts.linear_b.LinearB(config: str = 'linear_b')

Class for handling text transliteration and unicode conversion for Linear B.

To use the singleton instance, import like so: from potnia import linear_b

Designed especially for texts from DĀMOS (Database of Mycenaean at Oslo): https://damos.hf.uio.no/ and LiBER (Linear B Electronic Resources): https://liber.cnr.it/

config

Path to the configuration file or configuration data in string format. By default, it uses the ‘linear_a.yaml file in the ‘data’ directory.

Type:: str

config: str = 'linear_b'

regularize(text: str) → str

Applies regularization rules to a given string.

Parameters:: string (str) – Text string to be regularized.
Returns:: Regularized text string.
Return type:: str

to_transliteration(text: str) → str

Converts unicode text to transliteration format.

NB. This function may not work as expected for all scripts/languages because there may not be a one-to-one mapping between unicode and transliteration.

Parameters:: text (str) – Input text in unicode format.
Returns:: Transliterated text.
Return type:: str

to_unicode(text: str, regularize: bool = False) → str

Converts transliterated text to unicode format.

Parameters:

text (str) – Input text in transliterated format.
regularize (bool, optional) – Whether to apply regularization. Defaults to False.

Returns:

Text converted to unicode format, optionally regularized.

Return type:

str

tokenize_transliteration(text: str) → list[str]

Tokenizes transliterated text according to specific patterns.

Parameters:: text (str) – Input text in transliterated format.
Returns:: List of tokens
Return type:: list[str]

tokenize_unicode(text: str) → list[str]

Tokenizes a unicode string by splitting and joining words with dashes.

Parameters:: text (str) – Input text in unicode format.
Returns:: List of tokenized strings.
Return type:: list[str]

Arabic

class potnia.scripts.arabic.Arabic(config: str = 'arabic')

Class for handling text transliteration and unicode conversion to Arabic.

To use the singleton instance, import like so: from potnia import arabic

Uses the DIN 31635 standard for Arabic transliteration.

If you need the Tim Buckwalter transliteration system, then use the PyArabic library.

config

Path to the configuration file or configuration data in string format. By default, it uses the ‘arabic.yaml file in the ‘data’ directory.

Type:: str

config: str = 'arabic'

regularize(string: str) → str

Applies regularization rules to a given string.

Parameters:: string (str) – Text string to be regularized.
Returns:: Regularized text string.
Return type:: str

to_transliteration(text: str) → str

Converts unicode text to transliteration format.

NB. This function may not work as expected for all scripts/languages because there may not be a one-to-one mapping between unicode and transliteration.

Parameters:: text (str) – Input text in unicode format.
Returns:: Transliterated text.
Return type:: str

to_unicode(text: str, regularize: bool = False) → str

Converts transliterated text to unicode format.

Parameters:

text (str) – Input text in transliterated format.
regularize (bool, optional) – Whether to apply regularization. Defaults to False.

Returns:

Text converted to unicode format, optionally regularized.

Return type:

str

tokenize_transliteration(text: str) → list[str]

Tokenizes transliterated text according to specific patterns.

Parameters:: text (str) – Input text in transliterated format.
Returns:: List of tokens
Return type:: list[str]

tokenize_unicode(text: str) → list[str]

Tokenizes unicode text according to specific patterns.

By default, it tokenizes each character as a separate token. This method can be overridden in subclasses to provide more complex tokenization.

Parameters:: text (str) – Input text in unicode format.
Returns:: List of tokens
Return type:: list[str]

Hittite

class potnia.scripts.hittite.Hittite(config: str = 'hittite')

Class for handling text transliteration and unicode conversion to Hittite.

To use the singleton instance, import like so: from potnia import hittite

Designed especially for texts from the Catalog der Texte der Hethiter (CTH): https://www.hethport.uni-wuerzburg.de/CTH/index.php

config

Path to the configuration file or configuration data in string format. By default, it uses the ‘hittite.yaml file in the ‘data’ directory.

Type:: str

config: str = 'hittite'

regularize(string: str) → str

Applies regularization rules to a given string.

Parameters:: string (str) – Text string to be regularized.
Returns:: Regularized text string.
Return type:: str

to_transliteration(text: str) → str

Converts unicode text to transliteration format.

NB. This function may not work as expected for all scripts/languages because there may not be a one-to-one mapping between unicode and transliteration.

Parameters:: text (str) – Input text in unicode format.
Returns:: Transliterated text.
Return type:: str

to_unicode(text: str, regularize: bool = False) → str

Converts transliterated text to unicode format.

Parameters:

text (str) – Input text in transliterated format.
regularize (bool, optional) – Whether to apply regularization. Defaults to False.

Returns:

Text converted to unicode format, optionally regularized.

Return type:

str

tokenize_transliteration(input_string: str) → list[str]

Tokenizes transliterated text according to specific patterns.

Parameters:: text (str) – Input text in transliterated format.
Returns:: List of tokens
Return type:: list[str]

tokenize_unicode(text: str) → list[str]

Tokenizes unicode text according to specific patterns.

By default, it tokenizes each character as a separate token. This method can be overridden in subclasses to provide more complex tokenization.

Parameters:: text (str) – Input text in unicode format.
Returns:: List of tokens
Return type:: list[str]