HFST - Helsinki Finite-State Transducer Technology - C++ API
version 3.9.1
|
A synchronous finite-state transducer. More...
#include <HfstTransducer.h>
Public Member Functions | |
HFSTDLL bool | compare (const HfstTransducer &another, bool harmonize=true) const |
Whether this transducer and another are equivalent. More... | |
HFSTDLL HfstTransducer & | compose (const HfstTransducer &another, bool harmonize=true) |
Compose this transducer with another. More... | |
HFSTDLL HfstTransducer & | compose_intersect (const HfstTransducerVector &v, bool invert=false, bool harmonize=true) |
Compose this transducer with the intersection of transducers in v. If invert is true, then compose the intersection of the transducers in v with this transducer. More... | |
HFSTDLL HfstTransducer & | concatenate (const HfstTransducer &another, bool harmonize=true) |
Concatenate this transducer with another. More... | |
HFSTDLL HfstTransducer & | convert (ImplementationType type, std::string options="") |
Convert the transducer into an equivalent transducer in format type. More... | |
HFSTDLL HfstTransducer & | cross_product (const HfstTransducer &another, bool harmonize=true) |
Make cross product of this transducer with . It pairs every string of this with every string of . More... | |
HFSTDLL HfstTransducer & | determinize () |
Determinize the transducer. More... | |
HFSTDLL HfstTransducer & | disjunct (const HfstTransducer &another, bool harmonize=true) |
Disjunct this transducer with another. More... | |
HFSTDLL void | extract_paths (HfstTwoLevelPaths &results, int max_num=-1, int cycles=-1) const |
Extract a maximum of max_num paths that are recognized by the transducer following a maximum of cycles cycles and store the paths into results. More... | |
HFSTDLL void | extract_paths_fd (HfstTwoLevelPaths &results, int max_num=-1, int cycles=-1, bool filter_fd=true) const |
Extract a maximum of max_num paths that are recognized by the transducer and are not invalidated by flag diacritic rules following a maximum of cycles cycles and store the paths into results. filter_fd defines whether the flag diacritics themselves are filtered out of the result strings. More... | |
HFSTDLL StringSet | get_alphabet () const |
Get the alphabet of the transducer. More... | |
HFSTDLL StringSet | get_first_input_symbols () const |
Get first input level symbols of strings recognized (or rejected, if they end in a non-final state) by the transducer. More... | |
HFSTDLL std::string | get_name () const |
Get the name of the transducer. More... | |
HFSTDLL const std::map < std::string, std::string > & | get_properties () const |
Get all properties form transducer. More... | |
HFSTDLL std::string | get_property (const std::string &property) const |
Get arbitrary string propert property. get_property("name") works like get_name. More... | |
HFSTDLL ImplementationType | get_type (void) const |
The implementation type of the transducer. More... | |
HFSTDLL void | harmonize (HfstTransducer &another) |
Harmonize transducers this and another. More... | |
HFSTDLL | HfstTransducer () |
Create an uninitialized transducer (use with care). More... | |
HFSTDLL | HfstTransducer (ImplementationType type) |
Create an empty transducer, i.e. a transducer that does not recognize any string. The type of the transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (const std::string &utf8_str, const HfstTokenizer &multichar_symbol_tokenizer, ImplementationType type) |
Create a transducer by tokenizing the utf8 string utf8_string with tokenizer multichar_symbol_tokenizer. The type of the transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (const std::string &input_utf8_str, const std::string &output_utf8_str, const HfstTokenizer &multichar_symbol_tokenizer, ImplementationType type) |
Create a transducer by tokenizing the utf8 input string input_utf8_string and output string output_utf8_string with tokenizer multichar_symbol_tokenizer. The type of the transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (HfstInputStream &in) |
Read a binary transducer from transducer stream in. More... | |
HFSTDLL | HfstTransducer (const HfstTransducer &another) |
Create a deep copy of transducer another. More... | |
HFSTDLL | HfstTransducer (const hfst::implementations::HfstBasicTransducer &t, ImplementationType type) |
Create an HFST transducer equivalent to HFST basic transducer t. The type of the created transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (const std::string &symbol, ImplementationType type) |
Create a transducer that recognizes the string pair <"symbol","symbol">, i.e. [symbol:symbol]. The type of the transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (const std::string &isymbol, const std::string &osymbol, ImplementationType type) |
Create a transducer that recognizes the string pair <"isymbol","osymbol">, i.e [isymbol:osymbol]. The type of the transducer is defined by type. More... | |
HFSTDLL | HfstTransducer (FILE *ifile, ImplementationType type, const std::string &epsilon_symbol, unsigned int &linecount) |
Create a transducer of type type as defined in AT&T format in FILE ifile. epsilon_symbol defines how epsilons are represented. More... | |
HFSTDLL HfstTransducer & | input_project () |
Extract the input language of the transducer. More... | |
HFSTDLL HfstTransducer & | insert_freely (const StringPair &symbol_pair, bool harmonize=true) |
Freely insert symbol pair symbol_pair into the transducer. More... | |
HFSTDLL HfstTransducer & | insert_freely (const HfstTransducer &tr, bool harmonize=true) |
Freely insert a copy of tr into the transducer. More... | |
HFSTDLL void | insert_to_alphabet (const std::string &symbol) |
Explicitly insert symbol to the alphabet of the transducer. More... | |
HFSTDLL HfstTransducer & | intersect (const HfstTransducer &another, bool harmonize=true) |
Intersect this transducer with another. More... | |
HFSTDLL HfstTransducer & | invert () |
Swap the input and output symbols of each transition in the transducer. More... | |
HFSTDLL bool | is_automaton (void) const |
Whether the transducer is an automaton. More... | |
HFSTDLL bool | is_cyclic (void) const |
Whether the transducer is cyclic. More... | |
HFSTDLL bool | is_lookdown_infinitely_ambiguous (const StringVector &s) const |
(Not implemented) Whether lookdown of path s will have infinite results. More... | |
HFSTDLL bool | is_lookup_infinitely_ambiguous (const StringVector &s) const |
Whether lookup of path s will have infinite results. More... | |
HFSTDLL HfstTransducer & | lenient_composition (const HfstTransducer &another, bool harmonize=true) |
Make lenient composition of this transducer with . A .O. B = [ A .o. B ] .P. A. More... | |
HFSTDLL HfstOneLevelPaths * | lookdown (const StringVector &s, ssize_t limit=-1) const |
(Not implemented) Lookdown a single string s and return a maximum of limit results. More... | |
HFSTDLL HfstOneLevelPaths * | lookdown_fd (StringVector &s, ssize_t limit=-1) const |
(Not implemented) Lookdown a single string minding flag diacritics properly. More... | |
HFSTDLL HfstOneLevelPaths * | lookup (const StringVector &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single tokenized string s and return a maximum of limit results. More... | |
HFSTDLL HfstOneLevelPaths * | lookup (const std::string &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single string s and return a maximum of limit results. More... | |
HFSTDLL HfstOneLevelPaths * | lookup (const HfstTokenizer &tok, const std::string &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single string s and store a maximum of limit results to results. tok defined how s is tokenized. More... | |
HFSTDLL HfstOneLevelPaths * | lookup_fd (const StringVector &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results. More... | |
HFSTDLL HfstOneLevelPaths * | lookup_fd (const std::string &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results. More... | |
HFSTDLL HfstOneLevelPaths * | lookup_fd (const HfstTokenizer &tok, const std::string &s, ssize_t limit=-1, double time_cutoff=0.0) const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results. tok defines how s is tokenized. More... | |
HFSTDLL HfstTransducer & | minimize () |
Minimize the transducer. More... | |
HFSTDLL HfstTransducer & | n_best (unsigned int n) |
Extract n best paths of the transducer. More... | |
HFSTDLL HfstTransducer & | operator= (const HfstTransducer &another) |
Assign this transducer a new value equivalent to transducer another. More... | |
HFSTDLL HfstTransducer & | optionalize () |
Disjunct the transducer with an epsilon transducer. More... | |
HFSTDLL HfstTransducer & | output_project () |
Extract the output language of the transducer. More... | |
HFSTDLL HfstTransducer & | priority_union (const HfstTransducer &another) |
Make priority union of this transducer with another. More... | |
HFSTDLL HfstTransducer & | prune () |
Make transducer coaccessible. More... | |
HFSTDLL HfstTransducer & | prune_alphabet (bool force=true) |
Remove all symbols that do not occur in transitions of the transducer from its alphabet. More... | |
HFSTDLL HfstTransducer & | push_weights (PushType type) |
Push weights towards initial or final state(s) as defined by type. More... | |
HFSTDLL HfstTransducer & | remove_epsilons () |
Remove all epsilon:epsilon transitions from the transducer so that the transducer remains equivalent. More... | |
HFSTDLL void | remove_from_alphabet (const std::string &symbol) |
Remove symbol from the alphabet of the transducer. CURRENTLY NOT IMPLEMENTED. More... | |
HFSTDLL HfstTransducer & | repeat_n (unsigned int n) |
A concatenation of n transducers. More... | |
HFSTDLL HfstTransducer & | repeat_n_minus (unsigned int n) |
A concatenation of N transducers where N is any number from zero to n, inclusive. More... | |
HFSTDLL HfstTransducer & | repeat_n_plus (unsigned int n) |
A concatenation of N transducers where N is any number from n to infinity, inclusive. More... | |
HFSTDLL HfstTransducer & | repeat_n_to_k (unsigned int n, unsigned int k) |
A concatenation of N transducers where N is any number from n to k, inclusive. More... | |
HFSTDLL HfstTransducer & | repeat_plus () |
A concatenation of N transducers where N is any number from one to infinity. More... | |
HFSTDLL HfstTransducer & | repeat_star () |
A concatenation of N transducers where N is any number from zero to infinity. More... | |
HFSTDLL HfstTransducer & | reverse () |
Reverse the transducer. More... | |
HFSTDLL HfstTransducer & | set_final_weights (float weight, bool increment=false) |
Set the weights of all final states to weight. increment defines whether the old weight is incremented by weight or overwritten. More... | |
HFSTDLL void | set_name (const std::string &name) |
Rename the transducer name. More... | |
HFSTDLL void | set_property (const std::string &property, const std::string &value) |
Set arbitrary string property property to value. set_property("name") equals set_name(string&) . More... | |
HFSTDLL HfstTransducer & | substitute (bool(*func)(const StringPair &sp, StringPairSet &sps)) |
Substitute all transition sp with transitions sps as defined by function func. More... | |
HFSTDLL HfstTransducer & | substitute (const std::string &old_symbol, const std::string &new_symbol, bool input_side=true, bool output_side=true) |
Substitute all transition symbols equal to old_symbol with symbol new_symbol. input_side and output_side define whether the substitution is made on input and output sides. More... | |
HFSTDLL HfstTransducer & | substitute (const StringPair &old_symbol_pair, const StringPair &new_symbol_pair) |
Substitute all transition symbol pairs equal to old_symbol_pair with new_symbol_pair. More... | |
HFSTDLL HfstTransducer & | substitute (const StringPair &old_symbol_pair, const StringPairSet &new_symbol_pair_set) |
Substitute all transitions equal to old_symbol_pair with a set of transitions equal to new_symbol_pair_set. More... | |
HFSTDLL HfstTransducer & | substitute (const HfstSymbolSubstitutions &substitutions) |
Substitute all transition symbols as defined in substitutions. More... | |
HFSTDLL HfstTransducer & | substitute (const HfstSymbolPairSubstitutions &substitutions) |
Substitute all transition symbol pairs as defined in substitutions. More... | |
HFSTDLL HfstTransducer & | substitute (const StringPair &symbol_pair, HfstTransducer &transducer, bool harmonize=true) |
Substitute all transitions equal to symbol_pair with a copy of transducer transducer. More... | |
HFSTDLL HfstTransducer & | subtract (const HfstTransducer &another, bool harmonize=true) |
Subtract transducer another from this transducer. More... | |
HFSTDLL HfstTransducer & | transform_weights (float(*func)(float)) |
Transform all transition and state weights as defined in func. More... | |
HFSTDLL void | write_in_att_format (FILE *ofile, bool write_weights=true) const |
Write the transducer in AT&T format to FILE ofile. write_weights defines whether weights are written. More... | |
HFSTDLL void | write_in_att_format (const std::string &filename, bool write_weights=true) const |
Write the transducer in AT&T format to FILE named filename. write_weights defines whether weights are written. More... | |
virtual HFSTDLL | ~HfstTransducer (void) |
Destructor. More... | |
Static Public Member Functions | |
static HFSTDLL HfstTransducer | identity_pair (ImplementationType type) |
Create identity pair transducer of type. More... | |
static HFSTDLL HfstTransducer * | read_lexc_ptr (const std::string &filename, ImplementationType type, bool verbose) |
Compile a lexc file in file filename into an HfstTransducer of type type and return the transducer. More... | |
static HFSTDLL HfstTransducer | universal_pair (ImplementationType type) |
Create universal pair transducer of type. More... | |
Protected Member Functions | |
HfstTransducer & | apply (SFST::Transducer *(*sfst_funct)(SFST::Transducer *), fst::StdVectorFst *(*tropical_ofst_funct)(fst::StdVectorFst *), fsm *(*foma_funct)(fsm *), bool dummy) |
declarations for HFST functions that take two or more parameters More... | |
Friends | |
HFSTDLL friend std::ostream & | operator<< (std::ostream &out, const HfstTransducer &t) |
Write transducer t in AT&T format to ostream out. More... | |
A synchronous finite-state transducer.
Transducer functions modify their calling object and return a reference to the calling object after modification, unless otherwise mentioned. Transducer arguments are usually not modified.
// transducer is reversed transducer.reverse(); // transducer2 is not modified, but a copy of it is disjuncted with // transducer1 transducer1.disjunct(transducer2); // a chain of functions is possible transducer.reverse().determinize().reverse().determinize();
Currently, an HfstTransducer has four implementation types as defined by the enumeration ImplementationType. When an HfstTransducer is created, its type is defined with an ImplementationType argument. For functions that take a transducer as an argument, the type of the calling transducer must be the same as the type of the argument transducer:
// this will cause an error log_transducer.disjunct(sfst_transducer); // this works, but weights are lost in the conversion log_transducer.convert(SFST_TYPE).disjunct(sfst_transducer); // this works, information is not lost log_transducer.disjunct(sfst_transducer.convert(LOG_OPENFST_TYPE));
With HfstTransducer constructors it is possible to create empty, epsilon, one-transition and single-path transducers. Transducers can also be created from scratch with HfstBasicTransducer and converted to an HfstTransducer. More complex transducers can be combined from simple ones with various functions. <a name="symbols"></a>
The HFST transducers support transitions with epsilon, unknown and identity symbols. The special symbols are explained in documentation of datatype #String.
An example:
// In the xerox formalism used here, "?" means the unknown symbol // and "?:?" the identity pair HfstBasicTransducer tr1; tr1.add_state(1); tr1.set_final_weight(1, 0); tr1.add_transition (0, HfstBasicTransition(1, "@_UNKNOWN_SYMBOL_@", "foo", 0) ); // tr1 is now [ ?:foo ] HfstBasicTransducer tr2; tr2.add_state(1); tr2.add_state(2); tr2.set_final_weight(2, 0); tr2.add_transition (0, HfstBasicTransition(1, "@_IDENTITY_SYMBOL_@", "@_IDENTITY_SYMBOL_@", 0) ); tr2.add_transition (1, HfstBasicTransition(2, "bar", "bar", 0) ); // tr2 is now [ [ ?:? ] [ bar:bar ] ] ImplementationType type = SFST_TYPE; HfstTransducer Tr1(tr1, type); HfstTransducer Tr2(tr2, type); Tr1.disjunct(Tr2); // Tr1 is now [ [ ?:foo | bar:foo ] | [[ ?:? | foo:foo ] [ bar:bar ]] ]
HfstTransducer | ( | ) |
Create an uninitialized transducer (use with care).
HfstTransducer | ( | ImplementationType | type | ) |
Create an empty transducer, i.e. a transducer that does not recognize any string. The type of the transducer is defined by type.
HfstTransducer | ( | const std::string & | utf8_str, |
const HfstTokenizer & | multichar_symbol_tokenizer, | ||
ImplementationType | type | ||
) |
Create a transducer by tokenizing the utf8 string utf8_string with tokenizer multichar_symbol_tokenizer. The type of the transducer is defined by type.
utf8_str is read one token at a time and for each token a new transition is created in the resulting transducer. The input and output symbols of that transition are the same as the token read.
An example:
std::string ustring = "foobar"; HfstTokenizer TOK; HfstTransducer tr(ustring, TOK, LOG_OPENFST_TYPE); // tr now contains one path [f o o b a r]
@see HfstTokenizer
HfstTransducer | ( | const std::string & | input_utf8_str, |
const std::string & | output_utf8_str, | ||
const HfstTokenizer & | multichar_symbol_tokenizer, | ||
ImplementationType | type | ||
) |
Create a transducer by tokenizing the utf8 input string input_utf8_string and output string output_utf8_string with tokenizer multichar_symbol_tokenizer. The type of the transducer is defined by type.
input_utf8_str and output_utf8_str are read one token at a time and for each token a new transition is created in the resulting transducer. The input and output symbols of that transition are the same as the input and output tokens read. If either string contains less tokens than another, epsilons are used as transition symbols for the shorter string.
An example:
std::string input = "foo"; std::string output = "barr"; HfstTokenizer TOK; HfstTransducer tr(input, output, TOK, SFST_TYPE); // tr now contains one path [f:b o:a o:r 0:r]
@see HfstTokenizer
HfstTransducer | ( | HfstInputStream & | in | ) |
Read a binary transducer from transducer stream in.
The stream can contain HFST tranducers or OpenFst, foma or SFST transducers without an HFST header. If the backend implementations are used as such, they are converted into HFST transducers.
For more information on transducer conversions and the HFST header structure, see here.
NotTransducerStreamException | |
StreamNotReadableException | |
StreamIsClosedException | |
TransducerTypeMismatchException | |
MissingOpenFstInputSymbolTableException |
HfstTransducer | ( | const HfstTransducer & | another | ) |
Create a deep copy of transducer another.
HfstTransducer | ( | const hfst::implementations::HfstBasicTransducer & | t, |
ImplementationType | type | ||
) |
Create an HFST transducer equivalent to HFST basic transducer t. The type of the created transducer is defined by type.
HfstTransducer | ( | const std::string & | symbol, |
ImplementationType | type | ||
) |
Create a transducer that recognizes the string pair <"symbol","symbol">, i.e. [symbol:symbol]. The type of the transducer is defined by type.
HfstTransducer | ( | const std::string & | isymbol, |
const std::string & | osymbol, | ||
ImplementationType | type | ||
) |
Create a transducer that recognizes the string pair <"isymbol","osymbol">, i.e [isymbol:osymbol]. The type of the transducer is defined by type.
HfstTransducer | ( | FILE * | ifile, |
ImplementationType | type, | ||
const std::string & | epsilon_symbol, | ||
unsigned int & | linecount | ||
) |
Create a transducer of type type as defined in AT&T format in FILE ifile. epsilon_symbol defines how epsilons are represented.
In AT&T format, the transition lines are of the form:
[0-9]+[\w]+[0-9]+[\w]+[^\w]+[\w]+[^\w]([\w]+(-)[0-9]+(\.[0-9]+))
and final state lines:
[0-9]+[\w]+([\w]+(-)[0-9]+(\.[0-9]+))
If several transducers are listed in the same file, they are separated by lines of two consecutive hyphens "--". If the weight (<tt>([\\w]+(-)[0-9]+(\.[0-9]+))</tt>) is missing, the transition or final state is given a zero weight. NOTE: If transition symbols contains spaces, they must be escaped as "@_SPACE_@" because spaces are used as field separators. Both "@0@" and "@_EPSILON_SYMBOL_@" are always interpreted as epsilons.
An example:
0 1 foo bar 0.3 1 0.5 -- 0 0.0 -- -- 0 0.0 0 0 a <eps> 0.2
The example lists four transducers in AT&T format: one transducer accepting the string pair <"foo","bar">, one epsilon transducer, one empty transducer and one transducer that accepts any number of 'a's and produces an empty string in all cases. The transducers can be read with the following commands (from a file named "testfile.att"):
std::vector<HfstTransducer> transducers; FILE * ifile = fopen("testfile.att", "rb"); try { while (not eof(ifile)) { HfstTransducer t(ifile, TROPICAL_OPENFST_TYPE, "<eps>"); transducers.push_back(t); printf("read one transducer\n"); } } catch (NotValidAttFormatException e) { printf("Error reading transducer: not valid AT&T format.\n"); } fclose(ifile); fprintf(stderr, "Read %i transducers in total.\n", (int)transducers.size());
Epsilon will be represented as "@_EPSILON_SYMBOL_@" in the resulting transducer. The argument epsilon_symbol only denotes how epsilons are represented in ifile.
|
virtual |
Destructor.
|
protected |
declarations for HFST functions that take two or more parameters
bool compare | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) | const |
Whether this transducer and another are equivalent.
Two transducers are equivalent iff they accept the same input/output string pairs with the same weights and the same alignments.
HfstTransducer & compose | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Compose this transducer with another.
HfstTransducer & compose_intersect | ( | const HfstTransducerVector & | v, |
bool | invert = false , |
||
bool | harmonize = true |
||
) |
Compose this transducer with the intersection of transducers in v. If invert is true, then compose the intersection of the transducers in v with this transducer.
The algorithm used by this function is faster than intersecting all transducers one by one and then composing this transducer with the intersection.
HfstTransducer & concatenate | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Concatenate this transducer with another.
HfstTransducer & convert | ( | ImplementationType | type, |
std::string | options = "" |
||
) |
Convert the transducer into an equivalent transducer in format type.
If a weighted transducer is converted into an unweighted one, all weights are lost. In the reverse case, all weights are initialized to the semiring's one.
A transducer of type SFST_TYPE, TROPICAL_OPENFST_TYPE, LOG_OPENFST_TYPE or FOMA_TYPE can be converted into an HFST_OL_TYPE or HFST_OLW_TYPE transducer, but an HFST_OL_TYPE or HFST_OLW_TYPE transducer cannot be converted to any other type.
HfstTransducer & cross_product | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Make cross product of this transducer with . It pairs every string of this with every string of .
Both transducers must be automata, i.e. map strings onto themselves.
If strings are not the same length, epsilon padding will be added in the end of the shorter string.
HfstTransducer & determinize | ( | ) |
Determinize the transducer.
Determinizing a transducer yields an equivalent transducer that has no state with two or more transitions whose input:output symbol pairs are the same.
HfstTransducer & disjunct | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Disjunct this transducer with another.
void extract_paths | ( | HfstTwoLevelPaths & | results, |
int | max_num = -1 , |
||
int | cycles = -1 |
||
) | const |
Extract a maximum of max_num paths that are recognized by the transducer following a maximum of cycles cycles and store the paths into results.
results | The extracted paths are inserted here. |
max_num | The total number of resulting strings is capped at max_num, with 0 or negative indicating unlimited. |
cycles | Indicates how many times a cycle will be followed, with negative numbers indicating unlimited. |
This is a version of extract_paths that handles flag diacritics as ordinary symbols and does not validate the sequences prior to outputting as opposed to extract_paths_fd(HfstTwoLevelPaths &, int, int, bool) const.
If this function is called on a cyclic transducer with unlimited values for both max_num and cycles, an exception will be thrown.
This example
ImplementationType type = SFST_TYPE; HfstTransducer tr1("a", "b", type); tr1.repeat_star(); HfstTransducer tr2("c", "d", type); tr2.repeat_star(); tr1.concatenate(tr2).minimize(); HfstTwoLevelPaths results; tr1.extract_paths(results, MAX_NUM, CYCLES); // Go through all paths. for (HfstTwoLevelPaths::const_iterator it = results.begin(); it != results.end(); it++) { std::string istring; std::string ostring; for (StringPairVector::const_iterator IT = it->second.begin(); IT != it->second.end(); IT++) { istring.append(IT->first); ostring.append(IT->second); } // Print input and output strings of each path std::cerr << istring << ":" << ostring; // and optionally the weight of the path. //std::cerr << "\t" << it->first; std::cerr << std::endl; }
prints with values MAX_NUM == -1 and CYCLES == 1 all paths that have no consecutive cycles:
a : b ac : bd acc : bdd c : d cc : dd
and with values MAX_NUM == 7 and CYCLES == 2 a maximum of 7 paths that follow a cycle a maximum of 2 times (there are 11 such paths, but MAX_NUM limits their number to 7):
a : b aa : bb aac : bbd aacc : bbdd c : d cc : dd ccc : ddd
TransducerIsCyclicException |
void extract_paths_fd | ( | HfstTwoLevelPaths & | results, |
int | max_num = -1 , |
||
int | cycles = -1 , |
||
bool | filter_fd = true |
||
) | const |
Extract a maximum of max_num paths that are recognized by the transducer and are not invalidated by flag diacritic rules following a maximum of cycles cycles and store the paths into results. filter_fd defines whether the flag diacritics themselves are filtered out of the result strings.
results | The extracted paths are inserted here. |
max_num | The total number of resulting strings is capped at max_num, with 0 or negative indicating unlimited. |
cycles | Indicates how many times a cycle will be followed, with negative numbers indicating unlimited. |
filter_fd | Whether the flag diacritics are filtered out of the result strings. |
If this function is called on a cyclic transducer with unlimited values for both max_num and cycles, an exception will be thrown.
Flag diacritics are of the form @[PNDRCU][.][A-Z]+([.][A-Z]+)?
For example the transducer
[[@P.FEATURE.FOO@ foo] | [@P.FEATURE.BAR@ bar]] | [[foo @U.FEATURE.FOO@] | [bar @U.FEATURE.BAR@]]
will yield the paths <CODE>[foo foo]</CODE> and <CODE>[bar bar]</CODE>. <CODE>[foo bar]</CODE> and <CODE>[bar foo]</CODE> are invalidated by the flag diacritics so thay will not be included in \a results.
TransducerIsCyclicException |
StringSet get_alphabet | ( | ) | const |
Get the alphabet of the transducer.
The alphabet is defined as the set of symbols known to the transducer.
StringSet get_first_input_symbols | ( | ) | const |
Get first input level symbols of strings recognized (or rejected, if they end in a non-final state) by the transducer.
std::string get_name | ( | ) | const |
Get the name of the transducer.
const std::map< string, string > & get_properties | ( | ) | const |
Get all properties form transducer.
string get_property | ( | const std::string & | property | ) | const |
Get arbitrary string propert property. get_property("name") works like get_name.
ImplementationType get_type | ( | void | ) | const |
The implementation type of the transducer.
void harmonize | ( | HfstTransducer & | another | ) |
Harmonize transducers this and another.
|
static |
Create identity pair transducer of type.
The transducer has only one state, and it accepts: Identity:Identity
Transducer weight is 0.
HfstTransducer & input_project | ( | ) |
Extract the input language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to isymbol:isymbol.
HfstTransducer & insert_freely | ( | const StringPair & | symbol_pair, |
bool | harmonize = true |
||
) |
Freely insert symbol pair symbol_pair into the transducer.
To each state in this transducer is added a transition that leads from that state to itself with input and output symbols defined by symbol_pair.
If harmonize is true, then identity and unknown symbols in the transducer will be exapanded byt the symbols in symbol pair. Otherwise they aren't.
HfstTransducer & insert_freely | ( | const HfstTransducer & | tr, |
bool | harmonize = true |
||
) |
Freely insert a copy of tr into the transducer.
A copy of tr is attached with epsilon transitions to each state of this transducer. After the operation, for each state S in this transducer, there is an epsilon transition that leads from state S to the initial state of tr, and for each final state of tr, there is an epsilon transition that leads from that final state to state S in this transducer. The weights of the final states in tr are copied to the epsilon transitions leading to state S.
Implemented only for implementations::HfstBasicTransducer. Conversion is carried out for an HfstTransducer, if this function is called.
void insert_to_alphabet | ( | const std::string & | symbol | ) |
Explicitly insert symbol to the alphabet of the transducer.
HfstTransducer & intersect | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Intersect this transducer with another.
HfstTransducer & invert | ( | ) |
Swap the input and output symbols of each transition in the transducer.
bool is_automaton | ( | void | ) | const |
Whether the transducer is an automaton.
bool is_cyclic | ( | void | ) | const |
Whether the transducer is cyclic.
bool is_lookdown_infinitely_ambiguous | ( | const StringVector & | s | ) | const |
(Not implemented) Whether lookdown of path s will have infinite results.
bool is_lookup_infinitely_ambiguous | ( | const StringVector & | s | ) | const |
Whether lookup of path s will have infinite results.
Currently, this function will return whether the transducer is infinitely ambiguous on any lookup path found in the transducer, i.e. the argument s is ignored.
HfstTransducer & lenient_composition | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Make lenient composition of this transducer with . A .O. B = [ A .o. B ] .P. A.
HfstOneLevelPaths * lookdown | ( | const StringVector & | s, |
ssize_t | limit = -1 |
||
) | const |
(Not implemented) Lookdown a single string s and return a maximum of limit results.
Traverse all paths on logical second level of the transducer to produce all possible inputs on the first. This is in effect a fast composition of single path from left hand side.
s | string to look down |
limit | number of strings to extract. -1 tries to extract all and may get stuck if infinitely ambiguous |
HfstOneLevelPaths * lookdown_fd | ( | StringVector & | s, |
ssize_t | limit = -1 |
||
) | const |
HfstOneLevelPaths * lookup | ( | const StringVector & | s, |
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single tokenized string s and return a maximum of limit results.
This is a version of lookup that handles flag diacritics as ordinary symbols and does not validate the sequences prior to outputting. Currently, this function calls lookup_fd.
HfstOneLevelPaths * lookup | ( | const std::string & | s, |
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single string s and return a maximum of limit results.
This is an overloaded lookup function that leaves tokenizing to the transducer.
HfstOneLevelPaths * lookup | ( | const HfstTokenizer & | tok, |
const std::string & | s, | ||
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single string s and store a maximum of limit results to results. tok defined how s is tokenized.
This function is the same as lookup(const StringVector&, ssize_t, double) const but lookup is not done using a string and a tokenizer instead of a StringVector.
HfstOneLevelPaths * lookup_fd | ( | const StringVector & | s, |
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results.
Traverse all paths on logical first level of the transducer to produce all possible outputs on the second. This is in effect a fast composition of single path from left hand side.
This is a version of lookup that handles flag diacritics as epsilons and validates the sequences prior to outputting. Epsilons on the second level are represented by empty strings in results. For an example of flag diacritics, see hfst::HfstTransducer::extract_paths_fd(hfst::HfstTwoLevelPaths&, int, int, bool) const
s | String to look up. The weight is ignored. |
limit | (Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous. |
time_cutoff | Number of seconds that can pass before lookup is stopped. |
HfstOneLevelPaths * lookup_fd | ( | const std::string & | s, |
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results.
This is an overloaded lookup_fd that leaves tokenizing to the transducer.
@param s String to look up. The weight is ignored. @param limit (Currently ignored.) Number of strings to look up. -1 tries to look up all and may get stuck if infinitely ambiguous. @param time_cutoff Number of seconds that can pass before lookup is stopped. \return{A pointer to a HfstOneLevelPaths container allocated by callee} @sa lookup_fd
HFSTDLL HfstOneLevelPaths* lookup_fd | ( | const HfstTokenizer & | tok, |
const std::string & | s, | ||
ssize_t | limit = -1 , |
||
double | time_cutoff = 0.0 |
||
) | const |
Lookup or apply a single string s minding flag diacritics properly and store a maximum of limit results to results. tok defines how s is tokenized.
The same as lookup_fd(const StringVector&, ssize_t, double) const but uses a tokenizer and a string instead of a StringVector.
HfstTransducer & minimize | ( | ) |
Minimize the transducer.
Minimizing a transducer yields an equivalent transducer with the smallest number of states.
HfstTransducer & n_best | ( | unsigned int | n | ) |
Extract n best paths of the transducer.
In the case of a weighted transducer (TROPICAL_OPENFST_TYPE or LOG_OPENFST_TYPE), best paths are defined as paths with the lowest weight. In the case of an unweighted transducer (SFST_TYPE or FOMA_TYPE), the function returns random paths.
This function is not implemented for FOMA_TYPE or SFST_TYPE. If this function is called by an HfstTransducer of type FOMA_TYPE or SFST_TYPE, it is converted to TROPICAL_OPENFST_TYPE, paths are extracted and it is converted back to FOMA_TYPE or SFST_TYPE. If HFST is not linked to OpenFst library, an ImplementationTypeNotAvailableException is thrown.
HfstTransducer & operator= | ( | const HfstTransducer & | another | ) |
Assign this transducer a new value equivalent to transducer another.
HfstTransducer & optionalize | ( | ) |
Disjunct the transducer with an epsilon transducer.
HfstTransducer & output_project | ( | ) |
Extract the output language of the transducer.
All transition symbol pairs isymbol:osymbol are changed to osymbol:osymbol.
HfstTransducer & priority_union | ( | const HfstTransducer & | another | ) |
Make priority union of this transducer with another.
For the operation t1.priority_union(t2), the result is a union of t1 and t2, except that whenever t1 and t2 have the same string on the upper side, the path in t1 overrides the path in t2.
Example
Transducer 1 (t1): a : a b : b
Transducer 2 (t2): b : B c : C
Result ( t1.priority_union(t2) ): a : a b : b c : C
For more information, read: www.fsmbook.com
HfstTransducer & prune | ( | ) |
Make transducer coaccessible.
HfstTransducer & prune_alphabet | ( | bool | force = true | ) |
Remove all symbols that do not occur in transitions of the transducer from its alphabet.
If unknown or identity symbols occur in transitions of the transducer, pruning is not carried out by default.
force | Whether unused symbols are removed even if unknown or identity symbols occur in transitions. |
Epsilon, unknown and identity symbols are always included in the alphabet.
HfstTransducer & push_weights | ( | PushType | type | ) |
Push weights towards initial or final state(s) as defined by type.
If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.
|
static |
Compile a lexc file in file filename into an HfstTransducer of type type and return the transducer.
HfstTransducer & remove_epsilons | ( | ) |
Remove all epsilon:epsilon transitions from the transducer so that the transducer remains equivalent.
void remove_from_alphabet | ( | const std::string & | symbol | ) |
Remove symbol from the alphabet of the transducer. CURRENTLY NOT IMPLEMENTED.
HfstTransducer & repeat_n | ( | unsigned int | n | ) |
A concatenation of n transducers.
HfstTransducer & repeat_n_minus | ( | unsigned int | n | ) |
A concatenation of N transducers where N is any number from zero to n, inclusive.
HfstTransducer & repeat_n_plus | ( | unsigned int | n | ) |
A concatenation of N transducers where N is any number from n to infinity, inclusive.
HfstTransducer & repeat_n_to_k | ( | unsigned int | n, |
unsigned int | k | ||
) |
A concatenation of N transducers where N is any number from n to k, inclusive.
HfstTransducer & repeat_plus | ( | ) |
A concatenation of N transducers where N is any number from one to infinity.
HfstTransducer & repeat_star | ( | ) |
A concatenation of N transducers where N is any number from zero to infinity.
HfstTransducer & reverse | ( | ) |
Reverse the transducer.
A reverted transducer accepts the string "n(0) n(1) ... n(N)" iff the original transducer accepts the string "n(N) n(N-1) ... n(0)"
HfstTransducer & set_final_weights | ( | float | weight, |
bool | increment = false |
||
) |
Set the weights of all final states to weight. increment defines whether the old weight is incremented by weight or overwritten.
If the HfstTransducer is of unweighted type (SFST_TYPE or FOMA_TYPE), nothing is done.
void set_name | ( | const std::string & | name | ) |
Rename the transducer name.
void set_property | ( | const std::string & | property, |
const std::string & | value | ||
) |
Set arbitrary string property property to value. set_property("name") equals set_name(string&)
.
HfstTransducer & substitute | ( | bool(*)(const StringPair &sp, StringPairSet &sps) | func | ) |
Substitute all transition sp with transitions sps as defined by function func.
func | A pointer to a function that takes as its argument a StringPair sp and inserts to StringPairSet sps all StringPairs with which sp is to be substituted. Returns whether any substituting string pairs were inserted in sps, i.e. whether there is a need to perform substitution on transition sp. |
An example:
bool function(const StringPair &sp, StringPairSet &sps) { if (sp.second.compare(sp.first) != 0) return false; std::string isymbol = sp.first; std::string osymbol; if (sp.second.compare("a") == 0 || sp.second.compare("o") == 0 || sp.second.compare("u") == 0) osymbol = std::string("<back_wovel>"); if (sp.second.compare("e") == 0 || sp.second.compare("i") == 0) osymbol = std::string("<front_wovel>"); sps.insert(StringPair(isymbol, osymbol)); return true; } ... // For all transitions in transducer t whose input and output wovels // are equivalent, substitute the output wovel with a symbol that defines // whether the wovel in question is a front or back wovel. t.substitute(&function);
HfstTransducer & substitute | ( | const std::string & | old_symbol, |
const std::string & | new_symbol, | ||
bool | input_side = true , |
||
bool | output_side = true |
||
) |
Substitute all transition symbols equal to old_symbol with symbol new_symbol. input_side and output_side define whether the substitution is made on input and output sides.
old_symbol | Symbol to be substituted. |
new_symbol | The substituting symbol. |
input_side | Whether the substitution is made on the input side of a transition. |
output_side | Whether the substitution is made on the output side of a transition. |
The transition weights remain the same.
HfstTransducer & substitute | ( | const StringPair & | old_symbol_pair, |
const StringPair & | new_symbol_pair | ||
) |
Substitute all transition symbol pairs equal to old_symbol_pair with new_symbol_pair.
The transition weights remain the same.
Implemented only for TROPICAL_OPENFST_TYPE and LOG_OPENFST_TYPE. If this function is called by an unweighted HfstTransducer, it is converted to a weighted one, substitution is made and the transducer is converted back to the original format.
HfstTransducer & substitute | ( | const StringPair & | old_symbol_pair, |
const StringPairSet & | new_symbol_pair_set | ||
) |
Substitute all transitions equal to old_symbol_pair with a set of transitions equal to new_symbol_pair_set.
The weight of the original transition is copied to all new transitions.
Implemented only for TROPICAL_OPENFST_TYPE and LOG_OPENFST_TYPE. If this function is called by an unweighted HfstTransducer (SFST_TYPE or FOMA_TYPE), it is converted to TROPICAL_OPENFST_TYPE, substitution is done and it is converted back to the original format.
HfstTransducer & substitute | ( | const HfstSymbolSubstitutions & | substitutions | ) |
Substitute all transition symbols as defined in substitutions.
Each symbol old_symbol is substituted with symbol new_symbol, iff substitutions.find(old_symbol) == new_symbol != substitutions.end(). Otherwise, old_symbol remains the same.
This function performs all substitutions at the same time, so it is more efficient than calling substitute separately for each substitution.
HfstTransducer & substitute | ( | const HfstSymbolPairSubstitutions & | substitutions | ) |
Substitute all transition symbol pairs as defined in substitutions.
Each symbol pair old_isymbol:old_osymbol is substituted with symbol pair new_isymbol:new_osymbol, iff substitutions.find(old_isymbol:old_osymbol) == new_isymbol:new_osymbol != substitutions.end(). Otherwise, old_isymbol:old_osymbol remains the same.
This function performs all substitutions at the same time, so it is more efficient than calling substitute separately for each substitution.
HfstTransducer & substitute | ( | const StringPair & | symbol_pair, |
HfstTransducer & | transducer, | ||
bool | harmonize = true |
||
) |
Substitute all transitions equal to symbol_pair with a copy of transducer transducer.
A copy of transducer is attached (using epsilon transitions) between the source and target states of the transition to be substituted. The weight of the original transition is copied to the epsilon transition leaving from the source state.
Implemented only for TROPICAL_OPENFST_TYPE and LOG_OPENFST_TYPE. If this function is called by an unweighted HfstTransducer (SFST_TYPE or FOMA_TYPE), it is converted to TROPICAL_OPENFST_TYPE, substitution is done and it is converted back to the original format.
HfstTransducer & subtract | ( | const HfstTransducer & | another, |
bool | harmonize = true |
||
) |
Subtract transducer another from this transducer.
HfstTransducer & transform_weights | ( | float(*)(float) | func | ) |
Transform all transition and state weights as defined in func.
func | A pointer to a function that takes a weight as its argument and returns a weight that will be the new value of the weight given as the argument. |
An example:
float func(float f) { return 2*f + 0.5; } ... // All transition and final weights are multiplied by two and summed with 0.5. transducer.transform_weights(&func);
If the HfstTransducer is of unweighted type (#SFST_TYPE or #FOMA_TYPE), nothing is done.
|
static |
Create universal pair transducer of type.
The transducer has only one state, and it accepts: Identity:Identity, Unknown:Unknown, Unknown:Epsilon and Epsilon:Unknown
Transducer weight is 0.
void write_in_att_format | ( | FILE * | ofile, |
bool | write_weights = true |
||
) | const |
Write the transducer in AT&T format to FILE ofile. write_weights defines whether weights are written.
The fields in the resulting AT&T format are separated by tabulator characters.
NOTE: If the transition symbols contain space characters, the spaces are printed as "@_SPACE_@" because whitespace characters are used as field separators in AT&T format. Epsilon symbols are printed as "@0@".
If several transducers are written in the same file, they must be separated by a line of two consecutive hyphens "--", so that they will be read correctly by HfstTransducer(FILE*, ImplementationType, const std::string&).
An example:
ImplementationType type = FOMA_TYPE; HfstTransducer foobar("foo","bar",type); HfstTransducer epsilon("@_EPSILON_SYMBOL_@",type); HfstTransducer empty(type); HfstTransducer a_star("a",type); a_star.repeat_star(); FILE * ofile = fopen("testfile.att", "wb"); foobar.write_in_att_format(ofile); fprintf(ofile, "--\n"); epsilon.write_in_att_format(ofile); fprintf(ofile, "--\n"); empty.write_in_att_format(ofile); fprintf(ofile, "--\n"); a_star.write_in_att_format(ofile); fclose(ofile);
This will yield a file "testfile.att" that looks as follows:
0 1 foo bar 0.0 1 0.0 -- 0 0.0 -- -- 0 0.0 0 0 a a 0.0
@throws StreamCannotBeWrittenException @throws StreamIsClosedException @see operator<<(std::ostream &out, const HfstTransducer &t) @see HfstTransducer(FILE*, ImplementationType, const std::string&)
void write_in_att_format | ( | const std::string & | filename, |
bool | write_weights = true |
||
) | const |
Write the transducer in AT&T format to FILE named filename. write_weights defines whether weights are written.
If the file exists, it is overwritten. If the file does not exist, it is created.
|
friend |
Write transducer t in AT&T format to ostream out.
The same as hfst::HfstTransducer::write_in_att_format(FILE*, bool) const with ostreams. Weights are written if the type of t is weighted.