Using HFST on Python

HFST (Helsinki Finite-State Transducer Technology) is a neat tool for modelling morphology of languages in a computational way. The problem is that currently, the Python API is under-documented. But fear not, in this post you will learn how to load optimised lookup files in Python and use them to analyse and generate word forms. 😃

Update: You might be interested in an easier way of using FSTs including Omorfi

Install HFST

Installing HFST is just as simple as running sudo pip install hfst in terminal. For macOS and Windows, there are prebuilt versions that pip will install automatically. If however, you are using linux, you might need to install some dependencies. sudo apt-get install build-essential libreadline6-dev python-dev .

Using transducers

You can use the following code to load and use an HFST transducer.

import hfst

filename = "/path/to/your_lookup.hfstol"
input_stream = hfst.HfstInputStream(filename)
analyser = input_stream.read()
print analyser.lookup("word_you_want_to_lookup")

 

What is more, the same script can be used both to analyse and generate. You just need to change the HFST lookup file and you are good to go. 🙂

If you need ready-made transducer files to do some testing, you can for example, download prebuilt version of Omorfi for Finnish. You can try out, for example, omorfi-omor.analyse.hfst and omorfi-omor.generate.hfst located under /usr/local/share/hfst/fi/ or C:\omorfi\hfst\fi depending on your operating system.

Conclusion

It's not too difficult to use HFST nowadays, especially that it's finally on the pip repository, and it's compatible with both Python 2 and Python 3. For further information about the tool, consult the project's GitHub repository. Thanks for reading. 😊