Turku dependency parsers, both the statistical and neural ones, are no doubt among the most important recent NLP tools developed for Finnish. Without them, doing NLP for Finnish would be extremely difficult. This posts explains how to use them easily to parse Finnish from your Python code. 🐍
If instead of full parsing, you are interested in Finnish morphology, you should probably read my post about Omorfi. For generating syntactic Finnish automatically, you should learn about Syntax Maker.
Setting up the parser
Before diving into the Python code, we have to first set up the actual dependency parser. This step requires installing Docker and setting up either the statistical or the neural parser. At this point of time (early 2019), I would strongly recommend the old statistical parser. Why? Just because it is so much faster than the neural one. Maybe in the near future the neural parser catches up in speed and the situation is different. 😁
1 Install Docker
For Windows and Mac users, download and install Docker Desktop. On Mac, you will need to start Docker after installation so that it can install command line tools.
2 Statistical parser
Once you have docker installed in your system, you will only need to run the following command to get the Turku Finnish Dependency Parser running.
docker run -d -p 0.0.0.0:9876:9876 kazhar/finnish-dep-parser
The command will download and setup the statistical parser and start running it on the background.
3 Neural parser
In order to use Turku Neural Parser Pipeline, you will need to run the following command on terminal.
docker run -d -p 9876:7689 turkunlp/turku-neural-parser:finnish-cpu-plaintext-server
The command will do everything for you in terms of setting the neural parser up. It will also start running it on background.
Parse Finnish in Python
Now that the dependency parser is installed and running on the background, we can use UralicNLP to parse Finnish text. First, install the UralicNLP Python library.
pip install uralicNLP
Once it has been installed, you can use it to parse Finnish text. This includes features such as tokenization, pos tagging, morphological tagging, lemmatization and dependency parsing. Just try out the following code.
from uralicNLP import dependency
ud = dependency.parse_text("kissa nauroi kovaa\nLehmä lauloi ainiaan", "fin")
for sentence in ud:
for word in sentence:
print word.pos, word.lemma, word.get_attribute("deprel")
>>NOUN kissa nsubj
>>VERB nauraa root
>>ADJ kova obj
>>NOUN lehmä nsubj
>>VERB laulaa root
>>ADV ainiaan advmod
The result will be returned in as a UD_collection object. For the future, read more documentation on using UD formatted data with UralicNLP.
If you use the tools described in this post in an academic publication, please remember to cite them accordingly
Hämäläinen, Mika. (2019). UralicNLP: An NLP Library for Uralic Languages. Journal of open source software, 4(37), . https://doi.org/10.21105/joss.01345
Kanerva, J., Ginter, F., Miekka, N., Leino, A., & Salakoski, T. (2018, October). Turku neural parser pipeline: An end-to-end system for the conll 2018 shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies (pp. 133-142).
Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missilä, A., Ojala, S., Salokoski, T., & Ginter, F. (2014). Building the essential resources for Finnish: the Turku Dependency Treebank. Language Resources and Evaluation, 48(3), 493-531.