Parsing Finnish Syntax in Python Easily

Turku dependency parsers, both the statistical and neural ones, are no doubt among the most important recent NLP tools developed for Finnish. Without them, doing NLP for Finnish would be extremely difficult. This posts explains how to use them easily to parse Finnish from your Python code. 🐍

If instead of full parsing, you are interested in Finnish morphology, you should probably read my post about Omorfi. For generating syntactic Finnish automatically, you should learn about Syntax Maker.

Setting up the parser

Before diving into the Python code, we have to first set up the actual dependency parser. This step requires installing Docker and setting up either the statistical or the neural parser. At this point of time (early 2019), I would strongly recommend the old statistical parser. Why? Just because it is so much faster than the neural one. Maybe in the near future the neural parser catches up in speed and the situation is different. 😁

1 Install Docker

For Windows and Mac users, download and install Docker Desktop. On Mac, you will need to start Docker after installation so that it can install command line tools.

Linux users will have to configure a new repository and install docker through the system's package manager. Consult instructions for Ubuntu, Debian or Fedora.

2 Statistical parser

Once you have docker installed in your system, you will only need to run the following command to get the Turku Finnish Dependency Parser running.

The command will download and setup the statistical parser and start running it on the background.

3 Neural parser

In order to use Turku Neural Parser Pipeline, you will need to run the following command on terminal.

The command will do everything for you in terms of setting the neural parser up. It will also start running it on background.

Parse Finnish in Python

Now that the dependency parser is installed and running on the background, we can use UralicNLP to parse Finnish text. First, install the UralicNLP Python library.

Once it has been installed, you can use it to parse Finnish text. This includes features such as tokenization, pos tagging, morphological tagging, lemmatization and dependency parsing. Just try out the following code.

The result will be returned in as a UD_collection object. For the future, read more documentation on using UD formatted data with UralicNLP.

Cite

If you use the tools described in this post in an academic publication, please remember to cite them accordingly

Hämäläinen, Mika. (2019). UralicNLP: An NLP Library for Uralic Languages. Journal of open source software, 4(37), [1345]. https://doi.org/10.21105/joss.01345

Kanerva, J., Ginter, F., Miekka, N., Leino, A., & Salakoski, T. (2018, October). Turku neural parser pipeline: An end-to-end system for the conll 2018 shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies (pp. 133-142).

Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missilä, A., Ojala, S., Salokoski, T., & Ginter, F. (2014). Building the essential resources for Finnish: the Turku Dependency TreebankLanguage Resources and Evaluation48(3), 493-531.

Related Post