NLP

Poetry and a cup of coffee

Poetry is a complex genre of literature, and it is often something that we wouldn't expect computers to be able to understand. Poems play with the language by intentionally breaking its structure and by expressing meaning in figurative language such as metaphors. A lot of the expressive power of poetry lies in its ability of provoking certain imagery, certain emotions, in the reader. 😌 As it turns out, it is possible to analyze Finnish poetry automatically in Python. 😃

(more…)
The amazing murre Python library

When we Finns write online or message with friends and family, we hardly ever resort to using standard written Finnish (kirjakieli). Instead, we write as we would speak (in puhekieli), as using the standardized spelling would make our communication sound a bit too official. Now, for computers, this is quite a challenge, as most of the datasets used for NLP tools represent well curated normative text. 🤯 Luckily, as always, there's a solution for processing spoken Finnish text in Python. ☺️

(more…)
text saying you got this

Turku dependency parsers, both the statistical and neural ones, are no doubt among the most important recent NLP tools developed for Finnish. Without them, doing NLP for Finnish would be extremely difficult. This posts explains how to use them easily to parse Finnish from your Python code. 🐍

(more…)
How to install VISL CG3 on Mac, Windows and Linux

VISL CG3 is a neat tool for running constraint grammars (CGs) for things such as morphological disambiguation or syntactic parsing. Grammars of this formalism have been developed for a great many endangered Uralic languages boosting their NLP. And these CGs are actually easily available in UralicNLP for Python programmers.

Even for UralicNLP, a tool called vislcg3 needs to be installed on your machine, and it might be a tricky task if you cannot find the correct binaries for your operating system. Therefore I tailored this guide.

(more…)
How to use Omorfi for Finnish morphology

Omorfi is inarguably an amazing tool for processing Finnish morphology both in analysis and generation. However, using it might be quite a challenge for the users who are not too (H)FST savvy. 😅That is one of the motivations for my UralicNLP library the purpose of which is to provide an easy Python interface for a multitude NLP tools for Uralic languages. (more…)
a pen and a syntactic tree

If you are interested in generating Finnish with a computer (NLG), you have probably already run into the problem of the complex morphology and syntax of Finnish. In addition to knowing how to inflect words, you have to take agreement into account. That for example, the verb agrees with the subject's number and person: minä syön, sinä syöt and so on. And there's more: case governance has to be solved too. A verb takes its direct object in a certain case, for example, you would say uneksin autosta but näen auton. Such is the problem of natural language generation. 🤷🏼‍♂️

You are in luck, I have resolved this issue and created a kick-ass python library called syntax maker. Just for you, my friend, free to use.  Are you ready to unleash the power of NLG? 😊😊 (more…)

A green python ready to use HFST :-D

HFST (Helsinki Finite-State Transducer Technology) is a neat tool for modelling morphology of languages in a computational way. The problem is that currently, the Python API is under-documented. But fear not, in this post you will learn how to load optimised lookup files in Python and use them to analyse and generate word forms. 😃
(more…)