Poetry is a complex genre of literature, and it is often something that we wouldn't expect computers to be able to understand. Poems play with the language by intentionally breaking its structure and by expressing meaning in figurative language such as metaphors. A lot of the expressive power of poetry lies in its ability of provoking certain imagery, certain emotions, in the reader. 😌 As it turns out, it is possible to analyze Finnish poetry automatically in Python. 😃(more…)
When we Finns write online or message with friends and family, we hardly ever resort to using standard written Finnish (kirjakieli). Instead, we write as we would speak (in puhekieli), as using the standardized spelling would make our communication sound a bit too official. Now, for computers, this is quite a challenge, as most of the datasets used for NLP tools represent well curated normative text. 🤯 Luckily, as always, there's a solution for processing spoken Finnish text in Python. ☺️(more…)
Turku dependency parsers, both the statistical and neural ones, are no doubt among the most important recent NLP tools developed for Finnish. Without them, doing NLP for Finnish would be extremely difficult. This posts explains how to use them easily to parse Finnish from your Python code. 🐍(more…)
VISL CG3 is a neat tool for running constraint grammars (CGs) for things such as morphological disambiguation or syntactic parsing. Grammars of this formalism have been developed for a great many endangered Uralic languages boosting their NLP. And these CGs are actually easily available in UralicNLP for Python programmers.
Even for UralicNLP, a tool called vislcg3 needs to be installed on your machine, and it might be a tricky task if you cannot find the correct binaries for your operating system. Therefore I tailored this guide.(more…)
If you are interested in generating Finnish with a computer (NLG), you have probably already run into the problem of the complex morphology and syntax of Finnish. In addition to knowing how to inflect words, you have to take agreement into account. That for example, the verb agrees with the subject's number and person: minä syön, sinä syöt and so on. And there's more: case governance has to be solved too. A verb takes its direct object in a certain case, for example, you would say uneksin autosta but näen auton. Such is the problem of natural language generation. 🤷🏼♂️
HFST (Helsinki Finite-State Transducer Technology) is a neat tool for modelling morphology of languages in a computational way. The problem is that currently, the Python API is under-documented. But fear not, in this post you will learn how to load optimised lookup files in Python and use them to analyse and generate word forms. 😃