VISL CG3 is a neat tool for running constraint grammars (CGs) for things such as morphological disambiguation or syntactic parsing. Grammars of this formalism have been developed for a great many endangered Uralic languages boosting their NLP. And these CGs are actually easily available in UralicNLP for Python programmers.
Even for UralicNLP, a tool called vislcg3 needs to be installed on your machine, and it might be a tricky task if you cannot find the correct binaries for your operating system. Therefore I tailored this guide.
Omorfi is inarguably an amazing tool for processing Finnish morphology both in analysis and generation. However, using it might be quite a challenge for the users who are not too (H)FST savvy. 😅That is one of the motivations for my UralicNLP library the purpose of which is to provide an easy Python interface for a multitude NLP tools for Uralic languages. (more…)
Morphology can be described as the smallest information bearing unit of the human language. Words that are inflected can be divided into morphemes, e.g. -ed in talked is a morpheme that adds the meaning of a past tense into the verb talk; -s in dogs pluralizes the noun and so on. These morphemes that are added to words are known as affixes. There are different kinds of affixes and in this post we are going to look at them more closely. 🤓 (more…)
If you have done language technology in a Nordic country, you have probably heard about Korp. And by now, you have probably developed some sort of a love-hate relationship to it. My initial thought was: Korp is nice, but so what 🤷🏼♂️, I need to access it programmatically for it to serve any use. The fact that the API description is somewhat hidden online and that not all Korp services are open about the url of their API doesn't really help at all. 😩
Luckily, once again, yours truly has been typing in some code to make your life easier. 🤓 Behold, my very own python library for querying Korp. 😊 (more…)
Oh, sarcasm, sarcasm. The thing that puzzles us so much. It takes some knowledge of the person to know if he is being sarcastic or not. Regardless of how sarcastic we were ourselves. But is there any science behind it? As it turns out, there is, and I wrote my MA thesis about it in Spanish. But if you don't have time to read it, just read this post instead. 😅 (more…)
If you are interested in generating Finnish with a computer (NLG), you have probably already run into the problem of the complex morphology and syntax of Finnish. In addition to knowing how to inflect words, you have to take agreement into account. That for example, the verb agrees with the subject's number and person: minä syön, sinä syöt and so on. And there's more: case governance has to be solved too. A verb takes its direct object in a certain case, for example, you would say uneksin autosta but näen auton. Such is the problem of natural language generation. 🤷🏼♂️
You are in luck, I have resolved this issue and created a kick-ass python library called syntax maker. Just for you, my friend, free to use. Are you ready to unleash the power of NLG? 😊😊 (more…)
Languages can be grouped together in different ways. One can put languages together based on their family relation (e.g. Uralic languages, Indo-European languages) or the area where they are spoken. But maybe the most interesting and eye-opeing way of grouping them is by their morphology. As it turns out, there are only four morphological groups for languages and all spoken languages fall into one of them. (more…)
When you are targeting an international audience and you have enough money to back your project up, the thing you have to do is to localize your application. Thinking that everyone knows English, is just naive. This is a general guide that shows how the process of localization works. (more…)
I have compiled a list of places where one can look for corpora. This list is not limited to one language only, but rather I am listing resources that are multilingual. (more…)