Create Finnish sentences computationally in Python (NLG)

If you are interested in generating Finnish with a computer (NLG), you have probably already run into the problem of the complex morphology and syntax of Finnish. In addition to knowing how to inflect words, you have to take agreement into account. That for example, the verb agrees with the subject's number and person: minä syön, sinä syöt and so on. And there's more: case governance has to be solved too. A verb takes its direct object in a certain case, for example, you would say uneksin autosta but näen auton. Such is the problem of natural language generation. 🤷🏼‍♂️

You are in luck, I have resolved this issue and created a kick-ass python library called syntax maker. Just for you, my friend, free to use.  Are you ready to unleash the power of NLG? 😊😊

Go ahead and do pip install syntaxmaker to start using it. You will have to install Omorfi as well, for example, by using these Omorfi binaries. For troubleshooting in the installation, see my HFST post and syntaxmaker wiki. If you are only interested in morphological inflection, take a look at how to use Omorfi.

How to use it?

To understand how the library is suppose to be used, you have to understand the basics of phrases and heads. Let's take for example a syntactic analysis of a cat eats food  VP [ DP [ a NP [ cat ] ] eats NP [ food ] ]. Every single word (or head) is inside of a phrase of its part-of-speech: eats is the head of the VP (verb phrase), cat the head of NP (noun phrase) and so on. And you can see that the phrases are nested inside of each other, leaving the verb phrase the root of the tree. This is how syntax maker works too; everything has to go under a verb phrase.

In order to create a sentence kissat syövät ruokaa, you just have to create a structure that looks like the one in the figure below.

a syntactic tree for kissa syödä ruokaa

This can be done with the following piece of code:

from syntaxmaker.syntax_maker import *
vp = create_verb_pharse("syödä") #creates a Phrase object that happens to be a VP

subject = create_phrase("NP", "kissa", {u"NUM": "PL"}) #creates an NP with the optional argument number
vp.components["subject"] = subject #assigns the subject NP to the VP

dir_object = create_phrase("NP", "ruoka")
vp.components["dir_object"] = dir_object

print vp
#output: kissat syövät ruokaa

You can even make more complex sentences with relative clauses with the library. There are, however, a couple of things you should know about. First, when you create a sentence, you can pass a dictionary to the function with information about morphology. The possible values are:

Every Phrase object also has a structure in which they have components, order, head and agreement. You can learn all about the possible phrase types and their structures in the grammar.json. Basically, if you want to add more than the required phrases to a phrase, you can add a new component by whichever name and add it in the order list as well.

Let's continue the example above by adding an adposition phrase to it:

np = create_phrase("NP", "käsi", {u"NUM": "PL"})
pp = create_adposition_phrase("ilman", np) #this phrase will be ilman käsiä

vp.components["adposition"] = pp #let's add the pp to the vp by a new name
vp.order.append("adposition") #this will add the new component to the order list which tells the order in which the words appear in the phrase
print vp
#output kissat syövät ruokaa ilman käsiä

You can even shuffle the order of the phrases and exploit the free word order of Finnish:

from random import shuffle
shuffle(vp.order)
print vp
#output ilman käsiä ruokaa syövät kissat, for instance

Note that this only shuffles the order of the phrases, not the words themselves, and that's why ilman käsiä appears always together in the correct order.

More help with the NLG library?

You can find more instructions in syntax maker wiki or you can use the contact form on this site to ask me.

Conclusion

This has been a long time coming for me to write a new tutorial about the NLG library. I have put a decent amount of work into it, so I wouldn't mind seeing people actually using it too. 😊

Cite

Hämäläinen, M. & Rueter, J., 2018, Development of an Open Source Natural Language Generation Tool for Finnish. In Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages. Stroudsburg: The Association for Computational Linguistics, p. 51-58 8 p.