CL-WSD

A simple implementation, using the Weka framework for Machine Learning and a couple of other tools, to solve the Task 3 of SemEval 2010 about Cross-lingual Word Sense Disambiguation.

I did not partecipated in the competition since this project has been assigned to me by Prof. Roberto Navigli after the deadline was over, but I used exactly the same information and data the participants had. Here is the code, without the corpora of course (they amount to a few GB of data).

The competition worked in this way: a team could choose only one language to be tested against english, or joining the multilingual subtask (english versus all the other five languages). A team could also choose to be evaluated as “best” or “out of five”. Click here for details.

My program is only on the Bilingual task (but could be easily extended to multilingual), and I reached good results on both the “best” and “out of five” evaluations.

Here I describe how to perform a GIZA/MGIZA word alignment using europarl parallel corpora for this task.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s