CL-WSD

A simple implementation, using the Weka framework for Machine Learning and a couple of other tools, to solve the Task 3 of SemEval 2010 about Cross-lingual Word Sense Disambiguation.

I did not partecipated in the competition since this project has been assigned to me by Prof. Roberto Navigli after the deadline was over, but I used exactly the same information and data the participants had. Here is the code, without the corpora of course (they amount to a few GB of data).

The competition worked in this way: a team could choose only one language to be tested against english, or joining the multilingual subtask (english versus all the other five languages). A team could also choose to be evaluated as “best” or “out of five”. Click here for details.

My program is only on the Bilingual task (but could be easily extended to multilingual), and I reached good results on both the “best” and “out of five” evaluations.

Here I describe how to perform a GIZA/MGIZA word alignment using europarl parallel corpora for this task.

Best Evaluation

Italian:

  • max semeval 16.28
  • my result 17.1625

Spanish:

  • max semeval 23.42
  • my result 21.531

German:

  • max semeval 13.79
  • my result 13.571

French:

  • max semeval 21.96
  • my result 22.1715

Dutch:

  • max semeval 17.7
  • my result 17.07

Out of five Evaluation

Italian:

  • max semeval 40.7
  • my result 42.51

Spanish:

  • max semeval 43.12
  • my result 49.7

German:

  • max semeval 33.21
  • my result 33.9765

French:

  • max semeval 49.44
  • my result 48.984

Dutch:

  • max semeval 34.95
  • my result 36.2365
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s