Named Entity Recognition in Tweets

Named-Entity Recognition (NER) concerns the classification of textual segments of data in a predefined set of categories, like persons, organizations and locations. State-of-the-art NER systems achieve very high performance for a narrow set of entities and for noise-free and grammatically well-structured documents. But, in applications like Twitter where text is short, using an informal style and with an unreliable use of capitalization the recognition of entities becomes a challenging task.

The competition consists of identifying 13 types of entities (person, musicartist, organisation, geoloc, product, media, sportsteam, event, tvshow, movie, facility, transport line, other) in tweets. For example, the following phrase contains two types of entities. Note that entities may span several words.


Data and Evaluation

Training data will consist of 3,000 annotated French tweets with 12 types of entities in CoNLL format. Test data will also comprise 3,000 French tweets. The participants are free to use any type of external data in order to improve their systems. Systems will be judged on F1-score.


To register fill the form: here.
Attention, in order to obtain the data, you first have to fill the form.

Important Dates

The challenge will be as follows: