Combining semi-supervised and active learning to recognize minority senses in a new corpus
View/ Open
Date
2015Author
Cardellino, Cristian Adrián
Teruel, Milagro
Alonso i Alemany, Laura
Metadata
Show full item recordAbstract
In this paper we study the impact of combining active learning with bootstrapping to grow a small annotated corpus from a different, unannotated corpus. The intuition underlying our approach is that bootstrapping includes instances that are closer to the generative centers of the data, while discriminative approaches to active learning include instances that are closer to the decision boundaries of classifiers. We build an initial model from the original annotated corpus, which is then iteratively enlarged by including both manually annotated examples and automatically labelled examples as training examples for the following iteration. Examples to be annotated are selected in each iteration by applying active learning techniques. We show that intertwining an active learning component in a bootstrapping approach helps to overcome an initial bias towards a majority class, thus facilitating adaptation of a starting dataset towards the real distribution of a different, unannotated corpus.