Interpreting natural language instructions using language, vision and behavior
View/ Open
Date
2010Author
Benotti, Luciana
Lau, Tessa
Villalba, Martín Federico
Metadata
Show full item recordAbstract
We define the problem of automatic instruction interpretation as follows. Given a natural language instruc-
tion, can we automatically predict what an instruction follower, such as a robot, should do in the environment
to follow that instruction? Previous approaches to automatic instruction interpretation have required either
extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents
a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans inter-
acting in a game-like environment. Our approach uses an automatic annotation phase based on artificial
intelligence planning, for which two different annotation strategies are compared: one based on behavioral
information and the other based on visibility information. The resulting annotations are used as training
data for different automatic classifiers. This algorithm is based on the intuition that the problem of inter-
preting a situated instruction can be cast as a classification problem of choosing among the actions that are
possible in the situation. Classification is done by combining language, vision, and behavior information.
Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on avail-
able English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the
interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.