Projects

CoSy

FP6 Integrated Project called CoSy. My research on this project focused on reference resolution and generation for an robotic dialog system. In particular, constructing, updating and integrating multimodal context models and handling spatial references in visually situated discourse.
Papers describing the work include: kelleher/kruijff:06, kelleher/etal:06, kruijff/etal:06, costello/kelleher:06. Here are some videos illustraing the functionality of the system: video1 and video2.

CommonSense T9

The Common Sense T9 project used statistical computational lingusitic techniques to improve text entry with ambiguous keyboards (i.e., the type of keyboard you have on your mobile phone). Many mobile devices use a ambiguous keyboard (i.e. keyboards where each key represents several letters). The standard text entry method on these devices is called T9. The T9 algorithm uses a dictionary to guess which word is intended by a particular sequence of taps, based on the frequency of words in a context free corpus of natural language. For example, because the word "act" in English is more common that the words "cat" or "bat", the T9 algorithm guesses that the key sequence "228" probably means "act".
When T9 guess wrong, the user is forced to scroll through alternative guess or spell their preference using another method. T9 is a clever algorithm whcih takes advantage of the regularity of language, but is completely ignores the context and content of the message.
The text prediction algorithm developed in the Common Sense T9 project uses common sense knowledge to make better guesses based on the context and content of the text message being composed. It uses two different knowledge bases (background) and the words of the message (context) to make a better guess as to which word is meant by a particular key sequence. And when it is wrong, it orders the alternative meanings based on the same context and knowledge.
A paper descibing the project was presented at the Stairs Track at the Europen Conference on Artifical Intelligence (ECAI'04): Context-Sensitive Word Selection for Single-Tap Text Entry.

CommonSight IQ

The CommonSight project investigated different approaches to visual analogy. The CommonSight IQ test demo solves visual analogy problems similar to the one shown in the picture on the left. The question the system tries to solve is: A is to B as C is to ?.
The core concept used to solve these visual analogy problems is a relationship-based structure alignment. The shapes in image A are analocially mapped to corresponding shapes in image C based on their relationships with surrounding shapes (e.g. whether they enclose or are beside other shapes). This technique allows shapes to be aligned even though they do not have similar features (e.g. colour or shape). Once shapes have been aligned the program compares the way the shaptes in B are arranged to each of the possible solution images, looking for the closest match containing the correct shape.
The program demonstrates how relationship-base structure alignment facilitates reasoning when solving problems involving components with disparate features. Such a technique is useful when reasoning both across and within domains.
A paper descibing the project was presented at the Stairs Track at the Europen Conference on Artifical Intelligence (ECAI'04): Analogy by Alignment: On Structure Mapping and Similarity.

Situated Language Interpreter (SLI)

The focus of my Ph.D was designing, implementing and testing a semantic framework to underpin natural language virtual reality (NLVR) systems. An NLVR system allows a user to interact with a simulated 3D environment using natural language. The Situated Language Interpreter system was the result of this research.
The SLI system used the situated dialog framework developed in my thesis; the central tenet of the framework is the grounding of spatial semantics in visual perception, modelled as visual salience. The system could handle anaphoric and deictic referring expressions (including: definite and indefinite descriptions, demonstrative accompanied by deictic gestures, pronouns, other anaphoric expressions, one anaphora, locative expressions, and coordinating expressions). There are three main components within the framework:

• a model of visual salience,
• a semantic model for locative expressions,
• a discourse model.

A chapter from from my thesis describing the SLI system is accessible from here. There is also a short video of the system available from here. The video is a self extracting exe, you may have to turn up the sound on your computer to hear the audio.
More recently I have extended the the system to enable it to generate references and scene descriptions. With this extension I have also renamed the system Live, for Linguistic Interaction with Modelled Environments, to indicate that it can both generate and interpret langauge. A paper describing the algorithm used by the system to generate references was presented at the Florida Artificial Intellgience conference (Flairs'04): Exploiting Visual Salience for the Generation of Referring Expressions.

SONAS

The SONAs system focused on the development of a multi-modal interface. The system's interface offered a combination of gesture (using a 3-D data glove) and natural language as input modalities. There were two visual displays in the user interface: a 2D-overview map of the world and a 3D viewer which displays the location in the world where the viewe is currently situated. The user could position themselves in the world using natural language commands and/or data glove gestures, and by mouse input to the 2D overview map. The result of these input was displayed as a scene change in the 3D environment. If the user wished to interact directly with any object that they came across in the world, they could do so using the data glove or natural language. The images below are screen shots or the systems 3D simulation. You can see the graphical representation of the data glove in the foreground.