Automatic Identification of Off-Topic Regions of Conversation
A collection of recorded and transcribed telephone conversations clearly demonstrates the universality of small talk and other socially-motivated utterances. Building on theories about the linguistics of conversational speech, I consider various ways of describing each utterance, including which words were used, their part-of-speech, and the proximity to the beginning of the conversation. In order to better understand which of these features are most useful, I create a system for automatically distinguishing between on- and off-topic utterances and compare its performance when using different combinations of these features. The central hypothesis is that conversational speech contains sufficient low-level clues to separate on- and off-topic utterances with an automatic classifier. I find that the overall structure of conversations is predictable, and automatic classification can indeed be done with better-than-chance accuracy. But distinguishing more reliably between on- and off-topic utterances will probably require deeper knowledge of the context and overall topic.
Download Thesis (pdf)