Previous ] Home ] Up ] Next ]

Automatic linguistic segmentation of conversational speech

Andreas Stolcke and Elizabeth Shriberg

As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level features for segmentation performance. Using only word-level information, we achieve 85% recall and 70% precision on linguistic boundary detection.
Stolcke, A. & E. Shriberg 1996 Automatic linguistic segmentation of conversational speech. In Proceedings of the International Conference on Spoken Language Processing 2: 1005-1008.

Key points relevant to the study of filled pauses



Previous ] Home ] Up ] Next ]

send feedback

This site is maintained by Ralph L. Rose
Last Revised: 99/08/26

Note! This is the original FPRC ca. 1998. It is made available for archival purposes only. Click here to return to the current FPRC.