Proceedings: GI 2010

Multi-modal text entry and selection on a mobile device

David Dearman, Amy Karlson, Brian Meyers, Ben Bederson

Proceedings of Graphics Interface 2010: Ottawa, Ontario, Canada, 31 May - 2 June 2010, 19-26

  • BibTex

    author = {Dearman, David and Karlson, Amy and Meyers, Brian and Bederson, Ben},
    title = {Multi-modal text entry and selection on a mobile device},
    booktitle = {Proceedings of Graphics Interface 2010},
    series = {GI 2010},
    year = {2010},
    issn = {0713-5424},
    isbn = {978-1-56881-712-5},
    location = {Ottawa, Ontario, Canada},
    pages = {19--26},
    numpages = {8},
    publisher = {Canadian Human-Computer Communications Society},
    address = {Toronto, Ontario, Canada},


Rich text tasks are increasingly common on mobile devices, requiring the user to interleave typing and selection to produce the text and formatting she desires. However, mobile devices are a rich input space where input does not need to be limited to a keyboard and touch. In this paper, we present two complementary studies evaluating four different input modalities to perform selection in support of text entry on a mobile device. The modalities are: screen touch (Touch), device tilt (Tilt), voice recognition (Speech), and foot tap (Foot). The results show that Tilt is the fastest method for making a selection, but that Touch allows for the highest overall text throughput. The Tilt and Foot methods---although fast---resulted in users performing and subsequently correcting a high number of text entry errors, whereas the number of errors for Touch is significantly lower. Users experienced significant difficulty when using Tilt and Foot in coordinating the format selections in parallel with the text entry. This difficulty resulted in more errors and therefore lower text throughput. Touching the screen to perform a selection is slower than tilting the device or tapping the foot, but the action of moving the fingers off the keyboard to make a selection ensured high precision when interleaving selection and text entry. Additionally, mobile devices offer a breadth of promising rich input methods that need to be careful studied in situ when deciding if each is appropriate to support a given task; it is not sufficient to study the modalities independent of a natural task.