Sound plays an important role in both communication and contextual awareness of interesting events and information. Yet this information may not be easily accessible to people who are deaf or hard of hearing. The home environment can be modified to improve awareness with alerting systems for phones, doorbells, alarms or a baby's crying.1 However, other areas encountered in daily life—public spaces, stores, restaurants, streets, airports and so on—do not always provide adequate support for relaying sound-based information to the deaf. Therefore, mobile support would be very useful. Technology has thus far focused on mobile and stationary communication such as video relay on a personal digital assistant (PDA) or PC and nonmobile sound-awareness systems like phone-ring flashers. However, past work lacked a nuanced understanding of what speech and nonspeech sound awareness people value in daily life.
To address this gap, we designed and evaluated a mobile sound-transcription tool for the deaf and hard of hearing, Scribe4Me.2 It provides descriptions and transcriptions of recent speech and sounds at the press of a button. As illustrated in Figure 1, Scribe4Me runs on a PDA and records 30s of audio continuously. If the user wants more information about what happened in the last 30s, she can push a button on the screen labelled ‘what happened?’ For each request, the system sends 30s of audio via general-packet radio service (GPRS) to a human transcriber, who returns a text message describing the audio content. Scribe4Me is unique in handling both speech and nonspeech audio in mobile environments. It is also the only speech-to-text system for the deaf that is robust enough to be deployed in an unconstrained setting.
System diagram showing the request process.
We conducted a two-week field study of Scribe4Me to explore the potential uses of, and issues surrounding, a sound-transcription tool. We recruited six users living in two geographically distant, urban regions: two profoundly deaf (perceiving no sounds), one almost profoundly deaf (detecting only very loud sounds with little comprehension) and three hard of hearing, with the help of hearing aids or cochlear implants (highly variable hearing levels). Four participants were female and the participants' ages ranged from 25 to 51 years. They used Scribe4Me seven days a week during daylight hours. During the course of the study, we collected data from multiple sources: four interviews with each participant, the Scribe4Me requests and daily emails with a series of questions about each request (such as why it was made and how useful it was).
Overall, our study showed that participants were enthusiastic about Scribe4Me (see Figure 2). All but one found at least one valuable use for the tool. All said that they would continue to use Scribe4Me if they could install it on their mobile device. The major issue raised by all participants was the 3–5min delay between requesting and receiving a transcription. For three people, this severely limited the scenarios in which the system was useful, since they wanted a real-time communication aid. The other three found it valuable in a number of situations, even with the delay. For instance, one participant said, “It is a great idea. There are things that I'm curious about on a day-to-day basis. I don't mind waiting for the translation”.
Field-study users found the tool valuable in many situations, including at grocery stores, in group conversations, in airports and moving about the city.
Participants used the tool at home, work, restaurants, a drive-through, their own car, airports, business conferences, the grocery store, walking outside, riding public transportation, an animal shelter, church, group meetings, coffee shops, the post office and other service-oriented businesses. They found it useful for transcribing many types of sounds. For instance, participants made 13 requests to transcribe public speech (like others talking), 33 for semipublic speech (like class lectures, workplace chatter and church sermons) and 20 for conversations the user was engaged in. Other common uses were for announcements in public places like an airport or deli (14 instances), machines with voices, such as TV, radio or grocery self-checkout machines (16 instances) and environmental noise like traffic or home appliances (15 instances).
Our study of Scribe4Me identifies future areas of research. More work is needed to reduce the response delay and improve the recorded audio quality. While it is possible to use humans for transcription in a full system, automated translation of nonspeech audio may be an option. Subsequent versions need to encourage informed consent when audio recording other people. Lastly, we are interested in using what we learned for new remote-transcription applications. For example, travelers could use a similar tool to translate different languages in settings where existing automated translation fails.