Sitting in the dimly lit restaurant, I struggled to read the menu. Lucky for me, I keep readers in my purse, which I quickly pulled out and the problem was solved. But when the waiter came to tell us about the specials, I had trouble hearing him, even though I was wearing my hearing aids and lipreading furiously.
Wouldn’t it be wonderful if I could solve this problem just as easily — reaching into my purse for my captioning glasses — ones that would turn speech into text in real-time and with exceptional accuracy? Interestingly, a graduate student team at Cornell Tech is working on something just like this.
The product is in its early stages, but it is exciting to see hearing loss taking center stage in this innovative and important technology project. Part of the reason is due to the project’s leader, Christopher Caulfield, a second-year graduate student at Cornell Tech who is deaf.
Diagnosed with profound hearing loss at thirteen months, Christopher received a cochlear implant at 18 months and has attended mainstream school ever since. His academic and professional accomplishments are numerous and inspiring and he has decided to focus his career on technology, with an emphasis on how technology can be used to enhance accessibility. I imagine this is only the first of many projects he will lead throughout his career.
Captioned Glasses Prototype In Development
The student team describes the product as “an application that could improve deaf and hard of hearing people’s experiences in one-on-one conversations. The app will use augmented reality and automatic speech recognition to display captions of what your conversation partner is saying.” The captions would be seen by wearing special glasses that would impose the captions near the speaker’s face so you could maintain eye contact like in a typical conversation while benefiting from the captions. Wow!
The team recently conducted interviews with potential end users for feedback on their current prototype, specifically the look and feel of the captions themselves. I attended one a few weeks ago where I was asked to evaluate the placement of the captions relative to the person’s face, the color and sizing of the captions and a number of special effects like shading or bolding that were used to demonstrate emphasis or high emotion.
In the exercise, the captions were displayed on a screen while a video played, so it was different from what I imagine the end product could look like and there was no issue with the caption accuracy since they were pre-programmed. I enjoyed having the captions since the video was purposefully set on a very low volume. I don’t think I would have caught much of the content without them.
I use captions frequently when watching TV, attending the movies, and whenever else they are available, but these captions were different. The size of the words varied to indicate the emphasis the speaker placed on each word. Sometimes the words were also shaded red or green to indicate the mood or affect of the speaker. There was a lot more information contained in the captions than is typical. It was an interesting concept, but took some practice to fully appreciate.
The work is ongoing as much still needs to be determined in terms of the end product configuration and the speech to text technology itself. I applaud the team for involving the potential end users in the early stages of the design and throughout the process. This will go a long way towards making the final product a success.
Readers, would you use captioning glasses?