Have you ever wondered if you could “feel” what it’s like to revisit your favorite vacation spot while sitting on a couch in your living room? How about walk through a restaurant overlooking the water’s edge as you enjoy a savory dish, while still sitting on that couch?
If you follow trends in consumer electronics, you probably imagined a virtual reality (VR) headset that uses a visual interface to simulate the ambiance of the restaurant as you use a voice interface to scroll through the restaurant’s menu. While the tech world has made great progress in evolving the visual and voice interfaces of VR, immersive virtual food-tasting also requires a digital interface that supports a sense of smell and taste. In fact, the National University of Singapore is conducting research on the topic (see the video), and Project Nourished claims to enable “eating and drinking in a whole new way – by hacking vision, gustation, olfaction, audition and touch – with or without caloric intake” – through VR.
We’ve already come a long way in our quest to replicate human senses such as touch, vision (via biometric authentication) and voice to build user interfaces to interact with the digital devices around us. In fact, every invention that permanently changed the consumer electronics landscape in the last few decades has in turn brought to life one of these user interfaces (UI). For example, smartphones proliferated touch, video games such as Nintendo Wii and Sony Xbox brought gesture, and most recently, smart speakers and VR headsets have increased the adaptation of voice and vision.
Complexities of UI Design
UI design is a complicated task that builds upon years of research in neuroscience, cognitive thinking and engineering. It must also account for individuality because users interact differently with their digital devices. Some, like me, use their left hand predominantly when interacting with a gaming console. Some have a heavy accent, which can make speech recognition difficult, while those with a hearing disability may prefer touch over voice as a user interface. Application, context and proximity of the device to the user also affect UI. For example, a user interacting with a smartphone at home has the option to touch or speak to the device whereas voice is the safest means to communicate with a car’s infotainment system while driving.
Consumers often bring their digital devices wherever they go, but still expect a consistent user experience. Therefore, a natural user experience is the key to UI adoption. A multi-sensory approach combining voice, vision and/or touch could prove the most practical solution. For example, if I were to access my account at a bank ATM, I would prefer visual- or touch-interface authentication for security reasons, but I would still want to use a hands-free voice interaction to switch between the different menus on the machine. In this case, a combination of UIs could provide a more natural multi-sensory experience, albeit one that needs a careful design.
UI technology development and adoption are largely influenced by the top four players in the consumer electronics industry – Apple, Amazon, Google and Samsung. Apple pioneered the touch interface with the invention of keyboard-less smartphones, and the rest of the industry followed suit. The introduction of Google Glass kickstarted the VR/augmented reality (AR) segment and opened new applications in the gaming and multimedia entertainment segments. While VR headsets work for gaming – and more recently for selling products and experiences – they are large and cumbersome devices that are uncomfortable to wear for extended periods. These are major hurdles for designers to solve. Voice, on the other hand, offers a hands-free user interface that is a more natural and frictionless compared to alternative UIs.
A voice UI needs nothing but a voice command to interact with digital devices. However, it comes with its own complexity of varying user speech characteristics such as accent or volume. More importantly, the need to suppress various background noises for efficient use of voice UIs is critical. While edge computing and/or cloud-based artificial intelligence (AI) are critical technologies to enable battery life and performance of smart home devices, the overarching goal of conversational AI is still far from reality.
From a business standpoint, the winner of the race for voice UIs must improve AI capabilities while supporting a strong ecosystem of partners. Amazon, for example, is king of this strategy. The e-commerce giant is building an Alexa Voice Service (AVS) ecosystem by way of its Alexa Fund companies and third-party integrations (partners) to realize its goal of proliferating voice everywhere. These partnerships enable the ecosystem to build end-to-end speech systems that can literally take voice interface products everywhere and promote, among other things, hardware startups that are disrupting the MEMS market with products such as environmentally robust piezoelectric microphones.
Energy harvesting near-zero-power always-listening microphones, used in partnership with the AVS ecosystem, are enabling voice UI products to expand into battery-operated applications such as hands-free TV remotes, smart garbage cans, Bluetooth speakers, headsets, hardware appliances and automobiles. A good example of a unique voice UI launched at CES 2019: Housewares designer simplehuman’s voice-activated smart garbage can uses Vesper microphones and AVS. Watch the video.
While the future might bring additional digital interfaces, along with multisensory experiences using vision, gesture and touch, voice UI is at the forefront of current technological innovation. Soon, Alexa might help cook dinner without intervention, even turning off the stove when food is burning through the use of a scent-detection sensor integrated with a microphone array. Voice UI continues to astound us with its possibilities, and we’re excited for the journey ahead.
With more than 12 years of experience working in speech and voice applications for wireless devices, Udaynag Pisipati is a senior field applications engineer at Vesper. He holds a master’s degree in electrical engineering from University of Missouri and an MBA from Santa Clara University. A firm believer in speech as a natural user interface for human-machine interaction, Pisipati’s areas of interest include everything related to speech processing, including microphones/speakers, signal processing and machine learning.
Vesper is a member of MEMS & Sensors Industry Group (MSIG), SEMI technology community.