How MEMS Microphones Aid Sound Detection and Keyword Recognition in Voice-Activated Designs

By Majeed Ahmad

Contributed By DigiKey's North American Editors

2020-04-23

As users become more reliant upon voice as a user interface, designers are being challenged to implement the most accurate and reliable voice user interfaces (VUIs) at the lowest possible power consumption and response time, while also meeting tighter space and cost budgets and ever-shortening design schedules. To help designers meet these objectives, several vendors have introduced advanced microelectromechanical systems (MEMS) microphones with performance characteristics that are conducive to robust wake word detection and processing of voice commands for VUIs.

MEMS microphones—also known as silicon microphones—are already popular in smartphones, smartwatches, wireless earbuds, cars, and smart TVs, as well as remote controls. This is in large part due to the success of voice-based personal assistants such as Amazon’s Alexa, Google Assistant, and Apple’s Siri. These assistants listen for specific voice commands and extract them from the surroundings using wake word detection algorithms. The trick for designers is to implement this extraction function quickly and cost-effectively, while also improving reliability, accuracy, and far-field voice capture despite ambient noise.

This article discusses key MEMS microphone characteristics that affect a VUI design, including signal-to-noise ratio (SNR), dynamic range, sensitivity, and startup time. It then introduces hardware and software solutions from TDK InvenSense, CUI Devices, STMicroelectronics, and Vesper Technologies, and shows how to apply them in voice-activated designs.

How MEMS microphones work

MEMS microphones typically comprise two components in a single package: a MEMS membrane that converts sound waves into an electrical signal, and an amplifier that functions as an impedance converter to provide a usable analog output to the audio signal chain. A third component, an analog-to-digital converter (ADC), can also be integrated on the same die if a digital output is required.

Diagram of basic construction of a MEMS microphone Figure 1: The basic construction of a MEMS microphone showing its two key building blocks: the MEMS transducer and the signal processing chain (in the ASIC). (Image source: CUI Devices)

Along with enabling miniature microphones with either analog or digital outputs, MEMS technology also offers good performance in terms of phase matching and drift.

Key MEMS microphone characteristics

For designers of voice-controlled devices, key parameters to look for in a MEMS microphone include:

Signal-to-noise ratio (SNR): This is the ratio of a reference signal level to the noise level of the microphone output signal. SNR measurements include noise contributed by both the microphone element and any other devices, such as ICs, incorporated into the MEMS microphone package.
Sensitivity: The analog or digital output value in response to a 1 kilohertz (kHz) sinewave with a sound pressure level (SPL) of 94 decibels (dB) or 1 Pascal (Pa), a measurement of pressure.
Sensitivity tolerance: The range of sensitivity for any given individual microphone. A tight sensitivity tolerance ensures consistency when multiple microphones are used.
Dynamic range: A measure of the difference between the loudest and quietest SPLs over which the microphone responds linearly.
Frequency response: The audio range over which a microphone can operate.
Startup time: How quickly a microphone wakes and outputs a valid signal in response to a trigger event.

Voice-controlled devices such as remote controls, TVs, and smart speakers often operate with high ambient noise. Also, a user may be nearby or at distances of one to 10 meters (m) in far-field operations. These circumstances are what make a microphone’s dynamic range, sensitivity, and SNR so important. In applications where multiple microphones are to be used in an array, the sensitivity tolerance becomes critical.

While each microphone may be specified to have a certain sensitivity level, minute structural changes can cause variations. However, as MEMS microphones are developed using tightly controlled semiconductor manufacturing processes, they offer the tightly matched sensitivity tolerances needed for effective signal processing of an array of microphones (Figure 2).

Diagram of microphones used in an array Figure 2: Microphones used in an array must be tightly matched to accomplish the desired signal processing performance. (Image source: CUI Devices)

This tight tolerance is critical amid increasing adoption of microphone arrays in VUI-enabled designs. In a microphone array, two or more microphones are used to collect signals, and then the signal from each microphone is processed individually—amplified, delayed, or filtered—before the signals are combined to form the resultant signal. In microphone arrays, the multiple inputs can be employed to create a directional response, also known as beamforming, to filter out unwanted noise while focusing on sound from a more desired direction.

A MEMS microphone’s startup time is also critical with respect to capturing entire keywords and ensuring keyword accuracy. To conserve power, VUI-enabled devices are kept in a low-power state; but if the microphone’s startup time in response to a wake-up trigger is short, it impacts the VUI wakeup time, which in turn, affects the wake word detection performance as well as power consumption.

Once a microphone is chosen with these characteristics in mind, subsequent voice processing algorithms can better perform user voice extraction in the face of high ambient noise, or users speaking at a distance, or both.

Analog versus digital MEMS microphone interfaces

As alluded to in the section on how MEMS microphones work, the output from a MEMS microphone can be either analog or digital. Analog MEMS microphones use an internal amplifier to drive the output signal of the microphone to a reasonably high level with low output impedance. This provides a straightforward interface to the audio processor. For VUIs, the designer needs to make sure the associated processor has an on-board ADC, or the designer can select an ADC to meet their specific requirements. In the case of the latter, this can add complexity and cost.

With a digital MEMS microphone, the microphone output can be applied directly to a digital circuit, typically a microcontroller or a digital signal processor (DSP). VUI designs for electrically noisy environments tend to favor digital microphones because digital output signals have greater noise immunity compared to analog output signals.

In addition, digital MEMS microphones commonly employ pulse density modulation (PDM) to convert the analog signal voltage into a single-bit digital stream that contains a corresponding density of logic high signals. This provides further immunity to radio frequency interference (RFI) and electromagnetic interference (EMI). This is particularly important in large microphone arrays and physically large systems such as voice-enabled vehicle infotainment systems.

With respect to sensitivity, for analog microphones, it is measured in sound pressure level in decibels referenced to 1 volt (dB/V). For digital microphones, it is typically measured as decibels with respect to full scale (dB FS).

MEMS microphone solutions for VUIs

The ICS-40740 analog MEMS microphone from TDK InvenSense addresses many critical microphone performance requirements for VUI applications. It comprises a MEMS microphone element, an impedance converter, and a differential output amplifier in a small 4.00 x 3.00 x 1.20 millimeter (mm) surface mount package. It operates from a 1.5 volt supply that draws only 165 microamperes (µA) while in operation (Figure 3).

Image of InvenSense ICS-40740 analog MEMS microphone Figure 3: The ICS-40740 analog MEMS microphone fits both the size and power budget of smart speakers and wearable devices such as noise-canceling headsets. (Image source: TDK InvenSense)

It has an SNR of 70 dBA (A-weighted decibels) and couples this with a wide dynamic range of 108.5 dB, allowing voices to be detected despite high ambient noise, and under far-field conditions. It also has a wide operating frequency response ranging from 80 hertz (Hz) to 20 kHz, a linear response of 132.5 dB, and a sensitivity tolerance of ±1 dB. The latter makes it very useful for microphone arrays.

The ICS-40740’s small footprint and low power usage make it suitable for Internet of Things (IoT) applications built around smart speakers and wearable devices such as noise-canceling headsets.

Vesper Technologies’ VM3000 is an omnidirectional, bottom port digital piezoelectric MEMS microphone that features an ultra-fast startup time of less than 200 microseconds (µs), allowing it to wake up fast enough to capture complete wake words (Figure 4).

Image of Vesper VM3000 piezoelectric digital MEMS microphone Figure 4: The VM3000 piezoelectric digital MEMS microphone features an ultra-fast startup time of less than 200 µs, allowing it to wake up fast enough to capture complete wake words. (Image source: Vesper Technologies)

In a piezoelectric MEMS microphone, when a sound wave hits the piezoelectric cantilever, it moves the cantilever and creates a voltage. That voltage is sensed by a very-low-power comparator circuit, which sends a wake signal to the audio system.

Given that piezoelectric MEMS microphones don’t require a bias voltage, the VM3000 consumes virtually no power until turned on via a wake-word command. Also, it can remain in sleep mode while drawing only 0.35 µA and can switch to performance mode in less than 100 µs. The ultra-low-power sleep mode, combined with fast mode switching, also ensures that no information is lost when the audio device wakes up.

The VM3000 digital microphone can be paired with virtually any audio chip, and its output features multiplexing of two microphones on a single data line. It achieves a typical SNR of 63 dB at a 1 kHz signal and offers an acoustic overload point (AOP) of 122 dB SPL.

The VM3000 comes in a package measuring 3.5 x 2.65 x 1.3 mm and saves on the bill of materials (BOM) by integrating an ADC. Additionally, the VM3000 uses a single-layer piezoelectric crystal, making it immune to sensitivity drifts and protecting it against dust, water, moisture, and other environmental particles.

Piezoelectric MEMS microphones like the VM3000 also simplify the audio design for arrays by avoiding the need for a protective mesh or membrane to cover multiple microphones. Such a mesh or membrane, which is typically attached to the acoustic port as a protective element against environmental contaminations, can lead to a drop in sensitivity of the MEMS microphone.

The VM3000 is also relatively easy to implement in that it can connect directly to a CODEC or other processor (Figure 5). The master system (CODEC, etc.) provides the master clock, CLK, which defines the rate at which the bits are transmitted on the DATA line.

Diagram of Vesper VM3000 can connect directly to an external processor Figure 5: The VM3000 can connect directly to an external processor and can connect two microphones to a single DATA line. (Image source: Vesper Technologies)

Interestingly, two microphones can be connected over a single DATA line. This is because the data is set on the rising or falling edge of the clock (CLK), defined by the L/R Select pin, with L/R Select = GND (top) setting data on the falling edge, and L/R Select = VDD (bottom) setting data on the rising edge. The CODEC or processor can then separate the bitstreams based on their alignment with the CLK edges.

Getting started: MEMS microphone evaluation kits

To evaluate key parameters and simplify the design of audio systems using MEMS microphones, suppliers provide reference boards and software development kits. For example, Vesper offers the S-VM3000-C evaluation board that comprises a VM3000 digital MEMS microphone and a 0.1 microfarad (µF) power supply bypass capacitor, along with an edge connector.

Likewise, for its ICS-40740 analog MEMS sensors, TDK InvenSense provides the EV_ICS-40740-FX evaluation board that allows designers to analyze the performance of differential analog output microphones quickly and efficiently. Apart from the MEMS microphone, the only other component that this development kit has is a 0.1 µF supply bypass capacitor.

CUI Devices, which offers both analog and digital MEMS microphones, provides the DEVKIT-MEMS-001 development kit for design prototyping and testing (Figure 6). This evaluation board features four independent microphone evaluation circuits.

Image of CUI Devices DEVKIT-MEMS-001 development board Figure 6: The DEVKIT-MEMS-001 features four detachable microphone evaluation circuits: two for analog outputs and two for digital outputs. (Image source: CUI Devices)

The board has two analog MEMS microphones: the bottom port CMM-2718AB-38308-TR and the top port CMM-2718AT-42308-TR; and two digital MEMS microphones, the bottom port CMM-4030DB-26354-TR and the top port CMM-4030DT-26354-TR. The top and bottom sound port options are available for both analog and digital output microphones for design flexibility.

Comparing the two analog devices, the CMM-2718AB-38308-TR has a sensitivity of -38 dB and a SNR of 65 dBA. The CM-2718AT-43208-TR has a sensitivity of -42 dB and a SNR of 60 dBA. Both have a frequency range of 100 Hz to 10 kHz and draw 80 µA from a 2 volt supply rail.

With respect to the two digital microphones, the CMM-4030DB-26354-TR has a sensitivity of -26 dB FS and a SNR of 64 dBA. The CMM-4030DT-26354-TR has a sensitivity of -26 dB FS and a SNR of 65 dBA. Both use a 1-bit PDM data format, operate over the frequency range of 100 Hz to 10 kHz, and draw 0.54 milliamperes (mA) from a 2 volt supply.

Conclusion

A closer look at MEMS microphones—both analog and digital—shows their system-level performance advantages and how they complement always-on voice interface designs. The latest MEMS microphones employ novel technologies to extend battery life, improve far-field audio quality, and withstand environmental contaminants. Improving keyword accuracy is another major design consideration, which is closely tied to parameters such as SNR, sensitivity tolerance, and startup time—all of which are being addressed in the latest devices to better accommodate VUI designs.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of DigiKey or official policies of DigiKey.