Google demonstrated at the event in October that the Pixel’s Recorder app will soon be able to identify multiple individuals. Speaker Labels are currently being implemented.
Google’s excellent Recorder app received Speaker Labels that can identify multiple individuals as part of the Pixel Feature Drop in December. The team behind it has released an explanation of how the feature came to be, just like in previous editions.
Add Realmicentral to your Google News feed.
Turn-to-Diarize, Google’s brand-new speaker diarization system, powers Speaker Labels. It has three main parts that “run completely on the device”:
- Speaker turns detection model that detects a change of speaker in the input speech
- Speaker encoder model that extracts voice characteristics from each speaker turn
- Multi-stage clustering algorithm that annotates speaker labels to each speaker turn in a highly efficient way
Our speaker diarization system leverages several highly optimized machine learning models and algorithms to allow diarizing hours of audio in a real-time streaming fashion with limited computational resources on mobile devices.
The Recorder app’s audio recordings can last as long as up to 18 hours. According to Google, and more audio indicates greater confidence in predicted speaker labels. Thusly, the Recorder will “every so often make redresses to recently anticipated low-certainty speaker names. On the other hand, clients can physically make alters and split the record.
The Pixel 6, 6 Pro, 6a, 7, and 7 Pro all support both the first generation and G2 of the current system, which mostly runs on the Tensor’s CPU. “Working on delegating more computations to the TPU block.
Another future work direction is to leverage the multilingual capabilities of speaker encoder and speech recognition models to expand this feature to more languages.