Partner

Blog

HelpDesk

Audio Transcription-Machine Learning and AI
Audio Transcription

Audio Transcription

Overview

Objective

Building a custom Automated Voice Recognition (AVR) model using machine learning to transcribe audio recordings for applications like medical or legal transcriptions.

PROCESS

Audio Recording

Audio Recording

Conversion to Waveform

Conversion to Waveform

Input into the AVR model

Input into the AVR model

Text Output from the AVR model

Text Output from the AVR model

Quality Check

Quality Check and Editing by human transcriptionist

Final Transcription

Final Transcription

3x to 4x increase in transcription productivity, enabling much lower operational costs and competitive edge over rivals.

Approach: Automated Voice Recognition (AVR) – Modeling Approach

Labelled Raw Voice Data

Training Data
(E.g. 80% Split)

AVR model built using Training Data

Cross Validation Data
(E.g. 10% Split)

AVR model optimized using Cross Validation data

Test Data
(E.g. 10% Split)

AVR model accuracy evaluated on Test data

Raw Voice Data (Model Input)

Raw Voice Data (Model Input)

Random noise may be added to the training data to make the model more robust

Acoustic Model

Acoustic Model

E.g. Deep Bidirectional LSTM RNN trained using Connectionist Temporal Classification)

AVR Model framework
Feature Extraction

Feature Extraction

From Voice frames (e.g. Mel-Frequency Cepstral Coefficients)

Linguistic Model

Linguistic Model

Transcribed Text (Model Output)

Transcribed Text
(Model Output)

Automated Voice Recognition (AVR) – Acoustic Modeling candidates

  • Bidirectional LSTMs
  • Sequence Classification Problem
  • LSTM For Sequence Classification
  • Bidirectional LSTM For Sequence Classification
  • Compare LSTM to Bidirectional LSTM
  • Comparing Bidirectional LSTM Merge Modes
  • Candidate model has 3 layers of BLSTM with 256 nodes in each direction
  • Decoder has 2 LSTM layers with 512 nodes
  • Trained using asynchronous stochastic gradient descent
Automated Voice Recognition (AVR) – Acoustic Modeling candidates