Skip to content

Speech-to-text

GCP TTS Logo

cxcli has some commands that allows you to interact with Google Cloud Text to Speech service using the Cloud Speech-to-text API!

Is this your first time using this feature?

Before you start using this functionality, please, read the authentication page.

Usage

You can find the speech-to-text commands usage down the cxcli stt command. You can read the documentation about this command here.

The cxcli stt root command has the recognize command. You can find the usage of this command here.

Parameters

These are the relevant parameters that you can use to interact with Google Cloud stt:

  1. locale: the locale accepts all the locales accepted by the Google Cloud Speech-to-text API. You can find all the locales available here

Audio input file

It is important to know that the input has to have this format:

  1. A Sample Rate Hertz of 16000Hz
  2. The audio encoding has to be be Linear16. Linear16 is a 16-bit linear pulse-code modulation (PCM) encoding.

If you don't have a file with this format, you can create it by yourself using the cxcli tts command! All the information is located here

Example

This a simple example of the cxcli stt recognize command:

cxcli stt recognize hi.mp3  --locale en-US

The command above will give you an audio file like this one:

$ cxcli stt recognize hi.mp3 --locale en-US --verbose
INFO Duration time: 570 miliseconds               
INFO Detections: 1                                
INFO 1. Text detected: hi                         
INFO 1. Confidence: 79.276474%                     

are you running this command in a CICD pipeline?

If this is the case, we recommend you to execute with the --output-format parameter set to json.