How To Use 
Register an account

Try For Free Now

Dark

Light

Announcement: End of support for old file type in File Translation (08/24)

Announcemnet: System Maintenance (Update: September 17)

Announcement: End of support for old file type in File Translation (08/24)

Speech-to-Text (Streaming)
Speech-to-Text (Streaming) (Beta)
Extract transcript from an audio stream. This endpoint must be connected over WebSocket.
Endpoint
Request details
wss://translate.rozetta-api.io/api/v1/translate/stt-streaming
Header
Header
Description
accessKey, nonce, signature
Please refer to the authentication section.
Command
Command type
Description
SET_LANGUAGE
Sets speech language. Acceptable language codes are "en" (English), "ja" (Japanese), "zh-CN" (Mandarin Chinese), "zh-HK" (Cantonese), and "zh-TW" (Chinese).
SET_SAMPLING_RATE
Sets audio stream sampling rate. The recommand value is 16000.
END_STREAM
Signifies the end of audio stream. No values required.
END_SESSION
Signifies the end of speech recognition session. No values required. Once the session is ended, the WebSocket connection is closed.
Command example (set speech language)
{
  "command": "SET_LANGUAGE",
  "value": "ja"
}
Command example (signify end of stream)
{
  "command": "END_STREAM"
}
Audio stream
Audio stream in binary form should be sent over the WebSocket connection. The audio stream must have exactly 1 channel, 16 bit per second, and in WAV format.
Response
Response type
Description
LANGUAGE_READY
The speech language has been set successfully.
SAMPLING_RATE_READY
The audio stream sampling rate has been set successfully.
RECOGNITION_RESULT
The recognized transcript from the audio stream. This type of response may be sent multiple times during the transfer of the audio stream.
RECOGNITION_ERROR
The error message from the speech recognizer. This type of response may be sent multiple times.
Response example (language is set)
{
  "type": "LANGUAGE_READY"
}
Response example (recognition result)
{
  "type": "RECOGNITION_RESULT",
  "value": "そこに着いたらもう一度誰かに尋ねてください"
}

api/v1/translate/stt-streaming
Send audio stream for speech recognition on-the-fly.
const fs = require('fs');
const WebSocket = require('ws');

const path = require('path');

const authUtils = require('./utils/auth-utils');
const fsUtils = require('./utils/fs-utils');

const fsPromise = fs.promises;

const apiPath = '/api/v1/translate/stt-streaming';
const apiEndpoint = `wss://translate.rozetta-api.io${apiPath}`;
const authConfig = {
  accessKey: 'ACCESS_KEY',
  secretKey: 'SECRET_KEY',
  nonce: Date.now().toString(),
  contractId: 'CONTRACT_ID',
};
const speechData = {
  language: 'ja',
  samplingRate: 16000,
  audioFile: 'speech.wav',
  audioBuffer: null,
};

/**
* Command type sent from the client.
*/
const commandType = {
  setLanguage: 'SET_LANGUAGE',
  setSamplingRate: 'SET_SAMPLING_RATE',
  endStream: 'END_STREAM',
  endSession: 'END_SESSION',
};

/**
* Response types received from API endpoint.
*/
const responseType = {
  languageReady: 'LANGUAGE_READY',
  samplingRateReady: 'SAMPLING_RATE_READY',
  recognitionResult: 'RECOGNITION_RESULT',
  recognitionError: 'RECOGNITION_ERROR',
};

const getAuth = (url) => {
  const nonce = Date.now().toString();
  return {
      accessKey: authConfig.accessKey,
      nonce: nonce,
      signature: generateSignature(url, authConfig.secretKey, nonce),
      remoteurl: url,
      contractId: authConfig.contractId
  }
}

const handleSessionMessage = (connection, message) => {
  const messageJSON = JSON.parse(message);
  switch (messageJSON.type) {
    case responseType.languageReady:
      // The language is set. Set the sampling rate.
      console.log('Language is set. Set sampling rate.');
      connection.send(JSON.stringify({
        command: commandType.setSamplingRate,
        value: speechData.samplingRate,
      }));
      break;
    case responseType.samplingRateReady:
      // The language is set. Send the audio data stream.
      console.log('Sampling rate is set. Send audio data stream.');
      connection.send(speechData.audioBuffer);
      connection.send(JSON.stringify({
        command: commandType.endStream,
      }));
      break;
    case responseType.recognitionResult:
      console.log('Recognized transcript:');
      console.log(messageJSON.value);
      break;
    case responseType.recognitionError:
      console.error('Recognition error:');
      console.error(messageJSON.value);
      // In case of error, we close the connection immediately.
      connection.send(JSON.stringify({
        command: commandType.endSession,
      }));
      break;
    default:
      console.log('Unexpected response type:');
      console.log(messageJSON.type);
  }
};

const main = async () => {
  speechData.audioBuffer = await fsPromise.readFile(speechData.audioFile);
  const auth = getAuth(apiPath);
  console.log(apiPath);
  console.log(auth);
  const auth64 = btoa(JSON.stringify(auth));
  const url = `${apiEndpoint}?auth=${auth64}`
  console.log(url);
  const connection = new WebSocket(url);
  connection.on('open', () => {
    console.log('Connected to streaming STT API.');
    // Once connected, set the speech language.
    connection.send(JSON.stringify({
      command: commandType.setLanguage,
      value: speechData.language,
    }));
  });
  connection.on('message', (message) => {
    handleSessionMessage(connection, message);
  });
  connection.on('error', (error) => {
    console.error(error.message);
    connection.close();
  });
  connection.on('close', () => {
    console.log('Connection closed.');
  });
};

main();
About the authentication, please refer to the authentication section.
About full sample code of each languages, please refer tohere
©️ 2019 Rozetta API  ・  Powered by Rozetta

Rozetta Corp.

^