Training Watson Speech to Text

Posted by steveatkin in Speech to Text, Watson

In this blog post, I am going to quickly explain how you can train Watson Speech to Text to handle misunderstood words. As we will see it is very easy to train Watson Speech to Text. The first thing that we will need to do is create a training file. The typical workflow that I follow is to first let Watson Speech to Text process my audio stream using one of the default models, e.g., broadband or narrowband and then see which words or phrases Watson Speech to Text had trouble recognizing.

Once you have figured out which words Watson had trouble with you then can create the training words to handle those words. The training words are a simple JSON payload that contains an entry for each individual misunderstood word or phrase. In the example below my training words contains entries for two custom words: evolve and explore.

[
    {
        "word": "evolve",
        "sounds_like": ["appaled", "evolve"],
        "display_as": "evolve"
    },
    {
        "word": "explore",
        "sounds_like": ["floor", "explore"],
        "display_as": "explore"
    }
]

You will notice that in the training words I have listed the ways in which these words sound like other words. I have also provided the words that should be displayed when Watson uses these custom word definitions during speech recognition.

The next thing that you will need to do is to create a custom model to upload the training words to. In the example below I am creating a custom model based on the US English broadband model through a simple CURL command. Be sure to fill in your username and password.

curl -X POST -u “{username}”:”{password}”
–header “Content-Type: application/json”
–data “{\”name\”: \”My model\”,
\”base_model_name\”: \”en-US_BroadbandModel\”,
\”description\”: \”My custom language model\”}”
“https://stream.watsonplatform.net/speech-to-text/api/v1/customizations”

Once you have created the custom model you will then add the training words to the custom model. You can do that through a CURL command as well. Be sure to once again put in your username, password, and the customization id from the last CURL command.

curl -X POST -u “{username}”:”{password}”
–header “Content-Type: application/json”
–data “{\”words\”:
[{\”word\”: \”evolve\”, \”sounds_like\”: [\”appaled\”, \”evolve\”], \”display_as\”: \”evolve\”},
{\”word\”: \”explore\”, \”sounds_like\”: [\”floor\”, \”explore\”], \”display_as\”: “\explore\”}]}”
“https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/words”

After you have uploaded your training words you will need to check the status of your model to verify that it is ready to be trained. You do that by issuing the following CURL command. Be sure to supply the username and password.

curl -X GET -u “{username}”:”{password}”
“https://stream.watsonplatform.net/speech-to-text/api/v1/customizations”

After you make this call you need to check the status and verify that it is in the ready state. Once it is in the ready state, then you can proceed to training.

To train Watson Speech to Text with this new customized model. You can easily do this through a CURL command as well. Just like before make sure you provide your username, password, and customization id.

curl -X POST -u “{username}”:”{password}” “https://stream.watsonplatform.net/speech-to-text/api/v1/customizations/{customization_id}/train”

After you have trained Watson Speech to Text you need to once again check the status of your model and verify that it is in the available state before attempting to use the customized model for recognizing speech.

If you want to get more detailed information on the Watson Speech to Text APIs follow this link to the API reference.

‹ next post prev post ›

Leave a Reply Cancel reply

Steven Atkin