Creating and Translating Caption Files using Watson Speech to Text and Globalization Pipeline on IBM Bluemix
In this post I am going to show how you can use a set of utilities that I have put together to automatically create a SubRip .srt file from a .mp4 video using Watson Speech to Text and then translate the SubRip file into other languages using the Globalization Pipeline service on Bluemix. Here is a link to the utilities out on GitHub.
The first thing you will need to do is create a Bluemix account and create instances for both Watson Speech to Text and Globalization pipeline services. Once you have your service instances created copy the credentials for the services and place them in the credentials files in the cloned GitHub repo.
To create a SubRip file all you need to do is to call the subtitler utility and it will automatically create a SubRip file for you. I have also included a utility called segmenter that will attempt to take the raw captured text from subtitler and add proper English punctuation. When you are ready to translate your SubRip file into other languages simply use the translator utility.
There are a few things to consider when using these utilities:
I have tested the utilities with videos up to about 20 minutes in length. I will be uploading a YouTube video later that will walk you through all the steps, but this post should be enough to get you started.