When I am converting videos from one format to another, I often need to convert the susbtitles as well. I wrote a script to extract (and convert PGS to VobSub). This is a bash script that I am running on my Mac. If you want to run it on linux, just update the PATHs to the commands as necessary. The major tools/dependencies you have to have installed that aren’t available by default are ffprobe, mkvextract (part of the mkvtoolnix suite), java, and BDSup2Sub.
The script takes one parameter; the filename of the mkv.
If the filename has spaces, the subtitle files that are extracted will have underscores in place of the spaces. It makes the script a lot easier to code.
If PGS (.sup) subtitles are extracted, it will automatically convert them to VobSub (SUB/IDX) files for embedding in MP4 containers.
#!/bin/bash
trap 'rm -f /tmp/*.$$ ; exit 0' 0 1 2 3 13 15
ffprobe "$1" 2>&1 | grep Stream | grep "Subtitle:" | while read line
do
tracknum=$( echo "${line}" | cut -f 2 -d ':' | egrep -o "^[0-9]+" )
lang=$( echo "${line}" | cut -f 2 -d ':' | cut -f 2- -d '(' | tr -d ')' )
type=$( echo "${line}" | awk '{ print $4;}' )
case "${type}"
in
"hdmv_pgs_subtitle" | "hdmv_pgs_subtitle,")
ext="sup"
;;
"subrip")
ext="srt"
;;
"dvd_subtitle" | "dvd_subtitle,")
ext="sub"
;;
"ass")
ext="ass"
;;
"ssa")
ext="ssa"
;;
*)
ext="unknown_type"
;;
esac
echo "${tracknum}:$( basename "$1" | tr ' ' '_' ).${tracknum}.${lang}.${ext}"
# echo "${tracknum}:$( basename "$1" ).${tracknum}.${lang}.${ext}"
done > /tmp/extractsubs.$$
# For the life of me I can't get it to work with spaces in the names, so I just convert
# spaces to underscores, and it works. Maybe I'll use another character, then convert it
# back when it's done.
/Applications/MKVToolNix-*.0.0.app/Contents/MacOS/mkvextract "$1" tracks $( cat /tmp/extractsubs.$$ )
cat /tmp/extractsubs.$$ | egrep ".sup$" | cut -f 2- -d ':' | while read line
do
java -jar /Applications/BDSup2Sub512.jar -o "$( basename "${line}" .sup ).sub" "${line}"
rm "${line}"
done