Extracting Subtitles from MKV Files

When I am converting videos from one format to another, I often need to convert the susbtitles as well. I wrote a script to extract (and convert PGS to VobSub). This is a bash script that I am running on my Mac. If you want to run it on linux, just update the PATHs to the commands as necessary. The major tools/dependencies you have to have installed that aren’t available by default are ffprobe, mkvextract (part of the mkvtoolnix suite), java, and BDSup2Sub.

The script takes one parameter; the filename of the mkv.

If the filename has spaces, the subtitle files that are extracted will have underscores in place of the spaces. It makes the script a lot easier to code.

If PGS (.sup) subtitles are extracted, it will automatically convert them to VobSub (SUB/IDX) files for embedding in MP4 containers.

#!/bin/bash

trap 'rm -f /tmp/*.$$ ; exit 0' 0 1 2 3 13 15

ffprobe "$1"  2>&1 | grep Stream | grep "Subtitle:" | while read line
do
	tracknum=$( echo "${line}" | cut -f 2 -d ':' | egrep -o "^[0-9]+" )
	lang=$( echo "${line}" | cut -f 2 -d ':' | cut -f 2- -d '(' | tr -d ')' )
	type=$( echo "${line}" | awk '{ print $4;}' )

	case "${type}"
	in
		"hdmv_pgs_subtitle" | "hdmv_pgs_subtitle,")
			ext="sup"
		;;
		"subrip")
			ext="srt"
		;;
		"dvd_subtitle" | "dvd_subtitle,")
			ext="sub"
		;;
		"ass")
			ext="ass"
		;;
		"ssa")
			ext="ssa"
		;;
		*)
			ext="unknown_type"
		;;
	esac

	echo "${tracknum}:$( basename "$1" | tr ' ' '_' ).${tracknum}.${lang}.${ext}"
	# echo "${tracknum}:$( basename "$1" ).${tracknum}.${lang}.${ext}"
done > /tmp/extractsubs.$$

# For the life of me I can't get it to work with spaces in the names, so I just convert
# spaces to underscores, and it works.  Maybe I'll use another character, then convert it
# back when it's done.

/Applications/MKVToolNix-*.0.0.app/Contents/MacOS/mkvextract "$1" tracks $( cat /tmp/extractsubs.$$ )

cat /tmp/extractsubs.$$ | egrep ".sup$" | cut -f 2- -d ':' | while read line
do
	java -jar /Applications/BDSup2Sub512.jar -o "$( basename "${line}" .sup ).sub" "${line}"
	rm "${line}"
done 

Leave a Comment