Create An Optimized PDF of CBR/CBZ Files – Domain of the Technomancer

For some strange reason, in the land of the internet when people create books with flowable text (i.e. novels), they use the PDF format, which is simply awful for the medium. Yet when they have a content that lends itself perfectly to the PDF format, like comics, they instead use an unnecessary made-up format that is just a renamed ZIP or RAR file.

Also, for some reason, JPEG-2000 as a format seems to have never gained much traction. I do not understand why. It is so much better than classic JPEG. And because PDF readers do support the JPEG-2000 format, it seems even more ideal as a format for comics to make the file more portable, and smaller with no perceptual loss of quality.

I have a Remarkable 2, and it’s a perfect reader for manga and comics (at least black and white ones), and it handles PDFs better, so I wrote this script to convert a CBZ to a PDF using JPEG-2000 for compression.

This script requires the imagemagick and img2pdf packages.

Note: This does not work in Ubuntu linux, as they inexplicably removed all support for JPEG-2000. It does, however work perfectly on a Raspberry Pi (and probably Debian).

#!/bin/bash

CURDIR=$( pwd )

trap 'rm -rf "${CURDIR}"/*.$$ ; exit 0' 0 1 2 3 13 15

# This is the JPEG2000 compress rate.  In my own testing, 24 provides nearly no
# perceptual loss with a dramatic file size reduction.
#
# 0 is lossless, and will give a larger file, but will incur perfect reproduction.
# > 0 increases compression with progressivly more loss.

COMPRESS_RATE=24

COMICFILE=$1

mkdir -p comicpdf.$$

unzip -j -d "comicpdf.$$" "${COMICFILE}"


cd comicpdf.$$

if [ "$( ls -1 *.[Jj][Pp][Gg] 2> /dev/null )" ]
then
	for file in *.[Jj][Pp][Gg]
	{
		# Because of the Microsoft world not handling file name case, make
		# sure that the extension is lower case so basename works correctly.

		mv -n "$file" "$( echo "$file" | sed -e "s/[Jj][Pp][Gg]$/jpg/" )"
		echo -ne "Converting ${file}..."
		convert "$file" -define jp2:rate=${COMPRESS_RATE} "$( basename "$file" .jpg ).jp2"
		echo "Done."
	}
fi

if [ "$( ls -1 *.[Pp][Nn][Gg] 2> /dev/null )" ]
then
	for file in *.png
	{
		# Because of the Microsoft world not handling file name case, make
		# sure that the extension is lower case so basename works correctly.

		mv -n "$file" "$( echo "$file" | sed -e "s/[Pp][Nn][Gg]$/png/" )"
		echo -ne "Converting ${file}..."
		convert "$file" -define jp2:rate=${COMPRESS_RATE} "$( basename "$file" .png ).jp2"
		echo "Done."
	}
fi

if [ "$( ls -1 *.[Jj][Pp]2 2> /dev/null )" ]
then
	echo -ne "Creating PDF..."
	img2pdf --output "${CURDIR}/$( basename -s .cbz "${COMICFILE}" ).pdf" *.jp2
	echo "Done."
fi

cd "${CURDIR}"

2 thoughts on “Create An Optimized PDF of CBR/CBZ Files”

I think that the best format for an art book that does not require any index is CBR. It is similar to CBZ or CB7, but is getting an advantage of RAR recovery record that allows you to protect the archive from some damage. The size of the record you can vary depending on the importance or reliability of the used storage. The most modern media and file systems are quite reliable, but have no any protection for simple single bit errors that are happen. For a book from JPEG images it may follow to a not readable page. In this case even a default 3% recovery record in CBR will help you to recover the book back to the original.

Also keep in mind that re-compressing the JPEG images into another slightly better JPEG will not make them better. Every re-compression decrease the quality of the images and increase the JPEG noise.

I am very familiar with CBR; it’s just a renamed RAR file, which is technically patent encumbered, and does not have standard tools built in to the major OSes that ZIP and PDF do. Also PDFs are ideal for fixed size pages (i.e. scanned images), and more devices support PDFs than the comic book formats. The idea of re-compressing to JPEG-2000 over standard JPEG is to keep the same perceptual quality at a significantly reduced size. Obviously recompressing with a lossy format involves loss of data each time. The key is to pick a value where you can’t see the difference. It’s the same with video and H.264 vs. H.265. I can keep the same visible quality at half the size with H.265.
As for recoverability, I prefer to rely on proper backups than recovery blocks. And if I do want to keep recovery blocks around, I’ll just keep a par2 chuck that I can use.
At the end of the day, you do what works best for you. I’m just putting this out there as one possible way to store things, and a way to do it.

2 thoughts on “Create An Optimized PDF of CBR/CBZ Files”

Leave a Reply to Hauru Cancel reply