Tuesday, May 18, 2010

Kindle media converter Windows script

Check this post out for a Windows batch script to convert images into PDFs optimized for the Amazon Kindle e-reader.




So a while back I got an Amazon Kindle as a birthday gift from my brother-in-law (many thanks Junior!).



Coincidentally, I also got from my lovely wife a book with 20 years of Dilbert - that's funny, even though I read it for years I had no idea there were 20 years of Dilbert! The book, which is pretty huge by the way, thankfully came with a CD with images of all its pages.

Needless to say what happened, right? Since Kindle's light weight makes it much more comfortable to read than the book I wanted to find a good way to put all the images there.

Amazon did include support in the Kindle for photo-albums, or folders with image files (see detailed step by step here). This mechanism works well for photos but not so well when there are a proper sequence of images - for some reason it frequently presented some random page instead of the next page. Also, since the images weren't sized specifically for the Kindle it had to resize it on the fly (the Kindle isn't exactly a speed daemon to handle anything other than text books) and the images were using more space with resolution and colors its grayscale screen just isn't able to reproduce.

Then few months back I put together a Windows batch script to improve on that. I decided to share it, perhaps it might be useful for other e-reader users, or someone who wants to learn from a working script on how to use this amazing open source image processing tool ImageMagick. The script uses Windows and ImageMagick commands to process these 3 steps:

1. Iterate through all sub-folders for images (GIF, PNG, JPG and BMP) and sort them alphabetically by path and file names

This script can correctly handle directory hierarchies like:
.\volume1\image1.png
.\volume1\image2.gif
.\vol2\img1.gif
.\v3\img1l.gif
.\v3\img1r.gif
etc

2. Convert all images from their original format to GIF, resizing it to 600x800 with 16 shades of gray for optimal Kindle memory use.

If you have a different e-reader you can still use this mechanism with some minor modifications to resolution and image depth - keep reading for more info

3. Generate a PDF file with all processed images - this step ensures Kindle will follow the proper page sequence.

Please note that this isn't a super efficient solution or script, and I am sure some Windows batch file genius can point how to better implement some of these functions. I actually learned quite a bit about Windows batch scripts while I was implementing it (and especially how bad Windows scripting is compared to Unix shell scripting).

Installation

1. ImageMagick
ImageMagick is a set of image commandline-based manipulation tools. You can download the latest stable build here: http://www.imagemagick.org/script/download.php

Following their recommendation I downloaded ImageMagick-6.6.1-10-Q16-windows-dll.exe. You should use the latest stable version available on their website. During the installation there is a checkbox labeled "Add application directory to your system path" - make sure this is checked (default value). This is how the converter batch script will be able to use ImageMagick. I installed without changing any options, just Next / Next / Finish.

2. Ghostscript
Ghostscript is a PDF manipulation tool. You can download the latest stable build here: http://pages.cs.wisc.edu/~ghost/

Since we are using Windows we don't need to download the sources (the first set of files in the download page) - just scroll down and download one of the 2 Windows binary builds (32 or 64bit).

I downloaded file gs871w32.exe from this page just for compatibility reasons. If you have a 64bit Windows you can use instead the latest 64bit build for improved performance. Again, I installed it with all default settings.

3. Testing
Open a command prompt window and type this command:

convert.exe -version

If ImageMagick's installation worked properly the result should look like this:


About the script


The complete script can be downloaded here. Follow the text below for detailed instructions on how it works and how you can use it.

You should NOT download and execute batch scripts on your computer from someone you don't know without taking a good look at the code to ensure it is not going to cause your computer any harm. I am just sharing it here because it could be useful for others and I do not provide any guarantees that it will work for you or that it will not somehow break your computer. However, at least I didn't knowingly design it to break anything in your computer. If you download and execute this script you are doing so on your own risk.

The 2nd warning is, you should NOT use this script on content you do not own. I am sharing this just as a way to convert what you already have into your e-reader. Support good authors.

Okay, so now that the "no-guarantee liability babble" is out of the way let me explain how it works:

1. Image consolidation


mkdir work

REM Copy all images to current directory with unique names
    ECHO Pass 1/9 - Consolidate images
pushd %*
for /F "delims=" %%j in ('dir /B /O /S *.gif') do (
  rem echo file=%%j
  set fileName=%%j
  set uniqueName=!fileName:\=_!
  set uniqueName=!uniqueName::=!
  copy "!fileName!" "work\!uniqueName!"
)
popd

This block creates a temporary "work" sub-directory to consolidate images from all other sub-directories. It copies all images from other sub-directories with new names based on the image's path + original name, so the path and name sorting is kept and there are no naming collisions. For example, ".\vol1\1.png" would be copied as to "vol1_1.png" and ".\vol2\1.png" would be "vol2_1.png".

Let me re-state this - the script make copies of all images from all sub-directories into "work". It will NOT remove / rename / process the original images in any way, but only these temporary copies in "work".


2. Remove unwanted images

REM Remove unwanted images
del "work\*Credi*"

This block removes all unwanted images from the "work" directory based on the image name. If your source has images you do not want to transfer to the Kindle you can use this step to filter them out and squeeze the final PDF size a bit.

3. Convert all temporary images to 4bit color GIFs

set MAGICK_TMPDIR=work

REM Convert all GIFs
ECHO Pass 2/9 - GIF to 4bit color GIF
FOR %%a in (work\*.gif) DO (
    ECHO Processing file: "%%~nxa"

    REM Convert GIF to 4bit color GIF
    convert.exe -type GrayScaleMatte +level 0,15 -level 0,15 -quality 100 "%%a" "t1-%%~na.gif"
)

I know PNG is a newer and better format than GIF for various reasons, but I found that if I switch from GIF to PNG above then for some reason I got weird image size issues in the final PDF file.

4. Divide pages in two images


I didn't need this one for my book but it may be useful for other ones. If every original image has two pages side by side (e.g. a full image scan of the two pages) you can uncomment the block below to divide each image into two:

REM Divide pages in two images
rem FOR %%a in (t1*.gif) DO (
rem     convert.exe "%%a" -crop 50%%x100%% +repage "t2-%%~na%%d.gif"
rem )
REM Remove temporary images
rem del t1*.gif


5. Rename all temporary GIFs based on their proper page number

REM Rename all GIFs to format 00000.gif
    ECHO Pass 5/9 - Rename
pushd %*
set /A cnt=0
for /F "delims=" %%j in ('dir /B /O t*.gif') do (
  rem echo cnt1=!cnt!
  set /a cnt+=1
  set cnt2=!cnt!
  if "!cnt2:~1,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~2,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~3,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~4,1!"=="" set cnt2=0!cnt2!
  rem echo cnt2=!cnt2!
  ren "%%j" "!cnt2!.gif"
)
popd

6. Trim, resize, rotate and compress

REM Crop, Trim, Resize
    ECHO Pass 6/9 - Border
    mogrify.exe -bordercolor #FFFFFF -border 30 "*.gif"

    ECHO Pass 7/9 - Trim
    mogrify.exe -bordercolor #FFFFFF -border 30 -fuzz 10%% -trim +repage "*.gif"

    ECHO Pass 8/9 - Resize
rem    mogrify.exe -resize ^^!600x800^^! -colors 16 -compress LZW -quality 30030 "*.gif"
    mogrify.exe -rotate -90 -colors 16 -compress LZW -quality 30030 "*.gif"

For the specific set of images I was converting resizing to 600x800 didn't give me the best results, so I commented it out. It may be useful for other sources, however. Also, I found I had to rotate the images to better use Kindle's screen real state, which may also not be ideal for other sources.

Depending on what you are converting you might want to disable rotation and/or enable resize.

7. Generate the final PDF including all GIFs in the proper order

REM Convert *.GIF into PDF
    ECHO Pass 9/9 - GIF to PDF
    pushd %*
    set gifs=
    set /A cnt=0
    for /F "delims=" %%j in ('dir /B /O *.gif') do (
        echo Appending %%j
        set gifs=!gifs! "%%j"
        set /A cnt+=1
        if !cnt!==600 (
            convert.exe -limit memory 128mb -limit map 256mb -compress zip -quality 100 -density 200 "out.pdf" !gifs! "out.pdf"
            set /A cnt=0
            set gifs=
        )
    )
    if !cnt! GTR 0 (
        convert.exe -limit memory 128mb -limit map 256mb -compress zip -quality 100 -density 200 "out.pdf" !gifs! "out.pdf"
    )
    popd

The step above usually is where this script takes the most time - on my Core2Duo desktop it took few hours to convert all the images into one PDF file. Since I left it running throughout the night it didn't bother me much. I also found that it required a good amount of free space in the disk, to store temporary files.

8. Clean-up of all temporary files

REM Remove GIF files leaving only the PDF
    del "*.gif"
    rd /q /s work

The commands above are used to remove all GIFs from the current directory, and not from any sub-directory. Then, it also removes the temporary "work" directory for a complete clean-up.

Some weird error messages may pop up while the script is running - this wasn't happening with the version of ImageMagick I used to write it, but found it is with the current stable version. Thankfully it does not affect functionality, as far as I could see

If everything worked properly, in the end of the process you will have in the current directory all your original sub-directories and the final PDF file.

So, for a quick test I processed about 5 Dilbert images and divided in two sub-directories, simulating multiple volumes we want to consolidate into a single Kindle-optimized PDF.

The path structure looked like this:
C:\Temp\convertToKindle - convertToKindle.bat
C:\Temp\convertToKindle\v1 - v1-1.gif, v1-2.gif
C:\Temp\convertToKindle\v2 - v2-1.gif, v2-2.gif, v2-3.gif

Then after I executed convertToKindle.bat I had a new file, out.pdf with all 5 images, looking like this:

Like I said, these settings work well for this particular source - depending on what you're converting and your e-reader you might need to play with it to find optimal settings for rotation and resize.


Tip - process multiple documents in one shot

If you have multiple sources and want to convert each on a separate PDF file you can follow these instructions to process all of them one at a time:

1. setup your directories like this:
- root
- source1
- source2
- source3

2. copy convertToKindle.bat to source1, source2 and source3

3. create a new Windows batch file named processAll.bat in the root folder and paste the commands below:

cd source1
call convertToKindle.bat
cd ..

cd source2
call convertToKindle.bat
cd ..

cd source3
call convertToKindle.bat
cd ..

Now if you execute processAll.bat it will execute convertToKindle.bat on every source directory, generating one PDF for each directory.

That's it, I hope this can be helpful for others.
Pedro


Update - 05/27/2010

I found few similar tools that may be easier / better to use available elsewhere. So, just in case this batch script seems too complicated you might also want to try one of these instead:

1. Mangle: http://foosoft.net/mangle/

2. KindleManga: http://code.google.com/p/kindlemanga/

3. Manga2PDF (Linux): http://www.mobileread.com/forums/showthread.php?t=12790&highlight=manga2pdf

4. ePubizator (Multiplataform): http://www.mobileread.com/forums/showthread.php?t=36677


Note: I did not try any of these tools, so your mileage may vary.

No comments:

Post a Comment