Tuesday, May 18, 2010

Kindle media converter Windows script

So a while back I got an Amazon Kindle as a birthday gift from my brother-in-law (many thanks Junior!).



Coincidentally, I also got from my lovely wife a book with 20 years of Dilbert - that's funny, even though I read it for years I had no idea there were 20 years of Dilbert! The book, which is pretty huge by the way, thankfully came with a CD with images of all its pages.

Needless to say what happened, right? Since Kindle's light weight makes it much more comfortable to read than the book I wanted to find a good way to put all the images there.

Amazon did include support in the Kindle for photo-albums, or folders with image files (see detailed step by step here). This mechanism works well for photos but not so well when there are a proper sequence of images - for some reason it frequently presented some random page instead of the next page. Also, since the images weren't sized specifically for the Kindle it had to resize it on the fly (the Kindle isn't exactly a speed daemon to handle anything other than text books) and the images were using more space with resolution and colors its grayscale screen just isn't able to reproduce.

Then few months back I put together a Windows batch script to improve on that. I decided to share it, perhaps it might be useful for other e-reader users, or someone who wants to learn from a working script on how to use this amazing open source image processing tool ImageMagick. The script uses Windows and ImageMagick commands to process these 3 steps:

1. Iterate through all sub-folders for images (GIF, PNG, JPG and BMP) and sort them alphabetically by path and file names

This script can correctly handle directory hierarchies like:
.\volume1\image1.png
.\volume1\image2.gif
.\vol2\img1.gif
.\v3\img1l.gif
.\v3\img1r.gif
etc

2. Convert all images from their original format to GIF, resizing it to 600x800 with 16 shades of gray for optimal Kindle memory use.

If you have a different e-reader you can still use this mechanism with some minor modifications to resolution and image depth - keep reading for more info

3. Generate a PDF file with all processed images - this step ensures Kindle will follow the proper page sequence.

Please note that this isn't a super efficient solution or script, and I am sure some Windows batch file genius can point how to better implement some of these functions. I actually learned quite a bit about Windows batch scripts while I was implementing it (and especially how bad Windows scripting is compared to Unix shell scripting).

Installation

1. ImageMagick
ImageMagick is a set of image commandline-based manipulation tools. You can download the latest stable build here: http://www.imagemagick.org/script/download.php

Following their recommendation I downloaded ImageMagick-6.6.1-10-Q16-windows-dll.exe. You should use the latest stable version available on their website. During the installation there is a checkbox labeled "Add application directory to your system path" - make sure this is checked (default value). This is how the converter batch script will be able to use ImageMagick. I installed without changing any options, just Next / Next / Finish.

2. Ghostscript
Ghostscript is a PDF manipulation tool. You can download the latest stable build here: http://pages.cs.wisc.edu/~ghost/

Since we are using Windows we don't need to download the sources (the first set of files in the download page) - just scroll down and download one of the 2 Windows binary builds (32 or 64bit).

I downloaded file gs871w32.exe from this page just for compatibility reasons. If you have a 64bit Windows you can use instead the latest 64bit build for improved performance. Again, I installed it with all default settings.

3. Testing
Open a command prompt window and type this command:

convert.exe -version

If ImageMagick's installation worked properly the result should look like this:


About the script


The complete script can be downloaded here. Follow the text below for detailed instructions on how it works and how you can use it.

You should NOT download and execute batch scripts on your computer from someone you don't know without taking a good look at the code to ensure it is not going to cause your computer any harm. I am just sharing it here because it could be useful for others and I do not provide any guarantees that it will work for you or that it will not somehow break your computer. However, at least I didn't knowingly design it to break anything in your computer. If you download and execute this script you are doing so on your own risk.

The 2nd warning is, you should NOT use this script on content you do not own. I am sharing this just as a way to convert what you already have into your e-reader. Support good authors.

Okay, so now that the "no-guarantee liability babble" is out of the way let me explain how it works:

1. Image consolidation


mkdir work

REM Copy all images to current directory with unique names
    ECHO Pass 1/9 - Consolidate images
pushd %*
for /F "delims=" %%j in ('dir /B /O /S *.gif') do (
  rem echo file=%%j
  set fileName=%%j
  set uniqueName=!fileName:\=_!
  set uniqueName=!uniqueName::=!
  copy "!fileName!" "work\!uniqueName!"
)
popd

This block creates a temporary "work" sub-directory to consolidate images from all other sub-directories. It copies all images from other sub-directories with new names based on the image's path + original name, so the path and name sorting is kept and there are no naming collisions. For example, ".\vol1\1.png" would be copied as to "vol1_1.png" and ".\vol2\1.png" would be "vol2_1.png".

Let me re-state this - the script make copies of all images from all sub-directories into "work". It will NOT remove / rename / process the original images in any way, but only these temporary copies in "work".


2. Remove unwanted images

REM Remove unwanted images
del "work\*Credi*"

This block removes all unwanted images from the "work" directory based on the image name. If your source has images you do not want to transfer to the Kindle you can use this step to filter them out and squeeze the final PDF size a bit.

3. Convert all temporary images to 4bit color GIFs

set MAGICK_TMPDIR=work

REM Convert all GIFs
ECHO Pass 2/9 - GIF to 4bit color GIF
FOR %%a in (work\*.gif) DO (
    ECHO Processing file: "%%~nxa"

    REM Convert GIF to 4bit color GIF
    convert.exe -type GrayScaleMatte +level 0,15 -level 0,15 -quality 100 "%%a" "t1-%%~na.gif"
)

I know PNG is a newer and better format than GIF for various reasons, but I found that if I switch from GIF to PNG above then for some reason I got weird image size issues in the final PDF file.

4. Divide pages in two images


I didn't need this one for my book but it may be useful for other ones. If every original image has two pages side by side (e.g. a full image scan of the two pages) you can uncomment the block below to divide each image into two:

REM Divide pages in two images
rem FOR %%a in (t1*.gif) DO (
rem     convert.exe "%%a" -crop 50%%x100%% +repage "t2-%%~na%%d.gif"
rem )
REM Remove temporary images
rem del t1*.gif


5. Rename all temporary GIFs based on their proper page number

REM Rename all GIFs to format 00000.gif
    ECHO Pass 5/9 - Rename
pushd %*
set /A cnt=0
for /F "delims=" %%j in ('dir /B /O t*.gif') do (
  rem echo cnt1=!cnt!
  set /a cnt+=1
  set cnt2=!cnt!
  if "!cnt2:~1,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~2,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~3,1!"=="" set cnt2=0!cnt2!
  if "!cnt2:~4,1!"=="" set cnt2=0!cnt2!
  rem echo cnt2=!cnt2!
  ren "%%j" "!cnt2!.gif"
)
popd

6. Trim, resize, rotate and compress

REM Crop, Trim, Resize
    ECHO Pass 6/9 - Border
    mogrify.exe -bordercolor #FFFFFF -border 30 "*.gif"

    ECHO Pass 7/9 - Trim
    mogrify.exe -bordercolor #FFFFFF -border 30 -fuzz 10%% -trim +repage "*.gif"

    ECHO Pass 8/9 - Resize
rem    mogrify.exe -resize ^^!600x800^^! -colors 16 -compress LZW -quality 30030 "*.gif"
    mogrify.exe -rotate -90 -colors 16 -compress LZW -quality 30030 "*.gif"

For the specific set of images I was converting resizing to 600x800 didn't give me the best results, so I commented it out. It may be useful for other sources, however. Also, I found I had to rotate the images to better use Kindle's screen real state, which may also not be ideal for other sources.

Depending on what you are converting you might want to disable rotation and/or enable resize.

7. Generate the final PDF including all GIFs in the proper order

REM Convert *.GIF into PDF
    ECHO Pass 9/9 - GIF to PDF
    pushd %*
    set gifs=
    set /A cnt=0
    for /F "delims=" %%j in ('dir /B /O *.gif') do (
        echo Appending %%j
        set gifs=!gifs! "%%j"
        set /A cnt+=1
        if !cnt!==600 (
            convert.exe -limit memory 128mb -limit map 256mb -compress zip -quality 100 -density 200 "out.pdf" !gifs! "out.pdf"
            set /A cnt=0
            set gifs=
        )
    )
    if !cnt! GTR 0 (
        convert.exe -limit memory 128mb -limit map 256mb -compress zip -quality 100 -density 200 "out.pdf" !gifs! "out.pdf"
    )
    popd

The step above usually is where this script takes the most time - on my Core2Duo desktop it took few hours to convert all the images into one PDF file. Since I left it running throughout the night it didn't bother me much. I also found that it required a good amount of free space in the disk, to store temporary files.

8. Clean-up of all temporary files

REM Remove GIF files leaving only the PDF
    del "*.gif"
    rd /q /s work

The commands above are used to remove all GIFs from the current directory, and not from any sub-directory. Then, it also removes the temporary "work" directory for a complete clean-up.

Some weird error messages may pop up while the script is running - this wasn't happening with the version of ImageMagick I used to write it, but found it is with the current stable version. Thankfully it does not affect functionality, as far as I could see

If everything worked properly, in the end of the process you will have in the current directory all your original sub-directories and the final PDF file.

So, for a quick test I processed about 5 Dilbert images and divided in two sub-directories, simulating multiple volumes we want to consolidate into a single Kindle-optimized PDF.

The path structure looked like this:
C:\Temp\convertToKindle - convertToKindle.bat
C:\Temp\convertToKindle\v1 - v1-1.gif, v1-2.gif
C:\Temp\convertToKindle\v2 - v2-1.gif, v2-2.gif, v2-3.gif

Then after I executed convertToKindle.bat I had a new file, out.pdf with all 5 images, looking like this:

Like I said, these settings work well for this particular source - depending on what you're converting and your e-reader you might need to play with it to find optimal settings for rotation and resize.


Tip - process multiple documents in one shot

If you have multiple sources and want to convert each on a separate PDF file you can follow these instructions to process all of them one at a time:

1. setup your directories like this:
- root
- source1
- source2
- source3

2. copy convertToKindle.bat to source1, source2 and source3

3. create a new Windows batch file named processAll.bat in the root folder and paste the commands below:

cd source1
call convertToKindle.bat
cd ..

cd source2
call convertToKindle.bat
cd ..

cd source3
call convertToKindle.bat
cd ..

Now if you execute processAll.bat it will execute convertToKindle.bat on every source directory, generating one PDF for each directory.

That's it, I hope this can be helpful for others.
Pedro


Update - 05/27/2010

I found few similar tools that may be easier / better to use available elsewhere. So, just in case this batch script seems too complicated you might also want to try one of these instead:

1. Mangle: http://foosoft.net/mangle/

2. KindleManga: http://code.google.com/p/kindlemanga/

3. Manga2PDF (Linux): http://www.mobileread.com/forums/showthread.php?t=12790&highlight=manga2pdf

4. ePubizator (Multiplataform): http://www.mobileread.com/forums/showthread.php?t=36677


Note: I did not try any of these tools, so your mileage may vary.

Monday, May 10, 2010

Quest to find the best Linux for slow computers, Fedora 12 LXDE upgrade to 13

I replaced Kubuntu Netbook Remix (KNR) on my netbook with the LXDE spin of Fedora 12. KNR looks nice and has many nice features for devices with small screens but it was just painfully slow on my netbook. LXDE is (in theory) a more crappy-hardware-friendly Linux flavor, so I decided to give it a try.

LXDE's UI looks uglier and less flexible than KNR's, but performance is indeed better. With that in mind I put together a development environment based on a Fedora LXDE virtual machine I can use to test my own projects in Linux. Nothing terribly resource-intensive but things like Grails, Gradle, AppFuse, etc. I wanted to know if this VM would be fast enough so I could use it as my development environment.

I know this approach adds some overhead when compared to a native development environment, but if the performance hit is not noticeable enough it could be interesting. For example, the next time I upgrade my laptop I can simply copy my VMs and keep my fully configured environment. Also, since all servers I work with are Unix-based, there would be more script reuse and less environment-specific issues (not that I get many of these, but some of my colleagues insist in hardcoding Windows-style paths sometimes). And lastly, I can keep project-specific VMs to organize my development environments. So far this approach has been working well for the past few days.

But I wanted to see how Fedora 13 LXDE is looking like (we still have few days before the official launch). Since I couldn't find where to download anything but the main version of Fedora 13 I upgraded a copy of my VM following these steps:

Update current libraries

su
yum update

Download and install Fedora 13 release RPM

a. Download this file: http://mirror.aarnet.edu.au/pub/fedora/linux/releases/test/13-Beta/Fedora/i386/os/Packages/fedora-release-13-0.7.noarch.rpm
b. Install it with the command below:
rpm -Uvh fedora-release-13-0.7.noarch.rpm

Verify the release has changed

cat /etc/fedora-release
Fedora release 13 (Goddard)

Upgrade all libraries to Fedora 13

yum upgrade -y

And after one hour or two I had a fully functional Fedora 13 LXDE VM. I just don't see lots of changes other than few icons and the wallpaper. I guess I will have to wait until the official launch... :(

Wednesday, May 5, 2010

NFJS 2010 - Northern Virginia Software Symposium II

This is my last post about the No Fluff Just Stuff Software Symposium here in Reston, VA. Sad to say I didn't get any of the 3 iPads! I did see my name scrolling down for the last iPad (the most expensive model!) but the system had to pick someone else's name just 2 or 3 positions from mine! Oh well... guess I am not destined to get one, since I am not gonna buy one and my luck isn't gonna get me a free one either. This is something else I might consider buying, if it ever sees the light of day Notion Ink Adam.

Anyways, the symposium was amazing - Jay Zimmerman put together a great set of speakers and enough varied content for everybody. Thanks Jay! These are the sessions I attended, along with my comments:


Friday


Implementing Evolutionary Architecture - Neal Ford

Testing the Entire Stack - Neal Ford

Smithing in the 21st Century - Neal Ford

Like I said on my last post, I really like Neal's presentations. He described REST in detail, and how we can use it to keep systems loosely coupled. Also, he gave some great recommendations on tools and strategies for testing different aspects of common web and desktop applications.

What Stops You From Delivering? - David Hussman

Discussed about ways to find your process bottlenecks and how to deal with them.


Saturday


Pragmatic Architecture - Ted Neward

Discussed about architecture and how oftentimes we either under or over-architect systems.

It is quite common in the J2EE world to find teams building systems using so many different design patterns just because they want to use it, not because it is the easiest / best way to solve the problem. In the other extreme we have teams that just go coding stuff without minimal or any architecture discussion.

In the end architecture should be defined at a high-level before we start coding things, otherwise it will be defined with much less control and may not end up being quite what you expected.

Architect for Scale - Michael Nygard

Software Architecture for the Cloud - Michael Nygard

I am a big fan of Michael since I read his book "Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)
". His presentations focused on architecture for systems designed to handle volumes orders of magnitude higher than what I am used to. It was really interesting to glimpse on what kind of problems we may see soon, with more and more people using computer systems, and what kind of solutions we are using today to support it with high availability and performance. He described techniques such as process parallelization, architecture partitioning (sharding) and content delivery networks (e.g. Akamai) to support scalability.

Building RESTful Apps with SpringMVC - Craig Walls

He demonstrated a single SpringMVC-based web application with support for both browsers and REST web service clients. It was a good demo, I will try that later.

300 Sessions

This was cool, each presenter got only 5 minutes to present a topic. I wish they had shared these PPTs too.


Sunday


Cloud Computing - Rohit Bhardwaj

This one was really interesting - Rohit described and demonstrated how we can use Amazon EC2 and S3 as a low-cost approach for on-demand infrastructure services. He also demonstrated a simple Google AppEngine project, and compared the costs and specifics of each major player in the cloud computing world. It might be just me but I didn't know Google gave us free resources for our projects in Google App Engine! If you just want to test out an idea you can setup a virtual environment in no time, and as long you don't use lots of resources (up to 500MB, up to 5 million page hits per month, limited CPU use etc) you can set it up for FREE!

Google's offer is free but limited: no database (you have to use BigTable instead), not all parts of Java are supported (e.g. no Thread) etc. They basically hide the servers from you, so you can focus only on the application side.

Amazon's offer is not free but cheap, and they give you control over the infrastructure. You can try out different server architectures, create server templates etc without buying any physical infrastructure. This is great for temporary needs, like a performance test where after the test proved a set of hypothesis it can be discarded.

One approach is to build and test your app in Google, and then migrate it to Amazon or to your in-house infrastructure. As long you don't build close ties with Google's infrastructure that shouldn't be too complicated.

Now that I have cheap infrastructure I just need to have the killer idea on how to use it to make tons of money! Ideas anyone? Please? :D

Enter the Gradle - Ken Sipe

Gradle is like a new Maven, a project build system. From this presentation I got that it is still not mature yet for my purposes (though big projects are already using it) but it made me more interested in Groovy, the language used to build Gradle. I will definitely spend more time learning Groovy and Grails in good part because of this presentation.

Maintaining Source Code Quality - David Bock

David described how we can easily include project analyzers to determine code quality and use it as metrics towards better quality. Great stuff, but I used most of it on previous projects (PMD, FindBugs, CheckStyle etc). I did learn about some new source code analyzers though. He also pointed out the importance for enforcing a project-specific code style - it is not about cosmetic preferences, but to simplify merges in the source code versioning system.

Programming Clojure - Aaron Bedra

What else can I say about this one? I think I twisted my brain hahaha... really, this was the last session I attended in the conference, and the most complex one. Aaron had to consolidate a 4 hours presentation into just 1.5 hours, and while I think he did a good job at it describing Clojure's syntax my opinion is that it just made people want to avoid it.