Sunday, February 21, 2010

Solid State Drives - Part 1

I have been reading a lot about Solid State Drives (SSDs) for quite some time. SSDs are storage devices we can use to replace standard Hard Disk Drives (HDDs), but offer less space and cost much much more.

HDDs are based on mature (i.e. old) technology, basically consistinting of multiple spinning magnetic disks read and written by moving heads. Suppose you want to read a part of a certain file: the reading head must move to a specific track / cylinder and wait until the data in the spinning disk passes just under the reading head. This procedure of one physical move operation and then wait for the data is called seek time. And as much it sounds complex, it is actually amazingly fast in modern HDDs - both the heads and the disks are freaking fast. It still, however, takes some time, and by relying on moving parts this whole mechanism is quite sensitive to impacts. When my external Maxtor 500GB USB harddrive fell on the table (yes, it didn't drop on the floor or anything, it just fell from vertical to horizontal position on the table!) I could only imagine its disks spinning at 7200 RPM with the writing heads scratching on everything just like an old vinyl LP! Poor HD, it is now on the HD heavens, you will be remembered!

SSDs currently offer less space than HDDs, but since it has no moving parts (it is based on the same memory we use on digital cameras, small USB drives etc) it would still be "alive", one point to SSDs here. But also, without heads and spinning disks etc SSDs run cooler, use less energy (great for laptops) and can get the data without any delay - the seek times are ridiculously fast! Or so the manufacturers say...

It took me some time to find courage (i.e. $$$) to try it myself, but I finally got one on last year's Black Friday. Then I had to read even more to set it up properly. After 4 months using it every day in one of my work laptops I am certain that SSDs are to become mainstream in the next couple years, with falling prices and improved performance. Though they may not replace HDDs anytime soon, it makes a lot of sense to setup a mixed system, with SSDs to store the operating system and programs, and HDDs to store music, photos, movies, backups etc.

Now, if you are interested in more comprehensive reviews and bechmark comparisons, I am not dreaming of competing with sites like AnandTech (http://anandtech.com/storage/showdoc.aspx?i=3747&p=1), ARS Technica (http://arstechnica.com/hardware/news/2009/03/anand-investigates-ssd-performance-shenanigans.ars) or Tom's Hardware (http://arstechnica.com/hardware/news/2009/03/anand-investigates-ssd-performance-shenanigans.ars) - but, given how complex and expensive SSDs currently are, I wanted to share my own perspective as a Java developer.

# SSDs benefits:
- read and write speeds are much better than HDDs
- more resistant to shocks due to no moving parts
- when its lifetime is over you can still read your data and transfer it to other storage devices - something that is not easily accomplished with HDDs in most cases

# SSD drawbacks:
- still costly, but prices are going down as new and better products are being offered
- less storage space - unless you have lots of money to burn you may have to get used to work with less space in your laptop than what you have now
- many operating systems, especially older ones (XP, this is you!), were not designed to use SSDs efficiently, causing sub-optimal performance and increased wear
- limited lifetime - due to technical limitations in flash memory, over time it will stop being able to write new data. No one knows exactly how long each of these SSDs last. All manufactures provide some bogus unrealistic metrics, but if I had to guess it would not be much longer than 3 to 5 years

I said I would not make benchmarks but just to give you a glimpse of how faster SSDs are I posted these two images below. In the left we have a 160GB Western Digital HDD, and in the right we have a 80GB Intel X-25M - both tests executed on a Dell Latitude D630 laptop.






The Seq row represents scenarios similar to when we are reading / writing big files (i.e. a DVD movie) to the disk. The 4K row is, however, far more significative for the overall disk performance - it represents reading / writing of small chunks of data to the disk, an activity that is far more common on a workstation throughout the day.

Without getting into further details, I think it is pretty evident that SSDs are significantly faster than common HDDs. Should you buy one? Well it depends... but you can be sure it made sense for me to buy it! To explain why I believe in that, these two are the biggest concerns people have with SSDs, and also the main reasons why I bought it:

1. "SSDs are too costly" - I work with computers 8 to 10 hours per day. Every day. And when I am not working for my employer I am likely to be testing something for myself to learn new tricks. I always need to keep few resource hog applications up all the time (e.g. Oracle, Eclipse, virtual machines etc), which causes a lot of disk I/O. When I build a complex project with Ant or Maven it can take around 5 minutes - yeah, it seems fine, but sometimes I need to do that many many many times during the day, and you can guess it adds up fast.

Now, let us assume the SSD costs $200 and that my employer pays me $40 / hour. If the SSD's improved performance saves me 15 minutes per day in project builds, application startups etc, it would take me only 1 month to "recover" my investment. And after that, it is just profit - boom baby! Also, because I can now provide consistent faster results, especially on tasks associated with high disk I/O (e.g. build virtual machines, install Oracle etc), perhaps my boss will have good news on my next performance review and give me a raise! Yeah, I know I am dreaming now...

2. "SSDs will die after 5 or so years" - Ok, I wish you could see my face of "not caring". Seriously, in 5 years I want to be able to buy a few petabytes of quantum-based storage for the same $200. By then I will just not care if a 80GB or 160GB SSD is not able to write anymore. Even its speed, which is great for todays standard, is going to be dog slow!

Ok, now that we agree it makes sense in some cases (e.g. mine) to replace HDDs with SSDs, which one we should buy? Man, I wish things were simple... even though we have a gazillion companies selling these things, it is still something I would never ask my father to buy and install one by himself, and he plays with computers since I was 6 years old.

First, you need to pick the right SSD - and like I said, there are too many companies selling them. And I can tell you, you REALLY do not want to pick one of the bad SSDs - many of them are worse than regular HDDs, and they get even worse over time! Since this is not a super consumer-friendly (i.e. plug-and-play-and-forget) kind of device, your best bet is to stick with the top 2 vendors - the ones with most modern technology and the biggest number of fellow "SSD-ers" to help you on forums if you ever need help. Currently I consider Intel and OCZ as the two top contenders in the SSD market, with Intel having the upper hand in value. OCZ's top model (Vertex LE) is somewhat faster than Intel's I-25M G2 but it is way more expensive - the 100GB costs around twice as much of what I paid for Intel's 80GB SSD! Sure, there are others (e.g. GSkill, SuperTalent etc), but at this time I think it is just safer to stick with one of these two. Kingston entered the market re-branding Intel drives, so apart from the niceties in the Kingston bundle they count as Intel hardware.

  • Intel has two interesting models, the I-25M with 80GB or 160GB. Both use Intel's own controller.






  • OCZ has just too many different models and capacities, but the most popular ones seem to be the Vertex (Indilix controller) and Sumit series (Samsung controller).






If you really want to jump in the SSD bandwagon I recommend you keep an eye for deals on sites like Amazon, Newegg, SlickDeals, DealDump etc (see my previous post about the Black Friday for a more deals sites) for either one of these 3 models (Intel, Vertex and Summit) with the storage size your wallet can afford. These short-lived deals can make a good difference for your wallet, or even allow you to get a bigger drive for the same $$$.

Ok, this post ended up quite longer than I expected. I will leave for my next post to describe how I installed, setup, cloned and tweaked the SSD on my laptop, and why I picked Intel over other alternatives.

Thursday, February 4, 2010

Java magic with AspectJ

Sometimes in our programming projects we can start things from scratch and implement them exactly how we want them. In these happy and creative times we so much have control of everything, if we don't like some of our own designs, APIs or mechanisms we can just go and change it, simple like that. Bad thing oftentimes we aren't so lucky, and we have to work on more constrained settings, using existing crappy code and macaroni architectures. In times like these we gotta be happy if we at least have the source code we are supposed to extend / use!

Recently I had to work on a project whose a big chunk of the code was implemented by another team. And we were not supposed to change that code. But we were still somehow supposed to change the exception handling and logging for a certain kinds of Java classes on both that codebase and our own custom code.

Since we are on a pretty bad economy, instead of spending time to update my resume I thought of using this as an opportunity to test and learn AspectJ. It was a long shot but I read about it in the past, but never had a strong need to use it until now and thought it was worth trying. The thing is, I was interested in it a while back but thought it would be just like other technologies, like Spring, Hibernate, JSF etc. Think about it, if you need to do real work and have only 2 or 3 days, can you do something with Spring + Hibernate + JSF from scratch if you never worked with it before? The learning curve for most of these things is just too high if you need quick results!

Man, I am glad I tried AspectJ. I know its inner workings use some arcane technology (bytecode manipulation isn't for the faint of hearth), but it was SO much easier than I expected to hook my custom exception handling logic in the existing code I felt it was almost too easy to be true! :P

Since I think other people may benefit from this mechanism let me share some basic but still useful information about it. In a nutshell AspectJ is an arcane technique to alter behavior of existing Java binaries. You can transparently inject logic into certain "point cuts", like when a certain method name is executed, or when classes from a certain package are loaded etc etc. Few cool examples of what you can do with that are:
- a custom global logging mechanism to existing classes from a certain package (e.g. my.company.*)
- custom exception handling mechanism for certain exception types
- method profiling, for performance monitoring and tuning
- reduce LOC and complexity on your new code using aspects to handle multiple functional concerns transparently, outside your business classes

There are 2 main ways of using AspectJ:

1. Static-weaving: You can use AspectJ's ajc compiler to alter existing class files (either .java or .class), injecting your custom aspects and pack all changed files into a new JAR file. Then you put the new JAR in your classpath instead of the old classes (in most cases it may also work to just put the JAR first in the JVM's CLASSPATH). The Eclipse team divides this method as Compile-Time Weaving, when ajc compiles *.java into *.class injecting the aspects, and Post-Compile Weaving, when ajc injects aspects into pre-compiled class files. The performance penalty here is 0, it will be the same thing if you recompiled the original classes with your aspect's logic in it.

2. Load-time weaving: This is where AspectJ's magic is more evident - your original classes and JARs are modified when they are loaded by the JVM! You do not have to replace any JARs, it will all be done dynamically. In most cases the performance overhead imposed by AspectJ to "weave" your aspects in the existing classes is negligible when loading the classes, but note that there will be no runtime overhead - once AspectJ injects your aspects it will not mess with that class again.

If I have to choose between one of the 3 methods described above I would try Load-Time Weaving (LTW) first, because I think in most cases the performance overhead during class load times (note it is "load times" and not "run time") should not be enough to overshadow the benefits of not having to bother updating JAR files every time the original class must be updated. If there is a significant performance hit due a broad scope of weaved classes then it is easy to move to one of the two static weaving methods described in item 1 above.

Ok, enough talk, how can we actually use it? Let me describe it with a very simple example that can serve as the basis for whatever else you want to do with AspectJ: (yeah, you guessed) a custom logging and performance monitoring mechanism!

# Dev env setup

1. Download and install AspectJ Development Tools (AJDT) for Eclipse. You can download it here.

2. Download my AspectJ examples project here.

3. Extract the AspectJ examples project somewhere in your development environment and import it in Eclipse.





# Project notes

1. "test.Test.java" is a simple test class. Their methods will be intercepted by AspectJ to transparently inject additional behavior without us having to change any code in Test.java.

2. Our two aspects LogAspect.aj and MethodProfilerAspect.aj are located in package "aspects".

3. Eclipse with AJDT show us what aspects affecting each method in class Test.java in the Cross References view (we can see it in the screenshot above, in the bottom right). Just select a method in the Java editor and it will list the aspects in that view. Or select an aspect method in LogAspect.aj and it will list the intercepted methods in that view. This is pretty useful to control scoping, just keep in mind that AJDT will only list methods in its own classpath, and it may be different (i.e. the aspects may catch more classes) when you deploy the aspect JAR in the app server.

Without any aspects, this is the output of test.Test.java:

Test.method1
Test.method2
Test.method3

Enabling just LogAspect.aj we then get this output:
[LogAspect] before public static void test.Test.main(java.lang.String[])
[LogAspect] param[0]='[Ljava.lang.String;@12ac982'
[LogAspect] before public void test.Test.method1()
Test.method1
[LogAspect] after public void test.Test.method1()
[LogAspect] before public java.lang.Object test.Test.method2()
Test.method2
[LogAspect] after public java.lang.Object test.Test.method2()
[LogAspect] returned 'null'
[LogAspect] before public java.lang.Object test.Test.method3(java.lang.String)
[LogAspect] param[0]='InputValue'
Test.method3
[LogAspect] after public java.lang.Object test.Test.method3(java.lang.String)
[LogAspect] returned 'ReturnValue'
[LogAspect] after public static void test.Test.main(java.lang.String[])

And finally, enabling just MethodProfilerAspect.aj we get this output:
[MethodProfilerAspect] entering public static void test.Test.main(java.lang.String[])
[MethodProfilerAspect] entering public void test.Test.method1()
Test.method1
[MethodProfilerAspect] exiting public void test.Test.method1() in 31ms
[MethodProfilerAspect] entering public java.lang.Object test.Test.method2()
Test.method2
[MethodProfilerAspect] exiting public java.lang.Object test.Test.method2() in 62ms
[MethodProfilerAspect] entering public java.lang.Object test.Test.method3(java.lang.String)
Test.method3
[MethodProfilerAspect] exiting public java.lang.Object test.Test.method3(java.lang.String) in 16ms
[MethodProfilerAspect] exiting public static void test.Test.main(java.lang.String[]) in 109ms

If you look at the aspects' source code you will see how little coding was required for that. But this is just the tip of the iceberg, there are many other cool things we can do with AspectJ, like dynamically change the return value of certain methods, catch, modify and re-throw Exceptions etc.

So this is pretty powerful... but "with power comes responsibility" - you must be careful and not depend on too many aspects, or your project could become a maintenance nightmare!


# Additional interesting articles about AspectJ