0.19 is considered unstable by us at Cloudera and by the Y! folks; they never deployed it to their clusters. That said, we recommend 0.18.3 as the most stable version of Hadoop right now. Y! has (or will soon) deploy(ed) 0.20, which implies that it's at least stable enough for them to give it a go. Cloudera plans to support 0.20 as soon as a few more bugs get flushed out, which will probably happen in its next minor release.
So anyway, that said, it might make sense for you to start with 0.20.0, as long as you understand that the first major release usually is pretty buggy, and is basically considered a beta. If you're not willing to take the stability risk, then I'd recommend going with 0.18.3, though the upgrade from 0.18.3 to 0.20.X is going to be a headache (APIs changed, configuration files changed, etc.). Hope this is insightful. Alex On Thu, May 28, 2009 at 2:59 PM, David Rosenstrauch <dar...@darose.net>wrote: > Hadoop noob here, just starting to learn it, as we're planning to start > using it heavily in our processing. Just wondering, though, which version > of the code I should start learning/working with. > > It looks like the Hadoop API changed pretty significantly from 0.19 to > 0.20 (e.g., org.apache.hadoop.mapred -> org.apache.hadoop.mapreduce), > which leads me to think I should start with the new API. OTOH, since the > new release is a ".0" release after some of these major API overhauls, I'm > wondering if it's stable enough for us to start using in production. > > Where'd be best for me to start learning? > > TIA, > > DR > >