On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <gma...@talend.com> wrote:
> On 01/15/2013 06:50 PM, Erik Paulson wrote: > >> Hello - >> >> I'm curious what Hadoop developers use for their day-to-day hacking on >> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not >> developing Map-Reduce jobs or using using the HDFS Client libraries to >> talk >> to a filesystem from an application. >> >> I've checked out Hadoop, made minor changes and built it with Maven, and >> tracked down the resulting artifacts in a target/ directory that I could >> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works, >> or >> are the IDEs more common? >> > I haven't built Hadoop yet myself. Your use of "a" in "a target/ > directory" indicates you're also kind of new with Maven itself, as that's > the standard output folder for any Maven project. One of many nice things > about Maven is once you learn how to build one project with it you pretty > much know how to build any project with it, as everything's standardized > with it. > > Probably best to stick with the command line for building and use Eclipse > for editing, to keep things simple, but don't forget the mvn > eclipse:eclipse command to set up Eclipse projects that you can > subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/* > *entry/web_service_tutorial#**EclipseSetup<http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup> > > > >> I realize this sort of sounds like a dumb question, but I'm mostly curious >> what I might be missing out on if I stay away from anything other than >> vim, >> and not being entirely sure where maven might be caching jars that it uses >> to build, >> > > That will be your local Maven repository, in an .m2 hidden folder in your > user home directory. > > > > and how careful I have to be to ensure that my changes wind up in >> the right places without having to do a clean build every time. >> >> > Maven can detect changes (using mvn install instead of mvn clean install), > but I prefer doing clean builds. You can use the -Dmaven.test.skip setting > to speed up your "mvn clean installs" if you don't wish to run the tests > each time. > Thanks to everyone for their advice last week, it's been helpful. You're spot-on that I'm new to Maven, but I'm a little confused as to what the different targets/goals are best to use. Here's my scenario. What I'd like to get working is the DataNodeCluster, which lives in the tests. Running it from hadoop-hdfs-project/hadoop-hdfs/target as 'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar org.apache.hadoop.hdfs.DataNodeCluster -n 2' blows up with a NPE inside of MiniDFSCluster - the offending line is 'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of MiniDFSCluster.java) I'm not worried about being able to figure out what's wrong (I'm pretty sure it's that conf is still null when this gets called) - what I'm trying to use this as is a way to understand what gets built when. Just to check, I added a System.out.println one line before 2078 of MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with mvn package -DskipTests Because I don't want to run all the tests. This certainly compiles the codes - if I leave the semicolon off of my change the compile fails, even with -DskipTests. However, it doesn't appear to rebuild target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar - the timestamp is still the old version. It _does_ copy target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless it's copying or building it from somewhere else - but if it is, it's picking up old versions of my code) I only get an updated version if I ask for mvn package -Pdist -DskipTests Which is a 3 minute rebuild cycle, even for something as simple as changing the text in my System.out.println. (Even just a mvn package -DskipTests with no changes to any source code is a 40 second operation) I haven't sat around and waited for 'mvn package' to run and fire off the test suite. I don't know if that would result in an updated hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar being built. So, my question is: - Is there a better maven target to use if I just want to update code in MiniDFSCluster.java and run DataNodeCluster, all of which wind up in -tests.jar? ('better' here means a shorter build cycle. I'm a terrible programmer so finding errors quickly is a priority for me :) - is it worth being concerned that 'mvn package' on what should be a no-op takes as long as it does? I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This is all on the trunk - git show-ref is 2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last Wednesday) Thanks! -Erik