Re: development environment for hadoop core

Colin McCabe Mon, 21 Jan 2013 10:31:30 -0800

Hi Erik,

Eclipse can run junit tests very rapidly.  If you want a shorter test
cycle, that's one way to get it.


There is also Maven-shell, which reduces some of the overhead of starting
Maven.  But I haven't used it so I can't really comment.

cheers,
Colin


On Mon, Jan 21, 2013 at 8:36 AM, Erik Paulson <epaul...@unit1127.com> wrote:

> On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <gma...@talend.com> wrote:
>
> > On 01/15/2013 06:50 PM, Erik Paulson wrote:
> >
> >> Hello -
> >>
> >> I'm curious what Hadoop developers use for their day-to-day hacking on
> >> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not
> >> developing Map-Reduce jobs or using using the HDFS Client libraries to
> >> talk
> >> to a filesystem from an application.
> >>
> >> I've checked out Hadoop, made minor changes and built it with Maven, and
> >> tracked down the resulting artifacts in a target/ directory that I could
> >> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works,
> >> or
> >> are the IDEs more common?
> >>
> > I haven't built Hadoop yet myself.  Your use of "a" in "a target/
> > directory" indicates you're also kind of new with Maven itself, as that's
> > the standard output folder for any Maven project.  One of many nice
> things
> > about Maven is once you learn how to build one project with it you pretty
> > much know how to build any project with it, as everything's standardized
> > with it.
> >
> > Probably best to stick with the command line for building and use Eclipse
> > for editing, to keep things simple, but don't forget the mvn
> > eclipse:eclipse command to set up Eclipse projects that you can
> > subsequently import into your Eclipse IDE:
> http://www.jroller.com/gmazza/*
> > *entry/web_service_tutorial#**EclipseSetup<
> http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup>
> >
> >
> >
> >> I realize this sort of sounds like a dumb question, but I'm mostly
> curious
> >> what I might be missing out on if I stay away from anything other than
> >> vim,
> >> and not being entirely sure where maven might be caching jars that it
> uses
> >> to build,
> >>
> >
> > That will be your local Maven repository, in an .m2 hidden folder in your
> > user home directory.
> >
> >
> >
> >  and how careful I have to be to ensure that my changes wind up in
> >> the right places without having to do a clean build every time.
> >>
> >>
> > Maven can detect changes (using mvn install instead of mvn clean
> install),
> > but I prefer doing clean builds.  You can use the -Dmaven.test.skip
> setting
> > to speed up your "mvn clean installs" if you don't wish to run the tests
> > each time.
> >
>
> Thanks to everyone for their advice last week, it's been helpful.
>
> You're spot-on that I'm new to Maven, but I'm a little confused as to what
> the different targets/goals are best to use. Here's my scenario.
>
> What I'd like to get working is the DataNodeCluster, which lives in the
> tests.
>
> Running it from hadoop-hdfs-project/hadoop-hdfs/target as
> 'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
> org.apache.hadoop.hdfs.DataNodeCluster
> -n 2'
>
> blows up with a NPE inside of MiniDFSCluster - the offending line is
> 'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of
> MiniDFSCluster.java)
>
> I'm not worried about being able to figure out what's wrong (I'm pretty
> sure it's that conf is still null when this gets called) - what I'm trying
> to use this as is a way to understand what gets built when.
>
> Just to check, I added a System.out.println one line before 2078 of
> MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with
>
> mvn package -DskipTests
>
> Because I don't want to run all the tests.
>
> This certainly compiles the codes - if I leave the semicolon off of my
> change the compile fails, even with -DskipTests. However, it doesn't appear
> to rebuild
>
> target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
> - the timestamp is still the old version.
>
> It _does_ copy
>
> target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
> to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise
> update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless
> it's copying or building it from somewhere else - but if it is, it's
> picking up old versions of my code)
>
> I only get an updated version if I ask for
> mvn package -Pdist -DskipTests
>
> Which is a 3 minute rebuild cycle, even for something as simple as changing
> the text in my System.out.println. (Even just a mvn package -DskipTests
> with no changes to any source code is a 40 second operation)
>
> I haven't sat around and waited for 'mvn package' to run and fire off the
> test suite. I don't know if that would result in an updated
> hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
> being built.
>
> So, my question is:
>
> - Is there a better maven target to use if I just want to update code in
> MiniDFSCluster.java and run DataNodeCluster, all of which wind up in
> -tests.jar? ('better' here means a shorter build cycle. I'm a terrible
> programmer so finding errors quickly is a priority for me :)
> - is it worth being concerned that 'mvn package' on what should be a no-op
> takes as long as it does?
>
> I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This
> is all on the trunk - git show-ref is
> 2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last
> Wednesday)
>
> Thanks!
>
> -Erik
>

Re: development environment for hadoop core

Reply via email to