Re: development environment for hadoop core

Erik Paulson Mon, 21 Jan 2013 08:37:26 -0800

On Wed, Jan 16, 2013 at 7:31 AM, Glen Mazza <gma...@talend.com> wrote:


> On 01/15/2013 06:50 PM, Erik Paulson wrote:
>
>> Hello -
>>
>> I'm curious what Hadoop developers use for their day-to-day hacking on
>> Hadoop. I'm talking changes to the Hadoop libraries and daemons, and not
>> developing Map-Reduce jobs or using using the HDFS Client libraries to
>> talk
>> to a filesystem from an application.
>>
>> I've checked out Hadoop, made minor changes and built it with Maven, and
>> tracked down the resulting artifacts in a target/ directory that I could
>> deploy. Is this typically how a cloudera/hortonworks/mapr/etc dev works,
>> or
>> are the IDEs more common?
>>
> I haven't built Hadoop yet myself.  Your use of "a" in "a target/
> directory" indicates you're also kind of new with Maven itself, as that's
> the standard output folder for any Maven project.  One of many nice things
> about Maven is once you learn how to build one project with it you pretty
> much know how to build any project with it, as everything's standardized
> with it.
>
> Probably best to stick with the command line for building and use Eclipse
> for editing, to keep things simple, but don't forget the mvn
> eclipse:eclipse command to set up Eclipse projects that you can
> subsequently import into your Eclipse IDE: http://www.jroller.com/gmazza/*
> *entry/web_service_tutorial#**EclipseSetup<http://www.jroller.com/gmazza/entry/web_service_tutorial#EclipseSetup>
>
>
>
>> I realize this sort of sounds like a dumb question, but I'm mostly curious
>> what I might be missing out on if I stay away from anything other than
>> vim,
>> and not being entirely sure where maven might be caching jars that it uses
>> to build,
>>
>
> That will be your local Maven repository, in an .m2 hidden folder in your
> user home directory.
>
>
>
>  and how careful I have to be to ensure that my changes wind up in
>> the right places without having to do a clean build every time.
>>
>>
> Maven can detect changes (using mvn install instead of mvn clean install),
> but I prefer doing clean builds.  You can use the -Dmaven.test.skip setting
> to speed up your "mvn clean installs" if you don't wish to run the tests
> each time.
>

Thanks to everyone for their advice last week, it's been helpful.

You're spot-on that I'm new to Maven, but I'm a little confused as to what
the different targets/goals are best to use. Here's my scenario.

What I'd like to get working is the DataNodeCluster, which lives in the
tests.

Running it from hadoop-hdfs-project/hadoop-hdfs/target as
'hadoop jar ./hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
org.apache.hadoop.hdfs.DataNodeCluster
-n 2'

blows up with a NPE inside of MiniDFSCluster - the offending line is
'dfsdir = conf.get(HDFS_MINIDFS_BASEDIR, null);' (line 2078 of
MiniDFSCluster.java)

I'm not worried about being able to figure out what's wrong (I'm pretty
sure it's that conf is still null when this gets called) - what I'm trying
to use this as is a way to understand what gets built when.

Just to check, I added a System.out.println one line before 2078 of
MiniDFSCluster, and recompiled from hadoop-common/hadoop-hdfs-project with

mvn package -DskipTests

Because I don't want to run all the tests.

This certainly compiles the codes - if I leave the semicolon off of my
change the compile fails, even with -DskipTests. However, it doesn't appear
to rebuild
target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
- the timestamp is still the old version.

It _does_ copy
target/hadoop-hdfs-3.0.0-SNAPSHOT/share/hadoop/hdfs/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
to target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar, or at least otherwise
update the timestamp on target/hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar (unless
it's copying or building it from somewhere else - but if it is, it's
picking up old versions of my code)

I only get an updated version if I ask for
mvn package -Pdist -DskipTests

Which is a 3 minute rebuild cycle, even for something as simple as changing
the text in my System.out.println. (Even just a mvn package -DskipTests
with no changes to any source code is a 40 second operation)

I haven't sat around and waited for 'mvn package' to run and fire off the
test suite. I don't know if that would result in an updated
hadoop-hdfs-3.0.0-SNAPSHOT-tests.jar
being built.

So, my question is:

- Is there a better maven target to use if I just want to update code in
MiniDFSCluster.java and run DataNodeCluster, all of which wind up in
-tests.jar? ('better' here means a shorter build cycle. I'm a terrible
programmer so finding errors quickly is a priority for me :)
- is it worth being concerned that 'mvn package' on what should be a no-op
takes as long as it does?

I'll sort out the NPE in Datanodecluster and file appropriate JIRAs. (This
is all on the trunk - git show-ref is
2fc22342f44055ae4a2b526408de7524bf1f9215 HEAD, so the trunk as of last
Wednesday)

Thanks!

-Erik

Re: development environment for hadoop core

Reply via email to