Hey Steve,

We would like to be able to contribute back where appropriate. We think that 
our BloomFilter improvements and some of our MapFile improvements are generally 
useful, and those should be pretty natural contributions back to Hadoop. Other 
modifications may not be so obviously generally useful, such as hard-coded 
optimizations for Accumulo. However, it is certainly our goal to reduce 
unnecessary code forks.

The classloader project was a challenge, and it took us several attempts to get 
it right. It sure is cool now that it works. We still have a number of tickets 
on our todo list in this area, like more convenient distribution mechanisms for 
user-defined functions (i.e. Iterators or Coprocessors) across a Hadoop cluster.

Thanks for the pointers to BigTop and MR-279. Those certainly look promising 
for better integration with the Apache brand. I'm looking forward to lots of 
great contributions from the community to the roadmap as Accumulo moves into 
incubation.

Cheers,
Adam


----- Original Message -----
From: Steve Loughran <ste...@apache.org>
To: general@incubator.apache.org
Sent: Tue, 06 Sep 2011 15:09:44 -0000
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On 04/09/11 17:39, Billie J Rinaldi wrote:
> Bernd,
>
> We would divide the derived code into two categories: that which we modified 
> only slightly (for example to allow us to extend it) and that which we 
> modified heavily.  Now that we are able to interact openly, we hope to supply 
> much of that back to the original projects.  There is a detailed overview 
> below.  We identified these by searching for "copyright" in our code.  The 
> total count came to just over 14,000 lines.  We use "heavily" as a 
> qualitative assessment of how much we modified, but we could certainly come 
> up with quantitative assessments.
>
> 5400 lines: slightly modified versions of Hadoop BCFile and related classes
>              (our current file format extends BCFile)
> 4300 lines: heavily modified versions of MapFile and SequenceFile
>              (no longer our default file format, but still included for 
> backward compatibility)

Internal compatibility or external? If internal only I'd keep that out 
of the public codebase.

> 2000 lines: heavily modified versions of HBase BlockCache and related files
>              (Adam didn't count the tests when he said 1500 lines)

+1 for more tests.

> 1300 lines: heavily modified versions of Hadoop BloomFilters

-any plan to contribute back to hadoop-core, or are they too 
incompatible now?


> 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
> 325 lines: our Value is an immutable version of Hadoop BytesWritable

-any plan to contribute back to hadoop-core?

> 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

classloaders scare me. If we had an ASF-certified-classloader-hacker 
proposal where only approved people could write CLs for ASF code I'd be 
+1 for it, even though I'd fail the test myself.

I understand why you've forked off your own versions of some of the 
Hadoop and HBase core -it is not only your right, it gets the changes in 
on your schedule. I have been known to do this myself.


Ideally those thing have to get back to a (future) version of Hadoop, 
which people like Doug and Owen can help with. Having forked code in the 
ASF codebase is something to avoid. Again, I speak from experience.

I think the proposal ought to consider how they fit in with BigTop too, 
so it can be part of the full apache hadoop stack deploy/test process.

I also think that the roadmap for the system may want to think about 
MR-279 integration; would that architecture be a better way to run 
Accumulo code within a Hadoop cluster.

-Steve

(BTW: I'm not going to volunteer as a mentor/committer, my focus is on 
getting back into Hadoop core coding without distractions)

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to