On 04/09/11 17:39, Billie J Rinaldi wrote:
Bernd,

We would divide the derived code into two categories: that which we modified only slightly (for 
example to allow us to extend it) and that which we modified heavily.  Now that we are able to 
interact openly, we hope to supply much of that back to the original projects.  There is a detailed 
overview below.  We identified these by searching for "copyright" in our code.  The total 
count came to just over 14,000 lines.  We use "heavily" as a qualitative assessment of 
how much we modified, but we could certainly come up with quantitative assessments.

5400 lines: slightly modified versions of Hadoop BCFile and related classes
             (our current file format extends BCFile)
4300 lines: heavily modified versions of MapFile and SequenceFile
             (no longer our default file format, but still included for 
backward compatibility)

Internal compatibility or external? If internal only I'd keep that out of the public codebase.

2000 lines: heavily modified versions of HBase BlockCache and related files
             (Adam didn't count the tests when he said 1500 lines)

+1 for more tests.

1300 lines: heavily modified versions of Hadoop BloomFilters

-any plan to contribute back to hadoop-core, or are they too incompatible now?


419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
325 lines: our Value is an immutable version of Hadoop BytesWritable

-any plan to contribute back to hadoop-core?

142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

classloaders scare me. If we had an ASF-certified-classloader-hacker proposal where only approved people could write CLs for ASF code I'd be +1 for it, even though I'd fail the test myself.

I understand why you've forked off your own versions of some of the Hadoop and HBase core -it is not only your right, it gets the changes in on your schedule. I have been known to do this myself.


Ideally those thing have to get back to a (future) version of Hadoop, which people like Doug and Owen can help with. Having forked code in the ASF codebase is something to avoid. Again, I speak from experience.

I think the proposal ought to consider how they fit in with BigTop too, so it can be part of the full apache hadoop stack deploy/test process.

I also think that the roadmap for the system may want to think about MR-279 integration; would that architecture be a better way to run Accumulo code within a Hadoop cluster.

-Steve

(BTW: I'm not going to volunteer as a mentor/committer, my focus is on getting back into Hadoop core coding without distractions)

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to