Thanks to all folks providing feedbacks and participating the discussions. @Owen, do you still have any concerns on going forward in the direction of Apache Commons (or other options, TLP)?
Thanks, Haifeng -----Original Message----- From: Chen, Haifeng [mailto:haifeng.c...@intel.com] Sent: Saturday, January 30, 2016 10:52 AM To: hdfs-dev@hadoop.apache.org Subject: RE: Hadoop encryption module as Apache Chimera incubator project >> I believe encryption is becoming a core part of Hadoop. I think that >> moving core components out of Hadoop is bad from a project management >> perspective. > Although it's certainly true that encryption capabilities (in HDFS, YARN, > etc.) are becoming core to Hadoop, I don't think that should really influence > whether or not the non-Hadoop-specific encryption routines should be part of > the Hadoop code base, or part of the code base of another project that Hadoop > depends on. If Chimera had existed as a library hosted at ASF when HDFS > encryption was first developed, HDFS probably would have just added that as a > dependency and been done with it. I don't think we would've copy/pasted the > code for Chimera into the Hadoop code base. Agree with ATM. I want to also make an additional clarification. I agree that the encryption capabilities are becoming core to Hadoop. While this effort is to put common and shared encryption routines such as crypto stream implementations into a scope which can be widely shared across the Apache ecosystem. This doesn't move Hadoop encryption out of Hadoop (that is not possible). Agree if we make it a separate and independent releases project in Hadoop takes a step further than the existing approach and solve some issues (such as libhadoop.so problem). Frankly speaking, I think it is not the best option we can try. I also expect that an independent release project within Hadoop core will also complicate the existing release ideology of Hadoop release. Thanks, Haifeng -----Original Message----- From: Aaron T. Myers [mailto:a...@cloudera.com] Sent: Friday, January 29, 2016 9:51 AM To: hdfs-dev@hadoop.apache.org Subject: Re: Hadoop encryption module as Apache Chimera incubator project On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley <omal...@apache.org> wrote: > I believe encryption is becoming a core part of Hadoop. I think that > moving core components out of Hadoop is bad from a project management > perspective. > Although it's certainly true that encryption capabilities (in HDFS, YARN, etc.) are becoming core to Hadoop, I don't think that should really influence whether or not the non-Hadoop-specific encryption routines should be part of the Hadoop code base, or part of the code base of another project that Hadoop depends on. If Chimera had existed as a library hosted at ASF when HDFS encryption was first developed, HDFS probably would have just added that as a dependency and been done with it. I don't think we would've copy/pasted the code for Chimera into the Hadoop code base. > To put it another way, a bug in the encryption routines will likely > become a security problem that security@hadoop needs to hear about. > I don't think > adding a separate project in the middle of that communication chain is > a good idea. The same applies to data corruption problems, and so on... > Isn't the same true of all the libraries that Hadoop currently depends upon? If the commons-httpclient library (or commons-codec, or commons-io, or guava, or...) has a security vulnerability, we need to know about it so that we can update our dependency to a fixed version. This case doesn't seem materially different than that. > > > > It may be good to keep at generalized place(As in the discussion, we > > thought that place could be Apache Commons). > > > Apache Commons is a collection of *Java* projects, so Chimera as a > JNI-based library isn't a natural fit. > Could very well be that Apache Commons's charter would preclude Chimera. You probably know better than I do about that. > Furthermore, Apache Commons doesn't > have its own security list so problems will go to the generic > secur...@apache.org. > That seems easy enough to remedy, if they wanted to, and besides I'm not sure why that would influence this discussion. In my experience projects that don't have a separate security@project.a.o mailing list tend to just handle security issues on their private@project.a.o mailing list, which seems fine to me. > > Why do you think that Apache Commons is a better home than Hadoop? > I'm certainly not at all wedded to Apache Commons, that just seemed like a natural place to put it to me. Could be that a brand new TLP might make more sense. I *do* think that if other non-Hadoop projects want to make use of Chimera, which as I understand it is the goal which started this thread, then Chimera should exist outside of Hadoop so that: a) Projects that have nothing to do with Hadoop can just depend directly on Chimera, which has nothing Hadoop-specific in there. b) The Hadoop project doesn't have to export/maintain/concern itself with yet another publicly-consumed interface. c) Chimera can have its own (presumably much faster) release cadence completely separate from Hadoop. -- Aaron T. Myers Software Engineer, Cloudera