>> Let's do one step at a time. There is a clear need for common encryption, 
>> and let's focus on making that happen.
Strongly agree.

-----Original Message-----
From: Reynold Xin [mailto:r...@databricks.com] 
Sent: Thursday, February 4, 2016 8:50 AM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Hadoop encryption module as Apache Chimera incubator project

Let's do one step at a time. There is a clear need for common encryption, and 
let's focus on making that happen.

On Wed, Feb 3, 2016 at 4:48 PM, Zheng, Kai <kai.zh...@intel.com> wrote:

> I thought this discussion would switch to common-dev@ now?
>
> >> Would it make sense to also package some of the compression 
> >> libraries,
> and maybe some of the text processing from MapReduce? Evolving some of 
> this code to a common library with few/no dependencies would be 
> generally useful. As a subproject, it could have a broader scope that 
> could evolve into a viable TLP.
>
> Sounds like a great idea to make the potential TLP more sense!! I 
> thought it could be organized like in Apache common, the security, 
> compression and other common text related things could be organized in 
> different independent modules. Perhaps Hadoop conf could also be 
> considered. These modules could rely on some common utility module. It 
> can still be Hadoop background or powered, and eventually we would 
> have a good place for some Hadoop common codes to move into to benefit 
> and impact even more broad scope than Hadoop itself.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Chris Douglas [mailto:cdoug...@apache.org]
> Sent: Thursday, February 04, 2016 7:26 AM
> To: hdfs-dev@hadoop.apache.org
> Subject: Re: Hadoop encryption module as Apache Chimera incubator 
> project
>
> I went through the repository, and now understand the reasoning that 
> would locate this code in Apache Commons. This isn't proposing to 
> extract much of the implementation and it takes none of the 
> integration. It's limited to interfaces to crypto libraries and 
> streams/configuration. It might be a reasonable fit for commons-codec, 
> but that's a pretty sparse library and driving the release cadence 
> might be more complicated. It'd be worth discussing on their lists (please 
> also CC common-dev@).
>
> Chimera would be a boutique TLP, unless we wanted to draw out more of 
> the integration and tooling. Is that a goal you're interested in pursuing?
> There's a tension between keeping this focused and including enough 
> functionality to make it viable as an independent component. By way of 
> example, Hadoop's common project requires too many dependencies and 
> carries too much historical baggage for other projects to rely on.
> I agree with Colin/Steve: we don't want this to grow into another 
> guava-like dependency that creates more work in conflicts than it 
> saves in implementation...
>
> Would it make sense to also package some of the compression libraries, 
> and maybe some of the text processing from MapReduce? Evolving some of 
> this code to a common library with few/no dependencies would be 
> generally useful. As a subproject, it could have a broader scope that 
> could evolve into a viable TLP. If the encryption libraries are the 
> only ones you're interested in pulling out, then Apache Commons does 
> seem like a better target than a separate project. -C
>
>
> On Wed, Feb 3, 2016 at 1:49 AM, Chris Douglas <cdoug...@apache.org> wrote:
> > On Wed, Feb 3, 2016 at 12:48 AM, Gangumalla, Uma 
> > <uma.ganguma...@intel.com> wrote:
> >>>Standing in the point of shared fundamental piece of code like 
> >>>this, I do think Apache Commons might be the best direction which 
> >>>we can try as the first effort. In this direction, we still need to 
> >>>work with Apache Common community for buying in and accepting the proposal.
> >> Make sense.
> >
> > Makes sense how?
> >
> >> For this we should define the independent release cycles for this 
> >> project and it would just place under Hadoop tree if we all 
> >> conclude with this option at the end.
> >
> > Yes.
> >
> >> [Chris]
> >>>If Chimera is not successful as an independent project or stalls, 
> >>>Hadoop and/or Spark and/or $project will have to reabsorb it as 
> >>>maintainers.
> >>>
> >> I am not so strong on this point. If we assume project would be 
> >> unsuccessful, it can be unsuccessful(less maintained) even under hadoop.
> >> But if other projects depending on this piece then they would get 
> >> less support. Of course right now we feel this piece of code is 
> >> very important and we feel(expect) it can be successful as 
> >> independent project, irrespective of whether it as separate project 
> >> outside hadoop
> or inside.
> >> So, I feel this point would not really influence to judge the
> discussion.
> >
> > Sure; code can idle anywhere, but that wasn't the point I was after.
> > You propose to extract code from Hadoop, but if Chimera fails then 
> > what recourse do we have among the other projects taking a 
> > dependency on it? Splitting off another project is feasible, but 
> > Chimera should be sustainable before this PMC can divest itself of 
> > responsibility for security libraries. That's a pretty low bar.
> >
> > Bundling the library with the jar is helpful; I've used that before.
> > It should prefer (updated) libraries from the environment, if 
> > configured. Otherwise it's a pain (or impossible) for ops to patch 
> > security bugs. -C
> >
> >>>-----Original Message-----
> >>>From: Colin P. McCabe [mailto:cmcc...@apache.org]
> >>>Sent: Wednesday, February 3, 2016 4:56 AM
> >>>To: hdfs-dev@hadoop.apache.org
> >>>Subject: Re: Hadoop encryption module as Apache Chimera incubator 
> >>>project
> >>>
> >>>It's great to see interest in improving this functionality.  I 
> >>>think Chimera could be successful as an Apache project.  I don't 
> >>>have a strong opinion one way or the other as to whether it belongs 
> >>>as part of Hadoop or separate.
> >>>
> >>>I do think there will be some challenges splitting this 
> >>>functionality out into a separate jar, because of the way our 
> >>>CLASSPATH works right
> now.
> >>>For example, let's say that Hadoop depends on Chimera 1.2 and Spark 
> >>>depends on Chimera 1.1.  Now Spark jobs have two different versions 
> >>>fighting it out on the classpath, similar to the situation with 
> >>>Guava and other libraries.  Perhaps if Chimera adopts a policy of 
> >>>strong backwards compatibility, we can just always use the latest 
> >>>jar, but it still seems likely that there will be problems.  There 
> >>>are various classpath isolation ideas that could help here, but 
> >>>they are big projects in their own right and we don't have a clear 
> >>>timeline for them.  If this does end up being a separate jar, we 
> >>>may need to shade it to avoid all these issues.
> >>>
> >>>Bundling the JNI glue code in the jar itself is an interesting 
> >>>idea, which we have talked about before for libhadoop.so.  It 
> >>>doesn't really have anything to do with the question of TLP vs. 
> >>>non-TLP, of
> course.
> >>>We could do that refactoring in Hadoop itself.  The really 
> >>>complicated part of bundling JNI code in a jar is that you need to 
> >>>create jars for every cross product of (JVM version, openssl 
> >>>version,
> operating system).
> >>>For example, you have the RHEL6 build for openJDK7 using openssl 1.0.1e.
> >>>If you change any one thing-- say, change openJDK7 to Oracle JDK8, 
> >>>then you might need to rebuild.  And certainly using Ubuntu would 
> >>>be a rebuild.  And so forth.  This kind of clashes with Maven's 
> >>>philosophy of pulling prebuilt jars from the internet.
> >>>
> >>>Kai Zheng's question about whether we would bundle openSSL's 
> >>>libraries is a good one.  Given the high rate of new 
> >>>vulnerabilities discovered in that library, it seems like bundling 
> >>>would require Hadoop users and vendors to update very frequently, 
> >>>much more frequently than Hadoop is traditionally updated.  So 
> >>>probably we would
> not choose to bundle openssl.
> >>>
> >>>best,
> >>>Colin
> >>>
> >>>On Tue, Feb 2, 2016 at 12:29 AM, Chris Douglas 
> >>><cdoug...@apache.org>
> >>>wrote:
> >>>> As a subproject of Hadoop, Chimera could maintain its own cadence.
> >>>> There's also no reason why it should maintain dependencies on 
> >>>> other parts of Hadoop, if those are separable. How is this 
> >>>> solution inadequate?
> >>>>
> >>>> If Chimera is not successful as an independent project or stalls, 
> >>>> Hadoop and/or Spark and/or $project will have to reabsorb it as 
> >>>> maintainers. Projects have high mortality in early life, and a 
> >>>> fight over inheritance/maintenance is something we'd like to avoid.
> >>>> If, on the other hand, it develops enough of a community where it 
> >>>> is obviously viable, then we can (and should) break it out as a 
> >>>> TLP (as we have before). If other Apache projects take a 
> >>>> dependency on Chimera, we're open to adding them to security@hadoop.
> >>>>
> >>>> Unlike Yetus, which was largely rewritten right before it was 
> >>>> made into a TLP, security in Hadoop has a complicated pedigree. 
> >>>> If Chimera eventually becomes a TLP, it seems fair to include 
> >>>> those who work on it while it is a subproject. Declared upfront, 
> >>>> that criterion is fairer than any post hoc justification, and 
> >>>> will lead to a more accurate account of its community than a 
> >>>> subset of the Hadoop PMC/committers that volunteer. -C
> >>>>
> >>>>
> >>>> On Mon, Feb 1, 2016 at 9:29 PM, Chen, Haifeng 
> >>>><haifeng.c...@intel.com>
> >>>>wrote:
> >>>>> Thanks to all folks providing feedbacks and participating the 
> >>>>>discussions.
> >>>>>
> >>>>> @Owen, do you still have any concerns on going forward in the 
> >>>>>direction of Apache Commons (or other options, TLP)?
> >>>>>
> >>>>> Thanks,
> >>>>> Haifeng
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Chen, Haifeng [mailto:haifeng.c...@intel.com]
> >>>>> Sent: Saturday, January 30, 2016 10:52 AM
> >>>>> To: hdfs-dev@hadoop.apache.org
> >>>>> Subject: RE: Hadoop encryption module as Apache Chimera 
> >>>>> incubator project
> >>>>>
> >>>>>>> I believe encryption is becoming a core part of Hadoop. I 
> >>>>>>>think that moving core components out of Hadoop is bad from a 
> >>>>>>>project management perspective.
> >>>>>
> >>>>>> Although it's certainly true that encryption capabilities (in 
> >>>>>>HDFS, YARN, etc.) are becoming core to Hadoop, I don't think 
> >>>>>>that should really influence whether or not the 
> >>>>>>non-Hadoop-specific encryption routines should be part of the 
> >>>>>>Hadoop code base, or part of the code base of another project that 
> >>>>>>Hadoop depends on.
> >>>>>>If Chimera had existed as a library hosted at ASF when HDFS 
> >>>>>>encryption was first developed, HDFS probably would have just 
> >>>>>>added that as a dependency and been done with it. I don't think 
> >>>>>>we would've copy/pasted the code for Chimera into the Hadoop code base.
> >>>>>
> >>>>> Agree with ATM. I want to also make an additional clarification. 
> >>>>>I agree that the encryption capabilities are becoming core to Hadoop.
> >>>>>While this effort is to put common and shared encryption routines 
> >>>>>such as crypto stream implementations into a scope which can be 
> >>>>>widely shared across the Apache ecosystem. This doesn't move 
> >>>>>Hadoop encryption out of Hadoop (that is not possible).
> >>>>>
> >>>>> Agree if we make it a separate and independent releases project 
> >>>>>in Hadoop takes a step further than the existing approach and 
> >>>>>solve some issues (such as libhadoop.so problem). Frankly 
> >>>>>speaking, I think it is not the best option we can try. I also 
> >>>>>expect that an independent release project within Hadoop core 
> >>>>>will also complicate the existing release ideology of Hadoop release.
> >>>>>
> >>>>> Thanks,
> >>>>> Haifeng
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Aaron T. Myers [mailto:a...@cloudera.com]
> >>>>> Sent: Friday, January 29, 2016 9:51 AM
> >>>>> To: hdfs-dev@hadoop.apache.org
> >>>>> Subject: Re: Hadoop encryption module as Apache Chimera 
> >>>>> incubator project
> >>>>>
> >>>>> On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley 
> >>>>><omal...@apache.org>
> >>>>>wrote:
> >>>>>
> >>>>>> I believe encryption is becoming a core part of Hadoop. I think 
> >>>>>>that  moving core components out of Hadoop is bad from a project 
> >>>>>>management perspective.
> >>>>>>
> >>>>>
> >>>>> Although it's certainly true that encryption capabilities (in 
> >>>>>HDFS,  YARN,
> >>>>> etc.) are becoming core to Hadoop, I don't think that should 
> >>>>>really influence whether or not the non-Hadoop-specific 
> >>>>>encryption routines should be part of the Hadoop code base, or 
> >>>>>part of the code base of another project that Hadoop depends on. 
> >>>>>If Chimera had existed as a library hosted at ASF when HDFS 
> >>>>>encryption was first developed, HDFS probably would have just 
> >>>>>added that as a dependency and been done with it. I don't think 
> >>>>>we would've copy/pasted the code for Chimera into the Hadoop code base.
> >>>>>
> >>>>>
> >>>>>> To put it another way, a bug in the encryption routines will 
> >>>>>> likely become a security problem that security@hadoop needs to
> hear about.
> >>>>>>
> >>>>> I don't think
> >>>>>> adding a separate project in the middle of that communication 
> >>>>>>chain  is a good idea. The same applies to data corruption 
> >>>>>>problems, and so on...
> >>>>>>
> >>>>>
> >>>>> Isn't the same true of all the libraries that Hadoop currently 
> >>>>>depends upon? If the commons-httpclient library (or 
> >>>>>commons-codec, or commons-io, or guava, or...) has a security 
> >>>>>vulnerability, we need to know about it so that we can update our 
> >>>>>dependency to a fixed
> version.
> >>>>>This case doesn't seem materially different than that.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> > It may be good to keep at generalized place(As in the 
> >>>>>> > discussion, we thought that place could be Apache Commons).
> >>>>>>
> >>>>>>
> >>>>>> Apache Commons is a collection of *Java* projects, so Chimera 
> >>>>>> as a JNI-based library isn't a natural fit.
> >>>>>>
> >>>>>
> >>>>> Could very well be that Apache Commons's charter would preclude 
> >>>>>Chimera.
> >>>>> You probably know better than I do about that.
> >>>>>
> >>>>>
> >>>>>> Furthermore, Apache Commons doesn't have its own security list 
> >>>>>> so problems will go to the generic secur...@apache.org.
> >>>>>>
> >>>>>
> >>>>> That seems easy enough to remedy, if they wanted to, and besides 
> >>>>>I'm not sure why that would influence this discussion. In my 
> >>>>>experience projects that don't have a separate 
> >>>>>security@project.a.o mailing list tend to just handle security 
> >>>>>issues on their private@project.a.o mailing list, which seems fine to me.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Why do you think that Apache Commons is a better home than Hadoop?
> >>>>>>
> >>>>>
> >>>>> I'm certainly not at all wedded to Apache Commons, that just 
> >>>>>seemed like a natural place to put it to me. Could be that a 
> >>>>>brand new TLP might make more sense.
> >>>>>
> >>>>> I *do* think that if other non-Hadoop projects want to make use 
> >>>>>of Chimera, which as I understand it is the goal which started 
> >>>>>this thread, then Chimera should exist outside of Hadoop so that:
> >>>>>
> >>>>> a) Projects that have nothing to do with Hadoop can just depend 
> >>>>>directly on Chimera, which has nothing Hadoop-specific in there.
> >>>>>
> >>>>> b) The Hadoop project doesn't have to export/maintain/concern 
> >>>>>itself with yet another publicly-consumed interface.
> >>>>>
> >>>>> c) Chimera can have its own (presumably much faster) release 
> >>>>>cadence completely separate from Hadoop.
> >>>>>
> >>>>> --
> >>>>> Aaron T. Myers
> >>>>> Software Engineer, Cloudera
> >>
>

Reply via email to