That’s a good point, Kai. If what we are looking for is some level of autonomy then it would need to be a module with its own release train - or at least be able to.
On Jan 20, 2016, at 9:18 PM, Zheng, Kai <kai.zh...@intel.com> wrote: > Just a question. Becoming a separate jar/module in Apache Commons means > Chimera or the module can be released separately or in a timely manner, not > coupling with other modules for release in the project? Thanks. > > Regards, > Kai > > -----Original Message----- > From: Aaron T. Myers [mailto:a...@cloudera.com] > Sent: Thursday, January 21, 2016 9:44 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Hadoop encryption module as Apache Chimera incubator project > > +1 for Hadoop depending upon Chimera, assuming Chimera can get > hosted/released under some Apache project umbrella. If that's Apache Commons > (which makes a lot of sense to me) then I'm also a big +1 on Andrew's > suggestion that we make it a separate module. > > Uma, would you be up for approaching the Apache Commons folks saying that > you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark > are both on board to depend on this. > > -- > Aaron T. Myers > Software Engineer, Cloudera > > On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> Thanks Uma for putting together this proposal. Overall sounds good to >> me, >> +1 for these improvements. A few comments/questions: >> >> * If it becomes part of Apache Commons, could we make Chimera a >> separate JAR? We have real difficulties bumping dependency versions >> right now, so ideally we don't need to bump our existing Commons >> dependencies to use Chimera. >> * With this refactoring, do we have confidence that we can get our >> desired changes merged and released in a timely fashion? e.g. if we >> find another bug like HADOOP-11343, we'll first need to get the fix >> into Chimera, have a new Chimera release, then bump Hadoop's Chimera >> dependency. This also relates to the previous point, it's easier to do >> this dependency bump if Chimera is a separate JAR. >> >> Best, >> Andrew >> >> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < >> uma.ganguma...@intel.com> >> wrote: >> >>> Hi Devs, >>> >>> Some of our Hadoop developers working with Spark community to >>> implement the shuffle encryption. While implementing that, they >>> realized some/most >> of >>> the code in Hadoop encryption code and their implemention code have >>> to >> be >>> duplicated. This leads to an idea to create separate library, named >>> it as Chimera (https://github.com/intel-hadoop/chimera). It is an >>> optimized cryptographic library. It provides Java API for both >>> cipher level and >> Java >>> stream level to help developers implement high performance AES >>> encryption/decryption with the minimum code and effort. Chimera was >>> originally based Hadoop crypto code but was improved and generalized >>> a >> lot >>> for supporting wider scope of data encryption needs for more >>> components >> in >>> the community. >>> >>> So, now team is thinking to make this library code as open source >>> project via Apache Incubation. Proposal is Chimera to join the >>> Apache as incubating or Apache commons for facilitating its adoption. >>> >>> In general this will get the following advantages: >>> 1. As Chimera embedded the native in jar (similar to Snappy java), >>> it solves the current issues in Hadoop that a HDFS client has to >>> depend libhadoop.so if the client needs to read encryption zone in >>> HDFS. This means a HDFS client may has to depend a Hadoop >>> installation in local machine. For example, HBase uses depends on >>> HDFS client jar other than a Hadoop installation and then has no >>> access to libhadoop.so. So HBase >> cannot >>> use an encryption zone or it cause error. >>> 2. Apache Spark shuffle and spill encryption could be another >>> example where we can use Chimera. We see the fact that the stream >>> encryption for Spark shuffle and spill doesn’t require a stream >>> cipher like AES/CTR, although the code shares the common >>> characteristics of a stream style >> API. >>> We also see the need of optimized Cipher for non-stream style use >>> cases such as network encryption such as RPC. These improvements >>> actually can >> be >>> shared by more projects of need. >>> >>> 3. Simplified code in Hadoop to use dedicated library. And drives >>> more improvements. For example, current the Hadoop crypto code API >>> is totally based on AES/CTR although it has cipher suite configurations. >>> >>> AES/CTR is for HDFS data encryption at rest, but it doesn’t >>> necessary to be AES/CTR for all the cases such as Data transfer >>> encryption and intermediate file encryption. >>> >>> >>> >>> So, we wanted to check with Hadoop community about this proposal. >>> Please provide your feedbacks on it. >>> >>> Regards, >>> Uma >>> >>