Re: Hadoop encryption module as Apache Chimera incubator project

Larry McCay Wed, 20 Jan 2016 18:44:31 -0800

That’s a good point, Kai.

If what we are looking for is some level of autonomy then it would need to be a 
module with its own release train - or at least be able to.


On Jan 20, 2016, at 9:18 PM, Zheng, Kai <[email protected]> wrote:

> Just a question. Becoming a separate jar/module in Apache Commons means 
> Chimera or the module can be released separately or in a timely manner, not 
> coupling with other modules for release in the project? Thanks.
> 
> Regards,
> Kai
> 
> -----Original Message-----
> From: Aaron T. Myers [mailto:[email protected]] 
> Sent: Thursday, January 21, 2016 9:44 AM
> To: [email protected]
> Subject: Re: Hadoop encryption module as Apache Chimera incubator project
> 
> +1 for Hadoop depending upon Chimera, assuming Chimera can get
> hosted/released under some Apache project umbrella. If that's Apache Commons 
> (which makes a lot of sense to me) then I'm also a big +1 on Andrew's 
> suggestion that we make it a separate module.
> 
> Uma, would you be up for approaching the Apache Commons folks saying that 
> you'd like to contribute Chimera? I'd recommend saying that Hadoop and Spark 
> are both on board to depend on this.
> 
> --
> Aaron T. Myers
> Software Engineer, Cloudera
> 
> On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang <[email protected]>
> wrote:
> 
>> Thanks Uma for putting together this proposal. Overall sounds good to 
>> me,
>> +1 for these improvements. A few comments/questions:
>> 
>> * If it becomes part of Apache Commons, could we make Chimera a 
>> separate JAR? We have real difficulties bumping dependency versions 
>> right now, so ideally we don't need to bump our existing Commons 
>> dependencies to use Chimera.
>> * With this refactoring, do we have confidence that we can get our 
>> desired changes merged and released in a timely fashion? e.g. if we 
>> find another bug like HADOOP-11343, we'll first need to get the fix 
>> into Chimera, have a new Chimera release, then bump Hadoop's Chimera 
>> dependency. This also relates to the previous point, it's easier to do 
>> this dependency bump if Chimera is a separate JAR.
>> 
>> Best,
>> Andrew
>> 
>> On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < 
>> [email protected]>
>> wrote:
>> 
>>> Hi Devs,
>>> 
>>>  Some of our Hadoop developers working with Spark community to 
>>> implement the shuffle encryption. While implementing that, they 
>>> realized some/most
>> of
>>> the code in Hadoop encryption code and their  implemention code have 
>>> to
>> be
>>> duplicated. This leads to an idea to create separate library, named 
>>> it as Chimera (https://github.com/intel-hadoop/chimera). It is an 
>>> optimized cryptographic library. It provides Java API for both 
>>> cipher level and
>> Java
>>> stream level to help developers implement high performance AES 
>>> encryption/decryption with the minimum code and effort. Chimera was 
>>> originally based Hadoop crypto code but was improved and generalized 
>>> a
>> lot
>>> for supporting wider scope of data encryption needs for more 
>>> components
>> in
>>> the community.
>>> 
>>> So, now team is thinking to make this library code as open source 
>>> project via Apache Incubation.  Proposal is Chimera to join the 
>>> Apache as incubating or Apache commons for facilitating its adoption.
>>> 
>>> In general this will get the following advantages:
>>> 1. As Chimera embedded the native in jar (similar to Snappy java), 
>>> it solves the current issues in Hadoop that a HDFS client has to 
>>> depend libhadoop.so if the client needs to read encryption zone in 
>>> HDFS. This means a HDFS client may has to depend a Hadoop 
>>> installation in local machine. For example, HBase uses depends on 
>>> HDFS client jar other than a Hadoop installation and then has no 
>>> access to libhadoop.so. So HBase
>> cannot
>>> use an encryption zone or it cause error.
>>> 2. Apache Spark shuffle and spill encryption could be another 
>>> example where we can use Chimera. We see the fact that the stream 
>>> encryption for Spark shuffle and spill doesn’t require a stream 
>>> cipher like AES/CTR, although the code shares the common 
>>> characteristics of a stream style
>> API.
>>> We also see the need of optimized Cipher for non-stream style use 
>>> cases such as network encryption such as RPC. These improvements 
>>> actually can
>> be
>>> shared by more projects of need.
>>> 
>>> 3. Simplified code in Hadoop to use dedicated library. And drives 
>>> more improvements. For example, current the Hadoop crypto code API 
>>> is totally based on AES/CTR although it has cipher suite configurations.
>>> 
>>> AES/CTR is for HDFS data encryption at rest, but it doesn’t 
>>> necessary to be AES/CTR for all the cases such as Data transfer 
>>> encryption and intermediate file encryption.
>>> 
>>> 
>>> 
>>> So, we wanted to check with Hadoop community about this proposal. 
>>> Please provide your feedbacks on it.
>>> 
>>> Regards,
>>> Uma
>>> 
>>

Re: Hadoop encryption module as Apache Chimera incubator project

Reply via email to