Re: Hadoop encryption module as Apache Chimera incubator project

Gangumalla, Uma Wed, 20 Jan 2016 19:20:45 -0800

Hi All, 
Thanks Andrew, ATM, Yi, Kai, Larry. Thanks Haifeng on clarifying release
stuff.

Please find my responses below.

Andrew wrote:
If it becomes part of Apache Commons, could we make Chimera a separate
JAR? We have real difficulties bumping dependency versions right now, so
ideally we don't need to bump our existing Commons dependencies to use
Chimera.
[UMA] Yes, We plan to make separate Jar.

Andrew wrote:
With this refactoring, do we have confidence that we can get our desired
changes merged and released in a timely fashion? e.g. if we find another
bug like HADOOP-11343, we'll first need to get the fix into Chimera, have a
new Chimera release, then bump Hadoop's Chimera dependency. This also
relates to the previous point, it's easier to do this dependency bump if
Chimera is a separate JAR.
[UMA] Yes and the main target users for this project is Hadoop and Spark
right now. 
So, Hadoop requirements would be the priority tasks for it.

ATM wrote:
Uma, would you be up for approaching the Apache Commons folks saying that
you'd like to contribute Chimera? I'd recommend saying that Hadoop and
Spark are both on board to depend on this.
[UMA] Yes, will do that.

Kai wrote:
Just a question. Becoming a separate jar/module in Apache Commons means
Chimera or the module can be released separately or in a timely manner,
not coupling with other modules for release in the project? Thanks.

[Haifeng] From apache commons project web (https://commons.apache.org/),
we see there is already a long list of components in its Apache Commons
Proper list. Each component has its own release version and date. To join
and be one of the list is the target.

Larry wrote:
If what we are looking for is some level of autonomy then it would need to
be a module with its own release train - or at least be able to.

[UMA] Yes. Agree

Kai wrote:
So far I saw it's mainly about AES-256. I suggest the scope can be
expanded a little bit, perhaps a dedicated high performance encryption
library, then we would have quite much to contribute to it, like other
ciphers, MACs, PRNGs and so on. Then both Hadoop and Spark can benefit
from it.

[UMA] Yes, once development started as separate project then its free to
evolve and provide more improvements to support more customer/user space
for encryption based on demand.
Haifeng, would you add some points here?

Regards,
Uma

On 1/20/16, 4:31 PM, "Andrew Wang" <[email protected]> wrote:

>Thanks Uma for putting together this proposal. Overall sounds good to me,
>+1 for these improvements. A few comments/questions:
>
>* If it becomes part of Apache Commons, could we make Chimera a separate
>JAR? We have real difficulties bumping dependency versions right now, so
>ideally we don't need to bump our existing Commons dependencies to use
>Chimera.
>* With this refactoring, do we have confidence that we can get our desired
>changes merged and released in a timely fashion? e.g. if we find another
>bug like HADOOP-11343, we'll first need to get the fix into Chimera, have
>a
>new Chimera release, then bump Hadoop's Chimera dependency. This also
>relates to the previous point, it's easier to do this dependency bump if
>Chimera is a separate JAR.
>
>Best,
>Andrew
>
>On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma
><[email protected]>
>wrote:
>
>> Hi Devs,
>>
>>   Some of our Hadoop developers working with Spark community to
>>implement
>> the shuffle encryption. While implementing that, they realized
>>some/most of
>> the code in Hadoop encryption code and their  implemention code have to
>>be
>> duplicated. This leads to an idea to create separate library, named it
>>as
>> Chimera (https://github.com/intel-hadoop/chimera). It is an optimized
>> cryptographic library. It provides Java API for both cipher level and
>>Java
>> stream level to help developers implement high performance AES
>> encryption/decryption with the minimum code and effort. Chimera was
>> originally based Hadoop crypto code but was improved and generalized a
>>lot
>> for supporting wider scope of data encryption needs for more components
>>in
>> the community.
>>
>> So, now team is thinking to make this library code as open source
>>project
>> via Apache Incubation.  Proposal is Chimera to join the Apache as
>> incubating or Apache commons for facilitating its adoption.
>>
>> In general this will get the following advantages:
>> 1. As Chimera embedded the native in jar (similar to Snappy java), it
>> solves the current issues in Hadoop that a HDFS client has to depend
>> libhadoop.so if the client needs to read encryption zone in HDFS. This
>> means a HDFS client may has to depend a Hadoop installation in local
>> machine. For example, HBase uses depends on HDFS client jar other than a
>> Hadoop installation and then has no access to libhadoop.so. So HBase
>>cannot
>> use an encryption zone or it cause error.
>> 2. Apache Spark shuffle and spill encryption could be another example
>> where we can use Chimera. We see the fact that the stream encryption for
>> Spark shuffle and spill doesn¹t require a stream cipher like AES/CTR,
>> although the code shares the common characteristics of a stream style
>>API.
>> We also see the need of optimized Cipher for non-stream style use cases
>> such as network encryption such as RPC. These improvements actually can
>>be
>> shared by more projects of need.
>>
>> 3. Simplified code in Hadoop to use dedicated library. And drives more
>> improvements. For example, current the Hadoop crypto code API is totally
>> based on AES/CTR although it has cipher suite configurations.
>>
>> AES/CTR is for HDFS data encryption at rest, but it doesn¹t necessary to
>> be AES/CTR for all the cases such as Data transfer encryption and
>> intermediate file encryption.
>>
>>
>>
>>  So, we wanted to check with Hadoop community about this proposal.
>>Please
>> provide your feedbacks on it.
>>
>> Regards,
>> Uma
>>

Re: Hadoop encryption module as Apache Chimera incubator project

Reply via email to