Hadoop encryption module as Apache Chimera incubator project

Gangumalla, Uma Mon, 18 Jan 2016 23:47:32 -0800

Hi Devs,

  Some of our Hadoop developers working with Spark community to implement the 
shuffle encryption. While implementing that, they realized some/most of the 
code in Hadoop encryption code and their  implemention code have to be 
duplicated. This leads to an idea to create separate library, named it as 
Chimera (https://github.com/intel-hadoop/chimera). It is an optimized 
cryptographic library. It provides Java API for both cipher level and Java 
stream level to help developers implement high performance AES 
encryption/decryption with the minimum code and effort. Chimera was originally 
based Hadoop crypto code but was improved and generalized a lot for supporting 
wider scope of data encryption needs for more components in the community.


So, now team is thinking to make this library code as open source project via 
Apache Incubation.  Proposal is Chimera to join the Apache as incubating or 
Apache commons for facilitating its adoption.

In general this will get the following advantages:
1. As Chimera embedded the native in jar (similar to Snappy java), it solves 
the current issues in Hadoop that a HDFS client has to depend libhadoop.so if 
the client needs to read encryption zone in HDFS. This means a HDFS client may 
has to depend a Hadoop installation in local machine. For example, HBase uses 
depends on HDFS client jar other than a Hadoop installation and then has no 
access to libhadoop.so. So HBase cannot use an encryption zone or it cause 
error.
2. Apache Spark shuffle and spill encryption could be another example where we 
can use Chimera. We see the fact that the stream encryption for Spark shuffle 
and spill doesn’t require a stream cipher like AES/CTR, although the code 
shares the common characteristics of a stream style API. We also see the need 
of optimized Cipher for non-stream style use cases such as network encryption 
such as RPC. These improvements actually can be shared by more projects of need.

3. Simplified code in Hadoop to use dedicated library. And drives more 
improvements. For example, current the Hadoop crypto code API is totally based 
on AES/CTR although it has cipher suite configurations.

AES/CTR is for HDFS data encryption at rest, but it doesn’t necessary to be 
AES/CTR for all the cases such as Data transfer encryption and intermediate 
file encryption.



 So, we wanted to check with Hadoop community about this proposal. Please 
provide your feedbacks on it.

Regards,
Uma

Hadoop encryption module as Apache Chimera incubator project

Reply via email to