Thanks Uma for putting together this proposal. Overall sounds good to me,
+1 for these improvements. A few comments/questions:

* If it becomes part of Apache Commons, could we make Chimera a separate
JAR? We have real difficulties bumping dependency versions right now, so
ideally we don't need to bump our existing Commons dependencies to use
Chimera.
* With this refactoring, do we have confidence that we can get our desired
changes merged and released in a timely fashion? e.g. if we find another
bug like HADOOP-11343, we'll first need to get the fix into Chimera, have a
new Chimera release, then bump Hadoop's Chimera dependency. This also
relates to the previous point, it's easier to do this dependency bump if
Chimera is a separate JAR.

Best,
Andrew

On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma <uma.ganguma...@intel.com>
wrote:

> Hi Devs,
>
>   Some of our Hadoop developers working with Spark community to implement
> the shuffle encryption. While implementing that, they realized some/most of
> the code in Hadoop encryption code and their  implemention code have to be
> duplicated. This leads to an idea to create separate library, named it as
> Chimera (https://github.com/intel-hadoop/chimera). It is an optimized
> cryptographic library. It provides Java API for both cipher level and Java
> stream level to help developers implement high performance AES
> encryption/decryption with the minimum code and effort. Chimera was
> originally based Hadoop crypto code but was improved and generalized a lot
> for supporting wider scope of data encryption needs for more components in
> the community.
>
> So, now team is thinking to make this library code as open source project
> via Apache Incubation.  Proposal is Chimera to join the Apache as
> incubating or Apache commons for facilitating its adoption.
>
> In general this will get the following advantages:
> 1. As Chimera embedded the native in jar (similar to Snappy java), it
> solves the current issues in Hadoop that a HDFS client has to depend
> libhadoop.so if the client needs to read encryption zone in HDFS. This
> means a HDFS client may has to depend a Hadoop installation in local
> machine. For example, HBase uses depends on HDFS client jar other than a
> Hadoop installation and then has no access to libhadoop.so. So HBase cannot
> use an encryption zone or it cause error.
> 2. Apache Spark shuffle and spill encryption could be another example
> where we can use Chimera. We see the fact that the stream encryption for
> Spark shuffle and spill doesn’t require a stream cipher like AES/CTR,
> although the code shares the common characteristics of a stream style API.
> We also see the need of optimized Cipher for non-stream style use cases
> such as network encryption such as RPC. These improvements actually can be
> shared by more projects of need.
>
> 3. Simplified code in Hadoop to use dedicated library. And drives more
> improvements. For example, current the Hadoop crypto code API is totally
> based on AES/CTR although it has cipher suite configurations.
>
> AES/CTR is for HDFS data encryption at rest, but it doesn’t necessary to
> be AES/CTR for all the cases such as Data transfer encryption and
> intermediate file encryption.
>
>
>
>  So, we wanted to check with Hadoop community about this proposal. Please
> provide your feedbacks on it.
>
> Regards,
> Uma
>

Reply via email to