We did not look at compression and did not use it.  You'll probably get
the best compression while having encryption by building a batch of
messages, compressing that, then encrypting the compressed batch.

Compressing across the batch will probably almost certainly be better
space-wise than compressing each message separately because there are
likely to be similarities between the messages and a good compression
algorithm will pick up on that make the message smaller.  Even small
similarities such as it containing a lot of ASCII can be picked up.

To defy cryptanalysis, a good encryption algorithm will make the encrypted
message appear random.  Random data will not really compress.  If it is
reliably compressing after encryption, then your encryption is not as
secure as it should be.  Also discussed here:
http://security.stackexchange.com/a/19970.

-- Jim

On 1/15/16, 6:39 AM, "Bruno Rassaerts" <bruno.rassae...@novazone.be> wrote:

>Thanks for the input Jim.
>
>We managed to reduce the encryption impact to about 25% by disabling the
>kafka batch compression and compressing the messages ourselves before
>encrypting them one-by-one. However we still believe we could improve by
>batch compressing + batch encrypting.
>
>Can you confirm that in your tests batch compression was disabled ?
>
>Thanks,
>Bruno
>
>
>> On 14 Jan 2016, at 23:47, Jim Hoagland <jim_hoagl...@symantec.com>
>>wrote:
>> 
>> We did a proof of concept on end-to-end encryption using an approach
>>which
>> sounds similar to what you describe.  We blogged about it here:
>> 
>> 
>>http://www.symantec.com/connect/blogs/end-end-encryption-though-kafka-our
>>-p
>> roof-concept
>> 
>> You might want to review what is there to see how it differs from what
>>you
>> did.  In our tests, the encryption didn't add as much overhead as we
>> thought it would.
>> 
>> -- Jim
>> 
>> -- 
>> Jim Hoagland, Ph.D.
>> Sr. Principal Software Engineer
>> Big Data Analytics Team
>> Cloud Platform Engineering
>> 
>> 
>> 
>> On 1/14/16, 2:23 PM, "Bruno Rassaerts" <bruno.rassae...@novazone.be>
>>wrote:
>> 
>>> Hello,
>>> 
>>> In our project we have a very strong requirement to protect all data,
>>>all
>>> the time. Even when the data is “in-rest” on disk, it needs to be
>>> protected.
>>> We’ve been trying to figure out how to this with Kafka, and hit some
>>> obstacles.
>>> 
>>> One thing we’ve tried to do is to encrypt every message we hand over to
>>> kafka. This results in the encrypted messages being written to disk on
>>> the brokers.
>>> However, the performance of performing encryption has serious
>>>performance
>>> implications, due to the CPU intensive operation which encryption is,
>>>and
>>> the fact that batch compression offered by Kafka is not nearly as
>>> efficient anymore after encrypting the data. Doing this message by
>>> message encryption gives us a performance penalty of about 75%, even if
>>> we compress the messages before encryption.
>>> 
>>> What we are looking for is a way to plugin our encryption in two
>>>possible
>>> locations:
>>> 
>>> 1. As a custom compression algorithm, which would batch compress, and
>>> batch encrypt. And get the files stored as such.
>>> 2. As a encryption plugin specifically designed for storing the kafka
>>> broker files.
>>> 
>>> Is there any way that this can be done using Kafka (0.9), or can
>>>somebody
>>> point us to the place were we could add this in the Kafka codebase.
>>> 
>>> Thanks,
>>> Bruno Rassaerts
>> 
>

Reply via email to