We did not look at compression and did not use it. You'll probably get the best compression while having encryption by building a batch of messages, compressing that, then encrypting the compressed batch.
Compressing across the batch will probably almost certainly be better space-wise than compressing each message separately because there are likely to be similarities between the messages and a good compression algorithm will pick up on that make the message smaller. Even small similarities such as it containing a lot of ASCII can be picked up. To defy cryptanalysis, a good encryption algorithm will make the encrypted message appear random. Random data will not really compress. If it is reliably compressing after encryption, then your encryption is not as secure as it should be. Also discussed here: http://security.stackexchange.com/a/19970. -- Jim On 1/15/16, 6:39 AM, "Bruno Rassaerts" <bruno.rassae...@novazone.be> wrote: >Thanks for the input Jim. > >We managed to reduce the encryption impact to about 25% by disabling the >kafka batch compression and compressing the messages ourselves before >encrypting them one-by-one. However we still believe we could improve by >batch compressing + batch encrypting. > >Can you confirm that in your tests batch compression was disabled ? > >Thanks, >Bruno > > >> On 14 Jan 2016, at 23:47, Jim Hoagland <jim_hoagl...@symantec.com> >>wrote: >> >> We did a proof of concept on end-to-end encryption using an approach >>which >> sounds similar to what you describe. We blogged about it here: >> >> >>http://www.symantec.com/connect/blogs/end-end-encryption-though-kafka-our >>-p >> roof-concept >> >> You might want to review what is there to see how it differs from what >>you >> did. In our tests, the encryption didn't add as much overhead as we >> thought it would. >> >> -- Jim >> >> -- >> Jim Hoagland, Ph.D. >> Sr. Principal Software Engineer >> Big Data Analytics Team >> Cloud Platform Engineering >> >> >> >> On 1/14/16, 2:23 PM, "Bruno Rassaerts" <bruno.rassae...@novazone.be> >>wrote: >> >>> Hello, >>> >>> In our project we have a very strong requirement to protect all data, >>>all >>> the time. Even when the data is “in-rest” on disk, it needs to be >>> protected. >>> We’ve been trying to figure out how to this with Kafka, and hit some >>> obstacles. >>> >>> One thing we’ve tried to do is to encrypt every message we hand over to >>> kafka. This results in the encrypted messages being written to disk on >>> the brokers. >>> However, the performance of performing encryption has serious >>>performance >>> implications, due to the CPU intensive operation which encryption is, >>>and >>> the fact that batch compression offered by Kafka is not nearly as >>> efficient anymore after encrypting the data. Doing this message by >>> message encryption gives us a performance penalty of about 75%, even if >>> we compress the messages before encryption. >>> >>> What we are looking for is a way to plugin our encryption in two >>>possible >>> locations: >>> >>> 1. As a custom compression algorithm, which would batch compress, and >>> batch encrypt. And get the files stored as such. >>> 2. As a encryption plugin specifically designed for storing the kafka >>> broker files. >>> >>> Is there any way that this can be done using Kafka (0.9), or can >>>somebody >>> point us to the place were we could add this in the Kafka codebase. >>> >>> Thanks, >>> Bruno Rassaerts >> >