GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/3205
KAFKA-5236; Increase the block/buffer size when compressing with Snappy and Gzip We had originally increased Snappyâs block size as part of KAFKA-3704. However, we had some issues with excessive memory usage in the producer and we reverted it in 7c6ee8d5e. After more investigation, we fixed the underlying reason why memory usage seemed to grow much more than expected in KAFKA-3747 (included in 0.10.0.1). In 0.10.2, we changed the broker to use the same classes as the producer and the brokerâs block size for Snappy was changed from 32 KB to 1KB. As reported in KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is compressed with 1 KB instead of 32 KB as the block size. As discussed in KAFKA-3704, it may be worth making this configurable and/or allocate the compression buffers from the producer pool. However, for 0.11.0.0, I think the simplest thing to do is to default to 32 KB for Snappy (the default if no block size is provided). I also increased the Gzip buffer size. 1 KB is too small and the default is smaller still (512 bytes). 8 KB (which is the default buffer size for BufferedOutputStream) seemed like a reasonable default. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-5236-snappy-block-size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3205 ---- commit ef4af6757575e694c109074b67e59704ff437b56 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-06-02T10:17:23Z KAFKA-5236; Increase the block/buffer size when compressing with Snappy and Gzip We had originally increased Snappyâs block size as part of KAFKA-3704. However, we had some issues with excessive memory usage in the producer and we reverted it in 7c6ee8d5e. After more investigation, we fixed the underlying reason why memory usage seemed to grow much more than expected in KAFKA-3747 (included in 0.10.0.1). In 0.10.2, we changed the broker to use the same classes as the producer and the brokerâs block size for Snappy was changed from 32 KB to 1KB. As reported in KAFKA-5236, the on disk size is, in some cases, 50% larger when the data is compressed with 1 KB instead of 32 KB as the block size. As discussed in KAFKA-3704, it may be worth making this configurable and/or allocate the compression buffers from the producer pool. However, for 0.11.0.0, I think the simplest thing to do is to default to 32 KB for Snappy (the default if no block size is provided). I also increased the Gzip buffer size. 1 KB is too small and the default is smaller still (512 bytes). 8 KB (which is the default buffer size for BufferedOutputStream) seemed like a reasonable default. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---