[ https://issues.apache.org/jira/browse/KAFKA-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754710#comment-16754710 ]
huxihx commented on KAFKA-7879: ------------------------------- Are you using `du -s` instead of `du -b` to check the size? The latter one represents the actual amount of bytes. Disks are block-oriented devices and the former one gives the number based on the blocks. After the old segment is flushed, these two commands should report the same value(actual file size). > Data directory size decreases every few minutes when producer is sending > large amount of data > --------------------------------------------------------------------------------------------- > > Key: KAFKA-7879 > URL: https://issues.apache.org/jira/browse/KAFKA-7879 > Project: Kafka > Issue Type: Improvement > Reporter: Pradeep Bansal > Priority: Major > > I am running kafka broker with 6 nodes and have set reteion hours to 24 hours > and retention bytes to 5 GB. > > I have set retention bytes to 250GB on topic configuration. > > Now when producing message in async mode with 1000 bytes message with very > high frequency. I am seeing that kafka data directory size increases but > every 5 minutes it decreases by some percentage (in my observation it > increases by 40G and then reduces to 20G, so in every 5 minutes we are seeing > increase by 20G instead of 40G). > > Is there any extra configuration we need to set to avoid this data loss or is > there some sort of compression that is going on here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)