Haifeng Chen created KAFKA-20552:
------------------------------------

             Summary: Support log segments larger than 2 GB
                 Key: KAFKA-20552
                 URL: https://issues.apache.org/jira/browse/KAFKA-20552
             Project: Kafka
          Issue Type: Improvement
          Components: core
            Reporter: Haifeng Chen


The {{log.segment.bytes}} broker config (and its topic-level synonym 
{{{}segment.bytes{}}}) is currently defined as {{{}ConfigDef.Type.INT{}}}, 
capping the maximum segment size at {{Integer.MAX_VALUE}} (2,147,483,647 bytes, 
~2 GB). Additionally, the {{.index}} file format stores physical file positions 
as 4-byte signed integers, which also cannot address beyond ~2 GB.

With modern storage hardware (multi-TB NVMe drives) and high-throughput 
workloads, the 2 GB cap is increasingly a problem:
 * {*}Excessive file handle usage{*}: Each segment needs 4 files ({{{}.log{}}}, 
{{{}.index{}}}, {{{}.timeindex{}}}, {{{}.txnindex{}}}). A 10 TB partition with 
2 GB segments means ~20,000 open files.
 * {*}Frequent segment rolls{*}: A topic ingesting 500 MB/s rolls a new segment 
every ~4 seconds, amplifying index build, flush, and cleaner overhead.
 * {*}More log cleaning / compaction work{*}: More segments means more 
compaction cycles with more small groups.
 * {*}Remote storage overhead{*}: Each segment is an individual unit for tiered 
storage copy/delete operations.

Allowing segments of 4 GB, 8 GB, or larger would significantly reduce these 
overheads for high-throughput, large-retention workloads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to