Haifeng Chen created KAFKA-20552:
------------------------------------
Summary: Support log segments larger than 2 GB
Key: KAFKA-20552
URL: https://issues.apache.org/jira/browse/KAFKA-20552
Project: Kafka
Issue Type: Improvement
Components: core
Reporter: Haifeng Chen
The {{log.segment.bytes}} broker config (and its topic-level synonym
{{{}segment.bytes{}}}) is currently defined as {{{}ConfigDef.Type.INT{}}},
capping the maximum segment size at {{Integer.MAX_VALUE}} (2,147,483,647 bytes,
~2 GB). Additionally, the {{.index}} file format stores physical file positions
as 4-byte signed integers, which also cannot address beyond ~2 GB.
With modern storage hardware (multi-TB NVMe drives) and high-throughput
workloads, the 2 GB cap is increasingly a problem:
* {*}Excessive file handle usage{*}: Each segment needs 4 files ({{{}.log{}}},
{{{}.index{}}}, {{{}.timeindex{}}}, {{{}.txnindex{}}}). A 10 TB partition with
2 GB segments means ~20,000 open files.
* {*}Frequent segment rolls{*}: A topic ingesting 500 MB/s rolls a new segment
every ~4 seconds, amplifying index build, flush, and cleaner overhead.
* {*}More log cleaning / compaction work{*}: More segments means more
compaction cycles with more small groups.
* {*}Remote storage overhead{*}: Each segment is an individual unit for tiered
storage copy/delete operations.
Allowing segments of 4 GB, 8 GB, or larger would significantly reduce these
overheads for high-throughput, large-retention workloads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)