[ https://issues.apache.org/jira/browse/KAFKA-19603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013645#comment-18013645 ]
ally heev commented on KAFKA-19603: ----------------------------------- [~proggga] you might want to create a KIP and add details there. Then, you can trigger disc on mailing list. Guidelines here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > Change log.segment.bytes configuration type from int to long to support > segments larger than 2GB > ------------------------------------------------------------------------------------------------- > > Key: KAFKA-19603 > URL: https://issues.apache.org/jira/browse/KAFKA-19603 > Project: Kafka > Issue Type: Improvement > Components: core, log > Reporter: Mikhail Fesenko > Priority: Major > > h2. Description > h3. Summary > Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}* > to *{{long}}* to allow segment sizes beyond the current 2GB limit imposed by > the integer maximum value. > h3. Current Limitation > The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data > type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes). > This constraint becomes problematic for modern high-capacity storage > deployments. > h3. Background: Kafka Log Segment Structure > Each Kafka topic partition consists of multiple log segments stored as > separate files on disk. For each segment, Kafka maintains three core files: > * {*}{{.log}} files{*}: Contain the actual message data > * {*}{{.index}} files{*}: Store mappings between message offsets and their > physical positions within the log file, allowing Kafka to quickly locate > messages by their offset without scanning the entire log file > * {*}{{.timeindex}} files{*}: Store mappings between message timestamps and > their corresponding offsets, enabling efficient time-based retrieval of > messages > h3. Motivation > # {*}Modern Hardware Capabilities{*}: Current deployments often use > high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB > segments are inefficiently small > # {*}File Handle Optimization{*}: Large Kafka deployments with many topics > can have 50-100k open files across all segment types (.log, .index, > .timeindex files). Each segment requires open file handles, and larger > segments would reduce the total number of files and improve caching efficiency > # {*}Performance Benefits{*}: Fewer segment rotations in high-traffic > scenarios would reduce I/O overhead and improve overall performance. > Sequential disk operations are much faster than random access patterns > # {*}Storage Efficiency{*}: Reducing segment file proliferation improves > filesystem metadata performance and reduces inode usage on high-volume > deployments > # {*}Community Interest{*}: Similar requests have been raised in community > forums (see [Confluent forum > discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845]) > h3. Proposed Solution > Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type, > allowing segment sizes of 3-4GB or larger to better align with modern storage > capabilities. > h3. Technical Considerations (Raised by Community) > Based on dev mailing list discussion: > # {*}Index File Format Limitation{*}: Current index files use 4 bytes to > represent file positions within segments, assuming 2GB cap (Jun Rao). This > means: > ** {{.index}} files store offset-to-position mappings using 4-byte integers > for file positions > ** If segments exceed 2GB, position values would overflow the 4-byte limit > ** Index format may need to be updated to support 8-byte positions > # {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses > {{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao) > # {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three > file types (.log, .index, .timeindex) and their interdependencies > # {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB > segment limit > h3. Questions for Discussion > # What would be a reasonable maximum segment size limit? > # Should this change be backward compatible or require a protocol/format > version bump? > # Are there any other components beyond index files and > RemoteLogSegmentMetadata that need updates? > h3. Expected Benefits > * Reduced number of segment files for high-volume topics > * Improved file handle utilization and caching efficiency > * Better alignment with modern storage hardware capabilities > * Reduced segment rotation overhead in high-traffic scenarios > h3. Acceptance Criteria > * {{log.segment.bytes}} accepts long values > 2GB > * Index file format supports larger segments (if needed) > * RemoteLogSegmentMetadata interface updated (if needed) > * Backward compatibility maintained > * Documentation updated > * Unit and integration tests added > *Disclaimer* > I'm relatively new to Kafka internals and the JIRA contribution process. The > original idea and motivation came from my experience with large-scale > deployments, but I used Claude AI to help make this ticket more detailed and > technically structured. There may be technical inaccuracies or missing > implementation details that I haven't considered. > This ticket is open for community discussion and feedback before > implementation. > *Expert review and guidance would be greatly appreciated.* -- This message was sent by Atlassian Jira (v8.20.10#820010)