[ 
https://issues.apache.org/jira/browse/KAFKA-19603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013645#comment-18013645
 ] 

ally heev commented on KAFKA-19603:
-----------------------------------

[~proggga] you might want to create a KIP and add details there. Then, you can 
trigger disc on mailing list. Guidelines here: 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

>  Change log.segment.bytes configuration type from int to long to support 
> segments larger than 2GB
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-19603
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19603
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core, log
>            Reporter: Mikhail Fesenko
>            Priority: Major
>
> h2. Description
> h3. Summary
> Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}* 
> to *{{long}}* to allow segment sizes beyond the current 2GB limit imposed by 
> the integer maximum value.
> h3. Current Limitation
> The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data 
> type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes). 
> This constraint becomes problematic for modern high-capacity storage 
> deployments.
> h3. Background: Kafka Log Segment Structure
> Each Kafka topic partition consists of multiple log segments stored as 
> separate files on disk. For each segment, Kafka maintains three core files:
>  * {*}{{.log}} files{*}: Contain the actual message data
>  * {*}{{.index}} files{*}: Store mappings between message offsets and their 
> physical positions within the log file, allowing Kafka to quickly locate 
> messages by their offset without scanning the entire log file
>  * {*}{{.timeindex}} files{*}: Store mappings between message timestamps and 
> their corresponding offsets, enabling efficient time-based retrieval of 
> messages
> h3. Motivation
>  # {*}Modern Hardware Capabilities{*}: Current deployments often use 
> high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB 
> segments are inefficiently small
>  # {*}File Handle Optimization{*}: Large Kafka deployments with many topics 
> can have 50-100k open files across all segment types (.log, .index, 
> .timeindex files). Each segment requires open file handles, and larger 
> segments would reduce the total number of files and improve caching efficiency
>  # {*}Performance Benefits{*}: Fewer segment rotations in high-traffic 
> scenarios would reduce I/O overhead and improve overall performance. 
> Sequential disk operations are much faster than random access patterns
>  # {*}Storage Efficiency{*}: Reducing segment file proliferation improves 
> filesystem metadata performance and reduces inode usage on high-volume 
> deployments
>  # {*}Community Interest{*}: Similar requests have been raised in community 
> forums (see [Confluent forum 
> discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845])
> h3. Proposed Solution
> Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type, 
> allowing segment sizes of 3-4GB or larger to better align with modern storage 
> capabilities.
> h3. Technical Considerations (Raised by Community)
> Based on dev mailing list discussion:
>  # {*}Index File Format Limitation{*}: Current index files use 4 bytes to 
> represent file positions within segments, assuming 2GB cap (Jun Rao). This 
> means:
>  ** {{.index}} files store offset-to-position mappings using 4-byte integers 
> for file positions
>  ** If segments exceed 2GB, position values would overflow the 4-byte limit
>  ** Index format may need to be updated to support 8-byte positions
>  # {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses 
> {{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao)
>  # {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three 
> file types (.log, .index, .timeindex) and their interdependencies
>  # {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB 
> segment limit
> h3. Questions for Discussion
>  # What would be a reasonable maximum segment size limit?
>  # Should this change be backward compatible or require a protocol/format 
> version bump?
>  # Are there any other components beyond index files and 
> RemoteLogSegmentMetadata that need updates?
> h3. Expected Benefits
>  * Reduced number of segment files for high-volume topics
>  * Improved file handle utilization and caching efficiency
>  * Better alignment with modern storage hardware capabilities
>  * Reduced segment rotation overhead in high-traffic scenarios
> h3. Acceptance Criteria
>  * {{log.segment.bytes}} accepts long values > 2GB
>  * Index file format supports larger segments (if needed)
>  * RemoteLogSegmentMetadata interface updated (if needed)
>  * Backward compatibility maintained
>  * Documentation updated
>  * Unit and integration tests added
> *Disclaimer*
> I'm relatively new to Kafka internals and the JIRA contribution process. The 
> original idea and motivation came from my experience with large-scale 
> deployments, but I used Claude AI to help make this ticket more detailed and 
> technically structured. There may be technical inaccuracies or missing 
> implementation details that I haven't considered.
> This ticket is open for community discussion and feedback before 
> implementation. 
> *Expert review and guidance would be greatly appreciated.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to