[ 
https://issues.apache.org/jira/browse/FLINK-38349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu updated FLINK-38349:
-------------------------------
    Priority: Major  (was: Blocker)

> Incorrect calculation of scale in SpillingThread#mergeChannelList may cause 
> divide-by-zero exception
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-38349
>                 URL: https://issues.apache.org/jira/browse/FLINK-38349
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.13.0, 1.19.3, 1.20.2
>         Environment: 所有环境
>            Reporter: Huny
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screenshot 2025-11-12 at 4.00.42 PM.png, 
> image-2025-09-12-09-05-54-870.png, image-2025-09-12-09-06-10-159.png
>
>
> *Description*
> In the method 
> {{{}org.apache.flink.runtime.operators.sort.SpillingThread#mergeChannelList{}}},
>  the following code may lead to incorrect results:
> {{final double scale = Math.ceil(Math.log(channelIDs.size()) / 
> Math.log(this.maxFanIn)) - 1;}}
> *Steps to Reproduce*
>  # Configure {{{}maxFanIn = 18{}}}.
>  # Run with {{channelIDs.size() = 7055}} (which is greater than or equal to 
> {{{}18^3{}}}).
>  # The calculation results in:
>  * 
>  ** {{numStart = 7055}}
>  * 
>  ** {{numEnd = 5832}} ({{{}18^3{}}})
>  * 
>  ** {{numToMerge = 0}}
>  * 
>  ** {{channelsToMergePerStep = 0}}
>  # Later processing logic performs division by zero and throws an exception.
> *Expected Behavior*
> The calculation of {{scale}} should never result in 
> {{{}channelsToMergePerStep = 0{}}}.
> No divide-by-zero exception should occur.
> *Actual Behavior*
> When {{channelIDs.size()}} is greater than or equal to some power of 
> {{{}maxFanIn{}}}, the calculation becomes inaccurate due to floating-point 
> precision.
> This results in {{{}numToMerge = 0{}}}, which propagates to 
> {{channelsToMergePerStep = 0}} and eventually causes a {*}divide-by-zero 
> exception{*}.
> *Problem Pattern*
>  * If {{{}channelIDs.size() >= maxFanIn^3{}}}, problematic values of 
> {{maxFanIn}} include:
> {{5, 6, 18, 25, 36, 47, 66, 75, 80, 86, 131, 143, 148 ...}}
>  * If {{{}channelIDs.size() >= maxFanIn^5{}}}, problematic values of 
> {{maxFanIn}} include:
> {{7, 19, 20, 45, 49, 50, 58, 65, 67 ...}}
> *Environment*
>  * Flink version: (all)
> *Logs / Stacktrace*
> Caused by: java.io.IOException: Thread 'SortMerger spilling thread' 
> terminated due to an exception: / by zero
> at org.apache.flink.runtime.operators.sort.ThreadBase.run(ThreadBase.java:80)
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.flink.runtime.operators.sort.SpillingThread.getSegmentsForReaders(SpillingThread.java:574)
> at 
> org.apache.flink.runtime.operators.sort.SpillingThread.mergeChannelList(SpillingThread.java:495)
> at 
> org.apache.flink.runtime.operators.sort.SpillingThread.mergeOnDisk(SpillingThread.java:260)
> at 
> org.apache.flink.runtime.operators.sort.SpillingThread.go(SpillingThread.java:187)
> at org.apache.flink.runtime.operators.sort.ThreadBase.run(ThreadBase.java:73)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to