[ https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925552#comment-17925552 ]
Terry Wang edited comment on SPARK-44240 at 2/10/25 11:46 AM: -------------------------------------------------------------- We also has encountered same problem :(. To fix it we may need to add a sortExec before GlobalLimitExec! was (Author: terry1897): We also has encountered same problem :(. To fix it we may need add sort before GlobalLimitExec! > Setting the topKSortFallbackThreshold value may lead to inaccurate results > -------------------------------------------------------------------------- > > Key: SPARK-44240 > URL: https://issues.apache.org/jira/browse/SPARK-44240 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0 > Reporter: dzcxzl > Priority: Minor > Attachments: topKSortFallbackThreshold.png, > topKSortFallbackThresholdDesc.png > > > > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT > 10000) a; {code} > > If GlobalLimitExec is not the final operator and has a sort operator, shuffle > read does not guarantee the order, which leads to the limit read data that > may be random. > TakeOrderedAndProjectExec has ordering, so there is no such problem. > > !topKSortFallbackThreshold.png! > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > select min(id) from (select id from range(999999999) order by id desc limit > 10000) a; {code} > !topKSortFallbackThresholdDesc.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org