viirya commented on code in PR #613:
URL: https://github.com/apache/datafusion-comet/pull/613#discussion_r1664503124
##########
spark/src/test/scala/org/apache/spark/sql/CometTPCDSQuerySuite.scala:
##########
@@ -158,6 +158,11 @@ class CometTPCDSQuerySuite
conf.set(CometConf.COMET_EXEC_ALL_OPERATOR_ENABLED.key, "true")
conf.set(CometConf.COMET_EXEC_SHUFFLE_ENABLED.key, "true")
conf.set(CometConf.COMET_MEMORY_OVERHEAD.key, "20g")
+ conf.set(CometConf.COMET_SHUFFLE_ENFORCE_MODE_ENABLED.key, "true")
+ conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
+ // Disable `CometTakeOrderedAndProjectExec` because it doesn't produce
same output order
+ // as Spark.
+ conf.set("spark.comet.exec.takeOrderedAndProjectExec.disabled", "true")
Review Comment:
I think these tests should be deterministic (that's why we can compare it
with golden files). I'm not sure why `CometTakeOrderedAndProjectExec` returns
out of order results.
The results are same, but the orders are different to Spark. I suspect that
it is something related to sorting part in `CometTakeOrderedAndProjectExec`. As
the sorting is delegated to DataFusion's sort/top k operators, I need to
investigate particularly for the failed query (e.g., q6).
It is not related to the change here, though. So I will investigate it
separately in follow PRs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]