viirya opened a new issue, #925:
URL: https://github.com/apache/datafusion-comet/issues/925
### Describe the bug
In #924, we found that Spark sometimes produces exchange partitioning where
the partitioning expression cannot be resolved correctly.
For example:
```
+- TransformWithState value#667.toString, newInstance(class
org.apache.spark.sql.streaming.InputMapRow), [value#667], [key#659, action#660,
value#661], org.apache.spark.sql.streaming.TestMapStateProcessor@58fc42f6,
NoTime, Append,
class[value[0]: string], obj#671: scala.Tuple3, state info [ checkpoint = ,
runId = 9af20b3e-feb8-4ccd-a9f0-b3ed1517330a, opId = 0, ver = 0, numPartitions
= 5], 1725862230745, false, false, [value#667], [key#659, action#660, value#
661], value#667.toString
:- Sort [value#667 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS,
[plan_id=1124]
: +- AppendColumns
org.apache.spark.sql.streaming.TransformWithMapStateSuite$$Lambda$2590/0x000000f801e1c3d0@488fe08d,
newInstance(class org.apache.spark.sql.streaming.InputMapRow),
[staticinvoke(class org.apache.spark.unsaf
e.types.UTF8String, StringType, fromString, input[0, java.lang.String,
true], true, false, true) AS value#667]
: +- LocalTableScan [key#659, action#660, value#661]
+- !Sort [value#667 ASC NULLS FIRST], false, 0
+- !Exchange hashpartitioning(value#667, 5), ENSURE_REQUIREMENTS,
[plan_id=1125]
+- LocalTableScan <empty>, [value#672]
```
It causes resolution error in Comet when Comet tries to translate
partitioning expressions:
```
[info] - transformWithMapState - batch should succeed (without changelog
checkpointing) *** FAILED *** (23 milliseconds)
[info] org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find
value#667 in [value#672] SQLSTATE: XX000
[info] at
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
[info] at
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
[info] at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:81)
[info] at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
[info] at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:458)
[info] at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:84)
[info] at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:458)
[info] at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:434)
[info] at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:402)
[info] at
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
[info] at
org.apache.comet.serde.QueryPlanSerde$.exprToProtoInternal$1(QueryPlanSerde.scala:1714)
[info] at
org.apache.comet.serde.QueryPlanSerde$.exprToProto(QueryPlanSerde.scala:2565)
[info] at
org.apache.comet.serde.QueryPlanSerde$.$anonfun$supportPartitioning$1(QueryPlanSerde.scala:3184)
```
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]