Re: [PR] fix: union all by name [datafusion]

via GitHub Mon, 07 Apr 2025 04:29:45 -0700


chenkovsky commented on code in PR #15603:
URL: https://github.com/apache/datafusion/pull/15603#discussion_r2030547375



##########
datafusion/physical-plan/src/stream.rs:
##########
@@ -362,6 +362,8 @@ pin_project! {
 
         #[pin]
         stream: S,
+
+        transform_schema: bool,

Review Comment:
   I want to learn some experience from spark. 
   
   for logical plan, I haven't found any logic to handle this problem. 
   
   
https://github.com/apache/spark/blob/75d80c7795ca71d24229010ab04ae740473126aa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L475
   
   for physical plan, spark is much easier, its InternalRow is schemaless. so 
it will use the schema of physical plan by default. but recordbatch contains 
schema.
   
   
https://github.com/apache/spark/blob/75d80c7795ca71d24229010ab04ae740473126aa/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L688
   
   I'm not 100% sure, I think current logical plan and physical plan schema is 
correct. the root cause is that recordbatch's schema doesn't match physical 
plan's. so adding an adapter is a proper way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] fix: union all by name [datafusion]

Reply via email to