pan3793 commented on PR #52474:
URL: https://github.com/apache/spark/pull/52474#issuecomment-3344695043

   @peter-toth Let me take this example to explain what I'm trying to do,
   
   take the UT as an example:
   ```
   CREATE TABLE t (i INT, j INT, k STRING) USING PARQUET PARTITIONED BY (k);
   
   INSERT OVERWRITE t SELECT j AS i, i AS j, '0' as k FROM t0 SORT BY k, i;
   ```
   
   In `V1Writes.prepareQuery`, the `query` looks like
   
   ```
   Sort [0 ASC NULLS FIRST, i#280 ASC NULLS FIRST], false
   +- Project [j#287 AS i#280, i#286 AS j#281, 0 AS k#282]
      +- Relation spark_catalog.default.t0[i#286,j#287,k#288] parquet
   ```
   
   and `query.outputOrdering` is `[0 ASC NULLS FIRST, i#280 ASC NULLS FIRST]`, 
while `requiredOrdering` is `[k#282 ASC NULLS FIRST]`, thus `orderingMatched` 
will be false, then `Sort(requiredOrdering, global = false, empty2NullPlan)` 
will be added on top.
   
   the idea is to leverage the alias information in `Sort` to make 
`outputOrdering` knows `0` is alias of `k`, thus `outputOrdering` can satisfy 
`requiredOrdering`.
   
   ---
   
   
   BUT, when I debugged it last night, I found the issue gone, and in 
`V1Writes.prepareQuery`, the `query` looked like:
   
   ```
   Project [i#284, j#285, 0 AS k#290]
   +- Sort [0 ASC NULLS FIRST, i#284 ASC NULLS FIRST], false
      +- Project [i#284, j#285]
         +- Relation spark_catalog.default.t0[i#284,j#285,k#286] parquet
   ```
   
   After some investigation, I found this was accidentally fixed by SPARK-53707 
(https://github.com/apache/spark/pull/52449), which got merged just a few days 
ago (I happened to start constructing the UT before it got in ...), and fixes 
the issue by adding a `Project` on the `Sort`, in the 
`PreprocessTableInsertion` rule.
   
   Now, I'm not sure if this is still an issue ...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to