[ 
https://issues.apache.org/jira/browse/SPARK-57858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolina Vraneš updated SPARK-57858:
------------------------------------
    Description: 
The BIN BY relation operator proportionally rescales its DISTRIBUTE UNIFORM 
columns. The logical BinBy node currently carries those columns through 
child.output with the child's own ExprId, even though execution rewrites their 
values, which violates Catalyst's invariant that an equal ExprId implies an 
equal value (no other operator edits a value under a retained child attribute).

 

This sub-task makes the rescaled DISTRIBUTE columns produced attributes with 
fresh ExprIds (same names, types, nullability, and positions), shadowing the 
inputs, mirroring Generate.generatorOutput. The input columns stay as the 
operator's read inputs but leave output. ResolveBinBy mints them and 
DeduplicateRelations renews them across self-joins. Qualifier and metadata are 
dropped, matching expr AS value computed-value semantics.

> Emit BIN BY scaled DISTRIBUTE columns as produced attributes
> ------------------------------------------------------------
>
>                 Key: SPARK-57858
>                 URL: https://issues.apache.org/jira/browse/SPARK-57858
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Nikolina Vraneš
>            Priority: Major
>
> The BIN BY relation operator proportionally rescales its DISTRIBUTE UNIFORM 
> columns. The logical BinBy node currently carries those columns through 
> child.output with the child's own ExprId, even though execution rewrites 
> their values, which violates Catalyst's invariant that an equal ExprId 
> implies an equal value (no other operator edits a value under a retained 
> child attribute).
>  
> This sub-task makes the rescaled DISTRIBUTE columns produced attributes with 
> fresh ExprIds (same names, types, nullability, and positions), shadowing the 
> inputs, mirroring Generate.generatorOutput. The input columns stay as the 
> operator's read inputs but leave output. ResolveBinBy mints them and 
> DeduplicateRelations renews them across self-joins. Qualifier and metadata 
> are dropped, matching expr AS value computed-value semantics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to