Michael-J-Ward commented on PR #867:
URL: 
https://github.com/apache/datafusion-python/pull/867#issuecomment-2342254957

   Alright - I've narrowed down the bug. I'm fairly certain this is upstream 
but haven't reproduced it in Rust yet.
   
   I have a commit here that re-implements upstream `with_column` and adds a 
bunch of debug prints.
   
   
https://github.com/Michael-J-Ward/datafusion-python/commit/6014e8c33d4a51a395f2ac2e149daf7156695b61
   
   The thing to notice in this log is that `adding window function with alias` 
gets printed twice.
   
   I **think** the simplified version should be any
   
   ```rust
   df
   .with_column("foo", <normal expr>)
   .with_column("bar", <window expr>)
   ```
   
   
   ```console
   adding column: "total_value" with expr: PyExpr { expr: 
WindowFunction(WindowFunction { fun: AggregateUDF(AggregateUDF { inner: Sum { 
signature: Signature { type_signature: UserDefined, volatility: Immutable } } 
}), args: [Column(Column { relation: None, name: "value" })], partition_by: [], 
order_by: [], window_frame: WindowFrame { units: Rows, start_bound: 
Preceding(UInt64(NULL)), end_bound: Following(UInt64(NULL)), is_causal: false 
}, null_treatment: None }) }
   window_func_exprs: [WindowFunction(WindowFunction { fun: 
AggregateUDF(AggregateUDF { inner: Sum { signature: Signature { type_signature: 
UserDefined, volatility: Immutable } } }), args: [Column(Column { relation: 
None, name: "value" })], partition_by: [], order_by: [], window_frame: 
WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), end_bound: 
Following(UInt64(NULL)), is_causal: false }, null_treatment: None })]
   col_exists: true, window_func: true
   plan: WindowAggr: windowExpr=[[sum(value) ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING]]
     Aggregate: groupBy=[[?table?.ps_partkey]], aggr=[[sum(value) AS value]]
       Projection: ?table?.n_nationkey, ?table?.n_name, ?table?.s_suppkey, 
?table?.s_nationkey, ?table?.ps_supplycost, ?table?.ps_availqty, 
?table?.ps_suppkey, ?table?.ps_partkey, ?table?.ps_supplycost * 
?table?.ps_availqty AS value
         Inner Join: ?table?.s_suppkey = ?table?.ps_suppkey
           Inner Join: ?table?.n_nationkey = ?table?.s_nationkey
             Filter: ?table?.n_name = Utf8("GERMANY")
               Projection: ?table?.n_nationkey, ?table?.n_name
                 TableScan: ?table?
             Projection: ?table?.s_suppkey, ?table?.s_nationkey
               TableScan: ?table?
           Projection: ?table?.ps_supplycost, ?table?.ps_availqty, 
?table?.ps_suppkey, ?table?.ps_partkey
             TableScan: ?table?
   qualifier: Some(Bare { table: "?table?" }), field: Field { name: 
"ps_partkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: 
false, metadata: {} }
   adding column
   qualifier: None, field: Field { name: "value", data_type: Decimal128(36, 2), 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }
   adding window function with alias
   qualifier: None, field: Field { name: "sum(value) ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING", data_type: Decimal128(38, 2), nullable: 
true, dict_id: 0, dict_is_ordered: false, metadata: {} }
   adding window function with alias
   col exists - not pushing Alias(Alias { expr: WindowFunction(WindowFunction { 
fun: AggregateUDF(AggregateUDF { inner: Sum { signature: Signature { 
type_signature: UserDefined, volatility: Immutable } } }), args: [Column(Column 
{ relation: None, name: "value" })], partition_by: [], order_by: [], 
window_frame: WindowFrame { units: Rows, start_bound: Preceding(UInt64(NULL)), 
end_bound: Following(UInt64(NULL)), is_causal: false }, null_treatment: None 
}), relation: None, name: "total_value" })
   Traceback (most recent call last):
     File 
"/home/mike/workspace/datafusion-python/dev/examples/tpch/q11_important_stock_identification.py",
 line 70, in <module>
       df = df.with_column(
            ^^^^^^^^^^^^^^^
     File 
"/home/mike/workspace/datafusion-python/dev/python/datafusion/dataframe.py", 
line 164, in with_column
       return DataFrame(self.df.with_column(name, expr.expr))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Exception: Error during planning: Projections require unique expression 
names but the expression "value AS total_value" at position 1 and "sum(value) 
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS total_value" at 
position 2 have the same name. Consider aliasing ("AS") one of them.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to