adriangb commented on PR #20071:
URL: https://github.com/apache/datafusion/pull/20071#issuecomment-3825107178

   > > I think it should be much simpler than this: create a UDF and pass it 
through `ProjectionExprs::transform_exprs(|expr| expr.transform(|expr| // if 
expr is our ScalarUDF, replace with literal filename))`
   > 
   > Thanks @adriangb - really appreciate the feedback!
   > 
   > I've been working on implementing your approach where `input_file_name()` 
stays as a 0-arg scalar UDF, then `ProjectionOpener::open(partitioned_file)` 
does `expr.transform(...)` to replace the `ScalarFunctionExpr` with 
`Literal(Utf8(partitioned_file.location))`.
   > 
   > Does this direction look right to you?
   
   Yes that's what I'm thinking. Both `ProjectionOpener` and `FileSource` 
implementations that don't use it (`ParquetOpener`) should do the rewrite.
   
   > One question on plumbing: in many plans (e.g. `SELECT input_file_name() 
FROM t WHERE a > 1`), a top `ProjectionExec` can't push through `FilterExec` 
without dropping columns needed by the filter. This causes the UDF to stay 
above the scan and hit the runtime error guard.
   > 
   > For this first PR, which is better?
   > 
   >     1. Minimal opener rewrite only: works only when projection naturally 
pushes into the scan
   > 
   >     2. Include analyzer/optimizer/planner glue: marks scan as needing the 
reserved column, injects UDF into file source projection so it works with 
filters/sorts/limits
   
   This is precisely what I was referring to in 
https://github.com/apache/datafusion/issues/6051#issuecomment-3792652539. 
analyzer/optimizer/planner to sort this out is very complex and error prone. I 
think let's not do that for now. You should make the default behavior null 
instead of an error.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to