findepi commented on issue #12604:
URL: https://github.com/apache/datafusion/issues/12604#issuecomment-2371819367

   > why explicitly carrying the type in `Expr` would improve performance. If 
`get_type` is slow, wouldn't we face the same cost of determining the type when 
creating an `Expr`?
   
   correct, unless `get_type` is called more than once. optimizer rules may 
check types. 
   DF doesn't have as many optimizer rules yet, but I am basing this on 
observation and profiling of Trino optimizer.
   
   >  If the computing the type is expensive, we could also consider caching 
the result. 
   
   The type will be needed sooner or later. No need to have implicit cache for 
something that is needed and can be "cached" explicitly. Explicitness helps 
understand the contract -- if something is a field it implies it won't change.
   
   Also, can Expr be used against two different DFSchemas? Currently it can 
(and may produce different types), but i assume this capability is incidental 
more than intentional.
   
   > We also have a nullable property for Expr, if it makes sense not to 
explicit carrying nullable for Expr, wouldn't the same logic apply to type as 
well.
   
   That's a good question.
   There are also other type- or value- related properties (aka traits), like 
range of possible values.
   
   Not sure yet how these should be handled.
   
   
   > to make this large change it might be a very good reason for doing it. I 
think we need to list the exact benefits of having this refactoring
   
   
   @comphead that's a very good call.
   my main motivation isn't performance, it's the design & reliability. With DF 
being "LLVM for data processing", explicit is superior to implicit, especially 
that DF user could have different type derivation rules. Eg different coercions 
between types. To be even more concrete with example, Trino and SQL Server has 
different rules for coercing decimals of different precision and scale.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to