findepi commented on issue #12604: URL: https://github.com/apache/datafusion/issues/12604#issuecomment-2371819367
> why explicitly carrying the type in `Expr` would improve performance. If `get_type` is slow, wouldn't we face the same cost of determining the type when creating an `Expr`? correct, unless `get_type` is called more than once. optimizer rules may check types. DF doesn't have as many optimizer rules yet, but I am basing this on observation and profiling of Trino optimizer. > If the computing the type is expensive, we could also consider caching the result. The type will be needed sooner or later. No need to have implicit cache for something that is needed and can be "cached" explicitly. Explicitness helps understand the contract -- if something is a field it implies it won't change. Also, can Expr be used against two different DFSchemas? Currently it can (and may produce different types), but i assume this capability is incidental more than intentional. > We also have a nullable property for Expr, if it makes sense not to explicit carrying nullable for Expr, wouldn't the same logic apply to type as well. That's a good question. There are also other type- or value- related properties (aka traits), like range of possible values. Not sure yet how these should be handled. > to make this large change it might be a very good reason for doing it. I think we need to list the exact benefits of having this refactoring @comphead that's a very good call. my main motivation isn't performance, it's the design & reliability. With DF being "LLVM for data processing", explicit is superior to implicit, especially that DF user could have different type derivation rules. Eg different coercions between types. To be even more concrete with example, Trino and SQL Server has different rules for coercing decimals of different precision and scale. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
