Dandandan commented on issue #18411: URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3665233281
You are totally right of course - we shouldn't make optimizations that are only useful for some tpc-h query and not in the wild. Doing the optimization in general for short strings is super useful (and work for all short string / byte views). > Is there a mechanism in DataFusion already to carry the necessary meta information from the table all the way to the aggregation? The Arrow physical types alone aren't sufficiently rich to model that. Although I think that this might be useful as well, there is information in the table schema (i.e. certain fields are `char(1)`) that helps making the parquet read or aggregations on those fields faster and consume less memory than processing it as a variable-width utf8 field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
