alamb commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2228166564
## Improve Aggregate performance for multi-column grouping when at least one column is variable length * https://github.com/apache/datafusion/issues/9403 **What**: Queries like `GROUP BY url, code` where `url` is a string are significantly slower than `GROUP BY url`. We already have the single string column case handled with https://github.com/apache/datafusion/issues/7064 **Why**: There are several queries like this in ClickBench where copying string data to form group keys consumes significant time **What is left**: @jayzhan211 already has shown the basic idea of #9430 works in https://github.com/apache/datafusion/pull/10976. What is left is to figure out how to get the types to work out in the plans and ensure it doesn't cause performance regressions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
