alamb commented on issue #11442:
URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2228166564

   ## Improve Aggregate performance for multi-column grouping when at least one 
column is variable length
   * https://github.com/apache/datafusion/issues/9403
   
   **What**: Queries like `GROUP BY url, code` where `url` is a string are 
significantly slower than `GROUP BY url`. We already have the single string 
column case handled with https://github.com/apache/datafusion/issues/7064
   **Why**: There are several queries like this in ClickBench where copying 
string data to form group keys consumes significant time
   **What is left**: @jayzhan211  already has shown the basic idea of #9430 
works in https://github.com/apache/datafusion/pull/10976. What is left is to 
figure out how to get the types to work out in the plans and ensure it doesn't 
cause performance regressions
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to