alamb commented on PR #12092:
URL: https://github.com/apache/datafusion/pull/12092#issuecomment-2354023193
I figured out what is going on (different than I thought). I believe
`StringView::slice()` is quite a bit slower than `StringArray::slice` due to
the fact it has a buffers field
The query is
```sql
SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1')
AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer")
FROM hits_partitioned
WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC
LIMIT 25;
```
Some flamegraphs:



I will think about the best way to proceed here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]