andygrove commented on PR #789:
URL: https://github.com/apache/datafusion-comet/pull/789#issuecomment-2276484812
> > is it possible to post any kind of benchmarks to show the improvements?
>
> I think the expected improvements from this are based on a heuristic that
it beneficial to continue to use dictionary encoding whenever possible. So the
improvement will depend on what other expressions are used in combination with
it. Therefore I think would be possible to construct a benchmark to show almost
any or no benefit. Do you have something more specific benchmark in mind or
what worry do we want to resolve with it?
I'd like to have a go at proving the benefit of this PR (to help with my own
understanding of how dictionary types can affect performance). I am thinking of
running something like this:
```sql
SELECT struct(foo, bar) as s FROM tbl WHERE s.foo RLIKE '^[A-Z]{1}'
```
My hypothesis is that this will be faster if we preserve the dictionary type
because 1) the regexp can be evaluated on fewer rows, and 2) we avoid the cost
of unpacking the dictionary in the first place
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]