tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2581930161
> Oh with literal support I think the code becomes much more tricky. > Wondering if the performance benefit still worthy such complications. @tlm365 can we a criterion to check `find_in_set` with literal vs original optimization? @comphead I have added benchmark for case `find_in_set(str, str_list)` - `str_list` is literal. And the benchmark result compare the **original PR** vs **literal support commit** (faster) here: ```rust find_in_set_scalar/string_len_8 time: [57.247 µs 57.355 µs 57.477 µs] change: [-77.622% -77.503% -77.415%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe find_in_set_scalar/string_len_32 time: [57.872 µs 58.447 µs 59.215 µs] change: [-77.737% -77.460% -77.182%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 16 (16.00%) high severe find_in_set_scalar/string_len_1024 time: [58.024 µs 58.462 µs 58.993 µs] change: [-77.049% -76.867% -76.660%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe ``` --- NOTE: One notable thing I want to point out here. `find_in_set(str, str_list)` doesn't work if `str` or `str_list` is `Scalar::Utf8View` (string view literal), which is true for both the `main` branch and the **original version** of this PR (unit tests for these cases have been added in this PR, seems like a bug?! 🤔) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org