tlm365 commented on PR #14020:
URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2581930161

   > Oh with literal support I think the code becomes much more tricky.
   > Wondering if the performance benefit still worthy such complications. 
@tlm365 can we a criterion to check `find_in_set` with literal vs original 
optimization?
   
   @comphead I have added benchmark for case `find_in_set(str, str_list)` - 
`str_list` is literal. And the benchmark result compare the **original PR** vs 
**literal support commit** (faster) here:
   ```rust
   find_in_set_scalar/string_len_8
                           time:   [57.247 µs 57.355 µs 57.477 µs]
                           change: [-77.622% -77.503% -77.415%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     7 (7.00%) high mild
     6 (6.00%) high severe
   
   find_in_set_scalar/string_len_32
                           time:   [57.872 µs 58.447 µs 59.215 µs]
                           change: [-77.737% -77.460% -77.182%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 16 outliers among 100 measurements (16.00%)
     16 (16.00%) high severe
   
   find_in_set_scalar/string_len_1024
                           time:   [58.024 µs 58.462 µs 58.993 µs]
                           change: [-77.049% -76.867% -76.660%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 18 outliers among 100 measurements (18.00%)
     3 (3.00%) high mild
     15 (15.00%) high severe
   ```
   ---
   NOTE:
   One notable thing I want to point out here. `find_in_set(str, str_list)` 
doesn't work if `str` or `str_list` is `Scalar::Utf8View` (string view 
literal), which is true for both the `main` branch and the **original version** 
of this PR (unit tests for these cases have been added in this PR, seems like a 
bug?! 🤔)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to