Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-12 Thread via GitHub
alamb commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2585712897 Looks like this PR is ready to go so I'll merge it in. Let's handle any follow on work with subsequent PRs. Thanks @tlm365 @comphead and @jayzhan211 -- This is an automated m

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-12 Thread via GitHub
alamb merged PR #14020: URL: https://github.com/apache/datafusion/pull/14020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-11 Thread via GitHub
tlm365 commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1911936164 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +263,279 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set_general<

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-10 Thread via GitHub
comphead commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2583243383 > NOTE: One notable thing I want to point out here. find_in_set(str, str_list) doesn't work if str or str_list is Scalar::Utf8View (string view literal), which is true for both t

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-09 Thread via GitHub
tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2581930161 > Oh with literal support I think the code becomes much more tricky. > Wondering if the performance benefit still worthy such complications. @tlm365 can we a criterion to check `fin

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
comphead commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2578524584 Oh with literal support I think the code becomes much more tricky. Wondering if the performance benefit still worthy such complications. @tlm365 can we a criterion to check `find_

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2578291774 I just pushed some updates to support scalar args. Could you please take a look? @jayzhan211 @comphead -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2577664206 @comphead Thanks for reviewing, > I think it is a good PR the way it is. One thing comes to my mind which probably relevant for other string functions as well. > > So we

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
tlm365 commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1904127737 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +138,144 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set_general<

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
jayzhan211 commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1904073770 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +138,144 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set_gene

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
jayzhan-synnada commented on code in PR #14020: URL: https://github.com/apache/datafusion/pull/14020#discussion_r1904073374 ## datafusion/functions/src/unicode/find_in_set.rs: ## @@ -138,31 +138,144 @@ fn find_in_set(args: &[ArrayRef]) -> Result { } } -pub fn find_in_set

[PR] Improve performance of `find_in_set` function [datafusion]

2025-01-06 Thread via GitHub
tlm365 opened a new pull request, #14020: URL: https://github.com/apache/datafusion/pull/14020 ## Which issue does this PR close? Closes #. ## Rationale for this change Improve performance of `find_in_set` function ## What changes are included in this PR?