timsaucer opened a new pull request, #17289: URL: https://github.com/apache/datafusion/pull/17289
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/15882. ## Rationale for this change There are many use cases where you have a column of data that contains an array and you want to transform every element in that array. The current work around is to do something like unnest and then aggregate. This is bad from both ergonomics and performance. With this work we add a function `array_transform` that will take a scalar function and apply it to every element in an array. This PR is narrowly scoped as a first proof of concept. It does not address aggregation as #15882 requests and it is limited in scope to cases where all other variables passed to the inner function must be scalar values. ## What changes are included in this PR? Adds `array_transform` and unit tests. ## Are these changes tested? Unit test provided that demonstrates both low level testing of the invocation and also a full test demonstrating it in operation with a dataframe. ## Are there any user-facing changes? No ## Still to do before ready to merge - [ ] Add additional documentation describing how all the pieces of this work - [ ] Create a plan for how to expand beyond other variables requiring to be scalar values - [ ] Create a plan for addressing the aggregation case or open an issue for something like `array_aggregate` - [ ] Address how it can be used with SQL commands instead of only dataframe operations - [ ] Potentially move the integrated test to a different location - dataframe may not be the right place to test a function -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org