timsaucer opened a new pull request, #17289:
URL: https://github.com/apache/datafusion/pull/17289

   ## Which issue does this PR close?
   
   - Closes https://github.com/apache/datafusion/issues/15882.
   
   ## Rationale for this change
   
   There are many use cases where you have a column of data that contains an 
array and you want to transform every element in that array. The current work 
around is to do something like unnest and then aggregate. This is bad from both 
ergonomics and performance. With this work we add a function `array_transform` 
that will take a scalar function and apply it to every element in an array.
   
   This PR is narrowly scoped as a first proof of concept. It does not address 
aggregation as #15882 requests and it is limited in scope to cases where all 
other variables passed to the inner function must be scalar values.
   
   ## What changes are included in this PR?
   
   Adds `array_transform` and unit tests.
   
   ## Are these changes tested?
   
   Unit test provided that demonstrates both low level testing of the 
invocation and also a full test demonstrating it in operation with a dataframe.
   
   ## Are there any user-facing changes?
   
   No
   
   ## Still to do before ready to merge
   
   - [ ] Add additional documentation describing how all the pieces of this work
   - [ ] Create a plan for how to expand beyond other variables requiring to be 
scalar values
   - [ ] Create a plan for addressing the aggregation case or open an issue for 
something like `array_aggregate`
   - [ ] Address how it can be used with SQL commands instead of only dataframe 
operations
   - [ ] Potentially move the integrated test to a different location - 
dataframe may not be the right place to test a function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to