[ https://issues.apache.org/jira/browse/ARROW-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-16703: ----------------------------------- Labels: pull-request-available (was: ) > [R] Refactor map_batches() so it can stream results > --------------------------------------------------- > > Key: ARROW-16703 > URL: https://issues.apache.org/jira/browse/ARROW-16703 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Affects Versions: 8.0.0 > Reporter: Will Jones > Priority: Major > Labels: pull-request-available > Fix For: 10.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > As part of ARROW-15271, {{map_batches()}} was modified to return a > {{RecordBatchReader}}, but the implementation collects all results as a list > of record batches and then converts that to a reader. In theory, if we push > the implementation down to C++, we should be able to make a proper streaming > RBR. > We won't know the schema ahead of time. We could optionally accept it, which > would allow the function to be lazy. Or we could eagerly evaluate just the > first batch to determine the schema. -- This message was sent by Atlassian Jira (v8.20.10#820010)