[ https://issues.apache.org/jira/browse/ARROW-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-15820: ----------------------------------- Labels: pull-request-available (was: ) > [C++][Doc] Add table_source to streaming_execution.rst & clarify parameter > name > ------------------------------------------------------------------------------- > > Key: ARROW-15820 > URL: https://issues.apache.org/jira/browse/ARROW-15820 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently the table_source node does not appear in our documentation. > Also, in {{TableSourceNodeOptions}} we have: > {noformat} > // Size of batches to emit from this node > // If the table is larger the node will emit multiple batches from the > // the table to be processed in parallel. > int64_t batch_size; > {noformat} > However, when looking into a performance issue today, I realized this > description is incomplete. In reality we should probably call this parameter > {{max_batch_size}}. > Furthermore, we should make it clear that a table with smaller batches will > emit smaller batches directly (this is a good thing in my case) and will not > concatenate small batches together into a larger batch. -- This message was sent by Atlassian Jira (v8.20.1#820001)