Re 4. you create ChunkedArray from Array. BR
J śr., 22 lis 2023 o 20:48 Aldrin <octalene....@pm.me.invalid> napisał(a): > Assuming the C++ implementation, Jacek's suggestion (#3 below) is probably > best. Here is some extra context: > > 1. You can slice larger RecordBatches [1] > 2. You can make a larger RecordBatch [2] from columns of smaller > RecordBatches [3] probably using the correct type of Builder [4] and with a > bit of resistance from the various types > 3. As Jacek said, you can wrap smaller RecordBatches together as a Table > [5], combine the chunks [6], and then convert back to RecordBatches using a > TableBatchReader [7] if necessary > 4. I didn't see anything useful in the Compute API for concatenating > arbitrary Arrays or RecordBatches, but you can use Selection functions [8] > instead of Slicing for anything that's too big. > > > [1]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch5SliceE7int64_t7int64_t > > [2]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow11RecordBatch4MakeENSt10shared_ptrI6SchemaEE7int64_tNSt6vectorINSt10shared_ptrI5ArrayEEEE > [3]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch6columnEi > [4]: https://arrow.apache.org/docs/cpp/arrays.html#building-an-array > > [5]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow5Table17FromRecordBatchesENSt10shared_ptrI6SchemaEERKNSt6vectorINSt10shared_ptrI11RecordBatchEEEE > [6]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow5Table13CombineChunksEP10MemoryPool > [7]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow16TableBatchReaderE > > [8]: https://arrow.apache.org/docs/cpp/compute.html#selections > > > > # ------------------------------ > > # Aldrin > > > https://github.com/drin/ > > https://gitlab.com/octalene > > https://keybase.io/octalene > > > On Wednesday, November 22nd, 2023 at 10:58, Jacek Pliszka < > jacek.plis...@gmail.com> wrote: > > > > Hi! > > > > > I think some code is needed for clarity. You can concatenate tables (and > > combine_chunks afterwards) or arrays. Then pass such concatenated one. > > > > > Regards, > > > > > Jacek > > > > > śr., 22 lis 2023 o 19:54 Lee, David (PAG) david....@blackrock.com > .invalid > > > > > napisał(a): > > > > > > I've got 36 million rows of data which ends up as a record batch with > 3000 > > > batches ranging from 12k to 300k rows each. I'm assuming these batches > are > > > created using the multithreaded CSV file reader.. > > > > > > > Is there anyway to reorg the data into sometime like 36 batches > consistent > > > of 1 million rows each? > > > > > > > What I'm seeing when we try to load this data using the ADBC Snowflake > > > driver is that each individual batch is executed as a bind array > insert in > > > the Snowflake Go Driver. > > > 3,000 bind array inserts is taking 3 hours.. > > > > > > > This message may contain information that is confidential or > privileged. > > > If you are not the intended recipient, please advise the sender > immediately > > > and delete this message. See > > > http://www.blackrock.com/corporate/compliance/email-disclaimers for > > > further information. Please refer to > > > http://www.blackrock.com/corporate/compliance/privacy-policy for more > > > information about BlackRock’s Privacy Policy. > > > > > > > For a list of BlackRock's office addresses worldwide, see > > > http://www.blackrock.com/corporate/about-us/contacts-locations. > > > > > > > © 2023 BlackRock, Inc. All rights reserved.