RE: Is there anyway to resize record batches

2023-11-27 Thread Lee, David (PAG)
ber 27, 2023 8:20 AM To: dev@arrow.apache.org Subject: Re: Is there anyway to resize record batches External Email: Use caution with links and attachments Following up here, Dewey pointed out that the "right" way to do this would be to use Snowflake's own bulk ingestion support.

Re: Is there anyway to resize record batches

2023-11-27 Thread David Li
> new_table = pa.Table.from_batches(new_batches) > > cursor = adbc_conn.cursor() > cursor.adbc_ingest(table_name="xyz", data=new_table, mode="append") > cursor.execute("commit") > > -Original Message- > From: Aldrin > Sent: Wednesday, November 22, 20

RE: Is there anyway to resize record batches

2023-11-22 Thread Lee, David (PAG)
append") cursor.execute("commit") -Original Message- From: Aldrin Sent: Wednesday, November 22, 2023 12:36 PM To: dev@arrow.apache.org Subject: Re: Is there anyway to resize record batches As far as I understand, that bundles the Arrays into a ChunkedArray which only Table inte

Re: Is there anyway to resize record batches

2023-11-22 Thread Aldrin
As far as I understand, that bundles the Arrays into a ChunkedArray which only Table interacts with. It doesn't make a longer Array and depending on what the ADBC Snowflake driver is doing that may or may not help with the number of invocations that are happening. Also, its not portable across

Re: Is there anyway to resize record batches

2023-11-22 Thread Jacek Pliszka
Re 4. you create ChunkedArray from Array. BR J śr., 22 lis 2023 o 20:48 Aldrin napisał(a): > Assuming the C++ implementation, Jacek's suggestion (#3 below) is probably > best. Here is some extra context: > > 1. You can slice larger RecordBatches [1] > 2. You can make a larger RecordBatch [2] f

Re: Is there anyway to resize record batches

2023-11-22 Thread Aldrin
Assuming the C++ implementation, Jacek's suggestion (#3 below) is probably best. Here is some extra context: 1. You can slice larger RecordBatches [1] 2. You can make a larger RecordBatch [2] from columns of smaller RecordBatches [3] probably using the correct type of Builder [4] and with a bit

Re: Is there anyway to resize record batches

2023-11-22 Thread Jacek Pliszka
Hi! I think some code is needed for clarity. You can concatenate tables (and combine_chunks afterwards) or arrays. Then pass such concatenated one. Regards, Jacek śr., 22 lis 2023 o 19:54 Lee, David (PAG) napisał(a): > I've got 36 million rows of data which ends up as a record batch with 300

Is there anyway to resize record batches

2023-11-22 Thread Lee, David (PAG)
I've got 36 million rows of data which ends up as a record batch with 3000 batches ranging from 12k to 300k rows each. I'm assuming these batches are created using the multithreaded CSV file reader.. Is there anyway to reorg the data into sometime like 36 batches consistent of 1 million rows e