ber 27, 2023 8:20 AM
To: dev@arrow.apache.org
Subject: Re: Is there anyway to resize record batches
External Email: Use caution with links and attachments
Following up here, Dewey pointed out that the "right" way to do this would be
to use Snowflake's own bulk ingestion support.
> new_table = pa.Table.from_batches(new_batches)
>
> cursor = adbc_conn.cursor()
> cursor.adbc_ingest(table_name="xyz", data=new_table, mode="append")
> cursor.execute("commit")
>
> -Original Message-
> From: Aldrin
> Sent: Wednesday, November 22, 20
append")
cursor.execute("commit")
-Original Message-
From: Aldrin
Sent: Wednesday, November 22, 2023 12:36 PM
To: dev@arrow.apache.org
Subject: Re: Is there anyway to resize record batches
As far as I understand, that bundles the Arrays into a ChunkedArray which only
Table inte
As far as I understand, that bundles the Arrays into a ChunkedArray which only
Table interacts with. It doesn't make a longer Array and depending on what the
ADBC Snowflake driver is doing that may or may not help with the number of
invocations that are happening.
Also, its not portable across
Re 4. you create ChunkedArray from Array.
BR
J
śr., 22 lis 2023 o 20:48 Aldrin napisał(a):
> Assuming the C++ implementation, Jacek's suggestion (#3 below) is probably
> best. Here is some extra context:
>
> 1. You can slice larger RecordBatches [1]
> 2. You can make a larger RecordBatch [2] f
Assuming the C++ implementation, Jacek's suggestion (#3 below) is probably
best. Here is some extra context:
1. You can slice larger RecordBatches [1]
2. You can make a larger RecordBatch [2] from columns of smaller RecordBatches
[3] probably using the correct type of Builder [4] and with a bit
Hi!
I think some code is needed for clarity. You can concatenate tables (and
combine_chunks afterwards) or arrays. Then pass such concatenated one.
Regards,
Jacek
śr., 22 lis 2023 o 19:54 Lee, David (PAG)
napisał(a):
> I've got 36 million rows of data which ends up as a record batch with 300
I've got 36 million rows of data which ends up as a record batch with 3000
batches ranging from 12k to 300k rows each. I'm assuming these batches are
created using the multithreaded CSV file reader..
Is there anyway to reorg the data into sometime like 36 batches consistent of 1
million rows e