Re: Merge multiple record batches

Wes McKinney Mon, 19 Feb 2018 14:42:59 -0800

The function is just pyarrow.concat_tables. It's missing from the API
reference and ought to have a small section in the documentation.
Patches welcomed


https://issues.apache.org/jira/browse/ARROW-2181

On Mon, Feb 19, 2018 at 5:04 PM, Bryan Cutler <cutl...@gmail.com> wrote:
> Hi Rares,
>
> I'm not sure what version of Arrow you are using, but pyarrow.Table has a
> function to concat multiple tables together so the usage would be something
> like this:
>
> table_all = pa.Table.concat_tables([table1, table2])
>
> On Wed, Feb 14, 2018 at 4:01 AM, ALBERTO Bocchinfuso <
> alberto_boc...@hotmail.it> wrote:
>
>> Hi,
>> I don’t think I understood perfectly your point, but I try to give you the
>> answer that looks the simplest to me.
>> In your code there isn’t any operation on table 1 and 2 separately, it
>> just looks like you want to merge all those RecordBatches.
>> Now I think that:
>>
>>   1.  you can use the to_batches() operation reported in the API for
>> Table, but I never tried it myself. In this way you create 2 tables, create
>> batches from these tables, put the batches togheter.
>>   2.  I would rather store ALL the BATCHES in the two streams in the SAME
>> python LIST, and then create an unique table using from_batches() as you
>> suggested. That’s because in your code you create two tables even though
>> you don’t seem to care about them.
>>
>> I didn’t try, but I think that you can go both ways and then tell us if
>> the result is the same and if one of the two is faster then the other.
>>
>> Alberto
>>
>> Da: Rares Vernica<mailto:rvern...@gmail.com>
>> Inviato: mercoledì 14 febbraio 2018 05:13
>> A: dev@arrow.apache.org<mailto:dev@arrow.apache.org>
>> Oggetto: Merge multiple record batches
>>
>> Hi,
>>
>> If I have multiple RecordBatchStreamReader inputs, what is the recommended
>> way to get all the RecordBatch from all the inputs together, maybe in a
>> Table? They all have the same schema. The source for the readers are
>> different files.
>>
>> So, I do something like:
>>
>> reader1 = pa.open_stream('foo')
>> table1 = reader1.read_all()
>>
>> reader2 = pa.open_stream('bar')
>> table2 = reader2.read_all()
>>
>> # table_all = ???
>> # OR maybe I don't need to create table1 and table2
>> # table_all = pa.Table.from_batches( ??? )
>>
>> Thanks!
>> Rares
>>
>>

Re: Merge multiple record batches

Reply via email to