The function is just pyarrow.concat_tables. It's missing from the API reference and ought to have a small section in the documentation. Patches welcomed
https://issues.apache.org/jira/browse/ARROW-2181 On Mon, Feb 19, 2018 at 5:04 PM, Bryan Cutler <cutl...@gmail.com> wrote: > Hi Rares, > > I'm not sure what version of Arrow you are using, but pyarrow.Table has a > function to concat multiple tables together so the usage would be something > like this: > > table_all = pa.Table.concat_tables([table1, table2]) > > On Wed, Feb 14, 2018 at 4:01 AM, ALBERTO Bocchinfuso < > alberto_boc...@hotmail.it> wrote: > >> Hi, >> I don’t think I understood perfectly your point, but I try to give you the >> answer that looks the simplest to me. >> In your code there isn’t any operation on table 1 and 2 separately, it >> just looks like you want to merge all those RecordBatches. >> Now I think that: >> >> 1. you can use the to_batches() operation reported in the API for >> Table, but I never tried it myself. In this way you create 2 tables, create >> batches from these tables, put the batches togheter. >> 2. I would rather store ALL the BATCHES in the two streams in the SAME >> python LIST, and then create an unique table using from_batches() as you >> suggested. That’s because in your code you create two tables even though >> you don’t seem to care about them. >> >> I didn’t try, but I think that you can go both ways and then tell us if >> the result is the same and if one of the two is faster then the other. >> >> Alberto >> >> Da: Rares Vernica<mailto:rvern...@gmail.com> >> Inviato: mercoledì 14 febbraio 2018 05:13 >> A: dev@arrow.apache.org<mailto:dev@arrow.apache.org> >> Oggetto: Merge multiple record batches >> >> Hi, >> >> If I have multiple RecordBatchStreamReader inputs, what is the recommended >> way to get all the RecordBatch from all the inputs together, maybe in a >> Table? They all have the same schema. The source for the readers are >> different files. >> >> So, I do something like: >> >> reader1 = pa.open_stream('foo') >> table1 = reader1.read_all() >> >> reader2 = pa.open_stream('bar') >> table2 = reader2.read_all() >> >> # table_all = ??? >> # OR maybe I don't need to create table1 and table2 >> # table_all = pa.Table.from_batches( ??? ) >> >> Thanks! >> Rares >> >>