hi Masayuki,

I don't have direct experience using Arrow with Parquet in Java, but a
common approach is to set a batch size (number of logical rows) and
compute a sequence of Arrow record batches converted from the Parquet
file.

We are only supporting monolithic file and row group reads in C++
(https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/reader.h)
but I suspect we will eventually need a "scanner" that yields a
sequence of evenly sized record batches (so individual chunks are not
too large in memory). Such an interface can be used in an asynchronous
data flow setting.

- Wes

On Sun, Jul 23, 2017 at 9:19 AM, Masayuki Takahashi
<masayuki...@gmail.com> wrote:
> Hi,
>
> I try to convert Parquet files to Arrow.
> https://gist.github.com/masayuki038/4be6c8538dfd4563a8d5ff743cf375ae
>
> And I have a question.
>
> When converting Parquet to Arrow, is it the right idea to make Arrow's
> VectorSchemaRoot for each RowGroup of Parquet?
>
> thanks.
>
> 2017-07-21 5:19 GMT+09:00 Wes McKinney <wesmck...@gmail.com>:
>> hi Sven,
>>
>> There is a placeholder project in apache/parquet-mr
>> https://github.com/apache/parquet-mr/tree/master/parquet-arrow.
>>
>> It appears in the meantime that Dremio has created a vectorized
>> Parquet <-> Arrow reader/writer which has just been open sourced under
>> ASL 2.0: 
>> https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
>>
>> I am sure they are very busy right now, but it may be worth discussing
>> factoring out this Parquet <-> Arrow interface into a library
>> component that can be donated to Apache Parquet.
>>
>> - Wes
>>
>> On Wed, Jul 19, 2017 at 4:28 PM, Sven Wagner-Boysen
>> <sven.wagner-boy...@signavio.com> wrote:
>>> Hi,
>>>
>>> I started looking into the projects Parquet and Arrow. Looks very promising
>>> to me.
>>>
>>> I also came across PyArrow and the Parquet-Arrow integration in Python. Is
>>> there something similar available for Java?
>>>
>>> https://arrow.apache.org/docs/python/parquet.html
>>>
>>> Thanks
>>> Sven
>
>
>
> --
> 高橋 真之

Reply via email to