Re: Arrow Support in Parquet Writers

2020-07-08 Thread Ryan Blue
Yes, I think there is a reasonable path to an implementation that doesn't require the Iceberg API. While the first step is just getting it working, I think we could refactor and remove the dependency on iceberg-api and iceberg-parquet. Then we would have a module that can be used independently. Th

Re: Arrow Support in Parquet Writers

2020-07-07 Thread Chen Song
I agree that we should leverage the existing Java-Parquet implementation as much as possible and hopefully have Iceberg depending on that impl/lib with a very thin adaptor/wrapper layer. Chen On Mon, Jul 6, 2020 at 5:54 PM Wes McKinney wrote: > Is there is a path to having an Arrow<->Parquet im

Re: Arrow Support in Parquet Writers

2020-07-06 Thread Wes McKinney
Is there is a path to having an Arrow<->Parquet implementation in Java that does not have a hard dependency on Iceberg? This is a common ask and it seems like it would be a clear community win that would net more contributors than something Iceberg-specific. On Mon, Jul 6, 2020 at 2:54 PM Ryan Blu

Re: Arrow Support in Parquet Writers

2020-07-06 Thread Ryan Blue
Sure, if you need an Arrow writer and want to work on it, we would be happy to include it in Iceberg. What is your use case? The main reason why we don't have one is that neither Presto nor Spark uses Arrow for writing. On Mon, Jul 6, 2020 at 9:04 AM Chen Song wrote: > I looked at the Iceberg D

Arrow Support in Parquet Writers

2020-07-06 Thread Chen Song
I looked at the Iceberg Data API and found that the write is row based. If I want to use a columnar data file format like Parquet and efficiently sink columnar data in memory (like Arrow). I assume it is not