Re: Question on partitioning using Java API

2020-07-06 Thread Ryan Blue
Yes, that sounds correct. The problem that you're talking about with small files is a major reason to use a processing framework to write (or later rewrite) the data. On Mon, Jul 6, 2020 at 8:34 AM Chen Song wrote: > Thanks for the clarification, Ryan. > > In its simplest form, if using Parquet,

Re: Question on partitioning using Java API

2020-07-06 Thread Chen Song
Thanks for the clarification, Ryan. In its simplest form, if using Parquet, it is possible to add a layer on top of the existing GenericParquetWriter to distribute writing the list of records into data files based on the table's partition spec. One key feature that the processing engines like Spa

Re: Question on partitioning using Java API

2020-07-02 Thread Ryan Blue
Hi Chen, Iceberg's API requires that the caller divides data correctly into files according to the partition spec. Most of the time, users interact with Iceberg using a processing engine like Spark or Presto that will do it for you. If you're using the API directly, then you'll need to ensure you