Yes, that sounds correct. The problem that you're talking about with small
files is a major reason to use a processing framework to write (or later
rewrite) the data.
On Mon, Jul 6, 2020 at 8:34 AM Chen Song wrote:
> Thanks for the clarification, Ryan.
>
> In its simplest form, if using Parquet,
Thanks for the clarification, Ryan.
In its simplest form, if using Parquet, it is possible to add a layer on
top of the existing GenericParquetWriter to distribute writing the list of
records into data files based on the table's partition spec.
One key feature that the processing engines like Spa
Hi Chen,
Iceberg's API requires that the caller divides data correctly into files
according to the partition spec. Most of the time, users interact with
Iceberg using a processing engine like Spark or Presto that will do it for
you. If you're using the API directly, then you'll need to ensure you