Hi Chen, Iceberg's API requires that the caller divides data correctly into files according to the partition spec. Most of the time, users interact with Iceberg using a processing engine like Spark or Presto that will do it for you. If you're using the API directly, then you'll need to ensure you partition the rows into data files and pass the correct partition tuples when appending those files to the table.
The core API is mainly intended for use by the processing engines, but we're expanding support in the `iceberg-data` module for people who want to interact directly. There are probably some things we could do to make this easier, especially when partitioning data. If you have suggestions, please feel free to open an issue or pull request. rb On Thu, Jul 2, 2020 at 9:19 AM Chen Song <chen.song...@gmail.com> wrote: > I have a question on how hidden partitioning works in Iceberg using Java > API. > The code is something like the following. > > ``` > // records is the list of records with a time column > // table is created using partition spec hour(time) > // records have different rows with different hours > > Table table = loadTable(); > > Path path = new Path(...); > FileAppender<Record> appender = Avro.write(fromPath(path, conf)).build(); > appender.addAll(records); > appender.close(); > > DataFile dataFile = DataFiles.builder(table.spec()) > > .withInputFile(HadoopInputFile.fromPath(path, conf)) > .build(); > > table.newAppend().appendFile(dataFile).commit(); > ``` > However, once committed, I still see only one partition count updated and > one data file persisted, even though the underlying records > spread different hours. > > I think I use the API in the wrong way but appreciate if someone can help > me on the right way to write partitioned data. > > > Thanks, > -- > Chen Song > > -- Ryan Blue Software Engineer Netflix