Thanks Szehon. I’ll give this a try. From: Szehon Ho <szehon.apa...@gmail.com> Sent: Wednesday, February 23, 2022 1:38 PM To: Iceberg Dev List <dev@iceberg.apache.org> Subject: Re: Getting last modified timestamp/other stats per partition
Hi Probably the metadata tables can help with this. For the size/num_rows of partitions, you can query the files table, https://iceberg.apache.org/docs/latest/spark-queries/#files. (Because Iceberg keeps stats for files, and not necessary partitions). SELECT partition, sum(file_size_in_bytes), sum(record_count) from $my_table.files f GROUP BY f.partition This will be compressed size (again Iceberg keeps file-level stats and so not sure if there are any stats for uncompressed sizes.) For the last modified time, it will be slightly harder. The file's physical modified time is not good enough because it's not exactly when it is 'committed' into Iceberg. You may have to try a more advanced query on the snapshots table and manifest-entries table: https://iceberg.apache.org/docs/latest/spark-queries/#snapshots SELECT MAX(s.committed_at),e.data_file.partition FROM $my_table.snapshots s JOIN $my_table.entries e WHERE s.snapshot_id = e.snapshot_id GROUP_BY by e.data_file.partition Hope that helps, Szehon On Wed, Feb 23, 2022 at 8:50 AM Mayur Srivastava <mayur.srivast...@twosigma.com<mailto:mayur.srivast...@twosigma.com>> wrote: Hi, In Iceberg, is there a way to get the last modified timestamp and other stats (e.g. num rows, uncompressed size, compressed size) of the data per partition? Thanks, Mayur