RE: Getting last modified timestamp/other stats per partition

2022-03-08 Thread Mayur Srivastava
has a different commit time). Will we be able to store additional stats, e.g. commit times, per data file or partition in the tagged snapshot? From: Szehon Ho Sent: Monday, March 7, 2022 1:40 PM To: Iceberg Dev List Subject: Re: Getting last modified timestamp/other stats per partition 2

Re: Getting last modified timestamp/other stats per partition

2022-03-07 Thread Szehon Ho
ome recommendation on the amount of history for >> snapshots. >> >> 2. How can we distinguish between snapshots where new data was >> added vs snapshots where compaction was done? >> >> >> >> Thanks, >> >> Mayur >> >>

Re: Getting last modified timestamp/other stats per partition

2022-03-07 Thread Ryan Blue
gt; > *From:* Mayur Srivastava > *Sent:* Thursday, February 24, 2022 7:27 AM > *To:* dev@iceberg.apache.org > *Subject:* RE: Getting last modified timestamp/other stats per partition > > > > Thanks Szehon. I’ll give this a try. > > > > *From:* Szehon Ho > *Se

RE: Getting last modified timestamp/other stats per partition

2022-03-07 Thread Mayur Srivastava
data was added vs snapshots where compaction was done? Thanks, Mayur From: Mayur Srivastava Sent: Thursday, February 24, 2022 7:27 AM To: dev@iceberg.apache.org Subject: RE: Getting last modified timestamp/other stats per partition Thanks Szehon. I’ll give this a try. From: Szehon Ho

RE: Getting last modified timestamp/other stats per partition

2022-02-24 Thread Mayur Srivastava
Thanks Szehon. I’ll give this a try. From: Szehon Ho Sent: Wednesday, February 23, 2022 1:38 PM To: Iceberg Dev List Subject: Re: Getting last modified timestamp/other stats per partition Hi Probably the metadata tables can help with this. For the size/num_rows of partitions, you can query

Re: Getting last modified timestamp/other stats per partition

2022-02-23 Thread Szehon Ho
Hi Probably the metadata tables can help with this. For the size/num_rows of partitions, you can query the files table, https://iceberg.apache.org/docs/latest/spark-queries/#files. (Because Iceberg keeps stats for files, and not necessary partitions). SELECT partition, sum(file_size_in_bytes),