Hi Ryan,

Generally we add createTime and modifiedTime in table schema. However, due to historical reasons some hive tables do not have createTime and modifiedTime. When these hive tables are transformed to iceberg tables, we hope createTime and transient_lastDdl can be retained, so we can still do data expiration and track table activities. Once snapshots expire, we can not get these time info from iceberg. It seems the only way to solve the problem I mentioned above is to modify these hive table schema and rewrite them. Do you think so?

Thanks


On 01/29/2021 02:40Ryan Blue<rb...@netflix.com.INVALID> wrote:
Chong,

Once snapshots expire, I don't think that there is a way to recover the time that a given partition was created.

Can you explain more about what you're trying to do? When we age off data, we use the age of the records themselves, not the age from metadata. In other words, we use the logical timestamp from a row to expire it, not the timestamp when it was added to the table. You might consider doing that as well. I think it is probably a better way to ensure compliance.

rb

On Thu, Jan 28, 2021 at 9:42 AM chong luo <luochong....@gmail.com> wrote:
Hi Iceberg Devs


Im currently working on delete expired table and partition in iceberg. However, I can not find table/partition creation time, it seems iceberg only stores snapshot creation time. In hive, transient_lastDdlTime, createTime and lastAccessTime are stored in metastore. With time metadata, we can know when table is changed and track related jobs. 


Is there any way to get the time metadata mentioned above in the current implementation of iceberg?



Thanks.


--
Ryan Blue
Software Engineer
Netflix

Reply via email to