[ https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790127#comment-17790127 ]
JK Pasimuthu commented on HIVE-27903: ------------------------------------- The RETAIN LAST is count based. The use case involving max-sanpshot-age is time based. So the use cases are different. The documentation about older_than and retain_last is specific for Spark procedure expire_snapshots. I don't think we have a similar procedure for hive. Is there a way one can specify ALTER TABLE .... expire_snapshot without any arguments against hive? > TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work > ---------------------------------------------------------------- > > Key: HIVE-27903 > URL: https://issues.apache.org/jira/browse/HIVE-27903 > Project: Hive > Issue Type: Improvement > Components: Hive > Affects Versions: 4.0.0-alpha-2 > Reporter: JK Pasimuthu > Priority: Major > > [https://github.com/apache/iceberg/issues/9123] > The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while > expiring snapshots. > # > CREATE TABLE IF NOT EXISTS test5d78b6 ( > id INT, random1 STRING > ) > PARTITIONED BY (random2 STRING) > STORED BY ICEBERG > TBLPROPERTIES ( > 'write.format.default'='orc', > 'format-version'='2', > 'write.orc.compression-codec'='none' > ) > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SLEEP for 30 seconds > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, > uuid(), uuid() FROM test5d78b6 > # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 > 13:23:54.455')) * 1000; > # ALTER TABLE test5d78b6 SET > tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed > time in ms from the second insert and current time > # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10'); > # SELECT COUNT FROM default.test5d78b6.snapshots; > output: 1. it should be 2 rows. The default 1 is retained an all snapshots > are expired as usual, so setting the property has no effect. > Additional Info: the default value for 'history.expire.max-snapshot-age-ms' > is 5 days per this link: > [https://iceberg.apache.org/docs/1.3.1/configuration/] > Now while writing the tests and running them, the expiring snapshots just > worked fine within few seconds of the snapshots being created. > So, I'm assuming that this option doesn't have any effect right now. Having > said that, I'm thinking the implications on end user will have if we fix this. > The end user may not know about this option at all and will have tough time > figuring out why the snapshots are not getting expired. One option could be > to set the default to 0ms. -- This message was sent by Atlassian Jira (v8.20.10#820010)