[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790127#comment-17790127
 ] 

JK Pasimuthu commented on HIVE-27903:
-------------------------------------

The RETAIN LAST is count based. The use case involving max-sanpshot-age is time 
based. So the use cases are different. 

The documentation about older_than and retain_last is specific for Spark 
procedure expire_snapshots. I don't think we have a similar procedure for hive. 

Is there a way one can specify ALTER TABLE .... expire_snapshot without any 
arguments against hive? 

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> ----------------------------------------------------------------
>
>                 Key: HIVE-27903
>                 URL: https://issues.apache.org/jira/browse/HIVE-27903
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: JK Pasimuthu
>            Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to