JK Pasimuthu created HIVE-27903:
-----------------------------------

             Summary: TBLPROPERTIES('history.expire.max-snapshot-age-ms') 
doesn't work
                 Key: HIVE-27903
                 URL: https://issues.apache.org/jira/browse/HIVE-27903
             Project: Hive
          Issue Type: Improvement
          Components: Hive
    Affects Versions: 4.0.0-alpha-2
            Reporter: JK Pasimuthu


[https://github.com/apache/iceberg/issues/9123]

The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
expiring snapshots.
 #  

CREATE TABLE IF NOT EXISTS test5d78b6 (
id INT, random1 STRING
)
PARTITIONED BY (random2 STRING)
STORED BY ICEBERG
TBLPROPERTIES (
'write.format.default'='orc',
'format-version'='2',
'write.orc.compression-codec'='none'

)
 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # SLEEP for 30 seconds

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, uuid(), 
uuid() FROM test5d78b6

 # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
13:23:54.455')) * 1000;

 # ALTER TABLE test5d78b6 SET 
tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed time 
in ms from the second insert and current time

 # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');

 # SELECT COUNT FROM default.test5d78b6.snapshots;

output: 1. it should be 2 rows. The default 1 is retained an all snapshots are 
expired as usual, so setting the property has no effect.

Additional Info: the default value for 'history.expire.max-snapshot-age-ms' is 
5 days per this link: [https://iceberg.apache.org/docs/1.3.1/configuration/]

Now while writing the tests and running them, the expiring snapshots just 
worked fine within few seconds of the snapshots being created.

So, I'm assuming that this option doesn't have any effect right now. Having 
said that, I'm thinking the implications on end user will have if we fix this.

The end user may not know about this option at all and will have tough time 
figuring out why the snapshots are not getting expired. One option could be to 
set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to