[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790142#comment-17790142
 ] 

Ayush Saxena commented on HIVE-27903:
-------------------------------------

The Spark procedure also invokes the same Iceberg API & Hive does the same, The 
iceberg layer is same for hive, spark or any other engine.

When you use the count based expiration, it uses the table property for time, 
when you use the time based expiration it uses the count based table property.

That is what Iceberg is designed. 
{quote}Is there a way one can specify ALTER TABLE .... expire_snapshot without 
any arguments against hive? 
{quote}
Nopes,

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> ----------------------------------------------------------------
>
>                 Key: HIVE-27903
>                 URL: https://issues.apache.org/jira/browse/HIVE-27903
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: JK Pasimuthu
>            Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to