Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
FYI -- It looks like the built-in Hive version in the master branch of Apache Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to 3+) is an open issue. On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr

Re: [DISCUSS] Hive Support

2025-01-06 Thread Wing Yew Poon
Hi Peter, In Spark, you can specify the Hive version of the metastore that you want to use. There is a configuration, spark.sql.hive.metastore.version, which currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this default version are shipped with Spark as built-in. You can speci

[DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Manu Zhang
Hi all, While maintaining Iceberg tables for our customers, I find it's difficult to set a default snapshot expiration time (`history.expire.max-snapshot-age-ms`) for different workloads. The default value of 5 days looks good for daily batch jobs but is too long for frequently-updated jobs. I'm

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Ajantha Bhat
Hi Manu, We already have `retain_last` and `history.expire.min-snapshots-to-keep` to retain the snapshots based on count. Can you please elaborate on why can't we use the same? - Ajantha On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa wrote: > Thanks Manu for starting this discussion. Tha

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Manu Zhang
Hi Ajantha, `history.expire.min-snapshots-to-keep` is the *minimum number of snapshots* we can keep. I'm proposing to decide the *maximum number of snapshots* to keep by count rather than by age. Thanks, Manu On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat wrote: > Hi Manu, > > We already have `re

Re: [DISCUSS] Support keeping at most N snapshots

2025-01-06 Thread Walaa Eldin Moustafa
Thanks Manu for starting this discussion. That is definitely a valid feature. I have always found maintaining snapshots by day makes it harder to provide different types of guarantees/contracts especially when tables change rates are diverse or irregular. Maintaining by snapshot count makes a lot o

Re: [DISCUSS] REST: Way to query if metadata pointer is the latest

2025-01-06 Thread Jean-Baptiste Onofré
Hi Gabor I did a new pass on the proposal and it looks good to me. Great work ! I'm volunteer to work with you on the spec PR according to the doc. Thoughts ? Regards JB On Thu, Dec 19, 2024 at 11:09 AM Gabor Kaszab wrote: > > Hi All, > > Just an update that the proposal went through some ite

Re: [DISCUSS] Hive Support

2025-01-06 Thread Péter Váry
Hi Manu, > Spark has only added hive 4.0 metastore support recently for Spark 4.0[1] and there will be conflicts Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use Hive 2 when it is present on the classpath, but if older Hive versions are not on the classpath then it will u