FYI --
It looks like the built-in Hive version in the master branch of Apache
Spark is 2.3.10 (https://issues.apache.org/jira/browse/SPARK-47018), and
https://issues.apache.org/jira/browse/SPARK-44114 (upgrade built-in Hive to
3+) is an open issue.
On Mon, Jan 6, 2025 at 1:07 PM Wing Yew Poon wr
Hi Peter,
In Spark, you can specify the Hive version of the metastore that you want
to use. There is a configuration, spark.sql.hive.metastore.version, which
currently (as of Spark 3.5) defaults to 2.3.9, and the jars supporting this
default version are shipped with Spark as built-in. You can speci
Hi all,
While maintaining Iceberg tables for our customers, I find it's difficult
to set a default snapshot expiration time
(`history.expire.max-snapshot-age-ms`) for different workloads. The default
value of 5 days looks good for daily batch jobs but is too long for
frequently-updated jobs.
I'm
Hi Manu,
We already have `retain_last` and `history.expire.min-snapshots-to-keep` to
retain the snapshots based on count. Can you please elaborate on why can't
we use the same?
- Ajantha
On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa
wrote:
> Thanks Manu for starting this discussion. Tha
Hi Ajantha,
`history.expire.min-snapshots-to-keep` is the *minimum number of snapshots*
we can keep. I'm proposing to decide the *maximum number of snapshots* to
keep by count rather than by age.
Thanks,
Manu
On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat wrote:
> Hi Manu,
>
> We already have `re
Thanks Manu for starting this discussion. That is definitely a valid
feature. I have always found maintaining snapshots by day makes it harder
to provide different types of guarantees/contracts especially when tables
change rates are diverse or irregular. Maintaining by snapshot count makes
a lot o
Hi Gabor
I did a new pass on the proposal and it looks good to me. Great work !
I'm volunteer to work with you on the spec PR according to the doc.
Thoughts ?
Regards
JB
On Thu, Dec 19, 2024 at 11:09 AM Gabor Kaszab wrote:
>
> Hi All,
>
> Just an update that the proposal went through some ite
Hi Manu,
> Spark has only added hive 4.0 metastore support recently for Spark 4.0[1]
and there will be conflicts
Does this mean that Spark 4.0 will always use Hive 4 code? Or it will use
Hive 2 when it is present on the classpath, but if older Hive versions are
not on the classpath then it will u