This is what my WIP PR targets. It will help to identify any compatibility or breaking issues with the new dependency.
Thank you, Vlad On Mar 26, 2025, at 3:14 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: Because of dependencies we need to ensure that the underlying artifacts (Hive 4.0.1) is also stable enough. We should aim to establish that first and look for release timelines and where it fits cheers Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR [https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE] view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Wed, 26 Mar 2025 at 06:02, Rozov, Vlad <vro...@amazon.com.invalid> wrote: I started working on it. See https://github.com/apache/spark/pull/50213. Review and comments on the PR will help a lot. +1 for 4.1. It won’t be ready for 4.0 and will require extensive testing. I have few more local changes that fixes some tests in sql/hive and should publish another revision soon. Thank you, Vlad On Mar 25, 2025, at 10:12 PM, Wenchen Fan <cloud0...@gmail.com<mailto:cloud0...@gmail.com>> wrote: I agree, 4.0 is already in the RC stage and I think it's too late to do such a big version bump for the Hive dependency. We definitely need to do this upgrade and thanks for working on it! On Mon, Mar 24, 2025 at 1:31 PM Ángel Álvarez Pascua <angel.alvarez.pas...@gmail.com<mailto:angel.alvarez.pas...@gmail.com>> wrote: That's great news! If you need anything from me, just ask... We should also check and update other non-Hive third-party libraries with high/critical vulnerabilities, as someone mentioned in another email thread. Since this is a major change, I think we should leave it for Spark 4.1. What do you think? El lun, 24 mar 2025 a las 0:59, Mich Talebzadeh (<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>) escribió: For now, I am testing apache-hive-4.0.1-bin which is the latest release version from https://dlcdn.apache.org/hive/hive-4.0.1/ apache-hive-4.0.1-bin.tar My metastore is Oracle and upgrade scripts are provided.. My previous version is Hive 3.1.1 and the metastore upgrade went OK without any major headache. Now I just need to customise various files under $HIVE_HOME/conf and then I will have some testing underway. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR [https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE] view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Sun, 23 Mar 2025 at 17:13, Ángel Álvarez Pascua <angel.alvarez.pas...@gmail.com<mailto:angel.alvarez.pas...@gmail.com>> wrote: Well ... and then? When are we going to tackle this? I could help. El mié, 12 mar 2025, 15:50, Mich Talebzadeh <mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> escribió: Agreed. Hive upgrade is more time consuming as it involves backing up Hive schema on your metastore and then running Hive provided upgrade schema scripts against Hive schema that could be problematic,but needs to be done one way or another. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR [https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE] view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Wed, 12 Mar 2025 at 12:21, Ángel <angel.alvarez.pas...@gmail.com<mailto:angel.alvarez.pas...@gmail.com>> wrote: Not an easy task, I guess, but I'm totally for it too. The issue SPARK-49910<https://issues.apache.org/jira/browse/SPARK-49910> is related to this. El mar, 11 mar 2025 a las 23:06, Mich Talebzadeh (<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>) escribió: Yes I am all for it, as I use Hive with Oracle as its metastore extensively. Case in point, on 6th March A Hive user<https://lists.apache.org/thread/vhgxt1cj2ppc862j0lwxl63j6nfc7khh> alluded to it and I quote "I just wanted to highlight that Hive 3.x line is EOL. It has various known security vulnerabilities, many serious bugs (including wrong results and data corruption), and lacks lots of improvements and major features that are available in Hive 4. Upgrading is the right path forward." In summary, Hive 4.x likely includes performance improvements, new features, and bug fixes. Compiling against it would allow Spark to take advantage of these. Plus using the latest versions of both Spark and Hive is important for maintaining a secure data platform. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR [https://ci3.googleusercontent.com/mail-sig/AIorK4zholKucR2Q9yMrKbHNn-o1TuS4mYXyi2KO6Xmx6ikHPySa9MLaLZ8t2hrA6AUcxSxDgHIwmKE] view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Tue, 11 Mar 2025 at 19:08, Rozov, Vlad <vro...@amazon.com.invalid> wrote: Hi All, As Apache Hive announced EOL for Hive 2.x [1] and 3.x [2], should Spark be compiled against Hive 4.x and use it as default? Thank you, Vlad [1] https://lists.apache.org/thread/4ctrzfw60jkhc0hq2xoh1jpqxgt2zd93 [2] https://lists.apache.org/thread/99h6wr7nk4684r6tkcbm8ydfytgqy6f3 [3] https://github.com/apache/spark/pull/50213