Thanks for clarification Nicholas. Now the point is often we have when using Spark with hive, the spark_session is created as below
spark_session = SparkSession.builder.enableHiveSupport().appName(appName).getOrCreate() and if I go to jars directory /opt/spark/jars> ls -l hi* -rw-r--r--. 1 hduser hadoop 258346 Oct 21 03:29 hive-storage-api-2.8.1.jar -rw-r--r--. 1 hduser hadoop 12923 Oct 21 03:29 hive-shims-scheduler-2.3.9.jar -rw-r--r--. 1 hduser hadoop 120293 Oct 21 03:29 hive-shims-common-2.3.9.jar -rw-r--r--. 1 hduser hadoop 8786 Oct 21 03:29 hive-shims-2.3.9.jar -rw-r--r--. 1 hduser hadoop 53902 Oct 21 03:29 hive-shims-0.23-2.3.9.jar -rw-r--r--. 1 hduser hadoop 1679366 Oct 21 03:29 hive-service-rpc-3.1.3.jar -rw-r--r--. 1 hduser hadoop 916630 Oct 21 03:29 hive-serde-2.3.9.jar -rw-r--r--. 1 hduser hadoop 8195966 Oct 21 03:29 hive-metastore-2.3.9.jar -rw-r--r--. 1 hduser hadoop 326585 Oct 21 03:29 hive-llap-common-2.3.9.jar -rw-r--r--. 1 hduser hadoop 116364 Oct 21 03:29 hive-jdbc-2.3.9.jar -rw-r--r--. 1 hduser hadoop 10840949 Oct 21 03:29 hive-exec-2.3.9-core.jar -rw-r--r--. 1 hduser hadoop 436169 Oct 21 03:29 hive-common-2.3.9.jar -rw-r--r--. 1 hduser hadoop 44704 Oct 21 03:29 hive-cli-2.3.9.jar -rw-r--r--. 1 hduser hadoop 183633 Oct 21 03:29 hive-beeline-2.3.9.jar I have all these jars there but are you implying that the potential vulnerability will be from hive-metastore-2.3.9.jar alone or all of hive jars? Cheers Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Mon, 27 Jan 2025 at 20:33, NICHOLAS MARION <nmar...@us.ibm.com> wrote: > I’m am not entirely sure how Apache Hive is structured, but this CVE > refers to HIVE-25468 <https://issues.apache.org/jira/browse/HIVE-25468> > and Backport of HIVE-25468 into Hive 3.x > <https://github.com/apache/hive/pull/4309/files>; which indicates that > the Hive Metastore-server and Hive Standalone-metastore were updated for > the fix. > > > > When looking at vulnerabilities, many security teams, including ours, have > begun to look at them as Vulnerable or Affected. Vulnerable being, directly > impacted by the vulnerability and exploitable; while Affected is indicating > if a vulnerable dependency/package/jar is being delivered with a product. > In this case, Spark is delivering these hive jars within the distribution: > > > > * 183633 Dec 16 23:33 hive-beeline-2.3.9.jar* > > * 44704 Dec 16 23:33 hive-cli-2.3.9.jar* > > * 436169 Dec 16 23:33 hive-common-2.3.9.jar* > > *10840949 Dec 16 23:33 hive-exec-2.3.9-core.jar* > > * 116364 Dec 16 23:33 hive-jdbc-2.3.9.jar* > > *326585 Dec 16 23:33 hive-llap-common-2.3.9.jar* > > *8195966 Dec 16 23:33 hive-metastore-2.3.9.jar* > > * 916630 Dec 16 23:33 hive-serde-2.3.9.jar* > > *1679366 Dec 16 23:33 hive-service-rpc-3.1.3.jar* > > *53902 Dec 16 23:33 hive-shims-0.23-2.3.9.jar* > > *8786 Dec 16 23:33 hive-shims-2.3.9.jar* > > * 120293 Dec 16 23:33 hive-shims-common-2.3.9.jar* > > * 12923 Dec 16 23:33 hive-shims-scheduler-2.3.9.jar* > > * 258346 Dec 16 23:33 hive-storage-api-2.8.1.jar* > > * 577200 Dec 16 23:33 spark-hive-thriftserver_2.13-3.5.4.jar* > > * 735193 Dec 16 23:33 spark-hive_2.13-3.5.4.jar* > > > > And to extend that further, these outdated Apache Hive dependencies pull > in other older dependencies: > > > > * 75567 Dec 16 23:33 jackson-annotations-2.15.2.jar* > > *549207 Dec 16 23:33 jackson-core-2.15.2.jar* > > *232248 Dec 16 23:33 jackson-core-asl-1.9.13.jar* > > *1620088 Dec 16 23:33 jackson-databind-2.15.2.jar* > > * 54630 Dec 16 23:33 jackson-dataformat-yaml-2.15.2.jar* > > *122937 Dec 16 23:33 jackson-datatype-jsr310-2.15.2.jar* > > *780664 Dec 16 23:33 jackson-mapper-asl-1.9.13.jar* > > *518681 Dec 16 23:33 jackson-module-scala_2.13-2.15.2.jar* > > * 37085 Dec 16 23:33 json4s-jackson_2.13-3.7.0-M11.jar* > > *2017388 Dec 16 23:33 parquet-jackson-1.13.1.jar* > > > > Which looking at a dependency like j*ackson-mapper-asl-1.9.13.jar*, it > has a Critical and High CVE against it. With that said, if a user > accidentally users one of these dependents in their Spark application; will > Java CLASSPATH, set the $SPARK_HOME/jars as precedent and in turn expose > the unknowing end user to a vulnerability that way? > > > > With all of that said, there is a Jira item SPARK-30466 > <https://issues.apache.org/jira/browse/SPARK-30466> to remove a > dependency like *Jackson-mapper-asl-1.9.13*; but it is stuck behind > SPARK-44114 <https://issues.apache.org/jira/browse/SPARK-44114> which in > turn is blocked by HIVE-27508 > <https://issues.apache.org/jira/browse/HIVE-27508>. Does Apache Spark’s > community have enough push to encourage Apache Hive’s team to possibly > release a Hive 3.1.4 which would solve both these old CVEs along with the > one Balaji brought up in this thread. This would especially be great as the > next likelihood for upgrading to Hive 3.x wouldn’t occur until Spark 5.x. > > > > Sincerely, > > > > *Nicholas T. Marion * > Senior AI and Analytics Development Lead | IBM zDNN Product Owner > * Mobile:* 1 845 649 3592 > * E-mail:* nmar...@us.ibm.com > > IBM > > > > *From: *Mich Talebzadeh <mich.talebza...@gmail.com> > *Date: *Monday, January 27, 2025 at 10:11 AM > *To: *Sean Owen <sro...@gmail.com> > *Cc: *Balaji Sudharsanam V <balaji.sudharsa...@ibm.com>, > dev@spark.apache.org <dev@spark.apache.org> > *Subject: *[EXTERNAL] Re: Spark 4.0 vulnerable with > hive-metastore-2.3.x.jar versions > > To answer your question, I did not read this CVE, but I am responding > solely from my previous experiences with vulennabiries and the thread owner > implications, having used spark in conjunction with Spark for many years. > Mich Talebzadeh, Architect > > To answer your question, I did not read this CVE, but I am responding > solely from my previous experiences with vulennabiries and the thread owner > implications, having used spark in conjunction with Spark for many years. > > > > > > Mich Talebzadeh, > > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > > > [image: Image removed by sender.] view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > > > > > On Mon, 27 Jan 2025 at 15:03, Sean Owen <sro...@gmail.com> wrote: > > Mich: did you read the CVE? I'm not clear, as this contains no reference > to the Hive functionality that is affected, or how it might relate to a > metastore. Please explain. Otherwise this looks like a generic AI-generated > response with no particularly relevant content. "In summary"... > > > > On Mon, Jan 27, 2025 at 8:57 AM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > I think the thread owner's point is valid. The default use of the Hive > Metastore by Spark further gives credence to the importance of addressing > this Hive vulnerability to ensure the security and reliability of Spark > applications. I use Hive as the default metastore for Spark as well. Spark > relies heavily on the Hive Metastore for managing critical metadata, such > as table schemas, data locations, and access control, unless you are using > a platform like Databricks with a unified catalog. In summary, this > dependency makes it essential to address any vulnerabilities within the > Hive Metastore, as they can indirectly impact the security and stability of > Spark applications among other things > > > > HTH > > > > Mich Talebzadeh, > > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > > > [image: Image removed by sender.] view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > > > > > On Mon, 27 Jan 2025 at 13:37, Sean Owen <sro...@gmail.com> wrote: > > It looks like that affects Hive, and not the metastore. I do not see that > it is relevant to Spark at first glance. > > > > > > On Mon, Jan 27, 2025 at 1:21 AM Balaji Sudharsanam V > <balaji.sudharsa...@ibm.com.invalid> wrote: > > Hi All, > > There is a vulnerability with ‘High’ severity found in the *Apache Spark > 3.x and 4.0.0 preview (2) releases,* with the hive-metastore-2.3.x.jar. > This is defined here, Apache Hive security bypass CVE-2021-34538 > Vulnerability Report > <https://exchange.xforce.ibmcloud.com/vulnerabilities/231404> > > > > The recommendation is to use upgrade to the latest version of Apache Hive > (*3.1.3, 4.0 or later*), available from the Apache Web site. > > > > Can we expect this getting fixed in the Apache Spark 4.0 GA ? > > Thanks, > > Balaji > > > >