Hi, All.

I'm sending this email because it's important to discuss this topic narrowly
and make a clear conclusion.

`The forked Hive 1.2.1 is stable`? It sounds like a myth we created
by ignoring the existing bugs. If you want to say the forked Hive 1.2.1 is
stabler than XXX, please give us the evidence. Then, we can fix it.
Otherwise, let's stop making `The forked Hive 1.2.1` invincible.

Historically, the following forked Hive 1.2.1 has never been stable.
It's just frozen. Since the forked Hive is out of our control, we ignored
bugs.
That's all. The reality is a way far from the stable status.

    https://mvnrepository.com/artifact/org.spark-project.hive/

https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark
(2015 August)

https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2
(2016 April)

First, let's begin Hive itself by comparing with Apache Hive 1.2.2 and
1.2.3,

    Apache Hive 1.2.2 has 50 bug fixes.
    Apache Hive 1.2.3 has 9 bug fixes.

I will not cover all of them, but Apache Hive community also backports
important patches like Apache Spark community.

Second, let's move to SPARK issues because we aren't exposed to all Hive
issues.

    SPARK-19109 ORC metadata section can sometimes exceed protobuf message
size limit
    SPARK-22267 Spark SQL incorrectly reads ORC file when column order is
different

These were reported since Apache Spark 1.6.x because the forked Hive
doesn't have
a proper upstream patch like HIVE-11592 (fixed at Apache Hive 1.3.0).

Since we couldn't update the frozen forked Hive, we added Apache ORC
dependency
at SPARK-20682 (2.3.0), added a switching configuration at SPARK-20728
(2.3.0),
tured on `spark.sql.hive.convertMetastoreOrc by default` at SPARK-22279
(2.4.0).
However, if you turn off the switch and start to use the forked hive,
you will be exposed to the buggy forked Hive 1.2.1 again.

Third, let's talk about the new features like Hadoop 3 and JDK11.
No one believe that the ancient forked Hive 1.2.1 will work with this.
I saw that the following issue is mentioned as an evidence of Hive 2.3.6
bug.

    SPARK-29245 ClassCastException during creating HiveMetaStoreClient

Yes. I know that issue because I reported it and verified HIVE-21508.
It's fixed already and will be released ad Apache Hive 2.3.7.

Can we imagine something like this in the forked Hive 1.2.1?
'No'. There is no future on it. It's frozen.

>From now, I want to claim that the forked Hive 1.2.1 is the unstable one.
I welcome all your positive and negative opinions.
Please share your concerns and problems and fix them together.
Apache Spark is an open source project we shared.

Bests,
Dongjoon.

Reply via email to