Great, thanks for the info and pointer to the repo!

From: Patrick Wendell<mailto:pwend...@gmail.com>
Sent: ?Friday?, ?June? ?6?, ?2014 ?5?:?11? ?PM
To: user@spark.apache.org<mailto:user@spark.apache.org>

They are forked and slightly modified for two reasons:

(a) Hive embeds a bunch of other dependencies in their published jars
such that it makes it really hard for other projects to depend on
them. If you look at the hive-exec jar they copy a bunch of other
dependencies directly into this jar. We modified the Hive 0.12 build
to produce jars that do not include other dependencies inside of them.

(b) Hive replies on a version of protobuf that means it is
incompatible with certain Hadoop versions. We used a shaded version of
the protobuf dependency to avoid this.

The forked copy is here - feel free to take a look:
https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf

I'm hoping the upstream Hive project will change their published
artifacts to make them usable as a library for other applications.
Unfortunately as it stands we had to fork our own copy of these to
make it work. I think it's being tracked by this JIRA:

https://issues.apache.org/jira/browse/HIVE-5733

- Patrick

On Fri, Jun 6, 2014 at 12:08 PM, Silvio Fiorito
<silvio.fior...@granturing.com> wrote:
> Is there a repo somewhere with the code for the Hive dependencies
> (hive-exec, hive-serde, & hive-metastore) used in SparkSQL? Are they forked
> with Spark-specific customizations, like Shark, or simply relabeled with a
> new package name ("org.spark-project.hive")? I couldn't find any repos on
> Github or Apache main.
>
> I'm wanting to use some Hive packages outside of the ones burned into the
> Spark JAR but I'm having all sorts of headaches due to "jar-hell" with the
> Hive JARs in CDH or even HDP mismatched with the Spark Hive JARs.
>
> Thanks,
> Silvio

Reply via email to