They are forked and slightly modified for two reasons: (a) Hive embeds a bunch of other dependencies in their published jars such that it makes it really hard for other projects to depend on them. If you look at the hive-exec jar they copy a bunch of other dependencies directly into this jar. We modified the Hive 0.12 build to produce jars that do not include other dependencies inside of them.
(b) Hive replies on a version of protobuf that means it is incompatible with certain Hadoop versions. We used a shaded version of the protobuf dependency to avoid this. The forked copy is here - feel free to take a look: https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf I'm hoping the upstream Hive project will change their published artifacts to make them usable as a library for other applications. Unfortunately as it stands we had to fork our own copy of these to make it work. I think it's being tracked by this JIRA: https://issues.apache.org/jira/browse/HIVE-5733 - Patrick On Fri, Jun 6, 2014 at 12:08 PM, Silvio Fiorito <silvio.fior...@granturing.com> wrote: > Is there a repo somewhere with the code for the Hive dependencies > (hive-exec, hive-serde, & hive-metastore) used in SparkSQL? Are they forked > with Spark-specific customizations, like Shark, or simply relabeled with a > new package name ("org.spark-project.hive")? I couldn't find any repos on > Github or Apache main. > > I'm wanting to use some Hive packages outside of the ones burned into the > Spark JAR but I'm having all sorts of headaches due to "jar-hell" with the > Hive JARs in CDH or even HDP mismatched with the Spark Hive JARs. > > Thanks, > Silvio