Re: CDH5, HiveContext, Parquet

Sean Owen Sun, 10 Aug 2014 10:17:30 -0700

As far as I can tell, the method was removed after 0.12.0 in the fix
for HIVE-5223 
(https://github.com/apache/hive/commit/4059a32f34633dcef1550fdef07d9f9e044c722c#diff-948cc2a95809f584eb030e2b57be3993),
and that fix was back-ported in its entirety to 5.0.0+:
http://archive.cloudera.com/cdh5/cdh/5/hive-0.12.0-cdh5.0.0.releasenotes.html


The fix was evidently also important, but it's not clear the build can
have the fix and keep this method, not without forking via a custom
patch. Even though CDH5 never *didn't* have this version of the code,
it creates this sort of surprising problem.

I imagine it's not the only instance of this kind of problem people
will ever encounter. Can you rebuild Spark with this particular
release of Hive?

Because that's what the Spark that was shipped with CDH would have
done. Are you replacing / not using that?





On Sun, Aug 10, 2014 at 5:36 PM, Eric Friedman
<eric.d.fried...@gmail.com> wrote:
> I have a CDH5.0.3 cluster with Hive tables written in Parquet.
>
> The tables have the "DeprecatedParquetInputFormat" on their metadata, and
> when I try to select from one using Spark SQL, it blows up with a stack
> trace like this:
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> parquet.hive.DeprecatedParquetInputFormat
>
> at
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:309)
>
>
> Fair enough, DeprecatedParquetInputFormat isn't in the Spark assembly built
> with Hive.
>
>
> if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in
> order to get DeprecatedParquetInputFormat, I find out that there is an
> incompatibility in the SerDeUtils class.  Spark's Hive snapshot expects to
> find
>
>
> java.lang.NoSuchMethodError:
> org.apache.hadoop.hive.serde2.SerDeUtils.lookupDeserializer(Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/Deserializer;
>
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:217)
>
>
> But that isn't in the Hive snapshot provided by CDH5.0.3
>
>
> Both Spark and CDH label their Hive versions as 0.12.0.
>
>
> According to the Apache SVN server, CDH is the one that's out of step, as
> this method is definitely on the 0.12.0 release.  I have raised a ticket
> with Cloudera about this.
>
>
> Has anyone found a workaround?
>
>
> I did try extracting a subset of jars from hive-exec.jar, but that quickly
> turned into a journey down the rabbit hole.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: CDH5, HiveContext, Parquet

Reply via email to