Hi Sean, Thanks for the reply. I'm on CDH 5.0.3 and upgrading the whole cluster to 5.1.0 will eventually happen but not immediately.
I've tried running the CDH spark-1.0 release and also building it from source. This, unfortunately goes into a whole other rathole of dependencies. :-( Eric On Sun, Aug 10, 2014 at 10:16 AM, Sean Owen <so...@cloudera.com> wrote: > As far as I can tell, the method was removed after 0.12.0 in the fix > for HIVE-5223 ( > https://github.com/apache/hive/commit/4059a32f34633dcef1550fdef07d9f9e044c722c#diff-948cc2a95809f584eb030e2b57be3993 > ), > and that fix was back-ported in its entirety to 5.0.0+: > > http://archive.cloudera.com/cdh5/cdh/5/hive-0.12.0-cdh5.0.0.releasenotes.html > > The fix was evidently also important, but it's not clear the build can > have the fix and keep this method, not without forking via a custom > patch. Even though CDH5 never *didn't* have this version of the code, > it creates this sort of surprising problem. > > I imagine it's not the only instance of this kind of problem people > will ever encounter. Can you rebuild Spark with this particular > release of Hive? > > Because that's what the Spark that was shipped with CDH would have > done. Are you replacing / not using that? > > > > > > On Sun, Aug 10, 2014 at 5:36 PM, Eric Friedman > <eric.d.fried...@gmail.com> wrote: > > I have a CDH5.0.3 cluster with Hive tables written in Parquet. > > > > The tables have the "DeprecatedParquetInputFormat" on their metadata, and > > when I try to select from one using Spark SQL, it blows up with a stack > > trace like this: > > > > java.lang.RuntimeException: java.lang.ClassNotFoundException: > > parquet.hive.DeprecatedParquetInputFormat > > > > at > > > org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:309) > > > > > > Fair enough, DeprecatedParquetInputFormat isn't in the Spark assembly > built > > with Hive. > > > > > > if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in > > order to get DeprecatedParquetInputFormat, I find out that there is an > > incompatibility in the SerDeUtils class. Spark's Hive snapshot expects > to > > find > > > > > > java.lang.NoSuchMethodError: > > > org.apache.hadoop.hive.serde2.SerDeUtils.lookupDeserializer(Ljava/lang/String;)Lorg/apache/hadoop/hive/serde2/Deserializer; > > > > at > > > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:217) > > > > > > But that isn't in the Hive snapshot provided by CDH5.0.3 > > > > > > Both Spark and CDH label their Hive versions as 0.12.0. > > > > > > According to the Apache SVN server, CDH is the one that's out of step, as > > this method is definitely on the 0.12.0 release. I have raised a ticket > > with Cloudera about this. > > > > > > Has anyone found a workaround? > > > > > > I did try extracting a subset of jars from hive-exec.jar, but that > quickly > > turned into a journey down the rabbit hole. >