Hi we are using Cloudera 5.7.0
there's a use case to process XML data, we are using the https://github.com/dvasilen/Hive-XML-SerDe XML serde is working with Hive execution engine as Map-Reduce, we enabled Hive on Spark to test the performance, and we are facing following issue 16/06/23 12:47:45 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 3 16/06/23 12:47:45 INFO executor.Executor: Running task 0.3 in stage 0.0 (TID 3) 16/06/23 12:47:45 INFO rdd.HadoopRDD: Input split: Paths:/tmp/STYN/data/1040_274316329.xml:0+7406,/tmp/STYN/data/1040__274316331.xml:0+7496InputFormatClass: com.ibm.spss.hive.serde2.xml.XmlInputFormat 16/06/23 12:47:45 INFO exec.Utilities: PLAN PATH = hdfs://devcdh/tmp/hive/yesh/c9554491-f58c-4472-b3c5-f47eb5722dd4/hive_2016-06-23_12-47-29_259_4396208623700590328-9/-mr-10003/c79302c5-6f16-4887-85b4-67e781a9ed97/map.xml 16/06/23 12:47:45 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 (TID 3) java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:212) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:721) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251) ... 18 more Caused by: java.io.IOException: CombineHiveRecordReader: class not found com.ibm.spss.hive.serde2.xml.XmlInputFormat at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:55) ... 23 more i did following steps to ensure that the XML Serde is in Hive class path - configured hive aux jars path, in Cloudera Manager - Manually copied jar to all the nodes i am unable to figure out the issue here, Any pointers would be a great help Thanks, -Yeshwanth