Hi

we are using Cloudera 5.7.0

there's a use case to process XML data,
we are using the https://github.com/dvasilen/Hive-XML-SerDe

XML serde is working  with Hive execution engine as Map-Reduce,

we enabled Hive on Spark  to test the performance, and we are  facing
following issue

16/06/23 12:47:45 INFO executor.CoarseGrainedExecutorBackend: Got
assigned task 3
16/06/23 12:47:45 INFO executor.Executor: Running task 0.3 in stage 0.0 (TID 3)
16/06/23 12:47:45 INFO rdd.HadoopRDD: Input split:
Paths:/tmp/STYN/data/1040_274316329.xml:0+7406,/tmp/STYN/data/1040__274316331.xml:0+7496InputFormatClass:
com.ibm.spss.hive.serde2.xml.XmlInputFormat

16/06/23 12:47:45 INFO exec.Utilities: PLAN PATH =
hdfs://devcdh/tmp/hive/yesh/c9554491-f58c-4472-b3c5-f47eb5722dd4/hive_2016-06-23_12-47-29_259_4396208623700590328-9/-mr-10003/c79302c5-6f16-4887-85b4-67e781a9ed97/map.xml
16/06/23 12:47:45 ERROR executor.Executor: Exception in task 0.3 in
stage 0.0 (TID 3)
java.io.IOException: java.lang.reflect.InvocationTargetException
        at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:212)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332)
        at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:721)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
        ... 18 more
Caused by: java.io.IOException: CombineHiveRecordReader: class not
found com.ibm.spss.hive.serde2.xml.XmlInputFormat
        at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:55)
        ... 23 more


 i did following steps to ensure that the XML Serde is in Hive class path

   -   configured hive aux jars path, in Cloudera Manager
   - Manually copied jar to all the nodes

i am unable to figure out the issue here,

Any pointers would be a great help


Thanks,
-Yeshwanth

Reply via email to