[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101125#comment-15101125 ]
Rui Li commented on HIVE-12828: ------------------------------- Looked at the log and error is {noformat} 2016-01-14T14:38:11,889 - 16/01/14 14:38:11 WARN TaskSetManager: Lost task 0.0 in stage 136.0 (TID 238, ip-10-233-128-9.us-west-1.compute.internal): java.io.IOException: java.lang.reflect.InvocationTargetException 2016-01-14T14:38:11,889 - at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) 2016-01-14T14:38:11,889 - at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:269) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:216) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:343) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:680) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.Task.run(Task.scala:89) 2016-01-14T14:38:11,890 - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 2016-01-14T14:38:11,890 - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2016-01-14T14:38:11,890 - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2016-01-14T14:38:11,890 - at java.lang.Thread.run(Thread.java:744) 2016-01-14T14:38:11,890 - Caused by: java.lang.reflect.InvocationTargetException 2016-01-14T14:38:11,890 - at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source) 2016-01-14T14:38:11,890 - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 2016-01-14T14:38:11,890 - at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:255) 2016-01-14T14:38:11,890 - ... 21 more 2016-01-14T14:38:11,891 - Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$BaseGroupBuilder; 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:160) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:223) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:248) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:94) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:80) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67) 2016-01-14T14:38:11,891 - ... 25 more {noformat} The missing method exists in parquet-1.8.1 (hive) but not in 1.7.0 (spark). So I think it's still using the old tarball. > Update Spark version to 1.6 > --------------------------- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)