we tried to cache table through hiveCtx = HiveContext(sc) hiveCtx.cacheTable("table name") as described on Spark 1.3.1's document and we're on CDH5.3.0 with Spark 1.3.1 built with Hadoop 2.6 following error message would occur if we tried to cache table with parquet format & GZIP though we're not sure if this error message has anything to do with the table format since we can execute SQLs on the exact same table, we just hope to use cachTable so that it might speed-up a little bit since we're querying on this table for several times. Any advise is welcomed! Thanks!
15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 227.0 in stage 0.0 (TID 278, f14ecats037): parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1198.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue (ParquetRecordReader.java:204) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext (NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext (InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon $1.hasNext(InMemoryColumnarTableScan.scala:153) at org.apache.spark.storage.MemoryStore.unrollSafely (MemoryStore.scala:248) at org.apache.spark.CacheManager.putInBlockManager (CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute (CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: optional binary dcqv_val (UTF8) != optional double dcqv_val at parquet.io.ColumnIOFactory $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:97) at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren (ColumnIOFactory.java:87) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:61) at parquet.schema.MessageType.accept(MessageType.java:55) at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) at parquet.hadoop.InternalParquetRecordReader.checkRead (InternalParquetRecordReader.java:125) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:193) ... 31 more 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 74.2 in stage 0.0 (TID 377, f14ecats025, NODE_LOCAL, 2153 bytes) 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 56.2 in stage 0.0 (TID 329) on executor f14ecats025: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1047.parquet) [duplicate 2] 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 165.1 in stage 0.0 (TID 378, f14ecats026, NODE_LOCAL, 2151 bytes) 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 145.0 in stage 0.0 (TID 133, f14ecats026): parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1123.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue (ParquetRecordReader.java:204) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext (NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext (InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon $1.hasNext(InMemoryColumnarTableScan.scala:153) at org.apache.spark.storage.MemoryStore.unrollSafely (MemoryStore.scala:248) at org.apache.spark.CacheManager.putInBlockManager (CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute (CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: optional binary dcqv_val (UTF8) != optional double dcqv_val at parquet.io.ColumnIOFactory $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:97) at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren (ColumnIOFactory.java:87) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:61) at parquet.schema.MessageType.accept(MessageType.java:55) at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) at parquet.hadoop.InternalParquetRecordReader.checkRead (InternalParquetRecordReader.java:125) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:193) ... 31 more 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Starting task 221.2 in stage 0.0 (TID 379, f14ecats035, NODE_LOCAL, 2154 bytes) 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 90.3 in stage 0.0 (TID 323) on executor f14ecats035: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1076.parquet) [duplicate 3] 15/05/26 15:21:32 ERROR scheduler.TaskSetManager: Task 90 in stage 0.0 failed 4 times; aborting job 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 52.0 in stage 0.0 (TID 48, f14ecats009): parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1043.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue (ParquetRecordReader.java:204) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext (NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext (InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon $1.hasNext(InMemoryColumnarTableScan.scala:153) at org.apache.spark.storage.MemoryStore.unrollSafely (MemoryStore.scala:248) at org.apache.spark.CacheManager.putInBlockManager (CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute (CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: optional binary dcqv_val (UTF8) != optional double dcqv_val at parquet.io.ColumnIOFactory $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:97) at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren (ColumnIOFactory.java:87) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:61) at parquet.schema.MessageType.accept(MessageType.java:55) at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) at parquet.hadoop.InternalParquetRecordReader.checkRead (InternalParquetRecordReader.java:125) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:193) ... 31 more 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 179.1 in stage 0.0 (TID 269) on executor f14ecats031: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1154.parquet) [duplicate 1] 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 98.0 in stage 0.0 (TID 45, f14ecats008): parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1083.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue (ParquetRecordReader.java:204) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext (NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext (InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon $1.hasNext(InMemoryColumnarTableScan.scala:153) at org.apache.spark.storage.MemoryStore.unrollSafely (MemoryStore.scala:248) at org.apache.spark.CacheManager.putInBlockManager (CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute (CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: optional binary dcqv_val (UTF8) != optional double dcqv_val at parquet.io.ColumnIOFactory $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:97) at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren (ColumnIOFactory.java:87) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:61) at parquet.schema.MessageType.accept(MessageType.java:55) at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) at parquet.hadoop.InternalParquetRecordReader.checkRead (InternalParquetRecordReader.java:125) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:193) ... 31 more 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 134.1 in stage 0.0 (TID 317) on executor f14ecats007: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1113.parquet) [duplicate 1] 15/05/26 15:21:32 INFO cluster.YarnScheduler: Cancelling stage 0 15/05/26 15:21:32 INFO cluster.YarnScheduler: Stage 0 was cancelled 15/05/26 15:21:32 WARN scheduler.TaskSetManager: Lost task 239.0 in stage 0.0 (TID 273, f14ecats036): parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1208.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:213) at parquet.hadoop.ParquetRecordReader.nextKeyValue (ParquetRecordReader.java:204) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext (NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext (InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon $1.hasNext(InMemoryColumnarTableScan.scala:153) at org.apache.spark.storage.MemoryStore.unrollSafely (MemoryStore.scala:248) at org.apache.spark.CacheManager.putInBlockManager (CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute (CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute (MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask (ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: optional binary dcqv_val (UTF8) != optional double dcqv_val at parquet.io.ColumnIOFactory $ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:97) at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren (ColumnIOFactory.java:87) at parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit (ColumnIOFactory.java:61) at parquet.schema.MessageType.accept(MessageType.java:55) at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) at parquet.hadoop.InternalParquetRecordReader.checkRead (InternalParquetRecordReader.java:125) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue (InternalParquetRecordReader.java:193) ... 31 more 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 36.1 in stage 0.0 (TID 328) on executor f14ecats036: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1029.parquet) [duplicate 1] 15/05/26 15:21:32 INFO scheduler.DAGScheduler: Stage 0 (mapPartitions at Exchange.scala:65) failed in 3.189 s 15/05/26 15:21:32 INFO scheduler.DAGScheduler: Job 0 failed: collect at /home/bdadm/SparkSQLTchart-1.3.py:19, took 4.255423 s 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 25.1 in stage 0.0 (TID 288) on executor f14ecats037: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-102.parquet) [duplicate 1] 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 211.1 in stage 0.0 (TID 281) on executor f14ecats037: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1183.parquet) [duplicate 1] 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 190.1 in stage 0.0 (TID 309) on executor f14ecats019: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1164.parquet) [duplicate 1] 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 95.1 in stage 0.0 (TID 270) on executor f14ecats037: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1080.parquet) [duplicate 1] 15/05/26 15:21:32 INFO scheduler.TaskSetManager: Lost task 86.1 in stage 0.0 (TID 280) on executor f14ecats026: parquet.io.ParquetDecodingException (Can not read value at 0 in block -1 in file hdfs://f14ecat/tmp/tchart_0501_final/part-r-1072.parquet) [duplicate 1] --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. --------------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org