What's the size of this table? Is the data skewed (so that speculation
is probably triggered)?
Cheng
On 6/15/15 10:37 PM, Night Wolf wrote:
Hey Yin,
Thanks for the link to the JIRA. I'll add details to it. But I'm able
to reproduce it, at least in the same shell session, every time I do a
write I get a random number of tasks failing on the first run with the
NPE.
Using dynamic allocation of executors in YARN mode. No speculative
execution is enabled.
On Tue, Jun 16, 2015 at 3:11 PM, Yin Huai <yh...@databricks.com
<mailto:yh...@databricks.com>> wrote:
I saw it once but I was not clear how to reproduce it. The jira I
created is https://issues.apache.org/jira/browse/SPARK-7837.
More information will be very helpful. Were those errors from
speculative tasks or regular tasks (the first attempt of the
task)? Is this error deterministic (can you reproduce every time
you run this command)?
Thanks,
Yin
On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf
<nightwolf...@gmail.com <mailto:nightwolf...@gmail.com>> wrote:
Looking at the logs of the executor, looks like it fails to
find the file; e.g. for task 10323.0
15/06/16 13:43:13 ERROR output.FileOutputCommitter: Hit
IOException trying to rename
maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0/part-r-353626.gz.parquet
to
maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353626.gz.parquet
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:13 ERROR mapred.SparkHadoopMapRedUtil: Error
committing the output of task:
attempt_201506161340_0000_m_010181_0
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:16 ERROR output.FileOutputCommitter: Hit
IOException trying to rename
maprfs:///user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0/part-r-353768.gz.parquet
to
maprfs:/user/hive/warehouse/is_20150617_test2/part-r-353768.gz.parquet
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:16 ERROR mapred.SparkHadoopMapRedUtil: Error
committing the output of task:
attempt_201506161341_0000_m_010323_0
java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
block size to 134217728
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
dictionary page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary
is on
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation
is off
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer
version is: PARQUET_1_0
15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP
15/06/16 13:43:20 INFO codec.CodecConfig: Compression: GZIP
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
block size to 134217728
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
block size to 134217728
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
dictionary page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Parquet
dictionary page size to 1048576
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary
is on
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Dictionary
is on
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation
is off
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Validation
is off
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer
version is: PARQUET_1_0
15/06/16 13:43:20 INFO hadoop.ParquetOutputFormat: Writer
version is: PARQUET_1_0
15/06/16 13:43:20 ERROR fs.MapRFileSystem: Failed to delete
path
maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0,
error: No such file or directory (2)
15/06/16 13:43:21 ERROR sources.DefaultWriterContainer: Task
attempt attempt_201506161340_0000_m_010181_0 aborted.
15/06/16 13:43:21 ERROR sources.InsertIntoHadoopFsRelation:
Aborting task.
java.lang.RuntimeException: Failed to commit task
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:398)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
... 9 more
15/06/16 13:43:21 ERROR fs.MapRFileSystem: Failed to delete
path
maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0,
error: No such file or directory (2)
15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new
compressor [.gz]
15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new
compressor [.gz]
15/06/16 13:43:21 INFO compress.CodecPool: Got brand-new
compressor [.gz]
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at
row 0. reading next block
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at
row 0. reading next block
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader: at
row 0. reading next block
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader:
block read in memory in 124 ms. row count = 998525
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader:
block read in memory in 201 ms. row count = 983534
15/06/16 13:43:21 INFO hadoop.InternalParquetRecordReader:
block read in memory in 217 ms. row count = 970355
15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task
attempt attempt_201506161341_0000_m_010323_0 aborted.
15/06/16 13:43:22 ERROR sources.InsertIntoHadoopFsRelation:
Aborting task.
java.lang.RuntimeException: Failed to commit task
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:398)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:157)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid source or target
at com.mapr.fs.MapRFileSystem.rename(MapRFileSystem.java:952)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:201)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:225)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:100)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:137)
at
org.apache.spark.sql.sources.BaseWriterContainer.commitTask(commands.scala:357)
at
org.apache.spark.sql.sources.DefaultWriterContainer.commitTask(commands.scala:394)
... 9 more
15/06/16 13:43:22 ERROR fs.MapRFileSystem: Failed to delete
path
maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161341_0000_m_010323_0,
error: No such file or directory (2)
15/06/16 13:43:22 ERROR fs.MapRFileSystem: Failed to delete
path
maprfs:/user/hive/warehouse/is_20150617_test2/_temporary/_attempt_201506161340_0000_m_010181_0,
error: No such file or directory (2)
15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task
attempt attempt_201506161341_0000_m_010323_0 aborted.
15/06/16 13:43:22 ERROR sources.DefaultWriterContainer: Task
attempt attempt_201506161340_0000_m_010181_0 aborted.
15/06/16 13:43:22 ERROR executor.Executor: Exception in task
10323.0 in stage 0.0 (TID 8896)
java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at
parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:22 ERROR executor.Executor: Exception in task
10181.0 in stage 0.0 (TID 8835)
java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at
parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/16 13:43:22 INFO executor.CoarseGrainedExecutorBackend:
Got assigned task 9552
15/06/16 13:43:22 INFO executor.Executor: Running task 11093.0
in stage 0.0 (TID 9552)
15/06/16 13:43:22 INFO executor.CoarseGrainedExecutorBackend:
Got assigned task 9553
15/06/16 13:43:22 INFO executor.Executor: Running task 10323.1
in stage 0.0 (TID 9553)
On Tue, Jun 16, 2015 at 1:47 PM, Night Wolf
<nightwolf...@gmail.com <mailto:nightwolf...@gmail.com>> wrote:
Hi guys,
Using Spark 1.4, trying to save a dataframe as a table, a
really simple test, but I'm getting a bunch of NPEs;
The code Im running is very simple;
qc.read.parquet("/user/sparkuser/data/staged/item_sales_basket_id.parquet").write.format("parquet").saveAsTable("is_20150617_test2")
Logs of tasks lost;
[Stage 0:=================================>
(8771 + 450) / 13000]15/06/16 03:42:30 WARN
TaskSetManager: Lost task 10681.0 in stage 0.0 (TID 8757,
qtausc-pphd0146): java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at
parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[Stage 0:==================================>
(9006 + 490) / 13000]15/06/16 03:43:22 WARN
TaskSetManager: Lost task 10323.0 in stage 0.0 (TID 8896,
qtausc-pphd0167): java.lang.NullPointerException
at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at
parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73)
at
org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:116)
at
org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:404)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org
<http://org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org>$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:160)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:132)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)