You may try to set Hadoop conf "parquet.enable.summary-metadata" to
false to disable writing Parquet summary files (_metadata and
_common_metadata).
By default Parquet writes the summary files by collecting footers of all
part-files in the dataset while committing the job. Spark also follows
this convention. However, it turned out that the summary files aren't
very useful in practice now, unless you have other downstream tools that
strictly depend on the summary files. For example, if you don't need
schema merging, Spark simply picks a random part-file to discovery the
dataset schema. If you need schema merging, Spark has to read footers of
all part-files anyway (but in a distributed, parallel way).
Cheng
On 12/3/15 6:11 AM, Don Drake wrote:
Does anyone have any suggestions on creating a large amount of parquet
files? Especially in regards to the last phase where it creates the
_metadata.
Thanks.
-Don
On Sat, Nov 28, 2015 at 9:02 AM, Don Drake <dondr...@gmail.com
<mailto:dondr...@gmail.com>> wrote:
I have a 2TB dataset that I have in a DataFrame that I am
attempting to partition by 2 fields and my YARN job seems to write
the partitioned dataset successfully. I can see the output in
HDFS once all Spark tasks are done.
After the spark tasks are done, the job appears to be running for
over an hour, until I get the following (full stack trace below):
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238)
I had set the driver memory to be 20GB.
I attempted to read in the partitioned dataset and got another
error saying the /_metadata directory was not a parquet file. I
removed the _metadata directory and was able to query the data,
but it appeared to not use the partitioned directory when I
attempted to filter the data (it read every directory).
This is Spark 1.5.2 and I saw the same problem when running the
code in both Scala and Python.
Any suggestions are appreciated.
-Don
15/11/25 00:00:19 ERROR datasources.InsertIntoHadoopFsRelation:
Aborting job.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
at
org.apache.parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
at
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:208)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
at com.dondrake.qra.ScalaApp$.main(ScalaApp.scala:53)
at com.dondrake.qra.ScalaApp.main(ScalaApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
15/11/25 00:00:20 ERROR actor.ActorSystemImpl: exception on LARS?
timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/11/25 00:00:20 ERROR akka.ErrorMonitor: exception on LARS?
timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/11/25 00:00:20 INFO actor.ActorSystemImpl: starting new LARS thread
15/11/25 00:00:20 ERROR akka.ErrorMonitor: Uncaught fatal error
from thread [sparkDriver-scheduler-1] shutting down ActorSystem
[sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/11/25 00:00:20 ERROR actor.ActorSystemImpl: Uncaught fatal
error from thread [sparkDriver-scheduler-1] shutting down
ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:409)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/11/25 00:00:20 WARN akka.AkkaRpcEndpointRef: Error sending
message [message = BlockManagerHeartbeat(BlockManagerId(1453,
dd1067.dondrake.com <http://dd1067.dondrake.com>, 42479))] i
n 1 attempts
org.apache.spark.rpc.RpcTimeoutException:
Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1347881120]]
had already been terminated.. This timeout
is controlled by BlockManagerHeartbeat
at org.apache.spark.rpc.RpcTimeout.org
<http://org.apache.spark.rpc.RpcTimeout.org>$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at
scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at
scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at
org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:319)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at
org.apache.spark.scheduler.DAGScheduler.executorHeartbeatReceived(DAGScheduler.scala:194)
at
org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:386)
at
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:128)
at
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
at
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:127)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1347881120]]
had already been terminated.
at
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at
org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:307)
... 13 more
15/11/25 00:00:20 INFO
remote.RemoteActorRefProvider$RemotingTerminator: Shutting down
remote daemon.
15/11/25 00:00:20 INFO
remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon
shut down; proceeding with flushing remote transports.
15/11/25 00:00:20 INFO
remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/11/25 00:00:20 ERROR
datasources.DynamicPartitionWriterContainer: Job
job_201511242138_0000 aborted.
Exception in thread "main" org.apache.spark.SparkException: Job
aborted.
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:156)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
at com.dondrake.qra.ScalaApp$.main(ScalaApp.scala:53)
at com.dondrake.qra.ScalaApp.main(ScalaApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
at
org.apache.parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
at
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
at
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:208)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
at com.dondrake.qra.ScalaApp$.main(ScalaApp.scala:53)
at com.dondrake.qra.ScalaApp.main(ScalaApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
15/11/25 00:00:20 INFO spark.SparkContext: Invoking stop() from
shutdown hook
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/static/sql,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/SQL/execution/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/SQL/execution,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/SQL/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/SQL,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/api,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/static,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/environment,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/pool,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/job,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/json,null}
15/11/25 00:00:20 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs,null}
15/11/25 00:00:20 INFO ui.SparkUI: Stopped Spark web UI at
http://10.195.208.41:4040
15/11/25 00:00:20 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/11/25 00:00:20 INFO cluster.YarnClientSchedulerBackend:
Shutting down all executors
15/11/25 00:00:20 INFO cluster.YarnClientSchedulerBackend:
Interrupting monitor thread
15/11/25 00:00:20 WARN akka.AkkaRpcEndpointRef: Error sending
message [message = StopExecutors] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException:
Recipient[Actor[akka://sparkDriver/user/CoarseGrainedScheduler#1432624242]]
had already been terminated.. This tim
eout is controlled by spark.network.timeout
at org.apache.spark.rpc.RpcTimeout.org
<http://org.apache.spark.rpc.RpcTimeout.org>$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at
scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at
scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at
org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:319)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:274)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:283)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:180)
at
org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:439)
at
org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1439)
at
org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1724)
at
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1723)
at
org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:587)
at
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
at scala.util.Try$.apply(Try.scala:161)
at
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
at
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Caused by: akka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/CoarseGrainedScheduler#1432624242]]
had already been terminated.
at
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at
org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:307)
... 23 more
--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143 <tel:800-733-2143>
--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143