ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped
a lot !
Piping the 7 million records to local disk still runs out of memory.So piped
the results into another Hive table. I can live with that :-)
/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e "use aers; create table
unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter
from aers.aers_demo_view " --driver-memory 4G --total-executor-cores 12
--executor-memory 4G
thanks
From: Sanjay Subramanian <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Thursday, June 11, 2015 8:43 AM
Subject: spark-sql from CLI --->EXCEPTION: java.lang.OutOfMemoryError: Java
heap space
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node
| spark | 1.2.0+cdh5.3.3+371 |
I am testing some stuff on one view and getting memory errorsPossibly reason is
default memory per executor showing on 18080 is 512M
These options when used to start the spark-sql CLI does not seem to have any
effect --total-executor-cores 12 --executor-memory 4G
/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e "select distinct
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view"
aers.aers_demo_view (7 million+ records)===================isr bigint case
idevent_dt bigint Event dateage double age of patientage_cod
string days,months yearssex string M or Fyear intquarter int
VIEW DEFINITION================CREATE VIEW `aers.aers_demo_view` AS SELECT
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM
(SELECT `aers_demo_v1`.`isr`, `aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`, `aers_demo_v1`.`age_cod`, `aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`, `aers_demo_v1`.`quarter`FROM
`aers`.`aers_demo_v1`UNION ALLSELECT `aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`, `aers_demo_v2`.`age`, `aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`, `aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`FROM `aers`.`aers_demo_v2`UNION ALLSELECT
`aers_demo_v3`.`isr`, `aers_demo_v3`.`event_dt`, `aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`, `aers_demo_v3`.`gndr_cod`, `aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`FROM `aers`.`aers_demo_v3`UNION ALLSELECT
`aers_demo_v4`.`isr`, `aers_demo_v4`.`event_dt`, `aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`, `aers_demo_v4`.`gndr_cod`, `aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`FROM `aers`.`aers_demo_v4`UNION ALLSELECT
`aers_demo_v5`.`primaryid` AS `ISR`, `aers_demo_v5`.`event_dt`,
`aers_demo_v5`.`age`, `aers_demo_v5`.`age_cod`, `aers_demo_v5`.`gndr_cod`,
`aers_demo_v5`.`year`, `aers_demo_v5`.`quarter`FROM
`aers`.`aers_demo_v5`UNION ALLSELECT `aers_demo_v6`.`primaryid` AS `ISR`,
`aers_demo_v6`.`event_dt`, `aers_demo_v6`.`age`, `aers_demo_v6`.`age_cod`,
`aers_demo_v6`.`sex` AS `GNDR_COD`, `aers_demo_v6`.`year`,
`aers_demo_v6`.`quarter`FROM `aers`.`aers_demo_v6`) `aers_demo_view`
15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a
user handler while handling an exception event ([id: 0x01b99855,
/10.0.0.19:58117 => /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError:
Java heap space)java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
at
org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:40 ERROR Utils:
Uncaught exception in thread task-result-getter-0java.lang.OutOfMemoryError: GC
overhead limit exceeded at java.lang.Long.valueOf(Long.java:577)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
at
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
at
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:38 ERROR
ActorSystemImpl: exception on LARS’ timer threadjava.lang.OutOfMemoryError: GC
overhead limit exceeded at
akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19) at
akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)Exception in thread
"task-result-getter-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Long.valueOf(Long.java:577) at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
at
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651) at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:171)
at
org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)
at
org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:558)
at
org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:352)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:80)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:49)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:48)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:41 ERROR
ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-scheduler-1]
shutting down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: GC overhead
limit exceeded at
akka.dispatch.AbstractNodeQueue.<init>(AbstractNodeQueue.java:19) at
akka.actor.LightArrayRevolverScheduler$TaskQueue.<init>(Scheduler.scala:431)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:397)
at
akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:46 ERROR
ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem
[sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11
08:36:46 ERROR SparkSQLDriver: Failed in [select distinct
isr,event_dt,age,age_cod,sex,year,quarter from
aers.aers_demo_view]org.apache.spark.SparkException: Job cancelled because
SparkContext was shut down at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:702)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:701)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1428)
at
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
at
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
at akka.actor.ActorCell.terminate(ActorCell.scala:338) at
akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) at
akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at
akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at
akka.dispatch.Mailbox.run(Mailbox.scala:218) at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)15/06/11
08:36:51 WARN DefaultChannelPipeline: An exception was thrown by a user
handler while handling an exception event ([id: 0x79935a9b, /10.0.0.35:54028 =>
/10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap
space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:52 ERROR
ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.actor.default-dispatcher-5] shutting down ActorSystem
[sparkDriver]java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:53 WARN
DefaultChannelPipeline: An exception was thrown by a user handler while
handling an exception event ([id: 0xcb8c4b5d, /10.0.0.18:46744 =>
/10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: Java heap
space)java.lang.OutOfMemoryError: Java heap space15/06/11 08:36:56 WARN
NioEventLoop: Unexpected exception in the selector
loop.java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11 08:36:57
ERROR ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.actor.default-dispatcher-18] shutting down ActorSystem
[sparkDriver]java.lang.OutOfMemoryError: GC overhead limit exceeded15/06/11
08:36:58 ERROR Utils: Uncaught exception in thread
task-result-getter-3java.lang.OutOfMemoryError: GC overhead limit
exceededException in thread "task-result-getter-3" java.lang.OutOfMemoryError:
GC overhead limit exceeded15/06/11 08:37:01 ERROR ActorSystemImpl: Uncaught
fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting
down ActorSystem [sparkDriver]java.lang.OutOfMemoryError: Java heap spaceTime
taken: 70.982 seconds15/06/11 08:37:06 WARN QueuedThreadPool: 4 threads could
not be stopped15/06/11 08:37:11 ERROR MapOutputTrackerMaster: Error
communicating with MapOutputTrackerakka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had
already been terminated. at
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83) at
org.apache.spark.SparkContext.stop(SparkContext.scala:1210) at
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Exception
in thread "Thread-3" org.apache.spark.SparkException: Error communicating with
MapOutputTracker at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:116)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:122)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:330)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:83) at
org.apache.spark.SparkContext.stop(SparkContext.scala:1210) at
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107)Caused
by: akka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/MapOutputTracker#-2109395547]] had
already been terminated. at
akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:134) at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:111)