Re: Losing executors due to memory problems

Koert Kuipers Fri, 12 Aug 2016 02:12:35 -0700

you could have a very large key? perhaps a token value?

i love the rdd api but have found that for joins dataframe/dataset performs
better. maybe can you do the joins in that?


On Thu, Aug 11, 2016 at 7:41 PM, Muttineni, Vinay <vmuttin...@ebay.com>
wrote:

> Hello,
>
> I have a spark job that basically reads data from two tables into two
> Dataframes which are subsequently converted to RDD's. I, then, join them
> based on a common key.
>
> Each table is about 10 TB in size but after filtering, the two RDD’s are
> about 500GB each.
>
> I have 800 executors with 8GB memory per executor.
>
> Everything works fine until the join stage. But, the join stage is
> throwing the below error.
>
> I tried increasing the partitions before the join stage but it doesn’t
> change anything.
>
> Any ideas, how I can fix this and what I might be doing wrong?
>
> Thanks,
>
> Vinay
>
>
>
> ExecutorLostFailure (executor 208 exited caused by one of the running
> tasks) Reason: Container marked as failed: 
> container_1469773002212_96618_01_000246
> on host:. Exit status: 143. Diagnostics: Container [pid=31872,containerID=
> container_1469773002212_96618_01_000246] is running beyond physical
> memory limits. Current usage: 15.2 GB of 15.1 GB physical memory used; 15.9
> GB of 31.8 GB virtual memory used. Killing container.
>
> Dump of the process-tree for container_1469773002212_96618_01_000246 :
>
>          |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>
>          |- 31883 31872 31872 31872 (java) 519517 41888 17040175104
> 3987193 /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p
> -Xms14336m -Xmx14336m -Djava.io.tmpdir=/hadoop/11/scratch/local/
> usercacheappcache/application_1469773002212_96618/container_
> 1469773002212_96618_01_000246/tmp -Dspark.driver.port=32988
> -Dspark.ui.port=0 -Dspark.akka.frameSize=256 -Dspark.yarn.app.container.
> log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
> -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend
> --driver-url spark://CoarseGrainedScheduler@10.12.7.4:32988 --executor-id
> 208 –hostname x.com --cores 11 --app-id application_1469773002212_96618
> --user-class-path file:/hadoop/11/scratch/local/usercache
> /appcache/application_1469773002212_96618/container_
> 1469773002212_96618_01_000246/__app__.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/ appcache/application_
> 1469773002212_96618/container_1469773002212_96618_01_000246/
> mysql-connector-java-5.0.8-bin.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-core-3.2.10.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-api-jdo-3.2.6.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-rdbms-3.2.9.jar
>
>          |- 31872 16580 31872 31872 (bash) 0 0 9146368 267 /bin/bash -c
> LD_LIBRARY_PATH=/apache/hadoop/lib/native:/apache/
> hadoop/lib/native/Linux-amd64-64: /usr/java/latest/bin/java -server
> -XX:OnOutOfMemoryError='kill %p' -Xms14336m -Xmx14336m
> -Djava.io.tmpdir=/hadoop/11/scratch/local/usercache/ appcache/application_
> 1469773002212_96618/container_1469773002212_96618_01_000246/tmp
> '-Dspark.driver.port=32988' '-Dspark.ui.port=0'
> '-Dspark.akka.frameSize=256' -Dspark.yarn.app.container.
> log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246
> -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend
> --driver-url spark://CoarseGrainedScheduler@1.4.1.6:32988 --executor-id
> 208 --hostname x.com --cores 11 --app-id application_1469773002212_96618
> --user-class-path file:/hadoop/11/scratch/local/usercache/
> appcache/application_1469773002212_96618/container_
> 1469773002212_96618_01_000246/__app__.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/mysql-connector-java-5.0.8-bin.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-core-3.2.10.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-api-jdo-3.2.6.jar --user-class-path
> file:/hadoop/11/scratch/local/usercache/appcache/
> application_1469773002212_96618/container_1469773002212_
> 96618_01_000246/datanucleus-rdbms-3.2.9.jar 1> /hadoop/12/scratch/logs/
> application_1469773002212_96618/container_1469773002212_96618_01_000246/stdout
> 2> /hadoop/12/scratch/logs/application_1469773002212_
> 96618/container_1469773002212_96618_01_000246/stderr
>
>
>
> Container killed on request. Exit code is 143
>
> Container exited with a non-zero exit code 143
>
>
>
>
>

Re: Losing executors due to memory problems

Reply via email to