you could have a very large key? perhaps a token value? i love the rdd api but have found that for joins dataframe/dataset performs better. maybe can you do the joins in that?
On Thu, Aug 11, 2016 at 7:41 PM, Muttineni, Vinay <vmuttin...@ebay.com> wrote: > Hello, > > I have a spark job that basically reads data from two tables into two > Dataframes which are subsequently converted to RDD's. I, then, join them > based on a common key. > > Each table is about 10 TB in size but after filtering, the two RDD’s are > about 500GB each. > > I have 800 executors with 8GB memory per executor. > > Everything works fine until the join stage. But, the join stage is > throwing the below error. > > I tried increasing the partitions before the join stage but it doesn’t > change anything. > > Any ideas, how I can fix this and what I might be doing wrong? > > Thanks, > > Vinay > > > > ExecutorLostFailure (executor 208 exited caused by one of the running > tasks) Reason: Container marked as failed: > container_1469773002212_96618_01_000246 > on host:. Exit status: 143. Diagnostics: Container [pid=31872,containerID= > container_1469773002212_96618_01_000246] is running beyond physical > memory limits. Current usage: 15.2 GB of 15.1 GB physical memory used; 15.9 > GB of 31.8 GB virtual memory used. Killing container. > > Dump of the process-tree for container_1469773002212_96618_01_000246 : > > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > > |- 31883 31872 31872 31872 (java) 519517 41888 17040175104 > 3987193 /usr/java/latest/bin/java -server -XX:OnOutOfMemoryError=kill %p > -Xms14336m -Xmx14336m -Djava.io.tmpdir=/hadoop/11/scratch/local/ > usercacheappcache/application_1469773002212_96618/container_ > 1469773002212_96618_01_000246/tmp -Dspark.driver.port=32988 > -Dspark.ui.port=0 -Dspark.akka.frameSize=256 -Dspark.yarn.app.container. > log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246 > -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend > --driver-url spark://CoarseGrainedScheduler@10.12.7.4:32988 --executor-id > 208 –hostname x.com --cores 11 --app-id application_1469773002212_96618 > --user-class-path file:/hadoop/11/scratch/local/usercache > /appcache/application_1469773002212_96618/container_ > 1469773002212_96618_01_000246/__app__.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/ appcache/application_ > 1469773002212_96618/container_1469773002212_96618_01_000246/ > mysql-connector-java-5.0.8-bin.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-core-3.2.10.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-api-jdo-3.2.6.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-rdbms-3.2.9.jar > > |- 31872 16580 31872 31872 (bash) 0 0 9146368 267 /bin/bash -c > LD_LIBRARY_PATH=/apache/hadoop/lib/native:/apache/ > hadoop/lib/native/Linux-amd64-64: /usr/java/latest/bin/java -server > -XX:OnOutOfMemoryError='kill %p' -Xms14336m -Xmx14336m > -Djava.io.tmpdir=/hadoop/11/scratch/local/usercache/ appcache/application_ > 1469773002212_96618/container_1469773002212_96618_01_000246/tmp > '-Dspark.driver.port=32988' '-Dspark.ui.port=0' > '-Dspark.akka.frameSize=256' -Dspark.yarn.app.container. > log.dir=/hadoop/12/scratch/logs/application_1469773002212_96618/container_1469773002212_96618_01_000246 > -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend > --driver-url spark://CoarseGrainedScheduler@1.4.1.6:32988 --executor-id > 208 --hostname x.com --cores 11 --app-id application_1469773002212_96618 > --user-class-path file:/hadoop/11/scratch/local/usercache/ > appcache/application_1469773002212_96618/container_ > 1469773002212_96618_01_000246/__app__.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/mysql-connector-java-5.0.8-bin.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-core-3.2.10.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-api-jdo-3.2.6.jar --user-class-path > file:/hadoop/11/scratch/local/usercache/appcache/ > application_1469773002212_96618/container_1469773002212_ > 96618_01_000246/datanucleus-rdbms-3.2.9.jar 1> /hadoop/12/scratch/logs/ > application_1469773002212_96618/container_1469773002212_96618_01_000246/stdout > 2> /hadoop/12/scratch/logs/application_1469773002212_ > 96618/container_1469773002212_96618_01_000246/stderr > > > > Container killed on request. Exit code is 143 > > Container exited with a non-zero exit code 143 > > > > >