Priya: Have you checked the executor logs on hostname1 and hostname2 ? Cheers
On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > If you get stuck in job fails, one of best practices is to increase > #partitions. > Also, you'd better off using DataFrame instread of RDD in terms of join > optimization. > > // maropu > > > On Thu, May 26, 2016 at 11:40 PM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> Hello Team, >> >> >> I am trying to perform join 2 rdds where one is of size 800 MB and the >> other is 190 MB. During the join step, my job halts and I don't see >> progress in the execution. >> >> This is the message I see on console - >> >> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output >> locations for shuffle 0 to <hostname1>:40000 >> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output >> locations for shuffle 1 to <hostname2>:40000 >> >> After these messages, I dont see any progress. I am using Spark 1.6.0 >> version and yarn scheduler (running in YARN client mode). My cluster >> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has >> 1 TB hard disk space, 300GB memory and 32 cores. >> >> HDFS block size is 128 MB. >> >> Thanks, >> Padma Ch >> > > > > -- > --- > Takeshi Yamamuro >