RE: Lost TID: Loss was due to fetch failure from BlockManagerId

2014-07-03 Thread Mohammed Guller
...@gmail.com] Sent: Tuesday, July 1, 2014 10:45 PM To: user@spark.apache.org Subject: Re: Lost TID: Loss was due to fetch failure from BlockManagerId It could be cause you are out of memory on the worker nodes & blocks are not getting registered.. A older issue with 0.6.0 was with dead nodes causing

Re: Lost TID: Loss was due to fetch failure from BlockManagerId

2014-07-01 Thread Mayur Rustagi
It could be cause you are out of memory on the worker nodes & blocks are not getting registered.. A older issue with 0.6.0 was with dead nodes causing loss of task & then resubmission of data in an infinite loop... It was fixed in 0.7.0 though. Are you seeing a crash log in this log.. or in the wor

Re: Lost TID: Loss was due to fetch failure from BlockManagerId

2014-07-01 Thread Yana Kadiyska
A lot of things can get funny when you run distributed as opposed to local -- e.g. some jar not making it over. Do you see anything of interest in the log on the executor machines -- I'm guessing 192.168.222.152/192.168.222.164. From here https://github.com/apache/spark/blob/master/core/src/main/sc

Lost TID: Loss was due to fetch failure from BlockManagerId

2014-07-01 Thread Mohammed Guller
I am running Spark 1.0 on a 4-node standalone spark cluster (1 master + 3 worker). Our app is fetching data from Cassandra and doing a basic filter, map, and countByKey on that data. I have run into a strange problem. Even if the number of rows in Cassandra is just 1M, the Spark job goes seems t