Re: Heartbeat exceeds

2014-04-05 Thread Andrew Or
rix so >>> doing similar to ALS - lots of work on the worker nodes and keep timing >>> out. >>> >>> Tried a few updates to akka frame sizes, timeouts and blockmanager but >>> unable to complete. Will try the blockmanagerslaves property now and let >>> you >>> know the effect. That property doesn't appear to be documented on the >>> site >>> though. >>> >>> Cheers! >>> >>> Richard >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Heartbeat-exceeds-tp3798p3809.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> >

Re: Heartbeat exceeds

2014-04-05 Thread Debasish Das
and keep timing >> out. >> >> Tried a few updates to akka frame sizes, timeouts and blockmanager but >> unable to complete. Will try the blockmanagerslaves property now and let >> you >> know the effect. That property doesn't appear to be documented on t

Re: Heartbeat exceeds

2014-04-05 Thread Debasish Das
now the effect. That property doesn't appear to be documented on the site > though. > > Cheers! > > Richard > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Heartbeat-exceeds-tp3798p3809.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: Heartbeat exceeds

2014-04-05 Thread azurecoder
w and let you know the effect. That property doesn't appear to be documented on the site though. Cheers! Richard -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Heartbeat-exceeds-tp3798p3809.html Sent from the Apache Spark User List mailing list

Re: Heartbeat exceeds

2014-04-05 Thread Debasish Das
@patrick I think there is a bug...when this timeout happens then suddenly I see some negative ms numbers in spark uiI tried to send a pic showing the negative ms numbers but it was rejected by mailing list...I will send it your gmail... >From the archive I saw some more suggestions: >> It se

Re: Heartbeat exceeds

2014-04-04 Thread Patrick Wendell
If you look in the Spark UI, do you see any garbage collection happening? My best guess is that some of the executors are going into GC and they are timing out. You can manually increase the timeout by setting the Spark conf: spark.storage.blockManagerSlaveTimeoutMs to a higher value. In your cas

Heartbeat exceeds

2014-04-04 Thread Debasish Das
Hi, In my ALS runs I am noticing messages that complain about heart beats: 14/04/04 20:43:09 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(17, machine1, 53419, 0) with no recent heart beats: 48476ms exceeds 45000ms 14/04/04 20:43:09 WARN BlockManagerMasterActor: Removing Bloc