You are quite right. I am getting this error while profiling my module to see what is the minimum resources I can use to achieve my SLA. My point is that if resource constraint creates this problem, then this issue is just waiting to happen in a larger scenario(Though the probability of happening will be less)
I hope to get some guidance as to what parameter I can use in order to totally avoid this issue. I am guessing spark.shuffle.io.preferDirectBufs = false but I am not sure. ..Manas On Tue, Mar 15, 2016 at 2:30 PM, Iain Cundy <iain.cu...@amdocs.com> wrote: > Hi Manas > > > > I saw a very similar problem while using mapWithState. Timeout on > BlockManager remove leading to a stall. > > > > In my case it only occurred when there was a big backlog of micro-batches, > combined with a shortage of memory. The adding and removing of blocks > between new and old tasks was interleaved. Don’t really know what caused > it. Once I fixed the problems that were causing the backlog – in my case > state compaction not working with Kryo in 1.6.0 (with Kryo workaround > rather than patch) – I’ve never seen it again. > > > > So if you’ve got a backlog or other issue to fix maybe you’ll get lucky > too J. > > > > Cheers > > Iain > > > > *From:* manas kar [mailto:poorinsp...@gmail.com] > *Sent:* 15 March 2016 14:49 > *To:* Ted Yu > *Cc:* user > *Subject:* [MARKETING] Re: mapwithstate Hangs with Error cleaning > broadcast > > > > I am using spark 1.6. > > I am not using any broadcast variable. > > This broadcast variable is probably used by the state management of > mapwithState > > > > ...Manas > > > > On Tue, Mar 15, 2016 at 10:40 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > Which version of Spark are you using ? > > > > Can you show the code snippet w.r.t. broadcast variable ? > > > > Thanks > > > > On Tue, Mar 15, 2016 at 6:04 AM, manasdebashiskar <poorinsp...@gmail.com> > wrote: > > Hi, > I have a streaming application that takes data from a kafka topic and uses > mapwithstate. > After couple of hours of smooth running of the application I see a problem > that seems to have stalled my application. > The batch seems to have been stuck after the following error popped up. > Has anyone seen this error or know what causes it? > 14/03/2016 21:41:13,295 ERROR org.apache.spark.ContextCleaner: 95 - Error > cleaning broadcast 7456 > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout > at > org.apache.spark.rpc.RpcTimeout.org > $apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) > at > > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) > at > > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) > at > > org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136) > at > > org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228) > at > > org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) > at > > org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67) > at > > org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:233) > at > > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:189) > at > > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:180) > at scala.Option.foreach(Option.scala:236) > at > > org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:180) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180) > at > org.apache.spark.ContextCleaner.org > $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:173) > at > org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:68) > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds] > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at > scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > ... 12 more > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mapwithstate-Hangs-with-Error-cleaning-broadcast-tp26500.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > > This message and the information contained herein is proprietary and > confidential and subject to the Amdocs policy statement, you may review at > http://www.amdocs.com/email_disclaimer.asp >