Hi all, Thanks for the answers, yes, my problem was I was using just one worker with one core, so it was starving and then I never get the job to run, now it seems it's working properly.
One question, is this information in the docs? (because maybe I misread it) On Wed, Jul 1, 2015 at 10:30 AM, <[email protected]> wrote: > Spark streaming needs at least two threads on the worker/slave side. I > have seen this issue when(to test the behavior), I set the thread count for > spark streaming to 1. It should be atleast 2: one for the receiver > adapter(kafka, flume etc) and the second for processing the data. > > > > But I tested that in local mode: “--master local[2] “. The same issue > could happen in worker also. If you set “--master local[1] “ the streaming > worker/slave blocks due to starvation. > > > > Which conf parameter sets the worker thread count in cluster mode ? is it > spark.akka.threads ? > > > > *From:* Tathagata Das [mailto:[email protected]] > *Sent:* 01 July 2015 01:32 > *To:* Borja Garrido Bear > *Cc:* user > *Subject:* Re: Spark streaming on standalone cluster > > > > How many receivers do you have in the streaming program? You have to have > more numbers of core in reserver by your spar application than the number > of receivers. That would explain the receiving output after stopping. > > > > TD > > > > On Tue, Jun 30, 2015 at 7:59 AM, Borja Garrido Bear <[email protected]> > wrote: > > Hi all, > > > > I'm running a spark standalone cluster with one master and one slave > (different machines and both in version 1.4.0), the thing is I have a spark > streaming job that gets data from Kafka, and the just prints it. > > > > To configure the cluster I just started the master and then the slaves > pointing to it, as everything appears in the web interface I assumed > everything was fine, but maybe I missed some configuration. > > > > When I run it locally there is no problem, it works. > > When I run it in the cluster the worker state appears as "loading" > > - If the job is a Scala one, when I stop it I receive all the output > > - If the job is Python, when I stop it I receive a bunch of these > exceptions > > > > > \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ > > > > ERROR JobScheduler: Error running job streaming job 1435675420000 ms.0 > > py4j.Py4JException: An exception was raised by the Python Proxy. Return > Message: null > > at py4j.Protocol.getReturnValue(Protocol.java:417) > > at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113) > > at com.sun.proxy.$Proxy14.call(Unknown Source) > > at > org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63) > > at > org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156) > > at > org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > > at scala.util.Try$.apply(Try.scala:161) > > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193) > > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > > \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ > > > > Is there any known issue with spark streaming and the standalone mode? or > with Python? > > > The information contained in this electronic message and any attachments > to this message are intended for the exclusive use of the addressee(s) and > may contain proprietary, confidential or privileged information. If you are > not the intended recipient, you should not disseminate, distribute or copy > this e-mail. Please notify the sender immediately and destroy all copies of > this message and any attachments. WARNING: Computer viruses can be > transmitted via email. The recipient should check this email and any > attachments for the presence of viruses. The company accepts no liability > for any damage caused by any virus transmitted by this email. > www.wipro.com >
