If you look at your streaming app UI you should see how many tasks are executed 
each batch and on how many executors. This is dependent on the batch duration 
and block interval, which defaults to 200ms. So every block interval a 
partition will be generated. You can control the parallelism by adjusting the 
block interval and batch duration. As described in the docs, using the default 
block interval and a 2 second batch duration you'd get 10 partitions.

http://spark.apache.org/docs/latest/streaming-programming-guide.html#reducing-the-batch-processing-times

From: anshu shukla<mailto:anshushuk...@gmail.com>
Sent: ?Saturday?, ?June? ?20?, ?2015 ?10?:?27? ?AM
To: d...@spark.apache.org<mailto:d...@spark.apache.org>, Tathagata 
Das<mailto:t...@databricks.com>, 
user@spark.apache.org<mailto:user@spark.apache.org>

Any suggestions please ..!!
How to know that  In stream  Processing  over the  cluster  of 8 machines  all 
the machines/woker nodes are being used  (my cluster have  8 slaves )  .
I am submitting job from master itself over the ec-2 cluster crated by the ec-2 
scripts available with spark. But i am not able  figure out that  my job is  
using all workers or not .





--
Thanks & Regards,
Anshu Shukla
SERC-IISC

Reply via email to