If you look at your streaming app UI you should see how many tasks are executed each batch and on how many executors. This is dependent on the batch duration and block interval, which defaults to 200ms. So every block interval a partition will be generated. You can control the parallelism by adjusting the block interval and batch duration. As described in the docs, using the default block interval and a 2 second batch duration you'd get 10 partitions.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#reducing-the-batch-processing-times From: anshu shukla<mailto:anshushuk...@gmail.com> Sent: ?Saturday?, ?June? ?20?, ?2015 ?10?:?27? ?AM To: d...@spark.apache.org<mailto:d...@spark.apache.org>, Tathagata Das<mailto:t...@databricks.com>, user@spark.apache.org<mailto:user@spark.apache.org> Any suggestions please ..!! How to know that In stream Processing over the cluster of 8 machines all the machines/woker nodes are being used (my cluster have 8 slaves ) . I am submitting job from master itself over the ec-2 cluster crated by the ec-2 scripts available with spark. But i am not able figure out that my job is using all workers or not . -- Thanks & Regards, Anshu Shukla SERC-IISC