Thanks alot Juan, That was a great post, One more thing if u can .Any there any demo/blog telling how to configure or create a topology of different types .. i mean how we can decide the pipelining model in spark as done in storm for https://storm.apache.org/documentation/images/topology.png .
On Wed, May 6, 2015 at 2:47 PM, Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com> wrote: > Hi, > > You can use the method repartition from DStream (for the Scala API) or > JavaDStream (for the Java API) > > defrepartition(numPartitions: Int): DStream > <https://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html> > [T] > > Return a new DStream with an increased or decreased level of parallelism. > Each RDD in the returned DStream has exactly numPartitions partitions. > > I think the post > http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/ > on integration of Spark Streaming gives very interesting review on the > subject, although the integration with Kafka it's not up to date with > https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html > > Hope that helps. > > Greetings, > > Juan > > 2015-05-06 10:32 GMT+02:00 anshu shukla <anshushuk...@gmail.com>: > >> But main problem is how to increase the level of parallelism for any >> particular bolt logic . >> >> suppose i want this type of topology . >> >> https://storm.apache.org/documentation/images/topology.png >> >> How we can manage it . >> >> On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Every transformation on a dstream will create another dstream. You may >>> want to take a look at foreachrdd? Also, kindly share your code so people >>> can help better >>> On 6 May 2015 17:54, "anshu shukla" <anshushuk...@gmail.com> wrote: >>> >>>> Please help guys, Even After going through all the examples given i >>>> have not understood how to pass the D-streams from one bolt/logic to >>>> other (without writing it on HDFS etc.) just like emit function in storm . >>>> Suppose i have topology with 3 bolts(say) >>>> >>>> *BOLT1(parse the tweets nd emit tweet using given >>>> hashtags)=====>Bolt2(Complex logic for sentiment analysis over >>>> tweets)=======>BOLT3(submit tweets to the sql database using spark SQL)* >>>> >>>> >>>> Now since Sentiment analysis will take most of the time ,we have to >>>> increase its level of parallelism for tuning latency. Howe to increase the >>>> levele of parallelism since the logic of topology is not clear . >>>> >>>> -- >>>> Thanks & Regards, >>>> Anshu Shukla >>>> Indian Institute of Sciences >>>> >>> >> >> >> -- >> Thanks & Regards, >> Anshu Shukla >> > > -- Thanks & Regards, Anshu Shukla