Need clarification of joining streams

2014-04-21 Thread gaganbm
I wanted some clarification on the behavior of join streams. As I believe, the join works per batch. I am reading data from two Kafka streams and then joining them based on some keys. But what happens if one stream hasn't produced any data in that batch duration, and the other has some ? Or lets s

Re: Strange behaviour of different SSCs with same Kafka topic

2014-04-21 Thread gaganbm
> > On Thu, Apr 17, 2014 at 10:58 PM, gaganbm <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4556&i=0> > > wrote: > >> It happens with normal data rate, i.e., lets say 20 records per second. >> >> Apart from that, I am also getting some m

Re: Strange behaviour of different SSCs with same Kafka topic

2014-04-17 Thread gaganbm
ume rate? > > TD > > > On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4238&i=0> > > wrote: > >> I am really at my wits' end here. >> >> I have different Streaming contexts, lets s

Strange behaviour of different SSCs with same Kafka topic

2014-04-09 Thread gaganbm
I am really at my wits' end here. I have different Streaming contexts, lets say 2, and both listening to same Kafka topics. I establish the KafkaStream by setting different consumer groups to each of them. Ideally, I should be seeing the kafka events in both the streams. But what I am getting is

KafkaReciever Error when starting ssc (Actor name not unique)

2014-04-09 Thread gaganbm
Hi All, I am getting this exception when doing ssc.start to start the streaming context. ERROR KafkaReceiver - Error receiving data akka.actor.InvalidActorNameException: actor name [NetworkReceiver-0] is not unique! at akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(

RDD Collect returns empty arrays

2014-03-25 Thread gaganbm
I am getting strange behavior with the RDDs. All I want is to persist the RDD contents in a single file. The saveAsTextFile() saves them in multiple textfiles for each partition. So I tried with rdd.coalesce(1,true).saveAsTextFile(). This fails with the exception : org.apache.spark.SparkExcepti

Re: rdd.saveAsTextFile problem

2014-03-25 Thread gaganbm
Hi Folks, Is this issue resolved ? If yes, could you please throw some light on how to fix this ? I am facing the same problem during writing to text files. When I do stream.foreachRDD(rdd =>{ rdd.saveAsTextFile(<"Some path">) }) This wo

Persist streams to text files

2014-03-21 Thread gaganbm
Hi, I am trying to persist the DStreams to text files. When I use the inbuilt API 'saveAsTextFiles' as : stream.saveAsTextFiles(resultDirectory) this creates a number of subdirectories, for each batch, and within each sub directory, it creates bunch of text files for each RDD (I assume). I a