Re: streaming questions

2014-07-02 Thread mcampbell
Tathagata Das wrote > *Answer 1:*Make sure you are using master as "local[n]" with n > 1 > (assuming you are running it in local mode). The way Spark Streaming > works is that it assigns a code to the data receiver, and so if you > run the program with only one core (i.e., with local or local[1]),

Re: streaming questions

2014-03-26 Thread Tathagata Das
foreachRDD is the underlying function that is used behind print and saveAsTextFiles. If you are curious, here is the actual source https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L759 Yes, since the RDD is used twice, its proba

Re: streaming questions

2014-03-26 Thread Diana Carroll
Thanks, Tagatha, very helpful. A follow-up question below... On Wed, Mar 26, 2014 at 2:27 PM, Tathagata Das wrote: > > > *Answer 3:*You can do something like > wordCounts.foreachRDD((rdd: RDD[...], time: Time) => { >if (rdd.take(1).size == 1) { > // There exists at least one element i

RE: streaming questions

2014-03-26 Thread Adrian Mocanu
Hi Diana, I'll answer Q3: You can check if an RDD is empty in several ways. Someone here mentioned that using an iterator was safer: val isEmpty = rdd.mapPartitions(iter => Iterator(! iter.hasNext)).reduce(_&&_) You can also check with a fold or rdd.count rdd.reduce(_ + _) // can't handle em

Re: streaming questions

2014-03-26 Thread Tathagata Das
*Answer 1:*Make sure you are using master as "local[n]" with n > 1 (assuming you are running it in local mode). The way Spark Streaming works is that it assigns a code to the data receiver, and so if you run the program with only one core (i.e., with local or local[1]), then it wont have resources