Yes, and Kafka topics are basically queues. So perhaps what's needed is just 
KafkaRDD with starting offset being 0 and finish offset being a very large 
number...

Sent from my iPhone

> On Apr 29, 2015, at 1:52 AM, ayan guha <guha.a...@gmail.com> wrote:
> 
> I guess what you mean is not streaming.  If you create a stream context at 
> time t, you will receive data coming through starting time t++, not before 
> time t.
> 
> Looks like you want a queue. Let Kafka write to a queue, consume msgs from 
> the queue and stop when queue is empty.
> 
>> On 29 Apr 2015 14:35, "dgoldenberg" <dgoldenberg...@gmail.com> wrote:
>> Hi,
>> 
>> I'm wondering about the use-case where you're not doing continuous,
>> incremental streaming of data out of Kafka but rather want to publish data
>> once with your Producer(s) and consume it once, in your Consumer, then
>> terminate the consumer Spark job.
>> 
>> JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
>> Durations.milliseconds(...));
>> 
>> The batchDuration parameter is "The time interval at which streaming data
>> will be divided into batches". Can this be worked somehow to cause Spark
>> Streaming to just get all the available data, then let all the RDD's within
>> the Kafka discretized stream get processed, and then just be done and
>> terminate, rather than wait another period and try and process any more data
>> from Kafka?
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-stream-all-data-out-of-a-Kafka-topic-once-then-terminate-job-tp22698.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to