Sending multiple DStream outputs

contractor Thu, 18 Sep 2014 08:07:54 -0700

Hi all,

I am using Spark 1.0 streaming to ingest a a high volume stream of data 
(approx. 1mm lines every few seconds) transform it into two outputs and send 
those outputs to two separate Apache Kafka topics. I have two blocks of output 
code like this:


Stream1 =
….

Stream2 =

…

Stream1.foreachRDD {
   // emit to Kafka topic 1
}
Stream2.foreachRDD {
 // emit to Kafka topic 2
}

…

My problem is that I never see debug statements in the Stream2.foreachRDD code 
nor is the topic 2 consumer seeing anything. I am wondering if this is because 
these statements are serialized and Spark never gets out of Stream1 output code 
before it is ready to process the next Stream1 batch thereby never entering the 
Stream2 output code.

Do I have to parallelize these two output code blocks or am I missing something 
more fundamental?

Thanks,
Mahesh

________________________________
This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.

Sending multiple DStream outputs

Reply via email to