Hi all, I am using Spark 1.0 streaming to ingest a a high volume stream of data (approx. 1mm lines every few seconds) transform it into two outputs and send those outputs to two separate Apache Kafka topics. I have two blocks of output code like this:
Stream1 = …. Stream2 = … Stream1.foreachRDD { // emit to Kafka topic 1 } Stream2.foreachRDD { // emit to Kafka topic 2 } … My problem is that I never see debug statements in the Stream2.foreachRDD code nor is the topic 2 consumer seeing anything. I am wondering if this is because these statements are serialized and Spark never gets out of Stream1 output code before it is ready to process the next Stream1 batch thereby never entering the Stream2 output code. Do I have to parallelize these two output code blocks or am I missing something more fundamental? Thanks, Mahesh ________________________________ This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.