Hi all,
I am using Spark 1.0 streaming to ingest a a high volume stream of data
(approx. 1mm lines every few seconds) transform it into two outputs and send
those outputs to two separate Apache Kafka topics. I have two blocks of output
code like this:
Stream1 =
….
Stream2 =
…
Stream1.foreachRDD {
// emit to Kafka topic 1
}
Stream2.foreachRDD {
// emit to Kafka topic 2
}
…
My problem is that I never see debug statements in the Stream2.foreachRDD code
nor is the topic 2 consumer seeing anything. I am wondering if this is because
these statements are serialized and Spark never gets out of Stream1 output code
before it is ready to process the next Stream1 batch thereby never entering the
Stream2 output code.
Do I have to parallelize these two output code blocks or am I missing something
more fundamental?
Thanks,
Mahesh
________________________________
This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or subject to
copyright belonging to Time Warner Cable. This E-mail is intended solely for
the use of the individual or entity to which it is addressed. If you are not
the intended recipient of this E-mail, you are hereby notified that any
dissemination, distribution, copying, or action taken in relation to the
contents of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify the sender
immediately and permanently delete the original and any copy of this E-mail and
any printout.