Yes...union would be one solution. I am not doing any aggregation hence
reduceByKey would not be useful. If I use groupByKey, messages with same
key would be obtained in a partition. But groupByKey is very expensive
operation as it involves shuffle operation. My ultimate goal is to write
the messag
Hi All,
Can someone throw insights on this ?
On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch
wrote:
>
>
> Hi TD,
>
> Thanks for the info. I have the scenario like this.
>
> I am reading the data from kafka topic. Let's say kafka has 3 partitions
> for the topic. In my streaming application, I woul
Hi TD,
Thanks for the info. I have the scenario like this.
I am reading the data from kafka topic. Let's say kafka has 3 partitions
for the topic. In my streaming application, I would configure 3 receivers
with 1 thread each such that they would receive 3 dstreams (from 3
partitions of kafka to
You have to partition that data on the Spark Streaming by the primary key,
and then make sure insert data into Cassandra atomically per key, or per
set of keys in the partition. You can use the combination of the (batch
time, and partition Id) of the RDD inside foreachRDD as the unique id for
the d
Hi All,
I have a problem when writing streaming data to cassandra. Or existing
product is on Oracle DB in which while wrtiting data, locks are maintained
such that duplicates in the DB are avoided.
But as spark has parallel processing architecture, if more than 1 thread is
trying to write same d