Hey Mike,
I quickly looked through the example and I found major performance issue.
You are collecting the RDDs to the driver and then sending them to Mongo in
a foreach. Why not doing a distributed push to Mongo?
WHAT YOU HAVE
val mongoConnection = ...
WHAT YOU SHUOLD DO
rdd.foreachPartition {
Hi All,
I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the
Reactive Mongo library to write directly to the mongoDB? My stack is Apache
Spark sitting on top of Cassandra for the datastore, so my thinking is
himpler/blog-spark-streaming-log-aggregation/blob/master/src/main/scala/com/chimpler/sparkstreaminglogaggregation/LogAggregator.scala>
Thanks, Mike.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Integrating-Spark-Streaming-with-Reactive-Mongo-tp21828.html