[Streaming] join events in last 10 minutes
We have a scenario that events from three kafka topics sharing the same keys need to be merged. One topic has the master events; most events in other two topics arrive within 10 minutes of master event arrival. Wrote pseudo code below. I'd love to hear your thoughts whether I am on the right track.
Why SparkR didn't reuse PythonRDD
On behalf of Renyi Xiong - When reading Spark codebase, looks to me PythonRDD.scala is reusable, I wonder why SparkR choose to implement its own RRDD.scala? thanks Daniel