I tried to insert an "flag" in the RDD, so I could set in the last position a counter, when the counter gets X, I could do something. But in each slide comes the original RDD although I modificated it.
I did this code to check if this is possible but it doesn't work. val rdd1WithFlag = rdd1.map { register => var splitRegister = register._2.split("\\|") var newArray = new Array[String](splitRegister.length + 1) if (splitRegister.length == 2) { splitRegister.copyToArray(newArray) newArray(splitRegister.length) = "0" } else { splitRegister(splitRegister.length) = "1" splitRegister.copyToArray(newArray) } (splitRegister(1), newArray) } If I check the length of splitRegister is always 2 in each slide, it is never three. 2015-05-18 15:36 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>: > Hi, > > I have two streaming RDD1 and RDD2 and want to cogroup them. > Data don't come in the same time and sometimes they could come with some > delay. > When I get all data I want to insert in MongoDB. > > For example, imagine that I get: > RDD1 --> T 0 > RDD2 -->T 0.5 > I do cogroup between them but I couldn't store in Mongo yet because it > could come more data in the next windows/slide. > RDD2' -->T 1.5 > Another RDD2' comes, I only want to save in Mongo once. So, I should only > save it when I get all data. What I know it's how long I should wait as > much. > > Ideally, I would like to save in MongoDB in the last slide for each RDD > when I know that there is not possible to get more RDD2 to join with RDD1. > Is it possible? how? > > Maybe there is other way to resolve this problem, any idea? > > >