I tried to insert an "flag" in the RDD, so I could set in the last position
a counter, when the counter gets X, I could do something. But in each slide
comes the original RDD although I modificated it.
I did this code to check if this is possible but it doesn't work.
val rdd1WithFlag = rdd1.map { register =>
var splitRegister = register._2.split("\\|")
var newArray = new Array[String](splitRegister.length + 1)
if (splitRegister.length == 2) {
splitRegister.copyToArray(newArray)
newArray(splitRegister.length) = "0"
} else {
splitRegister(splitRegister.length) = "1"
splitRegister.copyToArray(newArray)
}
(splitRegister(1), newArray)
}
If I check the length of splitRegister is always 2 in each slide, it is
never three.
2015-05-18 15:36 GMT+02:00 Guillermo Ortiz <[email protected]>:
> Hi,
>
> I have two streaming RDD1 and RDD2 and want to cogroup them.
> Data don't come in the same time and sometimes they could come with some
> delay.
> When I get all data I want to insert in MongoDB.
>
> For example, imagine that I get:
> RDD1 --> T 0
> RDD2 -->T 0.5
> I do cogroup between them but I couldn't store in Mongo yet because it
> could come more data in the next windows/slide.
> RDD2' -->T 1.5
> Another RDD2' comes, I only want to save in Mongo once. So, I should only
> save it when I get all data. What I know it's how long I should wait as
> much.
>
> Ideally, I would like to save in MongoDB in the last slide for each RDD
> when I know that there is not possible to get more RDD2 to join with RDD1.
> Is it possible? how?
>
> Maybe there is other way to resolve this problem, any idea?
>
>
>