I tried to insert an "flag" in the RDD, so I could set in the last position
a counter, when the counter gets X, I could do something. But in each slide
comes the original RDD although I modificated it.

I did this code to check if this is possible but it doesn't work.

val rdd1WithFlag = rdd1.map { register =>
      var splitRegister = register._2.split("\\|")
      var newArray = new Array[String](splitRegister.length + 1)

      if (splitRegister.length == 2) {
        splitRegister.copyToArray(newArray)
        newArray(splitRegister.length) = "0"
      } else {
        splitRegister(splitRegister.length) = "1"
        splitRegister.copyToArray(newArray)
      }

      (splitRegister(1), newArray)
    }

If I check the length of splitRegister  is always 2 in each slide, it is
never three.



2015-05-18 15:36 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>:

> Hi,
>
> I have two streaming RDD1 and RDD2 and want to cogroup them.
> Data don't come in the same time and sometimes they could come with some
> delay.
> When I get all data I want to insert in MongoDB.
>
> For example, imagine that I get:
> RDD1 --> T 0
> RDD2 -->T 0.5
> I do cogroup between them but I couldn't store in Mongo yet because it
> could come more data in the next windows/slide.
> RDD2' -->T 1.5
> Another RDD2' comes, I only want to save in Mongo once. So, I should only
> save it when I get all data. What I know it's how long I should wait as
> much.
>
> Ideally, I would like to save in MongoDB in the last slide for each RDD
> when I know that there is not possible to get more RDD2 to join with RDD1.
> Is it possible? how?
>
> Maybe there is other way to resolve this problem, any idea?
>
>
>

Reply via email to