Hi All, I'm new to Flink but am trying to write an application that processes data from internet connected sensors.
My problem is as follows: -Data arrives in the format: [sensor id] [timestamp in seconds] [sensor value] -Data can arrive out of order (between sensor IDs) by upto 5 minutes. -So a stream of data could be: [1] [100] [20] [2] [101] [23] [1] [105] [31] [1] [140] [17] -Each sensor can sometimes split its measurements, and I'm hoping to 'put them back together' within Flink. For data to be 'put back together' it must have a timestamp within 90 seconds of the timestamp on the first piece of data. The data must also be put back together in order, in the example above for sensor 1 you could only have combinations of (A) the first reading on its own (time 100), (B) the first and third item (time 100 and 105) or (C) the first, third and fourth item (time 100, 105, 140). The second item is a different sensor so not considered in this exercise. -I would like to write something that tries different 'sum' combinations within the 90 second limit and outputs the best 'match' to expected values. In the example above lets say the expected sum values are 50 or 100. Of the three combinations I mentioned for sensor 1, the sum would be 20, 51, or 68. Therefore the 'best' combination is 51 as it is closest to 50 or 100, so it would output two data items: [1] [100] [20] and [1] [105] [31], with the last item left in the stream and matched with any other data points that arrive after. I am thinking some sort of iterative function that does this, but am not sure how to emit the values I want and keep other values that were considered (but not used) in the stream. Any ideas or help is really appreciated? Thanks, Marc -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/