Hey guys!

I've been thinking about this one today:

Say you have a stream of data in the form of (id, value) - This will
evidently be a DataStream of Tuple2.
I need to cache this data in some sort of static stream (perhaps even a
DataSet).
Then, if in the input stream, I see an id that was previously stored, I
should update its value with the most recent entry.

On an example:

1, 3
2, 5
6, 7
1, 5

The value cached for the id 1 should be 5.

How would you recommend caching the data? And what would be used for the
update? A join function?

As far as I see things, you cannot really combine DataSets with DataStreams
although a DataSet is, in essence, just a finite stream.
If this can indeed be done, some pseudocode would be nice :)

Thanks!
Andra

Reply via email to