This sounds like you have some per-key state to keep track of, so the 'correct' way to do it would be to keyBy the guid. I believe that if you run your environment using the Rocks DB state backend you will not OOM regardless of the number of GUIDs that are eventually tracked. Whether flink/stream processing is the most effective way to achieve your goal, I can't say, but I am fairly confident that this particular aspect is not a problem.
On Sat, Apr 23, 2016 at 1:13 AM, Chen Bekor <chen.be...@gmail.com> wrote: > hi all, > > I have a stream of incoming object versions (objects change over time) and > a requirement to fetch from a datastore the last known object version in > order to link it with the id of the new version, so that I end up with a > linked list of object versions. > > all object versions contain the same guid, so I was thinking about using > flink streaming in order to assure ordering and avoid concurrency / race > conditions in the linkage process (object version might arrive unordered or > may arrive at spikes) > > if I use the object guid as a key for a keyed stream I am concerned I will > end up with millions of windowed streams hence causing OOM. > > what do you think should be the right approach? do you think flink is the > right technology for this task? >