I have my main application updating with a blue-green deployment strategy whereby a new version (always called green) starts receiving an initial fraction of the web traffic and then - based on the error rates - we progress the % of traffic until 100% of traffic is being handled by the green version. At which point we decommission blue and green is the new blue when the next version comes along.
Applied to Flink, my initial thought is that you would run the two topologies in parallel, but the first action of each topology would be a filter based on the key. You basically would use a consistent transformation of the key into a number between 0 and 100 and the filter would be: (key) -> color == green ? f(key) < level : f(key) >= level Then I can use a suitable metric to determine if the new topology is working and ramp up or down the level. One issue I foresee is what happens if the level changes mid-window, I will have output from both topologies when the window ends. In the case of my output, which is aggregatable, I will get the same results from two rows as from one row *provided* that the switch from blue to green is synchronized between the two topologies. That sounds like a hard problem though. Another thought I had was to let the web front-end decide based on the same key vs level approach. Rather than submit the raw event, I would add the target topology to the event and the filter just selects based on whether it is the target topology. This has the advantage that I know each event will only ever be processed by one of green or blue. Heck I could even use the main web application's blue-green deployment to drive the flink blue green deployment as due to the way I structure my results I don't care if I get two rows of counts for a time window or one row of counts, because I'm adding up the total counts across multiple rows and sum is sum! Anyone else had to try and deal with this type of thing? -stephenc