On Mon, 11 Feb 2019 at 14:10, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote:
> Another possibility would be injecting pseudo events into the source and > having a stateful filter. > > The event would be something like “key X is now owned by green”. > > I can do that because getting a list of keys seen in the past X minutes is > cheap (we have it already) > > But it’s unclear what impact would be adding such state to the filter > Hmmm might not need to be quite so stateful, if the filter was implemented as a BroadcastProcessFunction or a KeyedBroadcastProcessFunction, I could run the key -> threshold and compare to the level from the Broadcast context... that way the broadcast events wouldn't need to be associated with any specific key and could just be {"level":56} > > On Mon 11 Feb 2019 at 13:33, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> >> >> On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < >> stephen.alan.conno...@gmail.com> wrote: >> >>> I have my main application updating with a blue-green deployment >>> strategy whereby a new version (always called green) starts receiving an >>> initial fraction of the web traffic and then - based on the error rates - >>> we progress the % of traffic until 100% of traffic is being handled by the >>> green version. At which point we decommission blue and green is the new >>> blue when the next version comes along. >>> >>> Applied to Flink, my initial thought is that you would run the two >>> topologies in parallel, but the first action of each topology would be a >>> filter based on the key. >>> >>> You basically would use a consistent transformation of the key into a >>> number between 0 and 100 and the filter would be: >>> >>> (key) -> color == green ? f(key) < level : f(key) >= level >>> >>> Then I can use a suitable metric to determine if the new topology is >>> working and ramp up or down the level. >>> >>> One issue I foresee is what happens if the level changes mid-window, I >>> will have output from both topologies when the window ends. >>> >>> In the case of my output, which is aggregatable, I will get the same >>> results from two rows as from one row *provided* that the switch from blue >>> to green is synchronized between the two topologies. That sounds like a >>> hard problem though. >>> >>> >>> Another thought I had was to let the web front-end decide based on the >>> same key vs level approach. Rather than submit the raw event, I would add >>> the target topology to the event and the filter just selects based on >>> whether it is the target topology. This has the advantage that I know each >>> event will only ever be processed by one of green or blue. Heck I could >>> even use the main web application's blue-green deployment to drive the >>> flink blue green deployment >>> >> >> In other words, if a blue web node receives an event upload it adds >> "blue", whereas if a green web node receives an event upload it adds >> "green" (not quite those strings but rather the web deployment sequence >> number). This has the advantage that the web nodes do not need to parse the >> event payload. The % of web traffic will result in the matching % of events >> being sent to blue and green. Also this means that all keys get processed >> at the target % during the deployment, which can help flush out bugs. >> >> I can therefore stop the old topology at > 1 window after the green web >> node started getting 100% of traffic in order to allow any existing windows >> in flight to flush all the way to the datastore... >> >> Out of order events would be tagged as green once green is 100% of >> traffic, and so can be processed correctly... >> >> And I can completely ignore topology migration serialization issues... >> >> Sounding very tempting... there must be something wrong... >> >> (or maybe my data storage plan just allows me to make this kind of >> optimization!) >> >> >>> as due to the way I structure my results I don't care if I get two rows >>> of counts for a time window or one row of counts, because I'm adding up the >>> total counts across multiple rows and sum is sum! >>> >> >>> >>> Anyone else had to try and deal with this type of thing? >>> >>> -stephenc >>> >>> >>> -- > Sent from my phone >