Hi Michael, Yeah, I feel the updating of state in an external system is a job for a sink. However, I don't really like the idea of combining both a web service call and DB write in the one sink class because it's breaking single responsibility - I feel there should be a nicer way to compose these types of flows. Also means I need to write smarts/manage state in the sink because of several points of failure (web service call, db update). NB: the response from the web service is not just for control because there are some values I need to record, including a correlation key for when the external system sends notifications to us sometime in the following hours/days, and I then need to perform an update in our system.
Which is what led me to just thinking of the web service call as a map transform, and connecting a DB sink to that operator. However, as you mention, this will be a map/transform function with side effects, therefore I would need to manage state to ensure duplicate web service calls aren't sent on failure restarts etc. Maybe I am missing a better approach, but for now I am leaning towards doing the web service call in a (stateful) map function, with the result feeding into a DB sink. Cheers On Mon, Apr 30, 2018 at 2:34 AM, TechnoMage <mla...@technomage.com> wrote: > If the external web service call does not modify the state of that > external system all the approaches you list are probably ok. If there is > external state modification then you want to ensure on restart the Flink > job does not resend requests to that service or that it can handle > duplicate requests. In that sense a sink that sends the request is the > cleanest as it represents an export of data to an external system. The > response back is just to allow the sink to not repeat messages. If it is > sending data to the system that affects it’s state and then the response > has values that need to be recorded as results not just control values, > then that could be a separate flow or use the map process as from the > Flink’s point of view it was a transform. This later case however to me > smacks of an undesireable side effect as these make error recovery cases > harder. > > Michael > > > On Apr 28, 2018, at 8:21 PM, wazza <wazzawe...@gmail.com> wrote: > > > > Hi all, > > > > I need to send a request to an external web service and then store the > response in a DB table, and I am wondering how people have approached this > or similar problems in the past. > > > > The flow is: Kafka source (msgs only every few seconds) => filter/map > operators => result sent to web service (which updates state in that > system) => response stored in DB. > > > > Initially I was thinking of just creating a custom sink which basically: > Sends request to webservice => Get response containing external key => > Save key into DB > > This feels to me like basically smashing together 2 separate sinks into > 1, and I am not sure if that is a good design or not. > > > > Another option would be to create a RichMapFunction (possibly async > function) which does the web service call. My map function can then just > return the response which I can then feed into a standard DB sink. > > However, with this approach it feels strange to update an external > system in a map() function, but maybe that's ok? Also, I presume to make my > map function idempotent I would need to store some state (I can key the > messages and use a ValueState) so I don't do duplicate web service calls if > there is a failure? > > > > Thoughts? > > > > Thanks in advance. > > > >