Hi Michael,

Yeah, I feel the updating of state in an external system is a job for a
sink. However, I don't really like the idea of combining both a web service
call and DB write in the one sink class because it's breaking single
responsibility - I feel there should be a nicer way to compose these types
of flows. Also means I need to write smarts/manage state in the sink
because of several points of failure (web service call, db update). NB: the
response from the web service is not just for control because there are
some values I need to record, including a correlation key for when the
external system sends notifications to us sometime in the following
hours/days, and I then need to perform an update in our system.

Which is what led me to just thinking of the web service call as a map
transform, and connecting a DB sink to that operator. However, as you
mention, this will be a map/transform function with side effects, therefore
I would need to manage state to ensure duplicate web service calls aren't
sent on failure restarts etc.

Maybe I am missing a better approach, but for now I am leaning towards
doing the web service call in a (stateful) map function, with the result
feeding into a DB sink.

Cheers


On Mon, Apr 30, 2018 at 2:34 AM, TechnoMage <mla...@technomage.com> wrote:

> If the external web service call does not modify the state of that
> external system all the approaches you list are probably ok.  If there is
> external state modification then you want to ensure on restart the Flink
> job does not resend requests to that service or that it can handle
> duplicate requests.  In that sense a sink that sends the request is the
> cleanest as it represents an export of data to an external system.  The
> response back is just to allow the sink to not repeat messages.  If it is
> sending data to the system that affects it’s state and then the response
> has values that need to be recorded as results not just control values,
> then that could be a separate flow or use the map process as from the
> Flink’s point of view it was a transform.  This later case however to me
> smacks of an undesireable side effect as these make error recovery cases
> harder.
>
> Michael
>
> > On Apr 28, 2018, at 8:21 PM, wazza <wazzawe...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I need to send a request to an external web service and then store the
> response in a DB table, and I am wondering how people have approached this
> or similar problems in the past.
> >
> > The flow is: Kafka source (msgs only every few seconds)  => filter/map
> operators => result sent to web service (which updates state in that
> system) => response stored in DB.
> >
> > Initially I was thinking of just creating a custom sink which basically:
> Sends request to webservice  => Get response containing external key =>
> Save key into DB
> > This feels to me like basically smashing together 2 separate sinks into
> 1, and I am not sure if that is a good design or not.
> >
> > Another option would be to create a RichMapFunction (possibly async
> function) which does the web service call. My map function can then just
> return the response which I can then feed into a standard DB sink.
> > However, with this approach it feels strange to update an external
> system in a map() function, but maybe that's ok? Also, I presume to make my
> map function idempotent I would need to store some state (I can key the
> messages and use a ValueState) so I don't do duplicate web service calls if
> there is a failure?
> >
> > Thoughts?
> >
> > Thanks in advance.
> >
>
>

Reply via email to