Re: Combining streams with static data and using REST API as a sink

Josh Mon, 23 May 2016 07:07:19 -0700

Hi Rami,

Thanks for the fast reply.

   1. In your solution, would I need to create a new stream for 'item
   updates', and add it as a source of my Flink job? Then I would need to
   ensure item updates get broadcast to all nodes that are running my job and
   use them to update the in-memory items database? This sounds like it might
   be a good solution, but I'm not sure how the broadcast would work - it
   sounds like I'd need Flink broadcast variables, but it looks like there's
   no support for changing datasets at the moment:
   https://issues.apache.org/jira/browse/FLINK-3514
   2. I don't understand why an HTTP sink isn't possible. Say the output of
   my job is 'number of items ordered per customer', then for each output I
   want to update a 'customer' in my database, incrementing their
   'item_order_count'. What's wrong with doing that update in the Flink job
   via an HTTP REST call (updating the customer resource), rather than writing
   directly to a database? The reason I'd like to do it this way is to
   decouple the underlying database from Flink.

Josh

On Mon, May 23, 2016 at 2:35 PM, Al-Isawi Rami <rami.al-is...@comptel.com>
wrote:

> Hi Josh,
>
> I am no expert in Flink yet, but here are my thoughts on this:
>
> 1. what about you stream an event to flink everytime the DB of items have
> an update? then in some background thread you get the new data from the DB
> let it be through REST (if it is only few updates a day) then load the
> results in memory and there is your updated static data.
>
> 2. REST API are over HTTP, how that is possible to be a sink? does not
> sound like flink job at all to serve http requests. simply sink the results
> to some DB and have some component to read from DB and serve it as REST API.
>
> -Rami
>
> On 23 May 2016, at 16:22, Josh <jof...@gmail.com> wrote:
>
> Hi all,
>
> I am new to Flink and have a couple of questions which I've had trouble
> finding answers to online. Any advice would be much appreciated!
>
>    1. What's a typical way of handling the scenario where you want to
>    join streaming data with a (relatively) static data source? For example, if
>    I have a stream 'orders' where each order has an 'item_id', and I want to
>    join this stream with my database of 'items'. The database of items is
>    mostly static (with perhaps a few new items added every day). The database
>    can be retrieved either directly from a standard SQL database (postgres) or
>    via a REST call. I guess one way to handle this would be to distribute the
>    database of items with the Flink tasks, and to redeploy the entire job if
>    the items database changes. But I think there's probably a better way to do
>    it?
>    2. I'd like my Flink job to output state to a REST API. (i.e. using
>    the REST API as a sink). Updates would be incremental, e.g. the job would
>    output tumbling window counts which need to be added to some property on a
>    REST resource, so I'd probably implement this as a PATCH. I haven't found
>    much evidence that anyone else has used a REST API as a Flink sink - is
>    there a reason why this might be a bad idea?
>
> Thanks for any advice on these,
>
> Josh
>
>
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>

Re: Combining streams with static data and using REST API as a sink

Reply via email to