Hi Rami, Thanks for the fast reply.
1. In your solution, would I need to create a new stream for 'item updates', and add it as a source of my Flink job? Then I would need to ensure item updates get broadcast to all nodes that are running my job and use them to update the in-memory items database? This sounds like it might be a good solution, but I'm not sure how the broadcast would work - it sounds like I'd need Flink broadcast variables, but it looks like there's no support for changing datasets at the moment: https://issues.apache.org/jira/browse/FLINK-3514 2. I don't understand why an HTTP sink isn't possible. Say the output of my job is 'number of items ordered per customer', then for each output I want to update a 'customer' in my database, incrementing their 'item_order_count'. What's wrong with doing that update in the Flink job via an HTTP REST call (updating the customer resource), rather than writing directly to a database? The reason I'd like to do it this way is to decouple the underlying database from Flink. Josh On Mon, May 23, 2016 at 2:35 PM, Al-Isawi Rami <rami.al-is...@comptel.com> wrote: > Hi Josh, > > I am no expert in Flink yet, but here are my thoughts on this: > > 1. what about you stream an event to flink everytime the DB of items have > an update? then in some background thread you get the new data from the DB > let it be through REST (if it is only few updates a day) then load the > results in memory and there is your updated static data. > > 2. REST API are over HTTP, how that is possible to be a sink? does not > sound like flink job at all to serve http requests. simply sink the results > to some DB and have some component to read from DB and serve it as REST API. > > -Rami > > On 23 May 2016, at 16:22, Josh <jof...@gmail.com> wrote: > > Hi all, > > I am new to Flink and have a couple of questions which I've had trouble > finding answers to online. Any advice would be much appreciated! > > 1. What's a typical way of handling the scenario where you want to > join streaming data with a (relatively) static data source? For example, if > I have a stream 'orders' where each order has an 'item_id', and I want to > join this stream with my database of 'items'. The database of items is > mostly static (with perhaps a few new items added every day). The database > can be retrieved either directly from a standard SQL database (postgres) or > via a REST call. I guess one way to handle this would be to distribute the > database of items with the Flink tasks, and to redeploy the entire job if > the items database changes. But I think there's probably a better way to do > it? > 2. I'd like my Flink job to output state to a REST API. (i.e. using > the REST API as a sink). Updates would be incremental, e.g. the job would > output tumbling window counts which need to be added to some property on a > REST resource, so I'd probably implement this as a PATCH. I haven't found > much evidence that anyone else has used a REST API as a Flink sink - is > there a reason why this might be a bad idea? > > Thanks for any advice on these, > > Josh > > > Disclaimer: This message and any attachments thereto are intended solely > for the addressed recipient(s) and may contain confidential information. If > you are not the intended recipient, please notify the sender by reply > e-mail and delete the e-mail (including any attachments thereto) without > producing, distributing or retaining any copies thereof. Any review, > dissemination or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended > recipient(s) is prohibited. Thank you. >