Aah we have it here https://docs.google.com/document/d/16GMH5VM6JJiWj_N0W8y3PtQ1aoJFxsKoOTSYOfqlsRE/edit#heading=h.bgl260hr56g6
On Thu, Mar 8, 2018 at 11:45 AM, Vishal Santoshi <vishal.santo...@gmail.com> wrote: > This is very interesting. I would imagine that there will be high back > pressure on the LEFT source effectively throttling it but as is the current > state that is likely effect other pipelines as the free o/p buffer on the > source side and and i/p buffers on the consumer side start blocking and get > exhausted for all other pipes. I am very interested in how holding back the > busy source does not create a pathological issue where that source is > forever held back. Is there a FLIP for it ? > > On Thu, Mar 8, 2018 at 11:29 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Gytis, >> >> Flink does currently not support holding back individual streams, for >> example it is not possible to align streams on (offset) event-time. >> >> However, the Flink community is working on a windowed join for the >> DataStream API, that only holds the relevant tail of the stream as state. >> If your join condition is +/- 5 minutes then, the join would store he >> last five minutes of both streams as state. Here's an implementation of the >> operator [1] that is close to be merged and will be available in Flink >> 1.6.0. >> Flink's SQL support (and Table API) support this join type since version >> 1.4.0 [2]. >> >> Best, Fabian >> >> [1] https://github.com/apache/flink/pull/5342 >> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> dev/table/sql.html#joins >> >> 2018-03-08 1:02 GMT-08:00 Gytis Žilinskas <gytis.zilins...@gmail.com>: >> >>> Hi, >>> >>> we're considering flink for a couple of our projects. I'm doing a >>> trial implementation for one of them. So far, I like a lot of things, >>> however there are a couple of issues that I can't figure out how to >>> resolve. Not sure if it's me misunderstanding the tool, or flink just >>> doesn't have a capability to do it. >>> >>> We want to do an event time join on two big kafka streams. Both of >>> them might experience some issues on the other end and be delayed. >>> Additionally, while both are big, one (let's call it stream A) is >>> significantly larger than stream B. >>> >>> We also know, that the join window is around 5min. That is, given some >>> key K in stream B, if there is a counterpart in stream A, it's going >>> to be +/5 5min in event time. >>> >>> Since stream A is especially heavy and it's unfeasable to keep hours >>> of it in memory, I would imagine an ideal solution where we read both >>> streams from Kafka. We always make sure that stream B is ahead by >>> 10min, that is, if stream A is currently ahead in watermarks, we stall >>> it and consume stream B until it catches up. Once the stream are >>> alligned in event time (with the 10min delay window) we run them both >>> through join. >>> >>> The problem is, that I find a mechanism to implement that in flink. If >>> I try to do a CoProcessFunction then it just consumes both streams at >>> the same time, ingests a lot of messages from stream A, runs out of >>> memory and dies. >>> >>> Any ideas on how this could be solved? >>> >>> (here's a thread with a very similar problem from some time ago >>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab >>> ble.com/synchronizing-two-streams-td6830.html) >>> >>> Regards, >>> Gytis >>> >> >> >