I understand your scenario but I disagree with its assumptions: "However, the partition of A is empty and thus A is temporarily idle." - you're assuming that the behavior of the source is to mark itself idle if data isn't available, but that's clearly source-specific and not behavior we expect to have in Pulsar source. A partition may be empty indefinitely while still being active. Imagine that the producer is defending a lease - "I'm here, there's no data, please don't advance the clock".
"we bind idleness to wall clock time" - you're characterizing a specific strategy (WatermarkStrategy.withIdleness()), not the inherent behavior of the pipeline. I wouldn't recommend using withIdleness() with source-based watermarks. I do agree that dynamism in partition assignment can wreak havoc on watermark correctness. We have some ideas on the Pulsar side about that too. I would ask that we focus on the Flink framework and pipeline behavior. By offering a more powerful framework, we encourage stream storage systems to "rise to the occasion" - treat event time in a first-class way, optimize for correctness, etc. In this case, FLIP-167 is setting the stage for evolution in Pulsar. Thanks again Arvid for the great discussion. On Fri, Jun 4, 2021 at 2:06 PM Arvid Heise <ar...@apache.org> wrote: > At least one big motivation is having (temporary) empty partitions. Let me > give you an example, why imho idleness is only approximate in this case: > Assume you have source subtask A, B, C that correspond to 3 source > partitions and a downstream keyed window operator W. > > W would usually trigger on min_watermark(A, B, C). However, the partition > of A is empty and thus A is temporarily idle. So W triggers on > min_watermark(B, C). When A is now active again, the watermark implicitly > is min_watermark(B, C) for A! > > Let's further assume that the source is filled by another pipeline before. > This pipeline experiences technical difficulties for X minutes and could > not produce into the partition of A, hence the idleness. When the upstream > pipeline resumes it fills A with some records that are before > min_watermark(B, C). Any watermark generated from these records is > discarded as the watermark is monotonous. Therefore, these records will be > considered late by W and discarded. > > Without idleness, we would have simply bocked W until the upstream pipeline > fully recovers and we would not have had any late records. The same holds > for any reprocessing where the data of partition A is continuous. > > If you look deeper, the issue is that we bind idleness to wall clock time > (e.g. advance watermark after X seconds without data). Then we assume the > watermark of the idle partition to be in sync with the slowest partition. > However, in the case of hiccups, this assumption does not hold at all. > I don't see any fix for that (easy or not easy) and imho it's inherent to > the design of idleness. > We lack information (why is no data coming) and have a heuristic to fix it. > > In the case of partition assignment where one subtask has no partition, we > are probably somewhat safe. We know why no data is coming (no partition) > and as long as we do not have dynamic partition assignment, there will > never be a switch to active without restart (for the foreseeable future). > > On Fri, Jun 4, 2021 at 10:34 PM Eron Wright <ewri...@streamnative.io > .invalid> > wrote: > > > Yes I'm talking about an implementation of idleness that is unrelated to > > processing time. The clear example is partition assignment to subtasks, > > which probably motivated Flink's idleness functionality in the first > place. > > > > On Fri, Jun 4, 2021 at 12:53 PM Arvid Heise <ar...@apache.org> wrote: > > > > > Hi Eron, > > > > > > Are you referring to an implementation of idleness that does not rely > on > > a > > > wall clock but on some clock baked into the partition information of > the > > > source system? > > > If so, you are right that it invalidates my points. > > > Do you have an example on where this is used? > > > > > > With a wall clock, you always run into the issues that I describe since > > you > > > are effectively mixing event time and processing time... > > > > > > > > > On Fri, Jun 4, 2021 at 6:28 PM Eron Wright <ewri...@streamnative.io > > > .invalid> > > > wrote: > > > > > > > Dawid, I think you're mischaracterizing the idleness signal as > > > inherently a > > > > heuristic, but Flink does not impose that. A source-based watermark > > (and > > > > corresponding idleness signal) may well be entirely data-driven, > > entirely > > > > deterministic. Basically you're underselling what the pipeline is > > > capable > > > > of, based on painful experiences with using the generic, > > heuristics-based > > > > watermark assigner. Please don't let those experiences overshadow > > what's > > > > possible with source-based watermarking. > > > > > > > > The idleness signal does have a strict definition, it indicates > whether > > > the > > > > stream is actively participating in advancing the event time clock. > > The > > > > status of all participants is considered when aggregating watermarks. > > A > > > > source subtask generally makes the determination based on data, e.g. > > > > whether a topic is assigned to that subtask. > > > > > > > > We have here a modest proposal to add callbacks to the sink function > > for > > > > information that the sink operator already receives. The practical > > > result > > > > is improved correctness when used with streaming systems that have > > > > first-class support for event time. The specific changes may be > > > previewed > > > > here: > > > > https://github.com/apache/flink/pull/15950 > > > > https://github.com/streamnative/flink/pull/2 > > > > > > > > Thank you all for the robust discussion. Do I have your support to > > > proceed > > > > to enhance FLIP-167 with idleness callbacks and to proceed to a vote? > > > > > > > > Eron > > > > > > > > > > > > On Fri, Jun 4, 2021 at 9:08 AM Arvid Heise <ar...@apache.org> wrote: > > > > > > > > > While everything I wrote before is still valid, upon further > > > rethinking, > > > > I > > > > > think that the conclusion is not necessarily correct: > > > > > - If the user wants to have pipeline A and B behaving as if A+B was > > > > jointly > > > > > executed in the same pipeline without the intermediate Pulsar > topic, > > > > having > > > > > the idleness in that topic is to only way to guarantee consistency. > > > > > - We could support the following in the respective sources: If the > > user > > > > > that wants to use a different definition of idleness in B, they can > > > just > > > > > provide a new idleness definition. At that point, we should discard > > the > > > > > idleness in the intermediate topic while reading. > > > > > > > > > > If we would agree on the latter way, I think having the idleness in > > the > > > > > topic is of great use because it's a piece of information that > cannot > > > be > > > > > inferred as stated by others. Consequently, we would be able to > > support > > > > all > > > > > use cases and can give the user the freedom to express his intent. > > > > > > > > > > > > > > > On Fri, Jun 4, 2021 at 3:43 PM Arvid Heise <ar...@apache.org> > wrote: > > > > > > > > > > > I think the core issue in this discussion is that we kind of > assume > > > > that > > > > > > idleness is something universally well-defined. But it's not. > It's > > a > > > > > > heuristic to advance data processing in event time where we would > > > lack > > > > > data > > > > > > to do so otherwise. > > > > > > Keep in mind that idleness has no real definition in terms of > event > > > > time > > > > > > and leads to severe unexpected results: If you reprocess a data > > > stream > > > > > with > > > > > > temporarily idle partitions, these partitions would not be deemed > > > idle > > > > on > > > > > > reprocessing and there is a realistic chance that records that > were > > > > > deemed > > > > > > late in the live processing case are now perfectly fine records > in > > > the > > > > > > reprocessing case. (I can expand on that if that was too short) > > > > > > > > > > > > With that in mind, why would a downstream process even try to > > > calculate > > > > > > the same idleness state as the upstream process? I don't see a > > point; > > > > we > > > > > > would just further any imprecision in the calculation. > > > > > > > > > > > > Let's have a concrete example. Assume that we have upstream > > pipeline > > > A > > > > > and > > > > > > downstream pipeline B. A has plenty of resources and is live > > > processing > > > > > > data. Some partitions are idle and that is propagated to the > sinks. > > > > Now B > > > > > > is heavily backpressured and consumes very slowly. B doesn't see > > any > > > > > > idleness directly. B can calculate exact watermarks and use all > > > records > > > > > for > > > > > > it's calculation. Reprocessing would yield the same result for B. > > If > > > we > > > > > now > > > > > > forward idleness, we can easily find cases where we would advance > > the > > > > > > watermark prematurely while there is data directly available to > > > > calculate > > > > > > the exact watermark. > > > > > > > > > > > > For me, idleness is just a pipeline-specific heuristic and should > > be > > > > > > viewed as such. > > > > > > > > > > > > Best, > > > > > > > > > > > > Arvid > > > > > > > > > > > > On Fri, Jun 4, 2021 at 2:01 PM Piotr Nowojski < > > pnowoj...@apache.org> > > > > > > wrote: > > > > > > > > > > > >> Hi, > > > > > >> > > > > > >> > Imagine you're starting consuming from the result channel in a > > > > > situation > > > > > >> were you have: > > > > > >> > record4, record3, StreamStatus.ACTIVE, StreamStatus.IDLE > > record2, > > > > > >> record1, record0 > > > > > >> > Switching to the encoded StreamStatus.IDLE is unnecessary, and > > > might > > > > > >> cause the record3 and record4 to be late depending on how the > > > > watermark > > > > > >> progressed in other partitions. > > > > > >> > > > > > >> Yes, I understand this point. But it can also be the other way > > > around. > > > > > >> There might be a large gap between record2 and record3, and > users > > > > might > > > > > >> prefer or might be not able to duplicate idleness detection > logic. > > > The > > > > > >> downstream system might be lacking some kind of information > (that > > is > > > > > only > > > > > >> available in the top level/ingesting system) to correctly set > the > > > idle > > > > > >> status. > > > > > >> > > > > > >> Piotrek > > > > > >> > > > > > >> pt., 4 cze 2021 o 12:30 Dawid Wysakowicz < > dwysakow...@apache.org> > > > > > >> napisał(a): > > > > > >> > > > > > >> > > > > > > >> > Same as Eron I don't follow this point. Any streaming sink can > > be > > > > used > > > > > >> as > > > > > >> > this kind of transient channel. Streaming sinks, like Kafka, > are > > > > also > > > > > >> used > > > > > >> > to connect one streaming system with another one, also for an > > > > > immediate > > > > > >> > consumption. > > > > > >> > > > > > > >> > Sure it can, but imo it is rarely the primary use case why you > > > want > > > > to > > > > > >> > offload the channels to an external persistent system. Again > in > > my > > > > > >> > understanding StreamStatus is something transient, e.g. part > of > > > our > > > > > >> > external system went offline. I think those kind of events > > should > > > > not > > > > > be > > > > > >> > persisted. > > > > > >> > > > > > > >> > Both watermarks and idleness status can be some > > > > > >> > inherent property of the underlying data stream. if an > > > > > >> upstream/ingesting > > > > > >> > system knows that this particular stream/partition of a stream > > is > > > > > going > > > > > >> > idle (for example for a couple of hours), why does this > > > information > > > > > >> have to > > > > > >> > be re-created in the downstream system using some heuristic? > It > > > > could > > > > > be > > > > > >> > explicitly encoded. > > > > > >> > > > > > > >> > Because it's most certainly not true in the downstream. The > > > idleness > > > > > >> works > > > > > >> > usually according to a heuristic: "We have not seen records > for > > 5 > > > > > >> minutes, > > > > > >> > so there is a fair chance we won't see records for the next 5 > > > > minutes, > > > > > >> so > > > > > >> > let's not wait for watermarks for now." That heuristic most > > > > certainly > > > > > >> won't > > > > > >> > hold for a downstream persistent storage. > > > > > >> > > > > > > >> > Imagine you're starting consuming from the result channel in a > > > > > situation > > > > > >> > were you have: > > > > > >> > > > > > > >> > record4, record3, StreamStatus.ACTIVE, StreamStatus.IDLE > > record2, > > > > > >> record1, > > > > > >> > record0 > > > > > >> > > > > > > >> > Switching to the encoded StreamStatus.IDLE is unnecessary, and > > > might > > > > > >> cause > > > > > >> > the record3 and record4 to be late depending on how the > > watermark > > > > > >> > progressed in other partitions. > > > > > >> > > > > > > >> > I understand Eron's use case, which is not about storing the > > > > > >> StreamStatus, > > > > > >> > but performing an immediate aggregation or said differently > > > changing > > > > > the > > > > > >> > partitioning/granularity of records and watermarks externally > to > > > > > Flink. > > > > > >> The > > > > > >> > produced by Flink partitioning is actually never persisted in > > that > > > > > >> case. In > > > > > >> > this case I agree exposing the StreamStatus makes sense. I am > > > still > > > > > >> > concerned it will lead to storing the StreamStatus which can > > lead > > > to > > > > > >> many > > > > > >> > subtle problems. > > > > > >> > On 04/06/2021 11:53, Piotr Nowojski wrote: > > > > > >> > > > > > > >> > Hi, > > > > > >> > > > > > > >> > Thanks for picking up this discussion. For the record, I also > > > think > > > > we > > > > > >> > shouldn't expose latency markers. > > > > > >> > > > > > > >> > About the stream status > > > > > >> > > > > > > >> > > > > > > >> > Persisting the StreamStatus > > > > > >> > > > > > > >> > I don't agree with the view that sinks are "storing" the > > > > data/idleness > > > > > >> > status. This nomenclature makes only sense if we are talking > > about > > > > > >> > streaming jobs producing batch data. > > > > > >> > > > > > > >> > > > > > > >> > In my understanding a StreamStatus makes sense only when > talking > > > > about > > > > > >> > immediately consumed transient channels such as between > > operators > > > > > within > > > > > >> > a single job. > > > > > >> > > > > > > >> > Same as Eron I don't follow this point. Any streaming sink can > > be > > > > used > > > > > >> as > > > > > >> > this kind of transient channel. Streaming sinks, like Kafka, > are > > > > also > > > > > >> used > > > > > >> > to connect one streaming system with another one, also for an > > > > > immediate > > > > > >> > consumption. > > > > > >> > > > > > > >> > You could say the same thing about watermarks (note they are > > > usually > > > > > >> > generated in Flink based on the incoming events) and I would > not > > > > agree > > > > > >> with > > > > > >> > it in the same way. Both watermarks and idleness status can be > > > some > > > > > >> > inherent property of the underlying data stream. if an > > > > > >> upstream/ingesting > > > > > >> > system knows that this particular stream/partition of a stream > > is > > > > > going > > > > > >> > idle (for example for a couple of hours), why does this > > > information > > > > > >> have to > > > > > >> > be re-created in the downstream system using some heuristic? > It > > > > could > > > > > be > > > > > >> > explicitly encoded. If you want to pass watermarks explicitly > > to > > > a > > > > > next > > > > > >> > downstream streaming system, because you do not want to > recreate > > > > them > > > > > >> from > > > > > >> > the events using a duplicated logic, why wouldn't you like to > do > > > the > > > > > >> same > > > > > >> > thing with the idleness? > > > > > >> > > > > > > >> > Also keep in mind that I would expect that a user can decide > > > whether > > > > > he > > > > > >> > wants to persist the watermarks/stream status on his own. This > > > > > >> shouldn't be > > > > > >> > obligatory. > > > > > >> > > > > > > >> > For me there is one good reason to not expose stream status > YET. > > > > That > > > > > >> is, > > > > > >> > if we are sure that we do not need this just yet, while at the > > > same > > > > > >> time we > > > > > >> > don't want to expand the Public/PublicEvolving API, as this > > always > > > > > >> > increases the maintenance cost. > > > > > >> > > > > > > >> > Best, > > > > > >> > Piotrek > > > > > >> > > > > > > >> > > > > > > >> > pt., 4 cze 2021 o 10:57 Eron Wright <ewri...@streamnative.io > > > > .invalid> > > > > > < > > > > > >> ewri...@streamnative.io.invalid> > > > > > >> > napisał(a): > > > > > >> > > > > > > >> > > > > > > >> > I believe that the correctness of watermarks and stream status > > > > markers > > > > > >> is > > > > > >> > determined entirely by the source (ignoring the generic > > assigner). > > > > > Such > > > > > >> > stream elements are known not to overtake records, and aren't > > > > > transient > > > > > >> > from a pipeline perspective. I do agree that recoveries may > be > > > > lossy > > > > > if > > > > > >> > some operator state is transient (e.g. valve state). > > > > > >> > > > > > > >> > Consider that status markers already affect the flow of > > watermarks > > > > > (e.g. > > > > > >> > suppression), and thus affect operator behavior. Seems to me > > that > > > > > >> exposing > > > > > >> > the idleness state is no different than exposing a watermark. > > > > > >> > > > > > > >> > The high-level story is, there is a need for the Flink job to > be > > > > > >> > transparent or neutral with respect to the event time clock. > I > > > > > believe > > > > > >> > this is possible if time flows with high fidelity from source > to > > > > sink. > > > > > >> Of > > > > > >> > course, one always has the choice as to whether to use > > > source-based > > > > > >> > watermarks; as you mentioned, requirements vary. > > > > > >> > > > > > > >> > Regarding the Pulsar specifics, we're working on a community > > > > proposal > > > > > >> that > > > > > >> > I'm anxious to share. To answer your question, the broker > > > > aggregates > > > > > >> > watermarks from multiple producers who are writing to a single > > > > topic. > > > > > >> > Each sink > > > > > >> > subtask is a producer. The broker considers each producer's > > > > > assertions > > > > > >> > (watermarks, idleness) to be independent inputs, much like the > > > case > > > > > with > > > > > >> > the watermark valve. > > > > > >> > > > > > > >> > On your concern about idleness causing false late events, I > > > > understand > > > > > >> your > > > > > >> > point but don't think it applies if the keyspace assignments > are > > > > > stable. > > > > > >> > > > > > > >> > I hope this explains to your satisfaction. > > > > > >> > > > > > > >> > - Eron > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Fri, Jun 4, 2021, 12:07 AM Dawid Wysakowicz < > > > > > dwysakow...@apache.org> > > > > > >> <dwysakow...@apache.org> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > I might be missing some background on Pulsar partitioning but > > > > > something > > > > > >> > seems off to me. What is the chunk/batch/partition that Pulsar > > > > brokers > > > > > >> > will additionally combine watermarks for? Isn't it the case > that > > > > only > > > > > a > > > > > >> > single Flink sub-task would write to such a chunk and thus > will > > > > > produce > > > > > >> > an aggregated watermark already via the writeWatermark method? > > > > > >> > > > > > > >> > Personally I am really skeptical about exposing the > StreamStatus > > > in > > > > > any > > > > > >> > Producer API. In my understanding the StreamStatus is a > > transient > > > > > >> > setting of a consumer of data. StreamStatus is a mechanism for > > > > making > > > > > a > > > > > >> > tradeoff between correctness (how many late elements that are > > > behind > > > > > >> > watermark we have) vs making progress. IMO one has to be extra > > > > > cautious > > > > > >> > when it comes to persistent systems. Again I might be missing > > the > > > > > exact > > > > > >> > use case you are trying to solve here, but I can imagine > > multiple > > > > jobs > > > > > >> > reading from such a stream which might have different > > correctness > > > > > >> > requirements. Just quickly throwing an idea out of my head you > > > might > > > > > >> > want to have an entirely correct results which can be delayed > > for > > > > > >> > minutes, and a separate task that produces quick insights > within > > > > > >> > seconds. Another thing to consider is that by the time the > > > > downstream > > > > > >> > job starts consuming the upstream one might have produced > > records > > > to > > > > > the > > > > > >> > previously idle chunk. Persisting the StreamStatus in such a > > > > scenario > > > > > >> > would add unnecessary false late events. > > > > > >> > > > > > > >> > In my understanding a StreamStatus makes sense only when > talking > > > > about > > > > > >> > immediately consumed transient channels such as between > > operators > > > > > within > > > > > >> > a single job. > > > > > >> > > > > > > >> > Best, > > > > > >> > > > > > > >> > Dawid > > > > > >> > > > > > > >> > On 03/06/2021 23:31, Eron Wright wrote: > > > > > >> > > > > > > >> > I think the rationale for end-to-end idleness (i.e. between > > > > pipelines) > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > the same as the rationale for idleness between operators > within > > a > > > > > >> > pipeline. On the 'main issue' you mentioned, we entrust the > > > source > > > > > >> > > > > > > >> > with > > > > > >> > > > > > > >> > adapting to Flink's notion of idleness (e.g. in Pulsar source, > > it > > > > > means > > > > > >> > that no topics/partitions are assigned to a given sub-task); a > > > > similar > > > > > >> > adaption would occur in the sink. In other words, I think it > > > > > >> > > > > > > >> > reasonable > > > > > >> > > > > > > >> > that a sink for a watermark-aware storage system has need for > > the > > > > > >> > > > > > > >> > idleness > > > > > >> > > > > > > >> > signal. > > > > > >> > > > > > > >> > Let me explain how I would use it in Pulsar's sink. Each > > sub-task > > > > is > > > > > a > > > > > >> > Pulsar producer, and is writing watermarks to a configured > topic > > > via > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > Producer API. The Pulsar broker aggregates the watermarks > that > > > are > > > > > >> > > > > > > >> > written > > > > > >> > > > > > > >> > by each producer into a global minimum (similar to > > > > > >> > > > > > > >> > StatusWatermarkValve). > > > > > >> > > > > > > >> > The broker keeps track of which producers are actively > producing > > > > > >> > watermarks, and a producer may mark itself as idle to tell the > > > > broker > > > > > >> > > > > > > >> > not > > > > > >> > > > > > > >> > to wait for watermarks from it, e.g. when a producer is going > > > > > >> > > > > > > >> > offline. I > > > > > >> > > > > > > >> > had intended to mark the producer as idle when the sub-task is > > > > > closing, > > > > > >> > > > > > > >> > but > > > > > >> > > > > > > >> > now I see that it would be insufficient; the producer should > > also > > > be > > > > > >> > > > > > > >> > idled > > > > > >> > > > > > > >> > if the sub-task is idled. Otherwise, the broker would wait > > > > > >> > > > > > > >> > indefinitely > > > > > >> > > > > > > >> > for the idled sub-task to produce a watermark. > > > > > >> > > > > > > >> > Arvid, I think your original instincts were correct about > > idleness > > > > > >> > propagation, and I hope I've demonstrated a practical use > case. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > On Thu, Jun 3, 2021 at 12:49 PM Arvid Heise <ar...@apache.org > > > > < > > > > > >> ar...@apache.org> wrote: > > > > > >> > > > > > > >> > > > > > > >> > When I was rethinking the idleness issue, I came to the > > conclusion > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > it > > > > > >> > > > > > > >> > should be inferred at the source of the respective downstream > > > > pipeline > > > > > >> > again. > > > > > >> > > > > > > >> > The main issue on propagating idleness is that you would force > > the > > > > > >> > > > > > > >> > same > > > > > >> > > > > > > >> > definition across all downstream pipelines, which may not be > > what > > > > the > > > > > >> > > > > > > >> > user > > > > > >> > > > > > > >> > intended. > > > > > >> > On the other hand, I don't immediately see a technical reason > > why > > > > the > > > > > >> > downstream source wouldn't be able to infer that. > > > > > >> > > > > > > >> > > > > > > >> > On Thu, Jun 3, 2021 at 9:14 PM Eron Wright < > > > ewri...@streamnative.io > > > > > >> > .invalid> <ewri...@streamnative.io.invalid> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Thanks Piotr for bringing this up. I reflected on this and I > > > agree > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > should expose idleness, otherwise a multi-stage flow could > > stall. > > > > > >> > > > > > > >> > Regarding the latency markers, I don't see an immediate need > for > > > > > >> > propagating them, because they serve to estimate latency > within > > a > > > > > >> > > > > > > >> > pipeline, > > > > > >> > > > > > > >> > not across pipelines. One would probably need to enhance the > > > source > > > > > >> > interface also to do e2e latency. Seems we agree this aspect > is > > > out > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > scope. > > > > > >> > > > > > > >> > I took a look at the code to get a sense of how to accomplish > > > this. > > > > > >> > > > > > > >> > The > > > > > >> > > > > > > >> > gist is a new `markIdle` method on the `StreamOperator` > > interface, > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > called when the stream status maintainer (the `OperatorChain`) > > > > > >> > > > > > > >> > transitions > > > > > >> > > > > > > >> > to idle state. Then, a new `markIdle` method on the > > > `SinkFunction` > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > `SinkWriter` that is called by the respective operators. > Note > > > that > > > > > >> > StreamStatus is an internal class. > > > > > >> > > > > > > >> > Here's a draft PR (based on the existing PR of FLINK-22700) to > > > > > >> > > > > > > >> > highlight > > > > > >> > > > > > > >> > this new aspect: > > > https://github.com/streamnative/flink/pull/2/files > > > > > >> > > > > > > >> > Please let me know if you'd like me to proceed to update the > > FLIP > > > > > >> > > > > > > >> > with > > > > > >> > > > > > > >> > these details. > > > > > >> > > > > > > >> > Thanks again, > > > > > >> > Eron > > > > > >> > > > > > > >> > On Thu, Jun 3, 2021 at 7:56 AM Piotr Nowojski < > > > pnowoj...@apache.org > > > > > > > > > > < > > > > > >> pnowoj...@apache.org> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Hi, > > > > > >> > > > > > > >> > Sorry for chipping in late in the discussion, but I would > second > > > > > >> > > > > > > >> > this > > > > > >> > > > > > > >> > point > > > > > >> > > > > > > >> > from Arvid: > > > > > >> > > > > > > >> > > > > > > >> > 4. Potentially, StreamStatus and LatencyMarker would also need > > to > > > > > >> > > > > > > >> > be > > > > > >> > > > > > > >> > encoded. > > > > > >> > > > > > > >> > It seems like this point was asked, but not followed? Or did I > > > miss > > > > > >> > > > > > > >> > it? > > > > > >> > > > > > > >> > Especially the StreamStatus part. For me it sounds like > exposing > > > > > >> > > > > > > >> > watermarks > > > > > >> > > > > > > >> > without letting the sink know that the stream can be idle is > an > > > > > >> > > > > > > >> > incomplete > > > > > >> > > > > > > >> > feature and can be very problematic/confusing for potential > > users. > > > > > >> > > > > > > >> > Best, > > > > > >> > Piotrek > > > > > >> > > > > > > >> > pon., 31 maj 2021 o 08:34 Arvid Heise <ar...@apache.org> < > > > > > >> ar...@apache.org> > > > > > >> > > > > > > >> > napisał(a): > > > > > >> > > > > > > >> > Afaik everyone can start a [VOTE] thread [1]. For example, > here > > a > > > > > >> > non-committer started a successful thread [2]. > > > > > >> > If you start it, I can already cast a binding vote and we just > > > > > >> > > > > > > >> > need 2 > > > > > >> > > > > > > >> > more > > > > > >> > > > > > > >> > for the FLIP to be accepted. > > > > > >> > > > > > > >> > [1] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026#FlinkBylaws-Voting > > > > > >> > > > > > > >> > [2] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Deprecating-Mesos-support-td50142.html > > > > > >> > > > > > > >> > On Fri, May 28, 2021 at 8:17 PM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > .invalid> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Arvid, > > > > > >> > Thanks for the feedback. I investigated the japicmp > > > > > >> > > > > > > >> > configuration, > > > > > >> > > > > > > >> > and I > > > > > >> > > > > > > >> > see that SinkWriter is marked Experimental (not Public or > > > > > >> > > > > > > >> > PublicEvolving). > > > > > >> > > > > > > >> > I think this means that SinkWriter need not be excluded. As > you > > > > > >> > > > > > > >> > mentioned, > > > > > >> > > > > > > >> > SinkFunction is already excluded. I've updated the FLIP with > an > > > > > >> > explanation. > > > > > >> > > > > > > >> > I believe all issues are resolved. May we proceed to a vote > > now? > > > > > >> > > > > > > >> > And > > > > > >> > > > > > > >> > are > > > > > >> > > > > > > >> > you able to drive the vote process? > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Eron > > > > > >> > > > > > > >> > > > > > > >> > On Fri, May 28, 2021 at 4:40 AM Arvid Heise <ar...@apache.org > > > > < > > > > > >> ar...@apache.org> > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > 1. fair point. It still feels odd to have writeWatermark in > the > > > > > >> > SinkFunction (it's supposed to be functional as you > mentioned), > > > > > >> > > > > > > >> > but I > > > > > >> > > > > > > >> > agree > > > > > >> > > > > > > >> > that invokeWatermark is not better. So unless someone has a > > > > > >> > > > > > > >> > better > > > > > >> > > > > > > >> > idea, > > > > > >> > > > > > > >> > I'm fine with it. > > > > > >> > 2.+3. I tried to come up with scenarios for a longer time. In > > > > > >> > > > > > > >> > general, > > > > > >> > > > > > > >> > it > > > > > >> > > > > > > >> > seems as if the new SinkWriter interface encourages more > > > > > >> > > > > > > >> > injection > > > > > >> > > > > > > >> > (see > > > > > >> > > > > > > >> > processing time service in InitContext), such that the need > for > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > context > > > > > >> > > > > > > >> > is really just context information of that particular record > and > > > > > >> > > > > > > >> > I > > > > > >> > > > > > > >> > don't > > > > > >> > > > > > > >> > see any use beyond timestamp and watermark. For SinkFunction, > > I'd > > > > > >> > > > > > > >> > not > > > > > >> > > > > > > >> > over-engineer as it's going to be deprecated soonish. So +1 to > > > > > >> > > > > > > >> > leave > > > > > >> > > > > > > >> > it > > > > > >> > > > > > > >> > out. > > > > > >> > 4. Okay so I double-checked: from an execution perspective, it > > > > > >> > > > > > > >> > works. > > > > > >> > > > > > > >> > However, japicmp would definitely complain. I propose to add > it > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > compatibility section like this. We need to add an exception > to > > > > > >> > > > > > > >> > SinkWriter > > > > > >> > > > > > > >> > then. (SinkFunction is already on the exception list) > > > > > >> > 5.+6. Awesome, I was also sure but wanted to double check. > > > > > >> > > > > > > >> > Best, > > > > > >> > > > > > > >> > Arvid > > > > > >> > > > > > > >> > > > > > > >> > On Wed, May 26, 2021 at 7:29 PM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > .invalid> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Arvid, > > > > > >> > > > > > > >> > 1. I assume that the method name `invoke` stems from > > > > > >> > > > > > > >> > considering > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > SinkFunction to be a functional interface, but is otherwise > > > > > >> > > > > > > >> > meaningless. > > > > > >> > > > > > > >> > Keeping it as `writeWatermark` does keep it symmetric with > > > > > >> > > > > > > >> > SinkWriter. > > > > > >> > > > > > > >> > My > > > > > >> > > > > > > >> > vote is to leave it. You decide. > > > > > >> > > > > > > >> > 2+3. I too considered adding a `WatermarkContext`, but it > would > > > > > >> > > > > > > >> > merely > > > > > >> > > > > > > >> > be a > > > > > >> > > > > > > >> > placeholder. I don't anticipate any context info in future. > > > > > >> > > > > > > >> > As > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > see > > > > > >> > > > > > > >> > with invoke, it is possible to add a context later in a > > > > > >> > backwards-compatible way. My vote is to not introduce a > > > > > >> > > > > > > >> > context. > > > > > >> > > > > > > >> > You > > > > > >> > > > > > > >> > decide. > > > > > >> > > > > > > >> > 4. No anticipated compatibility issues. > > > > > >> > > > > > > >> > 5. Short answer, it works as expected. The new methods are > > > > > >> > > > > > > >> > invoked > > > > > >> > > > > > > >> > whenever the underlying operator receives a watermark. I do > > > > > >> > > > > > > >> > believe > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > batch and ingestion time applications receive watermarks. > Seems > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > programming model is more unified in that respect since 1.12 > > > > > >> > > > > > > >> > (FLIP-134). > > > > > >> > > > > > > >> > 6. The failure behavior is the same as for elements. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Eron > > > > > >> > > > > > > >> > On Tue, May 25, 2021 at 12:42 PM Arvid Heise < > ar...@apache.org > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > I think the FLIP is crisp and mostly good to go. Some smaller > > > > > >> > things/questions: > > > > > >> > > > > > > >> > 1. SinkFunction#writeWatermark could be named > > > > > >> > SinkFunction#invokeWatermark or invokeOnWatermark to keep > > > > > >> > > > > > > >> > it > > > > > >> > > > > > > >> > symmetric. > > > > > >> > > > > > > >> > 2. We could add the context parameter to both. For > > > > > >> > > > > > > >> > SinkWriter#Context, > > > > > >> > > > > > > >> > we currently do not gain much. SinkFunction#Context also > > > > > >> > > > > > > >> > exposes > > > > > >> > > > > > > >> > processing > > > > > >> > time, which may or may not be handy and is currently > > > > > >> > > > > > > >> > mostly > > > > > >> > > > > > > >> > used > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > StreamingFileSink bucket policies. We may add that > > > > > >> > > > > > > >> > processing > > > > > >> > > > > > > >> > time > > > > > >> > > > > > > >> > flag > > > > > >> > > > > > > >> > also to SinkWriter#Context in the future. > > > > > >> > 3. Alternatively, we could also add a different context > > > > > >> > > > > > > >> > parameter > > > > > >> > > > > > > >> > just > > > > > >> > > > > > > >> > to keep the API stable while allowing additional > > > > > >> > > > > > > >> > information > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > be > > > > > >> > > > > > > >> > passed > > > > > >> > in the future. > > > > > >> > 4. Would we run into any compatibility issue if we use > > > > > >> > > > > > > >> > Flink > > > > > >> > > > > > > >> > 1.13 > > > > > >> > > > > > > >> > source > > > > > >> > > > > > > >> > in Flink 1.14 (with this FLIP) or vice versa? > > > > > >> > 5. What happens with sinks that use the new methods in > > > > > >> > > > > > > >> > applications > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > do not have watermarks (batch mode, processing time)? Does > > > > > >> > > > > > > >> > this > > > > > >> > > > > > > >> > also > > > > > >> > > > > > > >> > work > > > > > >> > with ingestion time sufficiently? > > > > > >> > 6. How do exactly once sinks deal with written watermarks > > > > > >> > > > > > > >> > in > > > > > >> > > > > > > >> > case > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > failure? I guess it's the same as normal records. (Either > > > > > >> > > > > > > >> > rollback > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > transaction or deduplication on resumption) > > > > > >> > > > > > > >> > Best, > > > > > >> > > > > > > >> > Arvid > > > > > >> > > > > > > >> > On Tue, May 25, 2021 at 6:44 PM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > .invalid> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Does anyone have further comment on FLIP-167? > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-167%3A+Watermarks+for+Sink+API > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Eron > > > > > >> > > > > > > >> > > > > > > >> > On Thu, May 20, 2021 at 5:02 PM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Filed FLIP-167: Watermarks for Sink API: > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-167%3A+Watermarks+for+Sink+API > > > > > >> > > > > > > >> > I'd like to call a vote next week, is that reasonable? > > > > > >> > > > > > > >> > > > > > > >> > On Wed, May 19, 2021 at 6:28 PM Zhou, Brian < > > > > > >> > > > > > > >> > b.z...@dell.com > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > Hi Arvid and Eron, > > > > > >> > > > > > > >> > Thanks for the discussion and I read through Eron's pull > > > > > >> > > > > > > >> > request > > > > > >> > > > > > > >> > and I > > > > > >> > > > > > > >> > think this can benefit Pravega Flink connector as well. > > > > > >> > > > > > > >> > Here is some background. Pravega had the watermark > > > > > >> > > > > > > >> > concept > > > > > >> > > > > > > >> > through > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > event stream since two years ago, and here is a blog > > > > > >> > > > > > > >> > introduction[1] > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > Pravega watermark. > > > > > >> > Pravega Flink connector also had this watermark > > > > > >> > > > > > > >> > integration > > > > > >> > > > > > > >> > last > > > > > >> > > > > > > >> > year > > > > > >> > > > > > > >> > that we wanted to propagate the Flink watermark to > > > > > >> > > > > > > >> > Pravega > > > > > >> > > > > > > >> > in > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > SinkFunction, and at that time we just used the existing > > > > > >> > > > > > > >> > Flink > > > > > >> > > > > > > >> > API > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > keep the last watermark in memory and check if watermark > > > > > >> > > > > > > >> > changes > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > each > > > > > >> > > > > > > >> > event[2] which is not efficient. With such new > > > > > >> > > > > > > >> > interface, > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > can > > > > > >> > > > > > > >> > also > > > > > >> > > > > > > >> > manage the watermark propagation much more easily. > > > > > >> > > > > > > >> > [1] > > > > > >> > > > > > > >> > > > https://pravega.io/blog/2019/11/08/pravega-watermarking-support/ > > > > > >> > > > > > > >> > [2] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://github.com/pravega/flink-connectors/blob/master/src/main/java/io/pravega/connectors/flink/FlinkPravegaWriter.java#L465 > > > > > >> > > > > > > >> > -----Original Message----- > > > > > >> > From: Arvid Heise <ar...@apache.org> <ar...@apache.org> > > > > > >> > Sent: Wednesday, May 19, 2021 16:06 > > > > > >> > To: dev > > > > > >> > Subject: Re: [DISCUSS] Watermark propagation with Sink > > > > > >> > > > > > > >> > API > > > > > >> > > > > > > >> > [EXTERNAL EMAIL] > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > Thanks for pushing that topic. I can now see that the > > > > > >> > > > > > > >> > benefit > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > even > > > > > >> > > > > > > >> > bigger than I initially thought. So it's worthwhile > > > > > >> > > > > > > >> > anyways > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > include > > > > > >> > > > > > > >> > that. > > > > > >> > > > > > > >> > I also briefly thought about exposing watermarks to all > > > > > >> > > > > > > >> > UDFs, > > > > > >> > > > > > > >> > but > > > > > >> > > > > > > >> > here I > > > > > >> > > > > > > >> > really have an issue to see specific use cases. Could > > > > > >> > > > > > > >> > you > > > > > >> > > > > > > >> > maybe > > > > > >> > > > > > > >> > take a > > > > > >> > > > > > > >> > few > > > > > >> > > > > > > >> > minutes to think about it as well? I could only see > > > > > >> > > > > > > >> > someone > > > > > >> > > > > > > >> > misusing > > > > > >> > > > > > > >> > Async > > > > > >> > > > > > > >> > IO as a sink where a real sink would be more > > > > > >> > > > > > > >> > appropriate. > > > > > >> > > > > > > >> > In > > > > > >> > > > > > > >> > general, > > > > > >> > > > > > > >> > if > > > > > >> > > > > > > >> > there is not a clear use case, we shouldn't add the > > > > > >> > > > > > > >> > functionality > > > > > >> > > > > > > >> > as > > > > > >> > > > > > > >> > it's > > > > > >> > > > > > > >> > just increased maintenance for no value. > > > > > >> > > > > > > >> > If we stick to the plan, I think your PR is already in a > > > > > >> > > > > > > >> > good > > > > > >> > > > > > > >> > shape. > > > > > >> > > > > > > >> > We > > > > > >> > > > > > > >> > need to create a FLIP for it though, since it changes > > > > > >> > > > > > > >> > Public > > > > > >> > > > > > > >> > interfaces > > > > > >> > > > > > > >> > [1]. I was initially not convinced that we should also > > > > > >> > > > > > > >> > change > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > old > > > > > >> > > > > > > >> > SinkFunction interface, but seeing how little the change > > > > > >> > > > > > > >> > is, I > > > > > >> > > > > > > >> > wouldn't > > > > > >> > > > > > > >> > mind at all to increase consistency. Only when we wrote > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > FLIP > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > approved it (which should be minimal and fast), we > > > > > >> > > > > > > >> > should > > > > > >> > > > > > > >> > actually > > > > > >> > > > > > > >> > look > > > > > >> > > > > > > >> > at > > > > > >> > > > > > > >> > the PR ;). > > > > > >> > > > > > > >> > The only thing which I would improve is the name of the > > > > > >> > > > > > > >> > function. > > > > > >> > > > > > > >> > processWatermark sounds as if the sink implementer > > > > > >> > > > > > > >> > really > > > > > >> > > > > > > >> > needs > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > implement it (as you would need to do it on a custom > > > > > >> > > > > > > >> > operator). > > > > > >> > > > > > > >> > I > > > > > >> > > > > > > >> > would > > > > > >> > > > > > > >> > make them symmetric to the record writing/invoking > > > > > >> > > > > > > >> > method > > > > > >> > > > > > > >> > (e.g. > > > > > >> > > > > > > >> > writeWatermark and invokeWatermark). > > > > > >> > > > > > > >> > As a follow-up PR, we should then migrate KafkaShuffle > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > new > > > > > >> > > > > > > >> > API. > > > > > >> > > > > > > >> > But that's something I can do. > > > > > >> > > > > > > >> > [1] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/Flink*Improvement*Proposals__;Kys!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMnp6nc7o$ > > > > > >> > > > > > > >> > [cwiki[.]apache[.]org] > > > > > >> > > > > > > >> > On Wed, May 19, 2021 at 3:34 AM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > .invalid> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Update: opened an issue and a PR. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/FLIN > > > > > >> > > > > > > >> > > > > > > K-22700__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dM > > > > > >> > > > > > > >> > plbgRO4$ [issues[.]apache[.]org] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > https://urldefense.com/v3/__https://github.com/apache/flink/pull/15950 > > > > > >> > > > > > > >> > > > > > > __;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMtScmG7a > > > > > >> > > > > > > >> > $ [github[.]com] > > > > > >> > > > > > > >> > > > > > > >> > On Tue, May 18, 2021 at 10:03 AM Eron Wright < > > > > > >> > > > > > > >> > ewri...@streamnative.io > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Thanks Arvid and David for sharing your ideas on > > > > > >> > > > > > > >> > this > > > > > >> > > > > > > >> > subject. > > > > > >> > > > > > > >> > I'm > > > > > >> > > > > > > >> > glad to hear that you're seeing use cases for > > > > > >> > > > > > > >> > watermark > > > > > >> > > > > > > >> > propagation > > > > > >> > > > > > > >> > via an enhanced sink interface. > > > > > >> > > > > > > >> > As you've guessed, my interest is in Pulsar and am > > > > > >> > > > > > > >> > exploring > > > > > >> > > > > > > >> > some > > > > > >> > > > > > > >> > options for brokering watermarks across stream > > > > > >> > > > > > > >> > processing > > > > > >> > > > > > > >> > pipelines. > > > > > >> > > > > > > >> > I think > > > > > >> > > > > > > >> > Arvid > > > > > >> > > > > > > >> > is speaking to a high-fidelity solution where the > > > > > >> > > > > > > >> > difference > > > > > >> > > > > > > >> > between > > > > > >> > > > > > > >> > intra- > > > > > >> > > > > > > >> > and inter-pipeline flow is eliminated. My goal is > > > > > >> > > > > > > >> > more > > > > > >> > > > > > > >> > limited; I > > > > > >> > > > > > > >> > want > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > write the watermark that arrives at the sink to > > > > > >> > > > > > > >> > Pulsar. > > > > > >> > > > > > > >> > Simply > > > > > >> > > > > > > >> > imagine that Pulsar has native support for > > > > > >> > > > > > > >> > watermarking > > > > > >> > > > > > > >> > in > > > > > >> > > > > > > >> > its > > > > > >> > > > > > > >> > producer/consumer API, and we'll leave the details > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > another > > > > > >> > > > > > > >> > forum. > > > > > >> > > > > > > >> > David, I like your invariant. I see lateness as > > > > > >> > > > > > > >> > stemming > > > > > >> > > > > > > >> > from > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > problem > > > > > >> > > > > > > >> > domain and from system dynamics (e.g. scheduling, > > > > > >> > > > > > > >> > batching, > > > > > >> > > > > > > >> > lag). > > > > > >> > > > > > > >> > When > > > > > >> > > > > > > >> > one > > > > > >> > > > > > > >> > depends on order-of-observation to generate > > > > > >> > > > > > > >> > watermarks, > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > app > > > > > >> > > > > > > >> > may > > > > > >> > > > > > > >> > become > > > > > >> > > > > > > >> > unduly sensitive to dynamics which bear on > > > > > >> > > > > > > >> > order-of-observation. > > > > > >> > > > > > > >> > My > > > > > >> > > > > > > >> > goal is to factor out the system dynamics from > > > > > >> > > > > > > >> > lateness > > > > > >> > > > > > > >> > determination. > > > > > >> > > > > > > >> > Arvid, to be most valuable (at least for my > > > > > >> > > > > > > >> > purposes) > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > enhancement is needed on SinkFunction. This will > > > > > >> > > > > > > >> > allow > > > > > >> > > > > > > >> > us > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > easily > > > > > >> > > > > > > >> > evolve the existing Pulsar connector. > > > > > >> > > > > > > >> > Next step, I will open a PR to advance the > > > > > >> > > > > > > >> > conversation. > > > > > >> > > > > > > >> > Eron > > > > > >> > > > > > > >> > On Tue, May 18, 2021 at 5:06 AM David Morávek< > > > > david.mora...@gmail.com > > > > > > > > > > > >> <david.mora...@gmail.com> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > Thanks for starting this discussion. I've been > > > > > >> > > > > > > >> > thinking > > > > > >> > > > > > > >> > about > > > > > >> > > > > > > >> > this > > > > > >> > > > > > > >> > recently as we've run into "watermark related" > > > > > >> > > > > > > >> > issues, > > > > > >> > > > > > > >> > when > > > > > >> > > > > > > >> > chaining multiple pipelines together. My to cents > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > discussion: > > > > > >> > > > > > > >> > How I like to think about the problem, is that > > > > > >> > > > > > > >> > there > > > > > >> > > > > > > >> > should > > > > > >> > > > > > > >> > an > > > > > >> > > > > > > >> > invariant that holds for any stream processing > > > > > >> > > > > > > >> > pipeline: > > > > > >> > > > > > > >> > "NON_LATE > > > > > >> > > > > > > >> > element > > > > > >> > > > > > > >> > entering > > > > > >> > > > > > > >> > the system, should never become LATE" > > > > > >> > > > > > > >> > Unfortunately this is exactly what happens in > > > > > >> > > > > > > >> > downstream > > > > > >> > > > > > > >> > pipelines, > > > > > >> > > > > > > >> > because the upstream one can: > > > > > >> > - break ordering (especially with higher > > > > > >> > > > > > > >> > parallelism) > > > > > >> > > > > > > >> > - emit elements that are ahead of output watermark > > > > > >> > > > > > > >> > There is not enough information to re-construct > > > > > >> > > > > > > >> > upstream > > > > > >> > > > > > > >> > watermark > > > > > >> > > > > > > >> > in latter stages (it's always just an estimate > > > > > >> > > > > > > >> > based > > > > > >> > > > > > > >> > on > > > > > >> > > > > > > >> > previous > > > > > >> > > > > > > >> > pipeline's output). > > > > > >> > > > > > > >> > It would be great, if we could have a general > > > > > >> > > > > > > >> > abstraction, > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > reusable for various sources / sinks (not just > > > > > >> > > > > > > >> > Kafka > > > > > >> > > > > > > >> > / > > > > > >> > > > > > > >> > Pulsar, > > > > > >> > > > > > > >> > thought this would probably cover most of the > > > > > >> > > > > > > >> > use-cases) > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > systems. > > > > > >> > > > > > > >> > Is there any other use-case then sharing watermark > > > > > >> > > > > > > >> > between > > > > > >> > > > > > > >> > pipelines, > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > you're trying to solve? > > > > > >> > > > > > > >> > Arvid: > > > > > >> > > > > > > >> > 1. Watermarks are closely coupled to the used > > > > > >> > > > > > > >> > system > > > > > >> > > > > > > >> > (=Flink). > > > > > >> > > > > > > >> > I > > > > > >> > > > > > > >> > have a > > > > > >> > > > > > > >> > hard time imagining that it's useful to use a > > > > > >> > > > > > > >> > different > > > > > >> > > > > > > >> > stream > > > > > >> > > > > > > >> > processor > > > > > >> > > > > > > >> > downstream. So for now, I'm assuming that both > > > > > >> > > > > > > >> > upstream > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > downstream > > > > > >> > > > > > > >> > are > > > > > >> > > > > > > >> > Flink applications. In that case, we probably > > > > > >> > > > > > > >> > define > > > > > >> > > > > > > >> > both > > > > > >> > > > > > > >> > parts > > > > > >> > > > > > > >> > of the pipeline in the same Flink job similar to > > > > > >> > > > > > > >> > KafkaStream's > > > > > >> > > > > > > >> > #through. > > > > > >> > > > > > > >> > I'd slightly disagree here. For example we're > > > > > >> > > > > > > >> > "materializing" > > > > > >> > > > > > > >> > change-logs > > > > > >> > > > > > > >> > produced by Flink pipeline into serving layer > > > > > >> > > > > > > >> > (random > > > > > >> > > > > > > >> > access > > > > > >> > > > > > > >> > db / > > > > > >> > > > > > > >> > in memory view / ..) and we need to know, whether > > > > > >> > > > > > > >> > responses > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > serve meet the "freshness" requirements (eg. you > > > > > >> > > > > > > >> > may > > > > > >> > > > > > > >> > want > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > respond differently, when watermark is lagging way > > > > > >> > > > > > > >> > too > > > > > >> > > > > > > >> > much > > > > > >> > > > > > > >> > behind > > > > > >> > > > > > > >> > processing time). Also not > > > > > >> > > > > > > >> > every > > > > > >> > > > > > > >> > stream processor in the pipeline needs to be Flink. > > > > > >> > > > > > > >> > It > > > > > >> > > > > > > >> > can > > > > > >> > > > > > > >> > as > > > > > >> > > > > > > >> > well > > > > > >> > > > > > > >> > be a simple element-wise transformation that reads > > > > > >> > > > > > > >> > from > > > > > >> > > > > > > >> > Kafka > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > writes back into separate topic (that's what we do > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > example > > > > > >> > > > > > > >> > with > > > > > >> > > > > > > >> > ML models, that have special hardware > > > > > >> > > > > > > >> > requirements). > > > > > >> > > > > > > >> > Best, > > > > > >> > D. > > > > > >> > > > > > > >> > > > > > > >> > On Tue, May 18, 2021 at 8:30 AM Arvid Heise < > > > > > >> > > > > > > >> > ar...@apache.org> > > > > > >> > > > > > > >> > wrote: > > > > > >> > > > > > > >> > Hi Eron, > > > > > >> > > > > > > >> > I think this is a useful addition for storage > > > > > >> > > > > > > >> > systems > > > > > >> > > > > > > >> > that > > > > > >> > > > > > > >> > act > > > > > >> > > > > > > >> > as > > > > > >> > > > > > > >> > pass-through for Flink to reduce recovery time. > > > > > >> > > > > > > >> > It > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > only > > > > > >> > > > > > > >> > useful > > > > > >> > > > > > > >> > if > > > > > >> > > > > > > >> > you > > > > > >> > > > > > > >> > combine it with regional fail-over as only a > > > > > >> > > > > > > >> > small > > > > > >> > > > > > > >> > part > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > pipeline > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > restarted. > > > > > >> > > > > > > >> > A couple of thoughts on the implications: > > > > > >> > 1. Watermarks are closely coupled to the used > > > > > >> > > > > > > >> > system > > > > > >> > > > > > > >> > (=Flink). > > > > > >> > > > > > > >> > I > > > > > >> > > > > > > >> > have > > > > > >> > > > > > > >> > a > > > > > >> > > > > > > >> > hard time imagining that it's useful to use a > > > > > >> > > > > > > >> > different > > > > > >> > > > > > > >> > stream > > > > > >> > > > > > > >> > processor > > > > > >> > > > > > > >> > downstream. So for now, I'm assuming that both > > > > > >> > > > > > > >> > upstream > > > > > >> > > > > > > >> > and > > > > > >> > > > > > > >> > downstream > > > > > >> > > > > > > >> > are > > > > > >> > > > > > > >> > Flink applications. In that case, we probably > > > > > >> > > > > > > >> > define > > > > > >> > > > > > > >> > both > > > > > >> > > > > > > >> > parts > > > > > >> > > > > > > >> > of the pipeline in the same Flink job similar to > > > > > >> > > > > > > >> > KafkaStream's > > > > > >> > > > > > > >> > #through. > > > > > >> > > > > > > >> > 2. The schema of the respective intermediate > > > > > >> > > > > > > >> > stream/topic > > > > > >> > > > > > > >> > would > > > > > >> > > > > > > >> > need > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > be > > > > > >> > > > > > > >> > managed by Flink to encode both records and > > > > > >> > > > > > > >> > watermarks. > > > > > >> > > > > > > >> > This > > > > > >> > > > > > > >> > reduces > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > usability quite a bit and needs to be carefully > > > > > >> > > > > > > >> > crafted. > > > > > >> > > > > > > >> > 3. It's not clear to me if constructs like > > > > > >> > > > > > > >> > SchemaRegistry > > > > > >> > > > > > > >> > can > > > > > >> > > > > > > >> > be > > > > > >> > > > > > > >> > properly > > > > > >> > > > > > > >> > supported (and also if they should be supported) > > > > > >> > > > > > > >> > in > > > > > >> > > > > > > >> > terms > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > schema evolution. > > > > > >> > 4. Potentially, StreamStatus and LatencyMarker > > > > > >> > > > > > > >> > would > > > > > >> > > > > > > >> > also > > > > > >> > > > > > > >> > need > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > be encoded. > > > > > >> > 5. It's important to have some way to transport > > > > > >> > > > > > > >> > backpressure > > > > > >> > > > > > > >> > from > > > > > >> > > > > > > >> > the downstream to the upstream. Or else you would > > > > > >> > > > > > > >> > have > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > same > > > > > >> > > > > > > >> > issue as KafkaStreams where two separate > > > > > >> > > > > > > >> > pipelines > > > > > >> > > > > > > >> > can > > > > > >> > > > > > > >> > drift > > > > > >> > > > > > > >> > so > > > > > >> > > > > > > >> > far away that > > > > > >> > > > > > > >> > you > > > > > >> > > > > > > >> > experience data loss if the data retention period > > > > > >> > > > > > > >> > is > > > > > >> > > > > > > >> > smaller > > > > > >> > > > > > > >> > than > > > > > >> > > > > > > >> > the drift. > > > > > >> > 6. It's clear that you trade a huge chunk of > > > > > >> > > > > > > >> > throughput > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > lower > > > > > >> > > > > > > >> > overall > > > > > >> > > > > > > >> > latency in case of failure. So it's an > > > > > >> > > > > > > >> > interesting > > > > > >> > > > > > > >> > feature > > > > > >> > > > > > > >> > for > > > > > >> > > > > > > >> > use > > > > > >> > > > > > > >> > cases > > > > > >> > > > > > > >> > with SLAs. > > > > > >> > > > > > > >> > Since we are phasing out SinkFunction, I'd prefer > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > only > > > > > >> > > > > > > >> > support > > > > > >> > > > > > > >> > SinkWriter. Having a no-op default sounds good to > > > > > >> > > > > > > >> > me. > > > > > >> > > > > > > >> > We have some experimental feature for Kafka [1], > > > > > >> > > > > > > >> > which > > > > > >> > > > > > > >> > pretty > > > > > >> > > > > > > >> > much > > > > > >> > > > > > > >> > reflects > > > > > >> > > > > > > >> > your idea. Here we have an ugly workaround to be > > > > > >> > > > > > > >> > able > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > process > > > > > >> > > > > > > >> > the watermark by using a custom StreamSink task. > > > > > >> > > > > > > >> > We > > > > > >> > > > > > > >> > could > > > > > >> > > > > > > >> > also > > > > > >> > > > > > > >> > try to > > > > > >> > > > > > > >> > create a > > > > > >> > > > > > > >> > FLIP that abstracts the actual system away and > > > > > >> > > > > > > >> > then > > > > > >> > > > > > > >> > we > > > > > >> > > > > > > >> > could > > > > > >> > > > > > > >> > use > > > > > >> > > > > > > >> > the approach for both Pulsar and Kafka. > > > > > >> > > > > > > >> > [1] > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > https://urldefense.com/v3/__https://github.com/apache/flink/blob/maste > > > > > >> > > > > > > >> > > > > > > r/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flin > > > > > >> > > > > > > >> > > > > > > k/streaming/connectors/kafka/shuffle/FlinkKafkaShuffle.java*L103__;Iw! > > > > > >> > > > > > > >> > > > > !LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMvmemHrt$ > > > > > >> > > > > > > >> > [github[.]com] > > > > > >> > > > > > > >> > On Mon, May 17, 2021 at 10:44 PM Eron > > > > > >> Wright<ewri...@streamnative.io.invalid> < > ewri...@streamnative.io > > > > > .invalid> > > > > > >> wrote: > > > > > >> > > > > > > >> > > > > > > >> > I would like to propose an enhancement to the > > > > > >> > > > > > > >> > Sink > > > > > >> > > > > > > >> > API, > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > ability > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > receive upstream watermarks. I'm aware that > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > sink > > > > > >> > > > > > > >> > context > > > > > >> > > > > > > >> > provides > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > current watermark for a given record. I'd like > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > be > > > > > >> > > > > > > >> > able > > > > > >> > > > > > > >> > to > > > > > >> > > > > > > >> > write > > > > > >> > > > > > > >> > a > > > > > >> > > > > > > >> > sink > > > > > >> > > > > > > >> > function that is invoked whenever the watermark > > > > > >> > > > > > > >> > changes. > > > > > >> > > > > > > >> > Out > > > > > >> > > > > > > >> > of > > > > > >> > > > > > > >> > scope > > > > > >> > > > > > > >> > would be event-time timers (since sinks aren't > > > > > >> > > > > > > >> > keyed). > > > > > >> > > > > > > >> > For context, imagine that a stream storage > > > > > >> > > > > > > >> > system > > > > > >> > > > > > > >> > had > > > > > >> > > > > > > >> > the > > > > > >> > > > > > > >> > ability to persist watermarks in addition to > > > > > >> > > > > > > >> > ordinary > > > > > >> > > > > > > >> > elements, > > > > > >> > > > > > > >> > e.g. to serve > > > > > >> > > > > > > >> > as > > > > > >> > > > > > > >> > source watermarks in a downstream processor. > > > > > >> > > > > > > >> > Ideally > > > > > >> > > > > > > >> > one > > > > > >> > > > > > > >> > could > > > > > >> > > > > > > >> > compose a > > > > > >> > > > > > > >> > multi-stage, event-driven application, with > > > > > >> > > > > > > >> > watermarks > > > > > >> > > > > > > >> > flowing > > > > > >> > > > > > > >> > end-to-end > > > > > >> > > > > > > >> > without need for a heuristics-based watermark > > > > > >> > > > > > > >> > at > > > > > >> > > > > > > >> > each > > > > > >> > > > > > > >> > stage. > > > > > >> > > > > > > >> > The specific proposal would be a new method on > > > > > >> > > > > > > >> > `SinkFunction` > > > > > >> > > > > > > >> > and/or > > > > > >> > > > > > > >> > on > > > > > >> > > > > > > >> > `SinkWriter`, called 'processWatermark' or > > > > > >> > > > > > > >> > 'writeWatermark', > > > > > >> > > > > > > >> > with a > > > > > >> > > > > > > >> > default > > > > > >> > > > > > > >> > implementation that does nothing. > > > > > >> > > > > > > >> > Thoughts? > > > > > >> > > > > > > >> > Thanks! > > > > > >> > Eron Wright > > > > > >> > StreamNative > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me > > > > > >> > < > > > > > >> > > > > > > >> > > > > https://urldefense.com/v3/__https://calendly.com/eronwright/regular > > > > > >> > > > > > > >> > > > > -1-hour__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5 > > > > > >> > > > > > > >> > dMtQrD25c$ [calendly[.]com]> > > > > > >> > > > > > > >> > < > > > > > >> > > > > > > >> > > > > https://urldefense.com/v3/__https://github.com/streamnative__;!!LpK > > > > > >> > > > > > > >> > > I!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMnQskrSQ$ > > > > > >> > > > > > > >> > [github[.]com]> > > > > > >> > < > > > > > >> > > > > > > >> > > > > https://urldefense.com/v3/__https://www.linkedin.com/company/stream > > > > > >> > > > > > > >> > > > > native/__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5 > > > > > >> > > > > > > >> > dMqO4UZJa$ [linkedin[.]com]> > > > > > >> > < > > > > > >> > > > > > > >> > > > https://urldefense.com/v3/__https://twitter.com/streamnativeio/__ > > > > > >> > > > > > > >> > ;! > > > > > >> > > > > > > >> > > > > !LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMpbyC_rP$ > > > > > >> > > > > > > >> > [twitter[.]com]> > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me > > > > > >> > < > > > > > >> > > > > > > >> > > > > > > https://urldefense.com/v3/__https://calendly.com/eronwright/regular-1 > > > > > >> > > > > > > >> > > > > > > -hour__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMtQ > > > > > >> > > > > > > >> > rD25c$ [calendly[.]com]> > > > > > >> > > > > > > >> > < > > > > > >> > > > > > > >> > > > > > https://urldefense.com/v3/__https://github.com/streamnative__;!!LpKI > > > > > >> > > > > > > >> > ! > > > > > >> > > > > > > >> > 2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMnQskrSQ$ > > > > > >> > > > > > > >> > [github[.]com]> > > > > > >> > < > > > > > >> > > > > > > >> > > > > > > https://urldefense.com/v3/__https://www.linkedin.com/company/streamna > > > > > >> > > > > > > >> > > > > > > tive/__;!!LpKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMqO > > > > > >> > > > > > > >> > 4UZJa$ [linkedin[.]com]> > > > > > >> > < > > > > > >> > > > > > > >> > > > > > > https://urldefense.com/v3/__https://twitter.com/streamnativeio/__;!!L > > > > > >> > > > > > > >> > > > pKI!2IQYKfnjRuBgkNRxnPbJeFvTdhWjpwN0urN3m0yz_6W11H74kY5dMpbyC_rP$ > > > > > >> > > > > > > >> > [twitter[.]com]> > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me< > > > > > >> https://calendly.com/eronwright/regular-1-hour> < > > > > > >> https://calendly.com/eronwright/regular-1-hour> > > > > > >> > <https://github.com/streamnative> < > > > https://github.com/streamnative > > > > >< > > > > > >> https://www.linkedin.com/company/streamnative/> < > > > > > >> https://www.linkedin.com/company/streamnative/>< > > > > > >> https://twitter.com/streamnativeio/> < > > > > > https://twitter.com/streamnativeio/ > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me< > > > > > >> https://calendly.com/eronwright/regular-1-hour> < > > > > > >> https://calendly.com/eronwright/regular-1-hour> > > > > > >> > <https://github.com/streamnative> < > > > https://github.com/streamnative > > > > >< > > > > > >> https://www.linkedin.com/company/streamnative/> < > > > > > >> https://www.linkedin.com/company/streamnative/>< > > > > > >> https://twitter.com/streamnativeio/> < > > > > > https://twitter.com/streamnativeio/ > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me< > > > > > >> https://calendly.com/eronwright/regular-1-hour> < > > > > > >> https://calendly.com/eronwright/regular-1-hour> > > > > > >> > <https://github.com/streamnative> < > > > https://github.com/streamnative > > > > >< > > > > > >> https://www.linkedin.com/company/streamnative/> < > > > > > >> https://www.linkedin.com/company/streamnative/>< > > > > > >> https://twitter.com/streamnativeio/> < > > > > > https://twitter.com/streamnativeio/ > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me< > > > > > >> https://calendly.com/eronwright/regular-1-hour> < > > > > > >> https://calendly.com/eronwright/regular-1-hour> > > > > > >> > <https://github.com/streamnative> < > > > https://github.com/streamnative > > > > >< > > > > > >> https://www.linkedin.com/company/streamnative/> < > > > > > >> https://www.linkedin.com/company/streamnative/>< > > > > > >> https://twitter.com/streamnativeio/> < > > > > > https://twitter.com/streamnativeio/ > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > > > > > > >> > Eron Wright Cloud Engineering Lead > > > > > >> > > > > > > >> > p: +1 425 922 8617 <18163542939> > > > > > >> > streamnative.io | Meet with me< > > > > > >> https://calendly.com/eronwright/regular-1-hour> < > > > > > >> https://calendly.com/eronwright/regular-1-hour> > > > > > >> > <https://github.com/streamnative> < > > > https://github.com/streamnative > > > > >< > > > > > >> https://www.linkedin.com/company/streamnative/> < > > > > > >> https://www.linkedin.com/company/streamnative/>< > > > > > >> https://twitter.com/streamnativeio/> < > > > > > https://twitter.com/streamnativeio/ > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >