Is there anything strictly special about sink functions?

2020-01-24 Thread Andrew Roberts
Hello, I’m trying to push some behavior that we’ve currently got in a large, stateful SinkFunction implementation into Flink’s windowing system. The task at hand is similar to what StreamingFileSink provides, but more flexible. I don’t want to re-implement that sink, because it uses the Stream

Re: Is there anything strictly special about sink functions?

2020-01-29 Thread Andrew Roberts
0 at 6:43 AM Konstantin Knauf >>> wrote: >>> Hi Andrew, >>> >>> as far as I know there is nothing particularly special about the sink in >>> terms of how it handles state or time. You can not leave the pipeline >>> "unfinished", on

ProcessingTimeSessionWindows and many other windowing pieces are built around Object

2018-08-17 Thread Andrew Roberts
I’m exploring moving some “manual” state management into Flink-managed state via Flink’s windowing paradigms, and I’m running into the surprise that many pieces of the windowing architecture require the stream be upcast to Object (AnyRef in scala). Is there a technical reason for this? I’m curre

Metrics for number of "open windows"?

2019-02-19 Thread Andrew Roberts
Hello, I’m trying to track the number of currently-in-state windows in a keyed, windowed stream (stream.keyBy(…).window(…).trigger(…).process(…)) using Flink metrics. Are there any built in? Or any good approaches for collecting this data? Thanks, Andrew -- *Confidentiality Notice: The infor

Flink window triggering and timing on connected streams

2019-02-25 Thread Andrew Roberts
Hello, I’m trying to implement session windows over a set of connected streams (event time), with some custom triggering behavior. Essentially, I allow very long session gaps, but I have an “end session” event that I want to cause the window to fire and purge. I’m assigning timestamps and water

Re: Flink window triggering and timing on connected streams

2019-02-25 Thread Andrew Roberts
watermark is for, and > > that in connected streams, the lowest (earliest) watermark of the input > > streams is what is seen as the watermark downstream. > Yes, and we can make use of this to make window fires only on 'end session' > event using the solution above. >

Flink parallel subtask affinity of taskmanager

2019-03-04 Thread Andrew Roberts
Hello, We run flink as a standalone cluster. When moving from flink 1.3 to 1.6, we noticed a change in the scheduling behavior. Where previously parallel subtasks of a job seemed to be round-robin allocated around our cluster, flink 1.6 appears to want to deploy as many subtasks to the same hos

Understanding timestamp and watermark assignment errors

2019-03-07 Thread Andrew Roberts
Hello, I’m trying to convert some of our larger stateful computations into something that aligns more with the Flink windowing framework, and particularly, start using “event time” instead of “ingest time” as a time characteristics. My data is coming in from Kafka (0.8.2.2, using the out-of-the

Re: Understanding timestamp and watermark assignment errors

2019-03-08 Thread Andrew Roberts
-8836. > > Cheers, > > Konstantin > >> On Thu, Mar 7, 2019 at 5:52 PM Andrew Roberts wrote: >> Hello, >> >> I’m trying to convert some of our larger stateful computations into >> something that aligns more with the Flink windowing framework, and >>

Parallelism and stateful mapping with Flink

2016-12-07 Thread Andrew Roberts
Hello, I’m trying to perform a stateful mapping of some objects coming in from Kafka in a parallelized flink job (set on the job using env.setParallelism(3)). The data source is a kafka topic, but the partitions aren’t meaningfully keyed for this operation (each kafka message is flatMapped to b

Re: Parallelism and stateful mapping with Flink

2016-12-07 Thread Andrew Roberts
:28 AM, Stefan Richter > wrote: > > Hi, > > could you maybe provide the (minimal) code for the problematic job? Also, are > you sure that the keyBy is working on the correct key attribute? > > Best, > Stefan > >> Am 07.12.2016 um 15:57 schrieb Andrew Ro

Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?

2017-01-11 Thread Andrew Roberts
Hello, I’m trying to understand the guarantees made by Flink’s Elasticsearch sink in terms of message delivery. according to (1), the ES sink offers at-least-once guarantees. This page doesn’t differentiate between flink-elasticsearch and flink-elasticsearch2, so I have to assume for the moment

Re: Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?

2017-01-16 Thread Andrew Roberts
ry soon too. > > Cheers, > Gordon > > > > > On January 11, 2017 at 3:49:06 PM, Andrew Roberts (arobe...@fuze.com > <mailto:arobe...@fuze.com>) wrote: > >> I’m trying to understand the guarantees made by Flink’s Elasticsearch sink >> in terms of mes

Job-level close()?

2017-12-15 Thread Andrew Roberts
Hello, I’m writing a Flink operator that connects to a database, and running it in parallel results in issues due to the singleton nature of the connection pool in the library I’m working with. The operator needs to close the connection pool when it’s done, but only when ALL parallel instances

Capturing the exception that leads to a job entering the FAILED state

2017-09-07 Thread Andrew Roberts
Hello, I’m trying to connect our Flink deployment to our error monitor tool, and I’m struggling to find an entry point for capturing that exception. I’ve been poking around a little bit of the source, but I can’t seem to connect anything I’ve found to the job submission API we’re using (`env.ex