Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-03 Thread Randal Pitt
Thanks everyone for the responses. I tried out the JeMalloc suggestion from FLINK-19125 using a patched 1.11.3 image and so far it appears to working well. I see it's included in 1.12.1 and Docker images are available so I'll look at upgrading too. Best regards, Randal. -- Sent from: http://a

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi Xintong Song, Correct, we are using standalone k8s. Task managers are deployed as a statefulset so have consistent pod names. We tried using native k8s (in fact I'd prefer to) but got persistent "io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 242214695 (242413

Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Randal Pitt
Hi, We're running Flink 1.11.3 on Kubernetes. We have a job with parallelism of 10 running on 10 task managers each with 1 task slot. The job has 4 time windows with 2 different keys, 2 windows have reducers and 2 are processed by window functions. State is stored in RocksDB. We've noticed when a

Re: Event time issues when Kinesis consumer receives batched messages

2020-12-09 Thread Randal Pitt
Thanks Roman, I'll look into how I go about doing that. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Event time issues when Kinesis consumer receives batched messages

2020-12-08 Thread Randal Pitt
Hi Roman, We're using a custom watermarker that uses a histogram to calculate a "best fit" event time as the data we receive can be very unordered. As you can see we're using the timestamp from the first event in the batch, so we're essentially sampling the timestamps rather than using them all.

Event time issues when Kinesis consumer receives batched messages

2020-12-07 Thread Randal Pitt
Hi there, We're using Flink to read from a Kinesis stream. The stream contains messages that themselves contain lists of events and we want our Flink jobs (using the event time characteristic) to process those events individually. We have this working using flatMap in the DataStream but we're havi