Re: Trouble with large state

2020-06-22 Thread Jeff Henrikson
ends up resolving both my symptom of slow/failing checkpoints, and also my symptom of crashes during long runs. Many thanks! Jeff On 6/20/20 11:46 AM, Jeff Henrikson wrote: Bhaskar, > Glad to know some progress. Yeah, some progress.  Yet overnight run didn't look as good as I hop

Re: Trouble with large state

2020-06-20 Thread Jeff Henrikson
s. For example you can keep a stateless scan as separate flink job and keep its output in some Kafka kind of store. From there you start your stateful joins. This would help focussing on your stateful job in much better fashion Regards Bhaskar On Sat, Jun 20, 2020 at 4:49 AM Jeff Henrik

Re: Trouble with large state

2020-06-19 Thread Jeff Henrikson
processElement(x: T, ctx: ProcessFunction[T, T]#Context, out: Collector[T]): Unit = { increment() if(eventsPerSecMax > 0 && eps() > eventsPerSecMax) { Thread.sleep(1000L) } out.collect(x) } } On 6/19/20 9:16 AM, Jeff Henrikson wrote: Bhaskar, Thank you for y

Re: Trouble with large state

2020-06-19 Thread Jeff Henrikson
I use all the same parameters except for triggering on every event. So it looks worse not better. Thanks again, Jeff Henrikson On 6/18/20 11:21 PM, Vijay Bhaskar wrote: Thanks for the reply. I want to discuss more on points (1) and (2) If we take care of them  rest will be good Coming t

Re: Trouble with large state

2020-06-18 Thread Jeff Henrikson
e initial launch of my application without needing to modify the framework. Regards, Jeff Henrikson On 6/18/20 7:28 AM, Vijay Bhaskar wrote: For me this seems to be an IO bottleneck at your task manager. I have a couple of queries: 1. What's your checkpoint interval? 2. How frequently are you upda

Re: Trouble with large state

2020-06-18 Thread Jeff Henrikson
laining about another taskmanager B. 2) Taskmanager B shows no obvious error other than complaining that taskmanager A has disconnected from it. Regards, Jeff Henrikson On 6/17/20 9:52 PM, Yun Tang wrote: Hi Jeff 1. "after around 50GB of state, I stop being able to reliably take che

Trouble with large state

2020-06-17 Thread Jeff Henrikson
ving it out to avoid the possibility of masking deterministic faults. Below are my configurations. Thanks in advance for any advice. Regards, Jeff Henrikson Flink version: 1.10 Configuration set via code: parallelism=8 maxParallelism=64 setStreamTimeCharacteristic(Tim

Re: flink-s3-fs-hadoop retry configuration

2020-06-17 Thread Jeff Henrikson
lesystem via Flink's  config yaml? Afaik all config parameters prefixed with "s3." are mirrored into the Hadoop file system connector. On Mon, May 4, 2020 at 8:45 PM Jeff Henrikson mailto:jehenri...@gmail.com>> wrote:  > 2) How can I tell if flink-s3-fs-h

Overriding hadoop core-site.xml keys using the flink-fs-hadoop-shaded assemblies

2020-05-05 Thread Jeff Henrikson
overridden. Thanks, Jeff Henrikson

Re: flink-s3-fs-hadoop retry configuration

2020-05-04 Thread Jeff Henrikson
Thanks in advance, Jeff Henrikson https://github.com/apache/flink/tree/master/flink-filesystems/flink-fs-hadoop-shaded https://github.com/apache/flink/blob/master/flink-filesystems/flink-fs-hadoop-shaded/src/main/resources/core-default-shaded.xml fs.s3a.connection.maximum

flink-s3-fs-hadoop retry configuration

2020-05-01 Thread Jeff Henrikson
a custom docker image. For now, I have not enabled the zookeeper-based HA. See below for a frequent stacktrace that I interpret as likely to be caused by s3 throttling. Thanks in advance for any help. Regards, Jeff Henrikson 2020-04-30 19:35:24 org.apache.flink.runtime.JobException