Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-21 Thread Gen Luo
Hi, Thanks for driving this @Till Rohrmann . I would give +1 on reducing the heartbeat timeout and interval, though I'm not sure if 15s and 3s would be enough either. IMO, except for the standalone cluster, where the heartbeat mechanism in Flink is totally relied, reducing the heartbeat can also

Flink TaskManager container got restarted by K8S very frequently

2021-07-21 Thread Fan Xie
Hi Flink Community, Recently I deployed a Flink cluster(1 JM, 1TM) with k8s standalone mode. Later on I notice that the pod which the TM is running on got restarted by k8s very frequently (3 times within 10 minutes). And I didn't see any error log for this pod. I tried to increase the containe

Need help of deploying Flink HA on kubernetes cluster

2021-07-21 Thread Dhiru
hi ,    I am very new to flink , I am planning to install Flink HA setup on eks cluster with 5 worker nodes . Please can some one point me to right materials or direction how to install as well as any sample job which I can run only for testing and confirm all things are working as expected . --

Re: confirm subscribe to user@flink.apache.org

2021-07-21 Thread Dhiru
need to be part of flink mailing list  On Wednesday, July 21, 2021, 11:22:14 PM AST, user-h...@flink.apache.org wrote: Hi! This is the ezmlm program. I'm managing the user@flink.apache.org mailing list. To confirm that you would like   userdh...@yahoo.com added to the user mailing li

Questions about keyed streams

2021-07-21 Thread Dan Hill
Hi. 1) If I use the same key in downstream operators (my key is a user id), will the rows stay on the same TaskManager machine? I join in more info based on the user id as the key. I'd like for these to stay on the same machine rather than shuffle a bunch of user-specific info to multiple task m

Re: Recover from savepoints with Kubernetes HA

2021-07-21 Thread Austin Cawley-Edwards
Hi Thomas, I've got a few questions that will hopefully help get to find an answer: What job properties are you trying to change? Something like parallelism? What mode is your job running in? i.e., Session, Per-Job, or Application? Can you also describe how you're redeploying the job? Are you u

Re: Kafka data sources, multiple interval joins and backfilling

2021-07-21 Thread David Morávek
Hi Dan, unfortunately Flink currently provides no source level synchronization, except for Kinesis [1], so it's easy to run into large states, when processing historical data. There is an on-going effort, to provide a generic watermark-based alignment of FLIP-27 sources [2], that will most likely

Re: Stateful Functions Status

2021-07-21 Thread Igal Shilman
Not yet unfortunately, But I'd be very much happy to work with the community on a JS SDK. On Tue, Jul 20, 2021 at 4:32 PM Omid Bakhshandeh wrote: > Igal, > > Thanks for the answers. Is there any JS SDK available? > > Best, > --Omid > > On Tue, Jul 20, 2021 at 10:23 AM Igal Shilman wrote: > >>

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-21 Thread Till Rohrmann
Thanks for sharing these insights. I think it is no longer true that the ResourceManager notifies the JobMaster about lost TaskExecutors. See FLINK-23216 [1] for more details. Given the GC pauses, would you then be ok with decreasing the heartbeat timeout to 20 seconds? This should give enough ti

Recover from savepoints with Kubernetes HA

2021-07-21 Thread Thms Hmm
Hey, we have some application clusters running on Kubernetes and explore the HA mode which is working as expected. When we try to upgrade a job, e.g. trigger a savepoint, cancel the job and redeploy, Flink is not restarting from the savepoint we provide using the -s parameter. So all state is lost