Flink SQL update-mode set to retract in env file.

2019-09-25 Thread srikanth flink
How could I configure environment file for Flink SQL, update-mode: retract? I have this for append: properties: - key: zookeeper.connect value: localhost:2181 - key: bootstrap.servers value: localhost:9092 - key: group.id value: reconMultiAttem

Re: Anomaly in handling late arriving data

2019-09-25 Thread Zhu Zhu
Hi Indraneel, In your case, ("u1", "e12", 8L) is not considered late and will go into the session window {e7,e8,e9,e11} (range=11~19). This is because 8+3(session gap) >= 11, the lower bound of the existing session window Regarding your 3 questions: *>> 1) Event("u1", "e12", 8L) change to Event("

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-25 Thread Aleksandar Mastilovic
Would you guys (Flink devs) be interested in our solution for zookeeper-less HA? I could ask the managers how they feel about open-sourcing the improvement. > On Sep 25, 2019, at 11:49 AM, Yun Tang wrote: > > As Aleksandar said, k8s with HA configuration could solve your problem. There > alrea

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-25 Thread Yun Tang
As Aleksandar said, k8s with HA configuration could solve your problem. There already have some discussion about how to implement such HA in k8s if we don't have a zookeeper service: FLINK-11105 [1] and FLINK-12884 [2]. Currently, you might only have to choose zookeeper as high-availability serv

Anomaly in handling late arriving data

2019-09-25 Thread Indraneel R
Hi Everyone, I am trying to execute this simple sessionization pipeline, with the allowed lateness shown below: def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) env.setPa

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-25 Thread Aleksandar Mastilovic
Can’t you simply use JobManager in HA mode? It would pick up where it left off if you don’t provide a Savepoint. > On Sep 25, 2019, at 6:07 AM, Sean Hester wrote: > > thanks for all replies! i'll definitely take a look at the Flink k8s Operator > project. > > i'll try to restate the issue to

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-25 Thread Zhu Zhu
We will then keep the decision that we do not support customized restart strategy in Flink 1.10. Thanks Steven for the inputs! Thanks, Zhu Zhu Steven Wu 于2019年9月26日周四 上午12:13写道: > Zhu Zhu, that is correct. > > On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu wrote: > >> Hi Steven, >> >> As a conclusio

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-25 Thread Steven Wu
Zhu Zhu, that is correct. On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu wrote: > Hi Steven, > > As a conclusion, since we will have a meter metric[1] for restarts, > customized restart strategy is not needed in your case. > Is that right? > > [1] https://issues.apache.org/jira/browse/FLINK-14164 > > T

Re: Setting environment variables of the taskmanagers (yarn)

2019-09-25 Thread Peter Huang
Hi Richard, Good suggestion. I just created a Jira ticket. I will find a time this week to update docs. Best Regards Peter Huang On Wed, Sep 25, 2019 at 8:05 AM Richard Deurwaarder wrote: > Hi Peter and Jiayi, > > Thanks for the answers this worked perfectly, I just added > > containerized.m

Re: Flink job manager doesn't remove stale checkmarks

2019-09-25 Thread Fabian Hueske
Hi, You enabled incremental checkpoints. This means that parts of older checkpoints that did not change since the last checkpoint are not removed because they are still referenced by the incremental checkpoints. Flink will automatically remove them once they are not needed anymore. Are you sure t

Re: Running flink on AWS ECS

2019-09-25 Thread Navneeth Krishnan
Thanks Terry, the reason why I asked this is because somewhere I saw running one slot per container is beneficial. I couldn’t find the where I saw that. Also I think running it with multiple slots will reduce IPC since some of the data will be processed writhing the same JVM. Thanks On Wed, Sep 2

Re: Setting environment variables of the taskmanagers (yarn)

2019-09-25 Thread Richard Deurwaarder
Hi Peter and Jiayi, Thanks for the answers this worked perfectly, I just added containerized.master.env.GOOGLE_APPLICATION_CREDENTIALS=xyz and containerized.taskmanager.env.GOOGLE_APPLICATION_CREDENTIALS=xyz to my flink config and they got picked up. Do you know why this is missing from the doc

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-25 Thread Vijay Bhaskar
One of the way you should do is, have a separate cluster job manager program in kubernetes, which is actually managing jobs. So that you can decouple the job control. While restarting the job, make sure to follow the below steps: a) First job manager takes save point by killing the job and notes d

Re: Multiple Job Managers in Flink HA Setup

2019-09-25 Thread Gary Yao
Hi Steve, > I also tried attaching a shared NFS folder between the two machines and > tried to set their web.tmpdir property to the shared folder, however it > appears that each job manager creates a seperate job inside that directory. You can create a fixed upload directory via the config option

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-25 Thread Sean Hester
thanks for all replies! i'll definitely take a look at the Flink k8s Operator project. i'll try to restate the issue to clarify. this issue is specific to starting a job from a savepoint in job-cluster mode. in these cases the Job Manager container is configured to run a single Flink job at start-

Re: Joins Usage Clarification

2019-09-25 Thread Fabian Hueske
Hi Nishant, To answer your questions: 1) yes, the SQL time-windowed join and the DataStream API Interval Join are the same (with different implementations though) 2) DataStream Session-window joins are not directly supported in SQL. You can play some tricks to make it work, but it wouldn't be eleg

Joins Usage Clarification

2019-09-25 Thread Nishant Gupta
Hi Team, There are 3 types of window join (Tumbling, Session, and Sliding) and 1 interval Join as mentioned in (For Table API) [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/joining.html

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-25 Thread Zhu Zhu
Yes. 1.8.2 contains all commits in 1.8.1. Subramanyam Ramanathan 于2019年9月25日周三 下午5:03写道: > Hi Zhu, > > > > Thanks a lot ! > > Since 1.8.2 is also available, would it be right to assume 1.8.2 would > also contain the fix ? > > > > Thanks, > > Subbu > > > > > > *From:* Zhu Zhu [mailto:reed...@gmai

RE: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-25 Thread Subramanyam Ramanathan
Hi Zhu, Thanks a lot ! Since 1.8.2 is also available, would it be right to assume 1.8.2 would also contain the fix ? Thanks, Subbu From: Zhu Zhu [mailto:reed...@gmail.com] Sent: Tuesday, September 24, 2019 9:39 PM To: Subramanyam Ramanathan Cc: Dian Fu ; user@flink.apache.org Subject: Re: NoC

Re: Question about reading ORC file in Flink

2019-09-25 Thread Fabian Hueske
Thank you very much for coming back and reporting the good news! :-) If you think that there is something that we can do to improve Flink's ORC input format, for example log a warning, please open a Jira. Thank you, Fabian Am Mi., 25. Sept. 2019 um 05:14 Uhr schrieb 163 : > Hi Fabian, > > After

Re: Running flink on AWS ECS

2019-09-25 Thread Terry Wang
Hi, Navneeth, I think both is ok. IMO, run one container with number of slots same as virtual cores may be better for slots can share the Flink Framework and thus reduce memory cost. Best, Terry Wang > 在 2019年9月25日,下午3:26,Navneeth Krishnan 写道: > > Hi All, > > I’m currently running flink o

Running flink on AWS ECS

2019-09-25 Thread Navneeth Krishnan
Hi All, I’m currently running flink on amazon ecs and I have assigned task slots based on vcpus per instance. Is it beneficial to run a separate container with one slot each or one container with number of slots same as virtual cores? Thanks