Re: Setting source vs sink vs window parallelism with data increase

2019-03-01 Thread Padarn Wilson
Hi all again - following up on this I think I've identified my problem as being something else, but would appreciate if anyone can offer advice. After running my stream from sometime, I see that my garbage collector for old generation starts to take a very long time: [image: Screen Shot 2019-03-02

Flink 1.7.1 Inaccessible

2019-03-01 Thread Seye Jin
I am getting "service temporarily unavailable due to an ongoing leader election" when I try to access Flink UI. The jobmanager has HA configured, I have tried to restart jobmanager multiple times but no luck. I also tried submitting my job from console but I also get the same message. When I view l

[1.7.1] job stuck in suspended state

2019-03-01 Thread Steven Wu
We have observe that sometimes job stuck in suspended state, and no job restart/recover were attempted once job is suspended. * it is a high-parallelism job (like close to 2,000) * there were a few job restarts before this * there were high GC pause during the period * zookeeper timeout. probably c

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-01 Thread Ken Krugler
Hi Arnaud, 1. What’s your checkpoint configuration? Wondering if you’re writing to HDFS, and thus the load you’re putting on it while catching up & checkpointing is too high. If so, then you could monitor the TotalLoad metric (FSNamesystem) in your source, and throttle back the emitting of fil

RE: Checkpoints and catch-up burst (heavy back pressure)

2019-03-01 Thread LINZ, Arnaud
Hi, I think I should go into more details to explain my use case. I have one non parallel source (parallelism = 1) that list binary files in a HDFS directory. DataSet emitted by the source is a data set of file names, not file content. These filenames are rebalanced, and sent to workers (paralle

Kafka consumer do not commit offset at checkpoint

2019-03-01 Thread Andy Hoang
Hi all, I posted a bug here but its seem is my configuration problem: https://issues.apache.org/jira/browse/FLINK-11335 so I resend this to mailing list My env: AWS EMR 5.20: hadoop, flink plugin flink: 1.62/1.70 run under yarn-cluster K

Re: How do I compute the average and keep track of a state over a window in DataStream?

2019-03-01 Thread Felipe Gutierrez
thanks Congxian. I will check Process Function over windows. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Fri, Mar 1, 2019 at 8:16 AM Congxian Qiu wrote: > Hi Felipe > > Maybe you could use pro

Flink Custom SourceFunction and SinkFunction

2019-03-01 Thread Siew Wai Yow
Hi guys, I have question regarding to the title that need your expertise, 1. I need to build a SFTP SourceFunction, may I know if hadoop SFTPFileSystem suitable? 2. I need to build a SFTP SinkFunction as well, may I know if per-defined HDFS rolling file sink accept SFTP connection since