Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
Hi Jingsong, Thanks for the explanation about the mechanism of the new Flink session cluster mode. Because I mostly use job cluster mode, so did not have a good knowledge of the new Flink session cluster mode. Best, Vino Jingsong Li 于2019年11月21日周四 下午2:46写道: > Hi Piper and Vino: > > Current Fl

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread Jingsong Li
Hi Piper and Vino: Current Flink version, the resources of Flink Session cluster are unrestricted, which means if the requested resources exceed the resources owned by the current session, it will apply to the RM of yarn for new resources. And if TaskManager is idle for too long, JM will release i

Re: Cron style for checkpoint

2019-11-20 Thread Congxian Qiu
Hi Currently, Flink does not support such feature, from what you describe, does set an appropriate timeout for checkpoint can solve your problem? Best, Congxian shuwen zhou 于2019年11月21日周四 下午12:06写道: > Hi Jiayi, > It would be great if Flink could have a user defined interface for user to > imp

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
Hi Piper, The understanding of two deploy modes For Flink on Yarn is right. AFAIK, The single job (job cluster) mode is more popular than Session mode. Because job cluster mode, Flink let YARN manage resources as far as possible. And this mode can keep isolation from other jobs. IMO, we do not

Re: Streaming data to Segment

2019-11-20 Thread Yuval Itzchakov
Hi Li, You're in the right direction. One additional step would be to use RickSinkFunction[Data] instead of SinkFunction[Data] which exposes open and close functions which allow you to initialize and dispose resources properly. On Thu, 21 Nov 2019, 5:23 Li Peng, wrote: > Hey folks, I'm interest

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-20 Thread Zhijiang
Hi Andreas, You are running a batch job, so there should be no native memory used by rocked state backend. Then I guess it is either heap memory or direct memory over used. The heap managed memory is mainly used by batch operators and direct memory is used by network shuffle. Can you further ch

Re: Cron style for checkpoint

2019-11-20 Thread shuwen zhou
Hi Jiayi, It would be great if Flink could have a user defined interface for user to implement to control checkpoint behavior, at least for time related behavior. I brought up a wish on JIRA [1], perhaps it described clearly enough. [1] https://issues.apache.org/jira/browse/FLINK-14884 On Thu, 2

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread Piper Piper
Hi Vino, I want to implement Resource Elasticity. In doing so, I have read that Flink with YARN has two modes: Job and Session. In Job mode, Flink’s Resource Manager requests YARN for containers with TMs, and then gives the containers back to YARN upon job completion. In Session mode, Flink alre

Re:Cron style for checkpoint

2019-11-20 Thread Jiayi Liao
Hi Shuwen, As far as I know, Flink can only support checkpoint with a fixed interval. However I think the flexible mechanism of triggering checkpoint is worth working on, at least from my perspective. And it may not only be a cron style. In our business scenario, the data traffic usually

Streaming data to Segment

2019-11-20 Thread Li Peng
Hey folks, I'm interested in streaming some data to Segment , using their existing java library. This is a pretty high throughput stream, so I wanted for each parallel operator to have its own instance of the segment client. From what I could tell, def

Cron style for checkpoint

2019-11-20 Thread shuwen zhou
Hi Community, I would like to know if there is a existing function to support cron style checkpoint? The case is, our data traffic is huge on HH:30 for each hour. We don't wont checkpoint to fall in that range of time. A cron like 15,45 * * * * to set for checkpoint would be nice. If a checkpoint i

Re: Completed job wasn't saved to archive

2019-11-20 Thread vino yang
If everything is OK(your config options about archive dir and history server is correct), Flink should archive the completed job. You said you did not find any exceptions in the log about failing to archive. But any other exceptions? Can you share the logs about your scene? Best, Vino Pavel Pots

Re: Dynamically creating new Task Managers in YARN

2019-11-20 Thread vino yang
Hi Piper, Can you share more reason and details of your requirements. Best, Vino Piper Piper 于2019年11月21日周四 上午5:48写道: > Hi, > > How can I make Flink's Resource Manager request YARN to spin up new (or > destroy/reclaim existing) TaskManagers in YARN containers? > > Preferably at runtime (i.e. d

Re: Re:Apache Flink - Operator name and uuid best practices

2019-11-20 Thread Congxian Qiu
Hi Currently, the last part (uuid a4d87cda-2afd-47d4-8d3f-b0658466fb2d) is generated by UUID.randomUUID(), so there is not a easy way to map this to the assigned in the application. In another word, the last part (uuid a4d87cda-2afd-47d4-8d3f-b0658466fb2d) belongs to one checkpoint, and the assign

StreamingFileSink duplicate data

2019-11-20 Thread Lei Nie
Hello, I would like clarification on the StreamingFileSink, thank you. >From my testing, it seems that resuming job from checkpoint does *not* also restore the rolling part counter. E.g, job may have stopped with last file: *part-6-71* But when resuming from most recent checkpoint: *part-6-89* (

Re: Re:Apache Flink - Operator name and uuid best practices

2019-11-20 Thread M Singh
Hi Arvid: Thanks for your clarification. I am giving supplying uid for the stateful operators and find the following directory structure on in the chkpoint directory: f4e78cb47f9dc12859558be7d15f39d0/chk-6/a4d87cda-2afd-47d4-8d3f-b0658466fb2d The first part f4e78cb47f9dc12859558be7d15f39d0 is th

Dynamically creating new Task Managers in YARN

2019-11-20 Thread Piper Piper
Hi, How can I make Flink's Resource Manager request YARN to spin up new (or destroy/reclaim existing) TaskManagers in YARN containers? Preferably at runtime (i.e. dynamically). Thank you Piper

Completed job wasn't saved to archive

2019-11-20 Thread Pavel Potseluev
Hi all, We see occasionally that flink doesn't save information about canceled job to archive directory (configured by jobmanager.archive.fs.dir property). And there are no exceptions in the log about failing archiving. It's a problem in our use case because our script for deploying jobs relies on

RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-20 Thread Hailu, Andreas
Going through the release notes today - we tried fiddling with the taskmanager.memory.fraction option, going as low as 0.1 with unfortunately no success. It still leads to the container running beyond physical memory limits. // ah From: Hailu, Andreas [Engineering] Sent: Tuesday, November 19, 2

Flink Forward North America 2020 - Call for Presentations open until January 12th, 2020

2019-11-20 Thread Fabian Hueske
Hi all, Flink Forward North America returns to San Francisco on March 23-25, 2020. For the first time in North America, the conference will feature two days of talks and one day of training. We are happy to announce that the Call for Presentations is open! If you'd like to give a talk and share

RE: YARN : Different cutoff for job and task managers

2019-11-20 Thread Gwenhael Pasquiers
This works; I had some issues modifying my scripts, but it’s OK and I could confirm by JMX that env.java.opts.jobmanager had priority over the “normal” heap size (calculated from cutoff). Thanks ! 😊 From: Yang Wang Sent: mercredi 20 novembre 2019 03:52 To: Gwenhael Pasquiers Cc: user@flink.ap

Re: Does apache flink support stream input from Postgresql ?

2019-11-20 Thread Theo Diefenthal
Hi, You could check out Debezium [1] and get your database CDC into a kafka stream which can then be read nicely by flink. Best regards Theo [1] https://debezium.io/documentation/reference/1.0/tutorial.html Von: "Yu Watanabe" An: "user" Gesendet: Mittwoch, 20. November 2019 12:56:18

Re: Does apache flink support stream input from Postgresql ?

2019-11-20 Thread Zhenghua Gao
The jdbc connector can read data from PostgreSQL for Table/SQL users. For pyflink, cc @Hequn *Best Regards,* *Zhenghua Gao* On Wed, Nov 20, 2019 at 7:56 PM Yu Watanabe wrote: > Hello . > > I would like to ask question about possibility of stream read table rows > from PostgresQL using pyflink.

Does apache flink support stream input from Postgresql ?

2019-11-20 Thread Yu Watanabe
Hello . I would like to ask question about possibility of stream read table rows from PostgresQL using pyflink. In use cases in online document, there is a example which is reading data from transactional database. https://flink.apache.org/img/usecases-datapipelines.png

Re: Re: IOException when using Prometheus Monitor

2019-11-20 Thread wangl...@geekplus.com.cn
After downgrade the pushgateway to pushgateway-0.8.0.linux-amd64.tar.gz, no this Exception again. Thanks very much. wangl...@geekplus.com.cn From: wangl...@geekplus.com.cn Date: 2019-11-20 18:19 To: Chesnay Schepler; user Subject: Re: Re: IOException when using Prometheus Monitor Hi Chesnay

Re: Re: IOException when using Prometheus Monitor

2019-11-20 Thread wangl...@geekplus.com.cn
Hi Chesnay, Although there's Exception, actually the metrics has been put to the pushgateway successfully. Promethues version i used: prometheus-2.8.1.linux-amd64.tar.gz pushgateway-1.0.0.linux-amd64.tar.gz flink-metrics-prometheus_2.12-1.8.2.jar I just download the tar.gz file to CentOS node

Re: IOException when using Prometheus Monitor

2019-11-20 Thread Chesnay Schepler
From what I found so far this appears to be an incompatibility between the pushgateway and client version. So you can either a) use an older version of the pushgateway b) bump the version of the prometheus reporter. Unfortunately I cannot tell you which version you would need. On 20/11/2019 10

IOException when using Prometheus Monitor

2019-11-20 Thread wangl...@geekplus.com.cn
Standalone session flink, 1.8.2 version. Install prometheus and pushgateway on the same host. After start-cluster.sh, every 10 seconds there's following error log: 2019-11-20 17:23:12,849 WARN org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter - Failed to push metrics to PushG

RE: YARN : Different cutoff for job and task managers

2019-11-20 Thread Gwenhael Pasquiers
I see, good idea, I’ll try that and tell you the result. Thanks, From: Yang Wang Sent: mercredi 20 novembre 2019 03:52 To: Gwenhael Pasquiers Cc: user@flink.apache.org Subject: Re: YARN : Different cutoff for job and task managers Hi Gwenhael, I'm afraid that we could not set different cut-o

Re: How to estimate the memory size of flink state

2019-11-20 Thread 刘建刚
Thank you. Your suggestion is good and I benefit a lot. For my case, I want to know the state memory size for other reasons. When the the gc pressure is bigger, I need to limit the source or discard some data from the source to ensure job’s running. If the state size is bigger, I ne