Re: PyFlink cluster runtime issue

2020-08-28 Thread Manas Kale
Hi Xingbo, Thanks, that worked. Just to make sure, the only way currently to submit a pyFlink job is through the command line right? Can I do that through the GUI? On Fri, Aug 28, 2020 at 8:17 PM Xingbo Huang wrote: > Hi Manas, > > I think you forgot to add kafka jar[1] dependency. You can use t

Exception on s3 committer

2020-08-28 Thread Ivan Yang
Hi all, We got this exception after a job restart. Does anyone know what may lead to this situation? and how to get pass this Checkpoint issue? Prior to this, the job failed due to “Checkpoint expired before completing.” We are s3 heavy, writing out 10K files to s3 every 10 minutes using Stream

Re: Debezium Flink EMR

2020-08-28 Thread Rex Fenley
Awesome, so that took me a step further. When running i'm receiving an error however. FYI, my docker-compose file is based on the Debezium mysql tutorial which can be found here https://debezium.io/documentation/reference/1.2/tutorial.html Part of the stack trace: flink-jobmanager_1 | Caused

Re: Batch version of StreamingFileSink.forRowFormat(...)

2020-08-28 Thread Dan Hill
Thanks! I'll take a look. On Tue, Aug 11, 2020 at 1:33 AM Timo Walther wrote: > Hi Dan, > > InputFormats are the connectors of the DataSet API. Yes, you can use > either readFile, readCsvFile, readFileOfPrimitives etc. However, I would > recommend to also give Table API a try. The unified Table

Re: Flink Task Slots

2020-08-28 Thread Vijayendra Yadav
Please ignore, last Email, Its working now by adding more parallelism. On Fri, Aug 28, 2020 at 12:39 PM Vijayendra Yadav wrote: > Hi Team, > > > https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#configuring-taskmanager-processing-slots > > > In my Flinkkafkaconsumer st

Re: Flink OnCheckpointRollingPolicy streamingfilesink

2020-08-28 Thread Vijayendra Yadav
Hi Andrey, Thanks, what is recommendation for : env.getCheckpointConfig. *setMaxConcurrentCheckpoints*(concurrentchckpt) ? 1 or higher based on what factor. Regards, Vijay On Tue, Aug 25, 2020 at 8:55 AM Andrey Zagrebin wrote: > Hi Vijay, > > I think it depends on your job requirements, in

Flink Task Slots

2020-08-28 Thread Vijayendra Yadav
Hi Team, https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#configuring-taskmanager-processing-slots In my Flinkkafkaconsumer streaming process. I have kafka with 3 partitions. I am getting parallelism = 3 (-p 3). Here 3 taskmangers are launched but each taskmanager g

Issues with Flink Batch and Hadoop dependency

2020-08-28 Thread Dan Hill
I'm assuming I have a simple, common setup problem. I've spent 6 hours debugging and haven't been able to figure it out. Any help would be greatly appreciated. *Problem* I have a Flink Streaming job setup that writes SequenceFiles in S3. When I try to create a Flink Batch job to read these Seq

flink watermark strategy

2020-08-28 Thread Vijayendra Yadav
Hi Team, For regular unbounded streaming application streaming through kafka, which does not use any event time or window operations, does setting watermark strategy for kafkaconsumer (connector) help us in any way like performance ? Regards, Vijay

Re: FileSystemHaServices and BlobStore

2020-08-28 Thread Alexey Trenikhun
Motivation is to have k8s HA setup without extra component - Zookeeper, see [1] Purpose of BlobStore is vague to me, what kind of BLOBs are stored? Looks like if we start job from savepoint, then persistence of BlobStore is not necessary, but is it needed if we recover from checkpoint? Thanks,

user@flink.apache.org

2020-08-28 Thread Sofya T. Irwin
Hi Danny, Thank you for your response. I'm trying to join two streams that are both fairly high volume. My join looks like this: SELECT A.rowtime as rowtime, A.foo, B.bar FROM A LEFT JOIN B ON A.foo = B.foo AND A.rowtime BETWEEN B.rowtime - INTERVAL '1' HOUR AND B.rowti

Re: FileSystemHaServices and BlobStore

2020-08-28 Thread Khachatryan Roman
Hello Alexey, I think you need FileSystemBlobStore as you are implementing HA Services, and BLOBs should be highly available too. However, I'm a bit concerned about the direction in general: it essentially means re-implementing ZK functionality on top of FS. What are the motivation and the use cas

FileSystemHaServices and BlobStore

2020-08-28 Thread Alexey Trenikhun
Hello, I'm thinking about implementing FileSystemHaServices - single leader, but persistent RunningJobRegistry, CheckpointIDCounter, CompletedCheckpointStore and JobGraphStore. I'm not sure do you need FileSystemBlobStore or VoidBlobStore is enough. Can't figure out, should BlobStore survive Job

Re: PyFlink cluster runtime issue

2020-08-28 Thread Xingbo Huang
Hi Manas, I think you forgot to add kafka jar[1] dependency. You can use the argument -j of the command line[2] or the Python Table API to specify the jar. For details about the APIs of adding Java dependency, you can refer to the relevant documentation[3] [1] https://ci.apache.org/projects/flink

PyFlink cluster runtime issue

2020-08-28 Thread Manas Kale
Hi, I am trying to deploy a pyFlink application on a local cluster. I am able to run my application without any problems if I execute it as a normal python program using the command : python myApplication.py My pyFlink version is __version__ = "1.11.0". I had installed this pyFlink through conda/pi

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-28 Thread Dawid Wysakowicz
@Aljoscha Let me bring back to the ML some of the points we discussed offline. Ad. 1 Yes I agree it's not just about scheduling. It includes more changes to the runtime. We might need to make it more prominent in the write up. Ad. 2 You have a good point here that switching the default value for

Re:Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Haibo Sun
Congratulations Dian ! Best, Haibo At 2020-08-27 18:03:38, "Zhijiang" wrote: >Congrats, Dian! > > >-- >From:Yun Gao >Send Time:2020年8月27日(星期四) 17:44 >To:dev ; Dian Fu ; user >; user-zh >Subject:Re: Re: [ANNOUNCE] New PMC member

Re: Flink Migration

2020-08-28 Thread Yun Tang
Hi Navneeth First of all, I suggest to upgrade Flink version to latest version. And you could refer here [1] for the savepoint compatibility when upgrading Flink. For the problem that cannot connect address, you could login your pod and run 'nslookup jobmanager' to see whether the host could be

Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-28 Thread Dian Fu
Thanks you all! It's my honor and pleasure to work with you in such a great community. Regards, Dian > 在 2020年8月28日,下午2:40,Till Rohrmann 写道: > > Congrats, Dian! > > Cheers, > Till > > On Fri, Aug 28, 2020 at 8:33 AM Wei Zhong wrote: > >> Congratulations Dian! >> >>> 在 2020年8月28日,14:29,Jin

Flink Migration

2020-08-28 Thread Navneeth Krishnan
Hi All, We are currently on a very old version of flink 1.4.0 and it has worked pretty well. But lately we have been facing checkpoint timeout issues. We would like to minimize any changes to the current pipelines and go ahead with the migration. With that said our first pick was to migrate to 1.5

Re: The frequency flink push metrics to pushgateway?

2020-08-28 Thread Chesnay Schepler
You can configure the reporter interval; please see this example . On 28/08/2020 08:49, wangl...@geekplus.com wrote: Using pr

The frequency flink push metrics to pushgateway?

2020-08-28 Thread wangl...@geekplus.com
Using prometheus and pushgateway to monitor my flink cluster. For self defined metrics, counter.inc() will be called for every invoke method I want to know when the actual counter number is pushed to pushgateway? Is is a fixed frequency? If i can set the frequency? Thanks, Lei wangl...@g