Re: Streaming data to parquet

2020-09-14 Thread Senthil Kumar
-the-transaction-log%2F&data=02%7C01%7Csenthilku%40vmware.com%7C63bdd93d8fb9422d323d08d858a90584%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637356830252901961&sdata=1aYcV%2F93blapUS2ml2iHcF%2F8XVxCnJLwuV0H6VKMaRI%3D&reserved=0> On Fri, Sep 11, 2020 at 5:16 PM Senthil Kumar ma

Re: Streaming data to parquet

2020-09-11 Thread Senthil Kumar
Hello Ayush, I am interesting in knowing about your “really simple” implementation. So assuming the streaming parquet output goes to S3 bucket: Initial (partitioned by event time) Do you write another Flink batch application (step 2) which partitions the data from Initial in larger “event time

Printing effective config for flint 1.11 cli

2020-07-24 Thread Senthil Kumar
Hello, My understanding is that flink consumes the config from the config file as well as those specified via the -D option. I assume that the -D will override the values from the config file? Is there a way to somehow see what the effective config is? i.e. print all of the config values that f

Re: Age old stop vs cancel debate

2020-06-09 Thread Senthil Kumar
sth similar with synchronous savepoint in it and any other message afterwards with -ID in it to see if the savepoint is completed successfully. 2) could you see if this behavior persists in the FLINK-1.10? Thanks, Kostas On Tue, Jun 2, 2020 at 4:20 PM

Re: Stopping a job

2020-06-08 Thread Senthil Kumar
I am just stating this for completeness. When a job is cancelled, Flink sends an Interrupt signal to the Thread running the Source.run method For some reason (unknown to me), this does not happen when a Stop command is issued. We ran into some minor issues because of said behavior. From: Kost

Re: Age old stop vs cancel debate

2020-06-02 Thread Senthil Kumar
Robert, Thank you once again! We are currently doing the “short” Thread.sleep() approach. Seems to be working fine. Cheers Kumar From: Robert Metzger Date: Tuesday, June 2, 2020 at 2:40 AM To: Senthil Kumar Cc: "user@flink.apache.org" Subject: Re: Age old stop vs cancel debate

Re: Age old stop vs cancel debate

2020-05-29 Thread Senthil Kumar
Kumar From: Robert Metzger Date: Friday, May 29, 2020 at 4:38 AM To: Senthil Kumar Cc: "user@flink.apache.org" Subject: Re: Age old stop vs cancel debate Hi Kumar, They way you've implemented your custom source sounds like the right way: Having a "running" flag ch

Age old stop vs cancel debate

2020-05-27 Thread Senthil Kumar
We are on flink 1.9.0 I have a custom SourceFunction, where I rely on isRunning set to false via the cancel() function to exit out of the run loop. My run loop essentially gets the data from S3, and then simply sleeps (Thread.sleep) for a specified time interval. When a job gets cancelled, the

Re: Flink Streaming Job Tuning help

2020-05-13 Thread Senthil Kumar
Zhijiang, Thanks for your suggestions. We will keep it in mind! Kumar From: Zhijiang Reply-To: Zhijiang Date: Tuesday, May 12, 2020 at 10:10 PM To: Senthil Kumar , "user@flink.apache.org" Subject: Re: Flink Streaming Job Tuning help Hi Kumar, I can give some general ideas f

Re: Flink Streaming Job Tuning help

2020-05-12 Thread Senthil Kumar
I forgot to mention, we are consuming said records from AWS kinesis and writing out to S3. From: Senthil Kumar Date: Tuesday, May 12, 2020 at 10:47 AM To: "user@flink.apache.org" Subject: Flink Streaming Job Tuning help Hello Flink Community! We have a fairly intensive flink

Flink Streaming Job Tuning help

2020-05-12 Thread Senthil Kumar
Hello Flink Community! We have a fairly intensive flink streaming application, processing 8-9 million records a minute, with each record being 10k. One of our steps is a keyBy operation. We are finding that flink lags seriously behind when we introduce the keyBy (presumably because of shuffle ac

Re: Correctly implementing of SourceFunction.run()

2020-05-08 Thread Senthil Kumar
OK, thank you. Much appreciated. Yes, I don’t want the job to fail. The source has very little data that is being pumped into a Broadcast stream. From: Robert Metzger Date: Friday, May 8, 2020 at 9:51 AM To: Jingsong Li Cc: Senthil Kumar , "user@flink.apache.org" Subject: Re:

Correctly implementing of SourceFunction.run()

2020-05-07 Thread Senthil Kumar
I am implementing a source function which periodically wakes up and consumes data from S3. My currently implementation is like so. Following: org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction Is it safe to simply swallow any and all exceptions in the run method

Updating Closure Variables

2020-04-27 Thread Senthil Kumar
Hello Flink Community! We have a flink streaming application with a particular use case where a closure variable Set is used in a filter function. Currently, the variable is set at startup time. It’s populated from an S3 location, where several files exist (we consume the one with the last upd

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-24 Thread Senthil Kumar
Flink config. 2020-01-23 22:07:44,045 DEBUG org.apache.flink.runtime.util.HadoopUtils - Cannot find hdfs-site configuration-file path in Flink config. From: Aaron Langford Date: Thursday, January 23, 2020 at 12:22 PM To: Senthil Kumar Cc: Yang Wang , "user@flink.apach

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-23 Thread Senthil Kumar
still don’t see any debug level logs Any further info is much appreciated! From: Aaron Langford Date: Tuesday, January 21, 2020 at 10:54 AM To: Senthil Kumar Cc: Yang Wang , "user@flink.apache.org" Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR) Senthil,

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-21 Thread Senthil Kumar
.java:530) at java.lang.Thread.run(Thread.java:748) From: Yang Wang Date: Saturday, January 18, 2020 at 7:58 PM To: Senthil Kumar Cc: "user@flink.apache.org" Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR) I think this exception is not because the hadoop

Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-17 Thread Senthil Kumar
Hello all, Newbie here! We are running in Amazon EMR with the following installed in the EMR Software Configuration Hadoop 2.8.5 JupyterHub 1.0.0 Ganglia 3.7.2 Hive 2.3.6 Flink 1.9.0 I am trying to get a Streaming job from one S3 bucket into an another S3 bucket using the StreamingF