PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-01 Thread Manas Kale
Hi, I am trying to submit a pyFlink job in detached mode using the command: ../../flink-1.11.0/bin/flink run -d -py basic_streaming_job.py -j flink-sql-connector-kafka_2.11-1.11.0.jar The jobs are submitted successfully but the command does not return. I realized that was because I had the follow

Re: Flink not outputting windows before all data is seen

2020-09-01 Thread David Anderson
Teodor, I've concluded this is a bug, and have reported it: https://issues.apache.org/jira/browse/FLINK-19109 Best regards, David On Sun, Aug 30, 2020 at 3:01 PM Teodor Spæren wrote: > Hey again David! > > I tried your proposed change of setting the paralilism higher. This > worked, but why do

Re: FileSystemHaServices and BlobStore

2020-09-01 Thread Yang Wang
Hi Alexey, Glad to hear that your are interested the K8s HA support. Roman's answer is just on point. "FileSystemBlobStore" is trying to store the user jars, job graph, etc. on the distributed storage(e.g. HDFS, S3, GFS). So when the JobManager failover, it could fetch the blob data from remote

Re: Re: Exception on s3 committer

2020-09-01 Thread Yun Gao
Hi Ivan, For flink sink it might commit a single file multiple times. This happens if there are failover after committing one file and then met with a failover, the the job will restarted from the latest checkpoint, and the file's state will be get back to pending and committed again. In the

Re: Packaging multiple Flink jobs from a single IntelliJ project

2020-09-01 Thread Flavio Pompermaier
Yes, the recommended way to proceed in your use case is to put all classes in a single JAR and to specify the main class to run to the flink client. Best, Flavio

Re: Idle stream does not advance watermark in connected stream

2020-09-01 Thread Aljoscha Krettek
I can only agree with Dawid, who explained it better than me... 😅 Aljoscha On 31.08.20 12:10, Dawid Wysakowicz wrote: Hey Arvid, The problem is that the StreamStatus.IDLE is set on the Task level. It is not propagated to the operator. Combining of the Watermark for a TwoInputStreamOperator hap

Re: Re: Questions of "State Processing API in Scala"

2020-09-01 Thread Matthias Pohl
Hi Izual, thanks for contributing and improving the documentation. The PR will be picked up as part of our regular maintenance work. The communication will happen through PR conversations as soon as someone picks it up. Best, Matthias On Tue, Sep 1, 2020 at 8:44 AM izual wrote: > I tried to fix

Using S3 as a streaming File source

2020-09-01 Thread orionemail
Hi, I have a S3 bucket that is continuously written to by millions of devices. These upload small compressed archives. What I want to do is treat the tar gzipped (.tgz) files as a streaming source and process each archive. The archive contains three files that each might need to be processed.

Re: Using S3 as a streaming File source

2020-09-01 Thread Jörn Franke
Why don’t you get an S3 notification on SQS and do the actions from there? You will probably need to write the content of the files to a no sql database . Alternatively send the s3 notification to Kafka and read flink from there. https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo

Re: Using S3 as a streaming File source

2020-09-01 Thread Ayush Verma
Word of caution. Streaming from S3 is really cost prohibitive as the only way to detect new files is to continuously spam the S3 List API. On Tue, Sep 1, 2020 at 4:50 PM Jörn Franke wrote: > Why don’t you get an S3 notification on SQS and do the actions from there? > > You will probably need to

Re: Editing Rowtime for SQL Table

2020-09-01 Thread Satyam Shekhar
Thanks for your replies Matthias and Timo. Converting the Table to DataStream, assigning a new Watermark & Rowtime attribute, and converting back to Table makes sense. One challenge with that approach is that Table to DataStream conversion could emit retractable data stream, however, I think, that

Re: FileSystemHaServices and BlobStore

2020-09-01 Thread Alexey Trenikhun
Judging from FLIP-19 (thank you Roman for the link), of 3 use cases (jars, RPC messages and log files) only jar files need HA guarantee, and in my particular case, job cluster with jar as part of image, it seems doesn't matter, I guess it explains why in my test I was able to recover from failur

Task Chaining slots performance

2020-09-01 Thread Vijayendra Yadav
Hi Team, https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups *Flink chaining my Tasks which is like: stream.map().filter().map() * *I think here the entire chain runs in the same slot.* *Documentation says flink does chahining for bet

Re: Default Flink Metrics Graphite

2020-09-01 Thread Vijayendra Yadav
Thanks all, I could see the metrics. On Thu, Aug 27, 2020 at 7:51 AM Robert Metzger wrote: > I don't think these error messages give us a hint why you can't see the > metrics (because they are about registering metrics, not reporting them) > > Are you sure you are using the right configuration p

Re: PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-01 Thread Xingbo Huang
Hi Manas, When running locally, you need `ten_sec_summaries.get_job_client().get_job_execution_result().result()` to wait job finished. However, when you submit to the cluster, you need to delete this code. In my opinion, the current feasible solution is that you prepare two sets of codes for this

Re: PyFlink - detect if environment is a StreamExecutionEnvironment

2020-09-01 Thread Manas Kale
Hi Xingbo, Thank you for clarifying that. I am indeed maintaining a different version of the code by commenting those lines, but I was just wondering if it was possible to detect the environment programmatically. Regards, Manas On Wed, Sep 2, 2020 at 7:32 AM Xingbo Huang wrote: > Hi Manas, > >

A couple of question for Stateful Functions

2020-09-01 Thread danp
Hi, Nice to see the progress of Stateful functions. I have a few questions that I hope you can reply to. My first question is regarding the newly implemented StatefulFunctionDataStreamBuilder. Is there anything to pay attention to if one first union a couple of streams and performs a sort via a

Re: Debezium Flink EMR

2020-09-01 Thread Rex Fenley
This worked, thanks! Looking forward to the future releases :) On Mon, Aug 31, 2020 at 5:06 PM Marta Paes Moreira wrote: > Hey, Rex! > > This is likely due to the tombstone records that Debezium produces for > DELETE operations (i.e. a record with the same key as the deleted row and a > value of