Optimizing Flink joins

2021-02-10 Thread Dan Hill
Hi! I was curious if there are docs on how to optimize Flink joins. I looked around and on the Flink docs and didn't see much. I see a little on the Configuration page. E.g. one of my jobs has an interval join. Does left vs right matter for interval join?

clarification on backpressure metrics in Apache Flink Dashboard

2021-02-10 Thread Marco Villalobos
given: [source] -> [operator 1] -> [operator 2] -> [sink]. If within the dashboard, operator 1 shows that it has backpressure, does that mean I need to improve the performance of operator 2 in order to alleviate backpressure upon operator 1?

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Barisa Obradovic
Great, thank you for help Matthias -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How does Flink handle shorted lived keyed streams

2021-02-10 Thread narasimha
Thanks Matthias, I think it will help to find out what all live keys are present. Let me check and revert back on the thread. On Wed, Feb 10, 2021 at 10:46 PM Matthias Pohl wrote: > Hi narashima, > not sure whether this fits your use case, but have you considered creating > a savepoint and anal

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Matthias Pohl
Hi Barisa, thanks for sharing this. I'm gonna add Till to this thread. He might have some insights. Best, Matthias On Wed, Feb 10, 2021 at 4:19 PM Barisa Obradovic wrote: > I'm trying to understand if behaviour of the flink jobmanager during > zookeeper upgrade is expected or not. > > I'm runni

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
As in https://github.com/aws/aws-sdk-java/blob/41a577e3f667bf5efb3d29a46aaf210bf70483a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java#L2378 never gets called as it is never GCed... On Wed, Feb 10, 2021 at 10:47 AM Vishal Santoshi wrote: > Thank you, > > Th

Re: How does Flink handle shorted lived keyed streams

2021-02-10 Thread Matthias Pohl
Hi narashima, not sure whether this fits your use case, but have you considered creating a savepoint and analyzing it using the State Processor API [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#state-processor-api On Wed, Feb 10,

Re: Flink’s Kubernetes HA services - NOT working

2021-02-10 Thread Matthias Pohl
Hi Daniel, what's the exact configuration you used? Did you use the resource definitions provided in the Standalone Flink on Kubernetes docs [1]? Did you do certain things differently in comparison to the documentation? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.

Re: How does Flink handle shorted lived keyed streams

2021-02-10 Thread narasimha
It is not solving the problem. I could see the memory keep increasing, resulting in a lot of high GCs. There could be a memory leak, just want to know how to know if older keps are skill alive, even after the pattern has been satisfied or within range of the pattern has expired. Can someone sugg

Re: S3 parquet files as Sink in the Table SQL API

2021-02-10 Thread Matthias Pohl
Hi, have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing. Best, Matthias [1] https://flink.apache.org/downloads.html#additional-components On Wed, Feb 10, 2021 at 1:24 PM meneldor wrote: > Hello, > I am using PyFlink and I want to write records f

statefun: Unable to find a source translation for ingress

2021-02-10 Thread Filip Karnicki
Hi, I modified the Stateful Functions 2.2.0 asyc example to include a real binding to kafka, I included statefun-flink-distribution and stateful-kafka-io in the pom and I created a fat jar using the maven-assembly-plugin, and my flink cluster complains about: java.lang.IllegalStateException: Unab

Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Barisa Obradovic
I'm trying to understand if behaviour of the flink jobmanager during zookeeper upgrade is expected or not. I'm running flink 1.11.2 in kubernetes, with zookeeper server 3.5.4-beta. While I'm doing zookeeper upgrade, there is a 20 seconds zookeeper downtime. I'd expect to either flink job to restar

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Chesnay Schepler
FileSystems must not be bundled in the user jar. You must place them in lib/ or plugins/, because by bundling it you break our assumption that they exist for the lifetime of the cluster (which in turn means we don't really have to worry about cleaning up). On 2/10/2021 4:01 PM, Vishal Santosh

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
com/amazonaws/services/s3/transfer/TransferManager.class is in flink-s3-fs-hadoop-1.11.2.jar which is in the plugins and that AFAIK should have a dedicated ClassLoader per plugin. So does it make sense that these classes remain beyond the job and so does the executor service for multipart upload

Re: Re: flink kryo exception

2021-02-10 Thread Piotr Nowojski
Hi, As Kezhu Wang pointed out, this MIGHT BE caused by the https://issues.apache.org/jira/browse/FLINK-21028 issue. During stop with savepoint procedure, source thread might be interrupted, leaving the whole application in an invalid and inconsistent state. In FLINK-1.12.x one potential symptom i

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
We do put the flink-hdoop-uber*.jar in the flink lib ( and thus available to the root classloader ). That still does not explain the executor service outliving the job. On Tue, Feb 9, 2021 at 6:49 PM Vishal Santoshi wrote: > Hello folks, > We see threads from > https://github.c

S3 parquet files as Sink in the Table SQL API

2021-02-10 Thread meneldor
Hello, I am using PyFlink and I want to write records from the table sql api as parquet files on AWS S3. I followed the documentations but it seems that I'm missing some dependencies or/and configuration. Here is the SQL: > CREATE TABLE sink_table( > `id` VARCHAR, > `type` VARCHAR, >

Re: State Access Beyond RichCoFlatMapFunction

2021-02-10 Thread Kezhu Wang
> Actually, my use case is that I want to share the state of one stream in two other streams. Right now, I can think of connecting this stream independently with each of the two other streams and manage the state twice, effectively duplicating it. > Only the matching keys (with the two other strea

Re: State Access Beyond RichCoFlatMapFunction

2021-02-10 Thread Sandeep khanzode
Hi, Yes, but the stream, whose state I want to share, will be indefinite and have a large volume. Also, not all keys from that stream have to go to every Task Node. Only the matching keys (with the two other streams) will do. Please let me know if there is another cleaner way to achieve this.

Re: Enabling allowNonRestoredState when resuming from a savepoint causes ClassNotFoundException

2021-02-10 Thread Roman Khachatryan
Hi Dongwon, With State Processor API you should be able to create a new snapshot that doesn't reference the unused classes. Regards, Roman On Tue, Feb 9, 2021 at 3:39 AM Dongwon Kim wrote: > Hi Khachatryan, > > Thanks for the explanation and the input! > > 1. Use the State Processor API to cr

Re: question on ValueState

2021-02-10 Thread Roman Khachatryan
Right, in this case FileSystemStateBackend is the right choice. The state size is limited by TM memory as you said. Regards, Roman On Tue, Feb 9, 2021 at 8:54 AM yidan zhao wrote: > What I am interested in is whether I should use rocksDB to replace > fileBackend. > RocksDB's performance is not