Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Yuval Itzchakov
The job construction itself is a bit complex, but it can either be a StatementSet that's being filled, or there is some kind of conversion Table -> DataStream and then we put the transformations on the DataStream itself. Invocation looks like this: executionEffect = if (...)

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Caizhi Weng
Hi! You talked about status code so I guess you're speaking about the client that submits the job, not the job itself. Flink jobs does not have "exit codes", they only have status such as RUNNING and FINISHED. When you run your user code locally, it is running in a testing mini-cluster in JVM. So

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Yuval Itzchakov
I mean it finishes successful and exists with status code 0. Both when running locally and submitting to the cluster. On Wed, Dec 22, 2021, 08:36 Caizhi Weng wrote: > Hi! > > By "the streaming job stops" do you mean the job ends with CANCELED state > instead of FINISHED state? Which kind of job

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Caizhi Weng
Hi! By "the streaming job stops" do you mean the job ends with CANCELED state instead of FINISHED state? Which kind of job are you running? Is it a select job or an insert job? Insert jobs should run continuously once they're submitted. Could you share your user code if possible? Yuval Itzchakov

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Yuval Itzchakov
Hi Caizhi, If I don't block on statementset.execute, the job finishes immediately with exit code 0 and the streaming job stops, and that's not what I want. I somehow need to block. On Wed, Dec 22, 2021, 03:43 Caizhi Weng wrote: > Hi! > > You can poll the status of that job with REST API [1].

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-21 Thread Debraj Manna
Any idea when can we expect https://issues.apache.org/jira/browse/FLINK-25375 to be released? On Mon, Dec 20, 2021 at 8:18 PM Martijn Visser wrote: > Hi, > > The status and Flink ticket for upgrading to Log4j 2.17.0 can be tracked > at https://issues.apache.org/jira/browse/FLINK-25375. > > Best

Re: PyFlink Perfomance

2021-12-21 Thread Francis Conroy
Hi Dian, I'll build up something similar and post it, my current test code contains proprietary information. On Wed, 22 Dec 2021 at 14:49, Dian Fu wrote: > Hi Francis, > > Could you share the benchmark code you use? > > Regards, > Dian > > On Wed, Dec 22, 2021 at 11:31 AM Francis Conroy < > fran

Re: PyFlink Perfomance

2021-12-21 Thread Dian Fu
Hi Francis, Could you share the benchmark code you use? Regards, Dian On Wed, Dec 22, 2021 at 11:31 AM Francis Conroy < francis.con...@switchdin.com> wrote: > I've just run an analysis using a similar example which involves a single > python flatmap operator and we're getting 100x less through

Re: FlinkKafkaProducer deprecated in 1.14 but pyflink binding still present?

2021-12-21 Thread Francis Conroy
Thanks for the response Dian, I made the changes to a fork of flink and have been using them. The changes aren't ready to be merged back though as a lot is missing, documentation updates, testing, etc. Thanks, Francis On Wed, 27 Oct 2021 at 13:40, Dian Fu wrote: > Hi Francis, > > Yes, you are r

Re: PyFlink Perfomance

2021-12-21 Thread Francis Conroy
I've just run an analysis using a similar example which involves a single python flatmap operator and we're getting 100x less through by using python over java. I'm interested to know if you can do such a comparison. I'm using Flink 14.0. Thanks, Francis On Thu, 18 Nov 2021 at 02:20, Thomas Portu

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread Xintong Song
Sorry to join the discussion late. +1 for dropping support for hadoop versions < 2.8 from my side. TBH, warping the reflection based logic with safeguards sounds a bit neither fish nor fowl to me. It weakens the major benefits that we look for by dropping support for early versions. - The codebas

Re: Overwriting Flink Core InputStreamFactory

2021-12-21 Thread Caizhi Weng
Hi! I see that you're submitting your job to Flink in a k8s environment. Could you explain in detail how do you submit your job? For example did you put your user jar under the lib directory and build a docker image from it? Flink regards user classes as higher priority ones, so adding the class

Re: Flink Checkpoint Duration/Job Throughput

2021-12-21 Thread Caizhi Weng
Hi! >From your description this is due to data skew. The common solution to data skew is to add a random value to your partition keys so that data can be distributed evenly into downstream operators. Could you provide more information about your job (preferably user code or SQL code), especially

Re: Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Caizhi Weng
Hi! You can poll the status of that job with REST API [1]. You can tell that the job successfully finishes by the FINISHED state and that the job fails by the FAILED state. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid Yuval Itzchakov 于2021年12月22日周三 02:3

Re: How to know if Job nodes are registered in cluster?

2021-12-21 Thread John Smith
Ok so only the leader will indicate it's the leader. The other just say they are waiting for a lock... On Tue., Dec. 21, 2021, 9:42 a.m. David Morávek, wrote: > Hi John, > > there is usually no need to run multiple JM, if you're able to start a new > one quickly after failure (eg. when you're ru

Re: org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat class missing from Flink 1.14 ?

2021-12-21 Thread Tuomas Pystynen
Timo, Thanks, that was the problem. I was using flink-connector-jdbc_2.11-1.13.2.jar. Upgrading to flink-connector-jdbc_2.11-1.14.2.jar solved my problem. Regards, Tuomas > On 21. Dec 2021, at 11.14, Timo Walther wrote: > > Hi Tuomas, > > are you sure that all dependencies have been

Overwriting Flink Core InputStreamFactory

2021-12-21 Thread AG
I included the package org.apache.flink.api.common.io.compression in my intellij project and added the class GzipInflaterInputStreamFactory. The class just redefined the method `getCommonFileExtensions` to not recognize "gzip". I need this because of google cloud's transcoding

Flink Checkpoint Duration/Job Throughput

2021-12-21 Thread Terry Heathcote
Hi We are having trouble with record throughput that we believe to be a result of slow checkpoint durations. The job uses Kafka as both a source and sink as well as a Redis-backed service within the same cluster, used to enrich the data in a transformation, before writing records back to Kafka. Be

Waiting on streaming job with StatementSet.execute() when using Web Submission

2021-12-21 Thread Yuval Itzchakov
Hi, Flink 1.14.2 Scala 2.12 I have a streaming job that executes and I want to infinitely wait for it's completion, or if an exception is thrown during initialization. When using *statementSet.execute().await()*, I get an error: Caused by: org.apache.flink.util.FlinkRuntimeException:* The Job Re

Re: Kryo EOFException: No more bytes left

2021-12-21 Thread Dan Hill
I was not able to reproduce it by re-running the same job with an updated kryo library. The join doesn't do anything special. On Sun, Dec 19, 2021 at 4:58 PM Dan Hill wrote: > I'll retry the job to see if it's reproducible. The serialized state is > bad so that run keeps failing. > > On Sun, De

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
"+ *setting class-loading strategy to parent-first *could" ... otherwise the classes will be always loaded from the jar provided via the REST API On Tue, Dec 21, 2021 at 5:46 PM Lior Liviev wrote: > So again, after putting the jar in that folder I don’t need to configure > anything else? > > Get

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread David Morávek
CC user@f.a.o Is anyone aware of something that blocks us from doing the upgrade? D. On Tue, Dec 21, 2021 at 5:50 PM David Morávek wrote: > Hi Martijn, > > from person experience, most Hadoop users are lagging behind the release > lines by a lot, because upgrading a Hadoop cluster is not reall

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread Lior Liviev
So again, after putting the jar in that folder I don’t need to configure anything else? Get Outlook for iOS From: David Morávek Sent: Tuesday, December 21, 2021 6:39:10 PM To: Lior Liviev Cc: user Subject: Re: Avoiding Dynamic Classloadin

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
hmm, with this approach I can only think about not really nice solutions... I guess putting a jar into `/lib` folder + setting class-loading strategy to parent-first could do the trick (load everything in the "main" class loader), but then using this endpoint / deployment path for submission kind o

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread Lior Liviev
Yes, I’m using "/jars/:jarid/run" Get Outlook for iOS From: David Morávek Sent: Tuesday, December 21, 2021 6:08:51 PM To: Lior Liviev ; user Subject: Re: Avoiding Dynamic Classloading for User Code CAUTION: external source Please always

Re: Class loader

2021-12-21 Thread David Morávek
to answer that, I need a better understanding of how exactly you deploy the jobs. Please keep the conversation focused in the second thread, that discusses the problem D. On Tue, Dec 21, 2021 at 5:09 PM Lior Liviev wrote: > I don’t have the Jar in Flinks folder yet it’s just that I don’t know i

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
Please always include the ML in the reply-list, so other can participate in the discussion / learn from the findings we are aware of multiple issues when web-submission can result in classloader / thread local leaks, which could potentially result in the behavior you're describing. We're working o

Re: Class loader

2021-12-21 Thread David Morávek
> > if I have my user code Jar in Flink > I assume this means that the user code is already on Flink's class path (eg. /lib folder). Then it depends on how the class-loader is set up [1] and on how you submit the job. For session cluster, each job is isolated in a separate classloader and by defa

Re: Class loader

2021-12-21 Thread David Morávek
I assume this is a duplicate of the previous thread [1] [1] https://lists.apache.org/thread/16kxytrqycolfwfmr5tv0g6bq9m2wvof Best, D. On Tue, Dec 21, 2021 at 3:53 PM Lior Liviev wrote: > Hello, I wanted to know if I have my user code Jar in Flink, and I'm > running it 3 times, will the class l

Re: Avoiding Dynamic Classloading for User Code

2021-12-21 Thread David Morávek
Hi Lior, can you please provide details about the steps (I'm not sure what load jar / execute with the API means)? are you submitting the job using the REST API or Flink CLI? I assume you're using a session cluster. also what is the concern here? do you run into any class-loading related issues?

Class loader

2021-12-21 Thread Lior Liviev
Hello, I wanted to know if I have my user code Jar in Flink, and I'm running it 3 times, will the class loader take the same classes at every execution?

Avoiding Dynamic Classloading for User Code

2021-12-21 Thread Lior Liviev
Hello, I have existing fixed cluster (not a new one with every job execution) and a single Jar +multiple executions with different params. Currently my procedure is: 1. Download Jar 2. Load Jar with API 3. Execute with API. I plan to avoid dynamic class loading by applying method described in:

Re: How to know if Job nodes are registered in cluster?

2021-12-21 Thread David Morávek
Hi John, there is usually no need to run multiple JM, if you're able to start a new one quickly after failure (eg. when you're running on kubernetes). There is always only single active leader and other JMs effectively do nothing besides competing for the leadership. Zookeeper based HA uses the De

Re: Read parquet data from S3 with Flink 1.12

2021-12-21 Thread Seth Wiesman
Hi Alexandre, You are correct, BatchTableEnvironment does not exist in 1.14 anymore. In 1.15 we will have the state processor API ported to DataStream for exactly this reason, it is the last piece to begin officially marking DataSet as deprecated. As you can understand, this has been a multi year

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
I'd say that the playground was never meant to be run behind a corporate proxy. If you feel like the documentation can be improved improved in this regard, please open a new jira issue describing what's missing. Ideally if you'd like to fix this for others with the similar setup that may run into t

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread HG
Hello David, Yes I understand the issues. But the problem is that the documentation on https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/try-flink/flink-operations-playground/ says: Starting the Playground #

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
oh, I've misread your email... the proxies are generally tricky to reason about, as each software needs to implement the proxy support on its own in general, most tools try to use the commonly "agreed upon" environment variables (HTTP_PROXY & HTTPS_PROXY), which is also case for wget JVM uses jav

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread HG
Hi David When I start a docker container based on the image that is created shortly before the mvn command fails, it is resolvable. I can do : apt update This basically means that the outside world is reachable When I install wget the following succeeds without any issue: wget https://repo.maven.

Re: org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat class missing from Flink 1.14 ?

2021-12-21 Thread Timo Walther
Hi Tuomas, are you sure that all dependencies have been upgraded to Flink 1.14. Connector dependencies that still reference Flink 1.13 might cause issues. JdbcBatchingOutputFormat has been refactored in this PR: https://github.com/apache/flink/pull/16528 I hope this helps. Regards, Timo On

Re: flink-playground docker/mvn clean install Unknown host repo.maven.apache.org: Name or service not known

2021-12-21 Thread David Morávek
Hello Hans, it's DNS ;) You need to make sure, that "repo.maven.apache.org" can be resolved from your docker container (you can use tools such as host, dig, nslookup to verify that). This is may be tricky to debug, unless you're familiar with networking. A good place to start might be checking the