PyFlink Performance

2024-04-24 Thread David Jost
Hi, I am currently evaluating PyFlink in comparison to Java and did some various tests, mainly comparing identical pipelines with focus on throughput. For me it seems, that PyFlink is generally worse for wear and seems to reach its limits in throughput at a point where Java still has resources l

Re: Pyflink Performance and Benchmark

2024-04-16 Thread Chase Zhang
On Mon, Apr 15, 2024 at 16:17 Niklas Wilcke wrote: > Hi Flink Community, > u > I wanted to reach out to you to get some input about Pyflink performance. > Are there any resources available about Pyflink benchmarks and maybe a > comparison with the Java API? I wasn't ab

Re: [EXTERNAL]Re: Pyflink Performance and Benchmark

2024-04-15 Thread Niklas Wilcke
> Best, > Zhanghao Chen > > From: Niklas Wilcke > Sent: Monday, April 15, 2024 15:17 > To: user > Subject: Pyflink Performance and Benchmark > > Hi Flink Community, > > I wanted to reach out to you to get some input about Pyflink performance. Are > there any resou

Re: Pyflink Performance and Benchmark

2024-04-15 Thread Zhanghao Chen
the JSON processing scenario with UDFs in Java/Python under thread mode/Python under process mode. Best, Zhanghao Chen From: Niklas Wilcke Sent: Monday, April 15, 2024 15:17 To: user Subject: Pyflink Performance and Benchmark Hi Flink Community, I wanted to reach

Pyflink Performance and Benchmark

2024-04-15 Thread Niklas Wilcke
Hi Flink Community, I wanted to reach out to you to get some input about Pyflink performance. Are there any resources available about Pyflink benchmarks and maybe a comparison with the Java API? I wasn't able to find something valuable, but maybe I missed something? I am aware

Re: PyFlink performance and deployment issues

2021-08-15 Thread Dian Fu
f job locally, beam will use a loopback way to connect back to > the python process used by the compilation job, so the time of starting up > the job will come faster than pyflink which will create a new python process > to execute udf code. > > 2. >>> However, this com

Fwd: PyFlink performance and deployment issues

2021-08-14 Thread Wouter Zorgdrager
lly submit a Python job to the standalone cluster to > run through the following command > > .bin/start-cluster.sh > ./bin/flink run --target remote \ > -m localhost:8086 \ > -pyarch /Users/duanchen/venv/venv.zip \ > -pyexec venv.zip/venv/bin/python \ > --parallelism 1 \ >

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
Hi Wouter, The JIRA is https://issues.apache.org/jira/browse/FLINK-23309. `bundle time` is from the perspective of your e2e latency. Regarding the `bundle size`, generally larger value will provide better throughput, but it should not be set too large, which may cause no output to be seen downstre

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
Hi Xingbo, all, That is good to know, thank you. Is there any Jira issue I can track? I'm curious to follow this progress! Do you have any recommendations with regard to these two configuration values, to get somewhat reasonable performance? Thanks a lot! Wouter On Thu, 8 Jul 2021 at 10:26, Xing

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
Hi Wouter, In fact, our users have encountered the same problem. Whenever the `bundle size` or `bundle time` is reached, the data in the buffer needs to be sent from the jvm to the pvm, and then waits for the pym to be processed and sent back to the jvm to send all the results to the downstream op

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
Hi Dian, all, I will come back to the other points asap. However, I’m still confused about this performance. Is this what I can expect in PyFlink in terms of performance? ~ 1000ms latency for single events? I also had a very simple setup where I send 1000 events to Kafka per second and response t

Re: PyFlink performance and deployment issues

2021-07-07 Thread Dian Fu
Hi Wouter, 1) Regarding the performance difference between Beam and PyFlink, I guess it’s because you are using an in-memory runner when running it locally in Beam. In that case, the code path is totally differently compared to running in a remote cluster. 2) Regarding to `flink run`, I’m surpr

Re: PyFlink performance and deployment issues

2021-07-07 Thread Xingbo Huang
ommand .bin/start-cluster.sh ./bin/flink run --target remote \ -m localhost:8086 \ -pyarch /Users/duanchen/venv/venv.zip \ -pyexec venv.zip/venv/bin/python \ --parallelism 1 \ --python /Users/duanchen/sourcecode/pyflink-performance-demo/python/flink/flink-perf-test.py \ --jarfile /Users/duanchen/

PyFlink performance and deployment issues

2021-07-06 Thread Wouter Zorgdrager
Dear community, I have been struggling a lot with the deployment of my PyFlink job. Moreover, the performance seems to be very disappointing especially the low-throughput latency. I have been playing around with configuration values, but it has not been improving. In short, I have a Datastream job