Hi,
I am currently evaluating PyFlink in comparison to Java and did some various
tests, mainly comparing identical pipelines with focus on throughput.
For me it seems, that PyFlink is generally worse for wear and seems to reach
its limits in throughput at a point where Java still has resources l
On Mon, Apr 15, 2024 at 16:17 Niklas Wilcke wrote:
> Hi Flink Community,
> u
> I wanted to reach out to you to get some input about Pyflink performance.
> Are there any resources available about Pyflink benchmarks and maybe a
> comparison with the Java API? I wasn't ab
> Best,
> Zhanghao Chen
>
> From: Niklas Wilcke
> Sent: Monday, April 15, 2024 15:17
> To: user
> Subject: Pyflink Performance and Benchmark
>
> Hi Flink Community,
>
> I wanted to reach out to you to get some input about Pyflink performance. Are
> there any resou
the JSON processing
scenario with UDFs in Java/Python under thread mode/Python under process mode.
Best,
Zhanghao Chen
From: Niklas Wilcke
Sent: Monday, April 15, 2024 15:17
To: user
Subject: Pyflink Performance and Benchmark
Hi Flink Community,
I wanted to reach
Hi Flink Community,
I wanted to reach out to you to get some input about Pyflink performance. Are
there any resources available about Pyflink benchmarks and maybe a comparison
with the Java API? I wasn't able to find something valuable, but maybe I missed
something?
I am aware
f job locally, beam will use a loopback way to connect back to
> the python process used by the compilation job, so the time of starting up
> the job will come faster than pyflink which will create a new python process
> to execute udf code.
>
> 2. >>> However, this com
lly submit a Python job to the standalone cluster to
> run through the following command
>
> .bin/start-cluster.sh
> ./bin/flink run --target remote \
> -m localhost:8086 \
> -pyarch /Users/duanchen/venv/venv.zip \
> -pyexec venv.zip/venv/bin/python \
> --parallelism 1 \
>
Hi Wouter,
The JIRA is https://issues.apache.org/jira/browse/FLINK-23309. `bundle
time` is from the perspective of your e2e latency. Regarding the `bundle
size`, generally larger value will provide better throughput, but it should
not be set too large, which may cause no output to be seen downstre
Hi Xingbo, all,
That is good to know, thank you. Is there any Jira issue I can track? I'm
curious to follow this progress! Do you have any recommendations with
regard to these two configuration values, to get somewhat reasonable
performance?
Thanks a lot!
Wouter
On Thu, 8 Jul 2021 at 10:26, Xing
Hi Wouter,
In fact, our users have encountered the same problem. Whenever the `bundle
size` or `bundle time` is reached, the data in the buffer needs to be sent
from the jvm to the pvm, and then waits for the pym to be processed and
sent back to the jvm to send all the results to the downstream op
Hi Dian, all,
I will come back to the other points asap. However, I’m still confused
about this performance. Is this what I can expect in PyFlink in terms of
performance? ~ 1000ms latency for single events? I also had a very simple
setup where I send 1000 events to Kafka per second and response
t
Hi Wouter,
1) Regarding the performance difference between Beam and PyFlink, I guess it’s
because you are using an in-memory runner when running it locally in Beam. In
that case, the code path is totally differently compared to running in a remote
cluster.
2) Regarding to `flink run`, I’m surpr
ommand
.bin/start-cluster.sh
./bin/flink run --target remote \
-m localhost:8086 \
-pyarch /Users/duanchen/venv/venv.zip \
-pyexec venv.zip/venv/bin/python \
--parallelism 1 \
--python
/Users/duanchen/sourcecode/pyflink-performance-demo/python/flink/flink-perf-test.py
\
--jarfile
/Users/duanchen/
Dear community,
I have been struggling a lot with the deployment of my PyFlink job.
Moreover, the performance seems to be very disappointing especially the
low-throughput latency. I have been playing around with configuration
values, but it has not been improving.
In short, I have a Datastream job
14 matches
Mail list logo