r Zorgdrager mailto:zorgdrag...@gmail.com>>
> Date: Thu, 8 Jul 2021 at 12:20
> Subject: Re: PyFlink performance and deployment issues
> To: Xingbo Huang mailto:hxbks...@gmail.com>>
>
>
> HI Xingbo, all,
>
> Regarding point 2, I actually made a mistake there. I
ct: Re: PyFlink performance and deployment issues
To: Xingbo Huang
HI Xingbo, all,
Regarding point 2, I actually made a mistake there. I picked port 8081
(WebUI port) rather than the job submission port (--target remote -m
localhost:8081). For some reason, this does not give an error or warning
and just
Hi Wouter,
The JIRA is https://issues.apache.org/jira/browse/FLINK-23309. `bundle
time` is from the perspective of your e2e latency. Regarding the `bundle
size`, generally larger value will provide better throughput, but it should
not be set too large, which may cause no output to be seen downstre
Hi Xingbo, all,
That is good to know, thank you. Is there any Jira issue I can track? I'm
curious to follow this progress! Do you have any recommendations with
regard to these two configuration values, to get somewhat reasonable
performance?
Thanks a lot!
Wouter
On Thu, 8 Jul 2021 at 10:26, Xing
Hi Wouter,
In fact, our users have encountered the same problem. Whenever the `bundle
size` or `bundle time` is reached, the data in the buffer needs to be sent
from the jvm to the pvm, and then waits for the pym to be processed and
sent back to the jvm to send all the results to the downstream op
Hi Dian, all,
I will come back to the other points asap. However, I’m still confused
about this performance. Is this what I can expect in PyFlink in terms of
performance? ~ 1000ms latency for single events? I also had a very simple
setup where I send 1000 events to Kafka per second and response
t
Hi Wouter,
1) Regarding the performance difference between Beam and PyFlink, I guess it’s
because you are using an in-memory runner when running it locally in Beam. In
that case, the code path is totally differently compared to running in a remote
cluster.
2) Regarding to `flink run`, I’m surpr
Hi Wouter,
Sorry for the late reply. I will try to answer your questions in detail.
1. >>> Perforce problem.
When running udf job locally, beam will use a loopback way to connect back
to the python process used by the compilation job, so the time of starting
up the job will come faster than pyflin
Dear community,
I have been struggling a lot with the deployment of my PyFlink job.
Moreover, the performance seems to be very disappointing especially the
low-throughput latency. I have been playing around with configuration
values, but it has not been improving.
In short, I have a Datastream job