Re: PyFlink performance and deployment issues

2021-08-15 Thread Dian Fu
r Zorgdrager mailto:zorgdrag...@gmail.com>> > Date: Thu, 8 Jul 2021 at 12:20 > Subject: Re: PyFlink performance and deployment issues > To: Xingbo Huang mailto:hxbks...@gmail.com>> > > > HI Xingbo, all, > > Regarding point 2, I actually made a mistake there. I

Fwd: PyFlink performance and deployment issues

2021-08-14 Thread Wouter Zorgdrager
ct: Re: PyFlink performance and deployment issues To: Xingbo Huang HI Xingbo, all, Regarding point 2, I actually made a mistake there. I picked port 8081 (WebUI port) rather than the job submission port (--target remote -m localhost:8081). For some reason, this does not give an error or warning and just

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
Hi Wouter, The JIRA is https://issues.apache.org/jira/browse/FLINK-23309. `bundle time` is from the perspective of your e2e latency. Regarding the `bundle size`, generally larger value will provide better throughput, but it should not be set too large, which may cause no output to be seen downstre

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
Hi Xingbo, all, That is good to know, thank you. Is there any Jira issue I can track? I'm curious to follow this progress! Do you have any recommendations with regard to these two configuration values, to get somewhat reasonable performance? Thanks a lot! Wouter On Thu, 8 Jul 2021 at 10:26, Xing

Re: PyFlink performance and deployment issues

2021-07-08 Thread Xingbo Huang
Hi Wouter, In fact, our users have encountered the same problem. Whenever the `bundle size` or `bundle time` is reached, the data in the buffer needs to be sent from the jvm to the pvm, and then waits for the pym to be processed and sent back to the jvm to send all the results to the downstream op

Re: PyFlink performance and deployment issues

2021-07-08 Thread Wouter Zorgdrager
Hi Dian, all, I will come back to the other points asap. However, I’m still confused about this performance. Is this what I can expect in PyFlink in terms of performance? ~ 1000ms latency for single events? I also had a very simple setup where I send 1000 events to Kafka per second and response t

Re: PyFlink performance and deployment issues

2021-07-07 Thread Dian Fu
Hi Wouter, 1) Regarding the performance difference between Beam and PyFlink, I guess it’s because you are using an in-memory runner when running it locally in Beam. In that case, the code path is totally differently compared to running in a remote cluster. 2) Regarding to `flink run`, I’m surpr

Re: PyFlink performance and deployment issues

2021-07-07 Thread Xingbo Huang
Hi Wouter, Sorry for the late reply. I will try to answer your questions in detail. 1. >>> Perforce problem. When running udf job locally, beam will use a loopback way to connect back to the python process used by the compilation job, so the time of starting up the job will come faster than pyflin

PyFlink performance and deployment issues

2021-07-06 Thread Wouter Zorgdrager
Dear community, I have been struggling a lot with the deployment of my PyFlink job. Moreover, the performance seems to be very disappointing especially the low-throughput latency. I have been playing around with configuration values, but it has not been improving. In short, I have a Datastream job