Hi Wouter, In fact, our users have encountered the same problem. Whenever the `bundle size` or `bundle time` is reached, the data in the buffer needs to be sent from the jvm to the pvm, and then waits for the pym to be processed and sent back to the jvm to send all the results to the downstream operator, which leads to a large delay, especially when it is a small size event as small messages are hard to be processed in pipeline.
I have been solving this problem recently and I plan to make this optimization to release-1.14. Best, Xingbo Wouter Zorgdrager <zorgdrag...@gmail.com> 于2021年7月8日周四 下午3:41写道: > Hi Dian, all, > > I will come back to the other points asap. However, I’m still confused > about this performance. Is this what I can expect in PyFlink in terms of > performance? ~ 1000ms latency for single events? I also had a very simple > setup where I send 1000 events to Kafka per second and response > times/latencies was around 15 seconds for single events. I understand there > is some Python/JVM overhead but since Flink is so performant, I would > expect much better numbers. In the current situation, PyFlink would just be > unusable if you care about latency. Is this something that you expect to be > improved in the future? > > I will verify how this works out for Beam in a remote environment. > > Thanks again! > Wouter > > > On Thu, 8 Jul 2021 at 08:28, Dian Fu <dian0511...@gmail.com> wrote: > >> Hi Wouter, >> >> 1) Regarding the performance difference between Beam and PyFlink, I guess >> it’s because you are using an in-memory runner when running it locally in >> Beam. In that case, the code path is totally differently compared to >> running in a remote cluster. >> 2) Regarding to `flink run`, I’m surprising that it’s running locally. >> Could you submit a java job with similar commands to see how it runs? >> 3) Regarding to `flink run-application`, could you share the exception >> stack? >> >> Regards, >> Dian >> >> 2021年7月6日 下午4:58,Wouter Zorgdrager <zorgdrag...@gmail.com> 写道: >> >> uses >> >> >>