Hey Guys, I have been trying to benchmark pulsar and get some E2E latency data from it, and try to stress it in regards to End to End Latency. Namely, we have been trying to stress it in a way that we would see End to End Latency increase in some sort of way between say, 5ms and 50ms.
We have encountered some strange behavior doing this, and was wondering if you guys had any insights on how to generate the information we are trying to gather. Weird behavior we are seeing: 1. The clients actually run out of resource space before the broker does for a non persistent workload. Using Openmessaging benchmark, we run workloads with number of topics varying from 3-243 topics, and up to 600,000 msg/s, non persistent topics, one broker. We are finding with under 6 local workers working as producers and consumers, that the clients usually run out of resources first, skewing our End to End latency statistics. Is this...in line with what you see? Is it normal for clients to run out of resources before the broker starts seeing latency gain? 2. If you raise rate on single topic, it will have higher end to end latency. But if you add another topic at a lower rate, that end to end latency does not increase until resource utilization is constrained. We ran the following experiments(1 broker, non persistent topics) 2 local workers run a workload on 50 topics with 200,000 msg/s aggregate message rate. 2 Other local workers run a workload on 1 topic at 10,000 msg/s, with 1 broker on non persistent topic. The single topic at 10,000 msg/s sees latency around 2ms, while the "background" workload of 50 topics with 200,000 msg/s sees latency in the 20ms range. Do you have any idea why we would see this behavior? Overall, we are looking to stress the brokers without stressing the clients first to see how number of topics and message rate affects pulsar as a whole. Rather than seeing any linear or explainable increase in latency, we have been seeing a pretty flat latency curve(2-5ms) followed by a huge spike at some workload level(somewhere around 100ms). Is there some way that you know of to see some kind of normalized latency increase that is not due to resource utilization(our first guess as to why we see such high spikes in latency). Would this be better for the Users group? I wasn't really sure which one to send to, but the devs may have a better idea for some of the behavior that we are seeing. Thanks, Tyler Landle