I'll do some further experimentation based on what you mention. Thanks
On Monday, February 5, 2018 at 5:01:15 PM UTC-8, Carl Mastrangelo wrote: > > Ah, I thought you were trying to measure latency of a single RPC. We > have 2 QPS benchmarks, an open loop and a closed loop benchmark. For the > closed loop, it runs the single-rpc latency benchmark in parallel with 200 > copies. This means there are only ever 200 active RPCs at a time. The > latecny is recorded, but not published anywhere. > > From your description, the open-loop benchmark sounds more like what you > are doing. We have a client that has a target QPS, and uses an > exponentially distributed delay between starting RPCs. This simulates real > traffic better and has occasional bursts of RPCs. We use this to measure > CPU while holding the QPS constant. > > > Larger payloads making them system faster is odd, and may be explained by > your benchmark machine. For example, if there is no work for gRPC to do, > it will go to sleep. When the amount of work is too low, it spends a lot > of time waking up and going back to sleep, lowering the overall > performance. Strangely, by adding more work (with bigger payloads), the > system never goes to sleep and thus accomplishes more real work. We work > around this by trying to keep the machine as close to 100% CPU as possible > without going over. Additionally, we disable CPU frequency scaling to > ensure stable results. (The CPU down-clocks while waiting for network > traffic, and doesn't speed back up fast enough when there is data). > > > We benchmark almost exclusively on Linux. > > > > > On Monday, February 5, 2018 at 4:32:55 PM UTC-8, [email protected] wrote: >> >> We actually have 8 threads sending bursts of requests simultaneously and >> measuring each request individually. We are using bursts of request and >> then waiting for some time to avoid hammering the server with huge amount >> of requests. It seems you are describing that it is only one client that >> sends one request only and then waits till the response to send another >> request. We are not doing that, we are simulating some kind of QPS >> approximation and measuring the latency. >> >> The behavior I'm seeing is that smaller payloads are slower than the >> bigger payloads. I was thinking it maybe had to do with some buffer taking >> longer to be filled and sent over the wire. >> >> The results you mention are they running on the Windows stack? >> >> Thanks >> >> Eduardo >> >> On Monday, February 5, 2018 at 4:24:15 PM UTC-8, Carl Mastrangelo wrote: >>> >>> By closed loop i mean starting a new RPC upon completion of one. I >>> think that is the same as your option b). These should be always faster >>> with small payloads than larger payloads, which it seems like you are >>> saying is happening? >>> >>> >>> We have closed loop latency tests that use a 1 byte payload, and measure >>> the 50th and 99th percentiles. We see about 100us per RPC at 50th. >>> >>> >>> >>> On Monday, February 5, 2018 at 4:16:29 PM UTC-8, [email protected] >>> wrote: >>>> >>>> With closed loop do you mean >>>> >>>> a) using loopback? >>>> b) measuring from when the request is made and finish measuring when >>>> the response gets back? >>>> >>>> In the test we have, we are not using loopback (two vms over the >>>> network) and we start measuring right before calling into >>>> ClientAsyncResponseReader and calling into Finish and we stop measuring >>>> when we get back the response and our callback gets called. >>>> >>>> If closed loop means something else please explain further. >>>> >>>> I may be able to share the code but before I go through that process do >>>> you have any general suggestions that I can try or consider? >>>> >>>> Thanks >>>> >>>> Eduardo >>>> >>>> >>>> On Monday, February 5, 2018 at 3:43:34 PM UTC-8, Carl Mastrangelo wrote: >>>>> >>>>> Are you doing a closed loop latency test like gRPC benchmarking does? >>>>> Also, can you show your code? >>>>> >>>>> On Monday, February 5, 2018 at 3:10:03 PM UTC-8, [email protected] >>>>> wrote: >>>>>> >>>>>> Hi, I'm working on a custom latency test. I'm using payloads of sizes >>>>>> 1 byte, 200 bytes, 1kb and 10kb. The tests of 1 byte show a very big >>>>>> difference from the rest of the payloads. (longer/worse latency). >>>>>> >>>>>> I'm working on grpc for c++ on Windows. I'm guessing this has to do >>>>>> with some http2 packing or optimization logic meaning that it is taking >>>>>> longer for the packets to be sent until a buffer is filled. >>>>>> >>>>>> What are the configuration I should look on modifying to see if I can >>>>>> improve this behavior? >>>>>> >>>>>> I've tried looking around in >>>>>> >>>>>> https://github.com/grpc/grpc/blob/master/include/grpc/grpc.h >>>>>> >>>>>> and in >>>>>> >>>>>> >>>>>> https://github.com/grpc/grpc/blob/master/include/grpc/impl/codegen/grpc_types.h >>>>>> >>>>>> with no luck. What do you suggest? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Eduardo >>>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/2b8c6364-44d0-4516-bbe9-a2b1655e2682%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
