Re: [grpc-io] Re: A single synchronous call takes ~230ms on local host!?! What am I doing wrong?

'Carl Mastrangelo' via grpc.io Wed, 05 Sep 2018 11:52:22 -0700

Correct me if I am mistaken, but gettimeofday will use the the realtime
clock rather than the monotonic clock.  The latter I believe is the correct
way to measure time differences.


Second, if you only issue one RPC, it won't be as fast.  Code won't be in
cache, threads, won't be active, and TCP windows won't be large enough.

Last: flatbuffers I believe are fixed size, which means that as you update
your proto with new fields, you'll still have the old fields stick around.
This will put you in a situation where you have to either reuse fields
(which is suspect from a API compatibility POV) or accept the larger size
(which hurts performance).   You might be willing to make this trade if
your API will seldom change, but generally they do change.  Plain protobuf
is slower now, for increased flexibility later.  You're certainly allowed
to swap out the serialization based on need, but I don't think the gRPC
team can recommend it.

On Wed, Sep 5, 2018 at 11:35 AM Amogh Akshintala <[email protected]>
wrote:

> Hey Carl,
>
> Thanks for your time!
>
> I see the same performance numbers as reported in the protobuf performance
> dashboard (~0.2ms per rpc) if I set up the HelloWorld client to call
> SayHello 1000 times in a tight loop and average it over the 1000 calls.
>
> How I arrived at the numbers I reported (everything is measured on
> localhost):
> The links are to my client and server code on GitHub.
> Client RPC Stub
> <https://github.com/aakshintala/darknet/blob/cf1c4dfeb2a2f1c3d123bc89f90edbb37854b25d/server/client.cpp#L100>
> :
> perform any necessary data manipulation
> Pack Request message
> *start = getTimeOfDay()*
> stub->invokeRPC()
> checkStatus()
> *end = getTimeofDay()*
> *Client RTT = end - start*
>
> In Server RPC ServiceImpl
> <https://github.com/aakshintala/darknet/blob/cf1c4dfeb2a2f1c3d123bc89f90edbb37854b25d/server/server.cpp#L43>
> :
> *start = getTimeofDay()*
> serviceRequest() <- GPU time is calculated inside this function, but
> ignore that for now.
> *end=getTimeofDay()*
> return Status
> *Server time = end - start*
>
> *gRPC + protobuf overhead = Client RTT - Server Time*
>
> I replaced protobuf with flatbuffers yesterday, after noticing (using perf)
> that a significant chunk of processing time was spent in protobuf
> serialization
> and deserialization code.
> Latency really improved with flatbuffer (no parsing, so…),
> but man is that library hard to use/debug compared to protobuf...
>
> New numbers *with flatbuffers:*
> *client RTT = ~70ms*
> *Server Time = ~40ms*
> *gRPC + flatbuffers = ~30ms (for ~4MiB of data over localhost)*
>
> Thanks for the link to pprof. Will check it out, especially if
>
> Cheers,
> Amogh Akshintala
> http://aakshintala.com
>
>
> On Wed, Sep 05, 2018 at 1:59 PM "'Carl Mastrangelo' via grpc.io" <">"'Carl
> Mastrangelo' via grpc.io" > wrote:
>
>> Our own benchmarks
>> <https://performance-dot-grpc-testing.appspot.com/explore?dashboard=5636470266134528>
>> get about 1000x better latency than that, so something is definitely up.
>> Can you describe how you arrived at that line number, or the tools you used
>> to profile?  (We use perf and pprof)
>>
>> On Monday, September 3, 2018 at 10:39:45 AM UTC-7, [email protected]
>> wrote:
>>>
>>> Hail gRPC experts (;D),
>>>
>>> I'm trying to build a image/video object detection server (as one of the
>>> reusable pieces in a benchmark suite) with low RTT requirements
>>> (near-realtime say ~60-90ms RTT)...
>>> I've used gRPC and protobuf (built from git master; hashes below in case
>>> that is relevant) for the serialization and transport.
>>> _________________________________
>>> grpc:
>>> commit dbc1e27e2e1a81b61eb064eb036ec6a267f88cb6
>>> Merge: 9bc6cd1 5d24ab9
>>> Author: Jiangtao Li <email redacted by me>
>>> Date:   Fri Jul 20 17:00:18 2018 -0700
>>>
>>> protobuf:
>>> commit b5fbb742af122b565925987e65c08957739976a7
>>> Author: Bo Yang <email redacted by me>
>>> Date:   Mon Mar 5 19:54:18 2018 -0800
>>> _________________________________
>>>
>>> gRPC seems to add inane amounts of overhead -- ~160ms (~2x the server's
>>> processing time)!
>>> For now I'm running on a single machine (a pretty beefy machine, so
>>> contention isn't an issue...) operating over localhost (loopback).
>>> The amount of data being transferred is considerable, but not unheard
>>> off (~4MiB per request).
>>>
>>> Server-side timing measurements:
>>> doDetection: new requeust 0x7ffc77f16920
>>> 0x7ffc77f16920: GPU processing took 24.045 milliseconds
>>> 0x7ffc77f16920: Server took *72.206 millisecond*
>>>
>>> Client-side measurements:
>>> 10 objects detected.
>>> This request took *234.825 milliseconds *
>>>
>>> *Client RTT - Server processing time = 234.85-72.206 = 162.644ms (!??!)*
>>> I've pinned the server and client to separate cores using taskset.
>>> There isn't anything else running on the server and it's a beefy 48 core
>>> (Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz) machine with ample RAM
>>> (128GiB), etc....
>>>
>>> As a start, I instrumented the implementation of the synchronous call
>>> in include/grpcpp/impl/codegen/client_unary_call.h:
>>> BlockingUnaryCallImpl(ChannelInterface* channel, const RpcMethod& method,
>>>                          ClientContext* context, const InputMessage&
>>> request,
>>>                          OutputMessage* result)
>>>
>>> and found that the vast majority of the time is spent spinning on a
>>> completion queue:
>>> line 107:   if (cq.Pluck(&ops)) {
>>>
>>> I wonder if I need to configure gRPC differently (perhaps the default
>>> configurations are more geared towards latency-insensitive batching?)...
>>>
>>> Any help understanding these numbers would be appreciated.
>>> Server code:
>>> https://github.com/aakshintala/darknet/blob/master/server/server.cpp
>>> Client code:
>>> https://github.com/aakshintala/darknet/blob/master/server/client.cpp
>>> Proto file:
>>> https://github.com/aakshintala/darknet/blob/master/server/darknetserver.proto
>>>
>>> Thanks in advance,
>>> Amogh Akshintala
>>> aakshintala.com
>>>
>>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "grpc.io" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/grpc-io/USjGJDmu_Hw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/c37ac6ed-9149-43dc-b9a3-5574e4eca439%40googlegroups.com
> <https://groups.google.com/d/msgid/grpc-io/c37ac6ed-9149-43dc-b9a3-5574e4eca439%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAAcqB%2BuarRqh_fzPBxEpmp6OBNnq216wOhwrnbWONV1QL%2ByFig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] Re: A single synchronous call takes ~230ms on local host!?! What am I doing wrong?

Reply via email to