[grpc-io] Re: Scala gRPC High CPU Utilization

'Carl Mastrangelo' via grpc.io Fri, 31 Aug 2018 16:07:04 -0700

It should be one ManagedChannel per host*.  There are a lot of batching 
opportunities possible when using just one channel.  For example, TLS 
encryption can work on larger block sizes at a time.  Another example is 
that netty can poll on fewer threads, meaning fewer wakeups across all your 
threads.  Just to make sure we are on the same page:


- Only use one ManagedChannel per "target"  (a.k.a. hostname)
- Use a limited executor (like ForkJoinPool) for each channel.  You can 
share this executor across channels if your RPCs don't block very long
- Use a single Netty EventLoopGroup, and limit the number of loops.  You 
can share the group across all your channels.
- Use Netty tcnative for SSL / TLS if you aren't already.  This is 
enormously faster than the SSL that comes with the JDK.

These also apply the server as well.


* there are rare cases to break this rule, but they don't sound like they 
apply to your usage.

On Friday, August 31, 2018 at 2:02:18 PM UTC-7, Kos wrote:
>
> Hi Carl,
>
> I did run a Yourkit run against my service and what I see is many threads 
> being created for the event loop group - they're all named something like: 
> 'grpc-default-worker-ELG-...'. I did some reading on your other posts and 
> saw you recommended using an ELG bounded to 1-2 threads. I tried this and I 
> see our CPU utilization drop by about 10-15% with no loss in throughput! 
>
> This got me thinking and I'm wondering if my problem is actually having 
> too many managed channels? Some background, this service creates a managed 
> channel per each node that it talks to (about 40 or so). For the traffic it 
> receives, it does some filtering and sends the data down N of those 
> channels. This outbound throughput is quite low - on the order of 10k/sec 
> across all the channels. All of these managed channels now share the same 
> NIO ELG and they use the same ForkJoinPool.commonPool as you recommend. I'm 
> wondering though if one managed channel per host is the correct approach 
> here? Or should I make my own 'ManagedChannel' and create a subchannel per 
> host?
>
> Thanks!
>
>
>
>
>
> On Wednesday, August 29, 2018 at 2:28:00 PM UTC-7, Carl Mastrangelo wrote:
>>
>> More info is needed to figure out why this is slow.  Have you use 
>> JProfiler or Yourkit before?  There are a couple Java profilers (perf, 
>> even) that can tell you where the CPU is going to.  Also, you should 
>> consider turning on gc logging to see if memory is being consumed too 
>> fast.  
>>
>> Our tuned benchmarks get about 5000qps per core, but that took profiling 
>> before we could get that fast.   The general approach is figure out whats 
>> slow, and then fix that.   Without knowing whats slow for your test, its 
>> hard to recommend a fix.
>>
>> On Tuesday, August 28, 2018 at 2:14:31 PM UTC-7, [email protected] wrote:
>>>
>>> Hi Carl,
>>>
>>> Thanks for responding! I've tried a couple different executors and they 
>>> don't seem to change the behavior. I've done FixedThreadPool with the 
>>> number of threads = # of cores * 2, the ForkJoinPool.commonPool as you 
>>> recommended, and the Scala global ExecutionContext which ultimately is a 
>>> ForkJoinPool as well. I've set this in the NettyServerBuilder as well as 
>>> the call to bind my service.  
>>>
>>> For some more information, here's results from a gatling test run that 
>>> lasted 10 minutes using the CommonPool. Server implementation now looks 
>>> like this:
>>>
>>> val realtimeServiceWithMonitoring =
>>>   ServerInterceptors.intercept(
>>>     RealtimePublishGrpc.bindService(realtimeService, 
>>> ExecutionContext.global),
>>>     serverInterceptor)
>>> val rppServiceWithMonitoring = ServerInterceptors.intercept(
>>>   RealtimeProxyGrpc.bindService(realtimePublishProxyService, 
>>> ExecutionContext.global),
>>>   serverInterceptor
>>> )
>>>
>>>
>>> NettyServerBuilder
>>>   .forPort(*8086*)
>>>   .sslContext(serverGrpcSslContexts)
>>>   .addService(realtimeServiceWithMonitoring)
>>>   .addService(batchPublishWithMonitoring)
>>>   .addService(rppServiceWithMonitoring)
>>>   .executor(ForkJoinPool.commonPool())
>>>   .build()
>>>
>>>
>>> My service implementation immediately returns Future.successful:
>>>
>>> override def publish(request: PublishRequest): Future[PublishResponse] = {
>>>   logger.debug("Received Publish request: " + request)
>>>   Future.successful(PublishResponse())
>>>
>>> }
>>>
>>>
>>> Test Results:
>>>
>>> ================================================================================
>>> ---- Global Information 
>>> --------------------------------------------------------
>>> > request count                                     208686 (OK=208686 
>>> KO=0     )
>>> > min response time                                    165 (OK=165    
>>> KO=-     )
>>> > max response time                                   2997 (OK=2997  
>>>  KO=-     )
>>> > mean response time                                   287 (OK=287    
>>> KO=-     )
>>> > std deviation                                        145 (OK=145    
>>> KO=-     )
>>> > response time 50th percentile                        232 (OK=232    
>>> KO=-     )
>>> > response time 75th percentile                        324 (OK=324    
>>> KO=-     )
>>> > response time 95th percentile                        501 (OK=501    
>>> KO=-     )
>>> > response time 99th percentile                        894 (OK=893    
>>> KO=-     )
>>> > mean requests/sec                                347.231 (OK=347.231 
>>> KO=-     )
>>> ---- Response Time Distribution 
>>> ------------------------------------------------
>>> > t < 800 ms                                        206014 ( 99%)
>>> > 800 ms < t < 1200 ms                                1511 (  1%)
>>> > t > 1200 ms                                         1161 (  1%)
>>> > failed                                                 0 (  0%)
>>>
>>> ================================================================================
>>>
>>> 347 Requests/Sec. CPU Utilization hovers between 29% - 35%. 
>>>
>>>
>>> Thanks for your help!
>>>
>>>
>>>
>>> On Tuesday, August 28, 2018 at 12:49:38 PM UTC-7, Carl Mastrangelo wrote:
>>>>
>>>> Can you try setting  the executor on both the channel and the server 
>>>> builder?   I would recommend ForkJoinPool.commonPool().
>>>>
>>>> On Monday, August 27, 2018 at 11:54:19 PM UTC-7, Kos wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using gRPC in a new Scala service and I'm seeing unexpectedly high 
>>>>> CPU utilization. I see this high utilization in our production workload 
>>>>> but 
>>>>> also am able to reproduce via performance tests which I'll describe 
>>>>> below. 
>>>>>
>>>>> My setup is using grpc-netty-shaded 1.10 (but i've also repro'd with 
>>>>> 1.14). My performance test uses mTLS to talk to the service. The service 
>>>>> is 
>>>>> deployed on a container with 6 cores and 2 gb ram. I've reduced the 
>>>>> footprint of my service to immediately return with a response without 
>>>>> doing 
>>>>> any other work to try and identify if it's the application or something 
>>>>> to 
>>>>> do with my gRPC configuration.
>>>>>
>>>>> My performance test is issuing about 250 requests a second using one 
>>>>> Managed Channel to one instance of my service. The data in each request 
>>>>> is 
>>>>> about 10 bytes. With this workload, my service is running at about 35% 
>>>>> CPU, 
>>>>> which I feel is far too high for this small amount of rps.
>>>>>
>>>>> Here is how I've constructed my server:
>>>>>
>>>>> val serverInterceptor = 
>>>>> MonitoringServerInterceptor.create(Configuration.allMetrics())
>>>>>
>>>>>
>>>>> val realtimeServiceWithMonitoring = ServerInterceptors.intercept(
>>>>>   RealtimePublishGrpc.bindService(realtimeService, 
>>>>> ExecutionContext.global),
>>>>>   serverInterceptor)
>>>>> val rppServiceWithMonitoring = ServerInterceptors.intercept(
>>>>>   RealtimeProxyGrpc.bindService(realtimePublishProxyService, 
>>>>> ExecutionContext.global),
>>>>>   serverInterceptor
>>>>> )
>>>>>
>>>>>
>>>>>   val keyManagerFactory = GrpcSSLHelper.getKeyManagerFactory
>>>>> (sslConfig)
>>>>>   val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory
>>>>> (sslConfig)
>>>>>   val serverGrpcSslContexts = GrpcSSLHelper.getServerSslContext
>>>>> (keyManagerFactory, trustManagerFactory)
>>>>>
>>>>>   NettyServerBuilder
>>>>>     .forPort(8086)
>>>>>     .sslContext(serverGrpcSslContexts)
>>>>>     .addService(realtimeServiceWithMonitoring)
>>>>>     .addService(rppServiceWithMonitoring)
>>>>>     .build()
>>>>> }
>>>>>
>>>>>
>>>>> The server interceptor is modeled after: 
>>>>> https://github.com/grpc-ecosystem/java-grpc-prometheus
>>>>>
>>>>> The managed channel is constructed as such:
>>>>>
>>>>> private val interceptor = 
>>>>> MonitoringClientInterceptor.create(Configuration.allMetrics())
>>>>>
>>>>>
>>>>> val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory(sslConfig)
>>>>>
>>>>> NettyChannelBuilder
>>>>>   .forAddress(address, *8086*)
>>>>>   .intercept(interceptor)
>>>>>   .negotiationType(NegotiationType.TLS)
>>>>>   .sslContext(GrpcSSLHelper.getClientSslContext(keyManagerFactory, 
>>>>> trustManagerFactory))
>>>>>   .build()
>>>>>
>>>>>
>>>>> Finally, I use non-blocking stubs to issue the gRPC request.
>>>>>
>>>>> Any help would be greatly appreciated. Thanks!
>>>>> -K
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/775e9bd4-34eb-4975-a8b5-01094007d57c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: Scala gRPC High CPU Utilization

Reply via email to