It should be one ManagedChannel per host*. There are a lot of batching opportunities possible when using just one channel. For example, TLS encryption can work on larger block sizes at a time. Another example is that netty can poll on fewer threads, meaning fewer wakeups across all your threads. Just to make sure we are on the same page:
- Only use one ManagedChannel per "target" (a.k.a. hostname) - Use a limited executor (like ForkJoinPool) for each channel. You can share this executor across channels if your RPCs don't block very long - Use a single Netty EventLoopGroup, and limit the number of loops. You can share the group across all your channels. - Use Netty tcnative for SSL / TLS if you aren't already. This is enormously faster than the SSL that comes with the JDK. These also apply the server as well. * there are rare cases to break this rule, but they don't sound like they apply to your usage. On Friday, August 31, 2018 at 2:02:18 PM UTC-7, Kos wrote: > > Hi Carl, > > I did run a Yourkit run against my service and what I see is many threads > being created for the event loop group - they're all named something like: > 'grpc-default-worker-ELG-...'. I did some reading on your other posts and > saw you recommended using an ELG bounded to 1-2 threads. I tried this and I > see our CPU utilization drop by about 10-15% with no loss in throughput! > > This got me thinking and I'm wondering if my problem is actually having > too many managed channels? Some background, this service creates a managed > channel per each node that it talks to (about 40 or so). For the traffic it > receives, it does some filtering and sends the data down N of those > channels. This outbound throughput is quite low - on the order of 10k/sec > across all the channels. All of these managed channels now share the same > NIO ELG and they use the same ForkJoinPool.commonPool as you recommend. I'm > wondering though if one managed channel per host is the correct approach > here? Or should I make my own 'ManagedChannel' and create a subchannel per > host? > > Thanks! > > > > > > On Wednesday, August 29, 2018 at 2:28:00 PM UTC-7, Carl Mastrangelo wrote: >> >> More info is needed to figure out why this is slow. Have you use >> JProfiler or Yourkit before? There are a couple Java profilers (perf, >> even) that can tell you where the CPU is going to. Also, you should >> consider turning on gc logging to see if memory is being consumed too >> fast. >> >> Our tuned benchmarks get about 5000qps per core, but that took profiling >> before we could get that fast. The general approach is figure out whats >> slow, and then fix that. Without knowing whats slow for your test, its >> hard to recommend a fix. >> >> On Tuesday, August 28, 2018 at 2:14:31 PM UTC-7, [email protected] wrote: >>> >>> Hi Carl, >>> >>> Thanks for responding! I've tried a couple different executors and they >>> don't seem to change the behavior. I've done FixedThreadPool with the >>> number of threads = # of cores * 2, the ForkJoinPool.commonPool as you >>> recommended, and the Scala global ExecutionContext which ultimately is a >>> ForkJoinPool as well. I've set this in the NettyServerBuilder as well as >>> the call to bind my service. >>> >>> For some more information, here's results from a gatling test run that >>> lasted 10 minutes using the CommonPool. Server implementation now looks >>> like this: >>> >>> val realtimeServiceWithMonitoring = >>> ServerInterceptors.intercept( >>> RealtimePublishGrpc.bindService(realtimeService, >>> ExecutionContext.global), >>> serverInterceptor) >>> val rppServiceWithMonitoring = ServerInterceptors.intercept( >>> RealtimeProxyGrpc.bindService(realtimePublishProxyService, >>> ExecutionContext.global), >>> serverInterceptor >>> ) >>> >>> >>> NettyServerBuilder >>> .forPort(*8086*) >>> .sslContext(serverGrpcSslContexts) >>> .addService(realtimeServiceWithMonitoring) >>> .addService(batchPublishWithMonitoring) >>> .addService(rppServiceWithMonitoring) >>> .executor(ForkJoinPool.commonPool()) >>> .build() >>> >>> >>> My service implementation immediately returns Future.successful: >>> >>> override def publish(request: PublishRequest): Future[PublishResponse] = { >>> logger.debug("Received Publish request: " + request) >>> Future.successful(PublishResponse()) >>> >>> } >>> >>> >>> Test Results: >>> >>> ================================================================================ >>> ---- Global Information >>> -------------------------------------------------------- >>> > request count 208686 (OK=208686 >>> KO=0 ) >>> > min response time 165 (OK=165 >>> KO=- ) >>> > max response time 2997 (OK=2997 >>> KO=- ) >>> > mean response time 287 (OK=287 >>> KO=- ) >>> > std deviation 145 (OK=145 >>> KO=- ) >>> > response time 50th percentile 232 (OK=232 >>> KO=- ) >>> > response time 75th percentile 324 (OK=324 >>> KO=- ) >>> > response time 95th percentile 501 (OK=501 >>> KO=- ) >>> > response time 99th percentile 894 (OK=893 >>> KO=- ) >>> > mean requests/sec 347.231 (OK=347.231 >>> KO=- ) >>> ---- Response Time Distribution >>> ------------------------------------------------ >>> > t < 800 ms 206014 ( 99%) >>> > 800 ms < t < 1200 ms 1511 ( 1%) >>> > t > 1200 ms 1161 ( 1%) >>> > failed 0 ( 0%) >>> >>> ================================================================================ >>> >>> 347 Requests/Sec. CPU Utilization hovers between 29% - 35%. >>> >>> >>> Thanks for your help! >>> >>> >>> >>> On Tuesday, August 28, 2018 at 12:49:38 PM UTC-7, Carl Mastrangelo wrote: >>>> >>>> Can you try setting the executor on both the channel and the server >>>> builder? I would recommend ForkJoinPool.commonPool(). >>>> >>>> On Monday, August 27, 2018 at 11:54:19 PM UTC-7, Kos wrote: >>>>> >>>>> Hi, >>>>> >>>>> I'm using gRPC in a new Scala service and I'm seeing unexpectedly high >>>>> CPU utilization. I see this high utilization in our production workload >>>>> but >>>>> also am able to reproduce via performance tests which I'll describe >>>>> below. >>>>> >>>>> My setup is using grpc-netty-shaded 1.10 (but i've also repro'd with >>>>> 1.14). My performance test uses mTLS to talk to the service. The service >>>>> is >>>>> deployed on a container with 6 cores and 2 gb ram. I've reduced the >>>>> footprint of my service to immediately return with a response without >>>>> doing >>>>> any other work to try and identify if it's the application or something >>>>> to >>>>> do with my gRPC configuration. >>>>> >>>>> My performance test is issuing about 250 requests a second using one >>>>> Managed Channel to one instance of my service. The data in each request >>>>> is >>>>> about 10 bytes. With this workload, my service is running at about 35% >>>>> CPU, >>>>> which I feel is far too high for this small amount of rps. >>>>> >>>>> Here is how I've constructed my server: >>>>> >>>>> val serverInterceptor = >>>>> MonitoringServerInterceptor.create(Configuration.allMetrics()) >>>>> >>>>> >>>>> val realtimeServiceWithMonitoring = ServerInterceptors.intercept( >>>>> RealtimePublishGrpc.bindService(realtimeService, >>>>> ExecutionContext.global), >>>>> serverInterceptor) >>>>> val rppServiceWithMonitoring = ServerInterceptors.intercept( >>>>> RealtimeProxyGrpc.bindService(realtimePublishProxyService, >>>>> ExecutionContext.global), >>>>> serverInterceptor >>>>> ) >>>>> >>>>> >>>>> val keyManagerFactory = GrpcSSLHelper.getKeyManagerFactory >>>>> (sslConfig) >>>>> val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory >>>>> (sslConfig) >>>>> val serverGrpcSslContexts = GrpcSSLHelper.getServerSslContext >>>>> (keyManagerFactory, trustManagerFactory) >>>>> >>>>> NettyServerBuilder >>>>> .forPort(8086) >>>>> .sslContext(serverGrpcSslContexts) >>>>> .addService(realtimeServiceWithMonitoring) >>>>> .addService(rppServiceWithMonitoring) >>>>> .build() >>>>> } >>>>> >>>>> >>>>> The server interceptor is modeled after: >>>>> https://github.com/grpc-ecosystem/java-grpc-prometheus >>>>> >>>>> The managed channel is constructed as such: >>>>> >>>>> private val interceptor = >>>>> MonitoringClientInterceptor.create(Configuration.allMetrics()) >>>>> >>>>> >>>>> val trustManagerFactory = GrpcSSLHelper.getTrustManagerFactory(sslConfig) >>>>> >>>>> NettyChannelBuilder >>>>> .forAddress(address, *8086*) >>>>> .intercept(interceptor) >>>>> .negotiationType(NegotiationType.TLS) >>>>> .sslContext(GrpcSSLHelper.getClientSslContext(keyManagerFactory, >>>>> trustManagerFactory)) >>>>> .build() >>>>> >>>>> >>>>> Finally, I use non-blocking stubs to issue the gRPC request. >>>>> >>>>> Any help would be greatly appreciated. Thanks! >>>>> -K >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/775e9bd4-34eb-4975-a8b5-01094007d57c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
