On Sun, Mar 19, 2017 at 4:58 PM Alexander Petrovsky <askju...@gmail.com> wrote:
* The 99th percentile ignores the 40 slowest queries. What does the 99.9, 9.99, ... and max percentiles look like? I'v have no answer to this question. And I don't know how it can help me? Usually, the maximum latency is a better indicator of trouble than a 99th percentile in my experience. If you improve the worst case, then surely the other cases are likely to follow. However, there are situations where this will hurt the median 50th percentile latency. Usually this trade-off is okay, but there are a few situations where it might not be. Yep, it's the also the main question! I'm log and graph nginx $request_time, and log and graph internal function time. What is between, I can't log, it's: - local network (TCP); - work in kernel/user space; - golang GC and other run-time; - golang fasthttp machinery before call my http handler. The kernel and GC can be dynamically inspected. I'd seriously consider profiling as well in a laboratory environment. Your hypothesis is that none of these have a discrepancy, but they may have. * Caches have hit/miss rates that looks about right. In my application this is not true caches, it real it's dictionary loaded from database, and user in calculation. Perhaps the code in https://godoc.org/golang.org/x/text is of use for this? It tends to be faster than maps because it utilizes compact string representations and tries. Of course, it requires you show that the problem is with the caching sublayer first. * 15% CPU load means we are spending ample amounts of time waiting. What are we waiting on? Maybe, or maybe the 32 core can process the 4k rps. How can I find out, what my app is waiting on? blockprofile is my guess at what I would grab first. Perhaps the tracing functionality as well. You can also adds metrics on each blocking point in order to get an idea of where the system is going off. Functionality like dtrace would be nice, but I'm not sure Go has it, unfortunately. * Are we measuring the right thing in the internal measurements? If the window between external/internal is narrow, then chances are we are doing the wrong thing on the internal side. Could you explain this? There may be a bug in the measurement code, so you should probably go over it again. One common fault of mine is to place the measurement around the wrong functions, so I think they are detecting more than they are. A single regular expression that is only hit in corner-cases can be enough to mess with a performance profile. Another common mistake is to not have a appropriate decay parameter on your latency measurements, so older requests eventually gets removed from the latency graph[0] In general, as the amount of work a system processes goes up, it gets more sensitive to fluctuations in latency. So even at a fairly low CPU load, you may still have some spiky behavior hidden by a smoothing of the CPU load measure, and this can contribute to added congestion. [0] A decaying Vitter's algorithm R implementation, or Tene's HdrHistogram is preferable. HdrHistogram is interesting in that it uses a floating-point representation for its counters: one array for exponents, one array for mantissa. It allows very fast accounting (nanoseconds) and provides precise measurements around 0 at the expense of precision at, say, 1 hour. It is usually okay because if you waited 1 hour, you don't care if it was really 1 hour and 3 seconds. But at 1us, you really care about being precise. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.