On Sun, Mar 19, 2017 at 4:58 PM Alexander Petrovsky <askju...@gmail.com>
wrote:



* The 99th percentile ignores the 40 slowest queries. What does the 99.9,
9.99, ... and max percentiles look like?


I'v have no answer to this question. And I don't know how it can help me?


Usually, the maximum latency is a better indicator of trouble than a 99th
percentile in my experience. If you improve the worst case, then surely the
other cases are likely to follow. However, there are situations where this
will hurt the median 50th percentile latency. Usually this trade-off is
okay, but there are a few situations where it might not be.


Yep, it's the also the main question! I'm log and graph nginx
$request_time, and log and graph internal function time. What is between, I
can't log, it's:
 - local network (TCP);
 - work in kernel/user space;
 - golang GC and other run-time;
 - golang fasthttp machinery before call my http handler.


The kernel and GC can be dynamically inspected. I'd seriously consider
profiling as well in a laboratory environment. Your hypothesis is that none
of these have a discrepancy, but they may have.



* Caches have hit/miss rates that looks about right.


In my application this is not true caches, it real it's dictionary loaded
from database, and user in calculation.


Perhaps the code in https://godoc.org/golang.org/x/text is of use for this?
It tends to be faster than maps because it utilizes compact string
representations and tries. Of course, it requires you show that the problem
is with the caching sublayer first.


* 15% CPU load means we are spending ample amounts of time waiting. What
are we waiting on?


Maybe, or maybe the 32 core can process the 4k rps. How can I find out,
what my app is waiting on?


blockprofile is my guess at what I would grab first. Perhaps the tracing
functionality as well. You can also adds metrics on each blocking point in
order to get an idea of where the system is going off. Functionality like
dtrace would be nice, but I'm not sure Go has it, unfortunately.




* Are we measuring the right thing in the internal measurements? If the
window between external/internal is narrow, then chances are we are doing
the wrong thing on the internal side.


Could you explain this?


There may be a bug in the measurement code, so you should probably go over
it again. One common fault of mine is to place the measurement around the
wrong functions, so I think they are detecting more than they are. A single
regular expression that is only hit in corner-cases can be enough to mess
with a performance profile. Another common mistake is to not have a
appropriate decay parameter on your latency measurements, so older requests
eventually gets removed from the latency graph[0]

In general, as the amount of work a system processes goes up, it gets more
sensitive to fluctuations in latency. So even at a fairly low CPU load, you
may still have some spiky behavior hidden by a smoothing of the CPU load
measure, and this can contribute to added congestion.

[0] A decaying Vitter's algorithm R implementation, or Tene's HdrHistogram
is preferable. HdrHistogram is interesting in that it uses a floating-point
representation for its counters: one array for exponents, one array for
mantissa. It allows very fast accounting (nanoseconds) and provides precise
measurements around 0 at the expense of precision at, say, 1 hour. It is
usually okay because if you waited 1 hour, you don't care if it was really
1 hour and 3 seconds. But at 1us, you really care about being precise.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to