воскресенье, 19 марта 2017 г., 20:46:16 UTC+3 пользователь Jesper Louis Andersen написал: > > On Sun, Mar 19, 2017 at 4:58 PM Alexander Petrovsky <askj...@gmail.com > <javascript:>> wrote: > >> >> > * The 99th percentile ignores the 40 slowest queries. What does the 99.9, >>> 9.99, ... and max percentiles look like? >>> >> >> I'v have no answer to this question. And I don't know how it can help me? >> > > Usually, the maximum latency is a better indicator of trouble than a 99th > percentile in my experience. If you improve the worst case, then surely the > other cases are likely to follow. However, there are situations where this > will hurt the median 50th percentile latency. Usually this trade-off is > okay, but there are a few situations where it might not be. >
Got it! > >> Yep, it's the also the main question! I'm log and graph nginx >> $request_time, and log and graph internal function time. What is between, I >> can't log, it's: >> - local network (TCP); >> - work in kernel/user space; >> - golang GC and other run-time; >> - golang fasthttp machinery before call my http handler. >> > > The kernel and GC can be dynamically inspected. I'd seriously consider > profiling as well in a laboratory environment. Your hypothesis is that none > of these have a discrepancy, but they may have. > If I'm understood your correctly, I think the problem is something there... > >> >>> * Caches have hit/miss rates that looks about right. >>> >> >> In my application this is not true caches, it real it's dictionary loaded >> from database, and user in calculation. >> > > Perhaps the code in https://godoc.org/golang.org/x/text is of use for > this? It tends to be faster than maps because it utilizes compact string > representations and tries. Of course, it requires you show that the problem > is with the caching sublayer first. > The dictionary is not words dictionary, the dictionary in database terms — some keys (ids), and values or multiple values. > > >> * 15% CPU load means we are spending ample amounts of time waiting. What >>> are we waiting on? >>> >> >> Maybe, or maybe the 32 core can process the 4k rps. How can I find out, >> what my app is waiting on? >> > > blockprofile is my guess at what I would grab first. Perhaps the tracing > functionality as well. You can also adds metrics on each blocking point in > order to get an idea of where the system is going off. Functionality like > dtrace would be nice, but I'm not sure Go has it, unfortunately. > Thanks a lot, I will! > > >> >> >>> * Are we measuring the right thing in the internal measurements? If the >>> window between external/internal is narrow, then chances are we are doing >>> the wrong thing on the internal side. >>> >> >> Could you explain this? >> > > There may be a bug in the measurement code, so you should probably go over > it again. One common fault of mine is to place the measurement around the > wrong functions, so I think they are detecting more than they are. A single > regular expression that is only hit in corner-cases can be enough to mess > with a performance profile. Another common mistake is to not have a > appropriate decay parameter on your latency measurements, so older requests > eventually gets removed from the latency graph[0] > > In general, as the amount of work a system processes goes up, it gets more > sensitive to fluctuations in latency. So even at a fairly low CPU load, you > may still have some spiky behavior hidden by a smoothing of the CPU load > measure, and this can contribute to added congestion. > > [0] A decaying Vitter's algorithm R implementation, or Tene's HdrHistogram > is preferable. HdrHistogram is interesting in that it uses a floating-point > representation for its counters: one array for exponents, one array for > mantissa. It allows very fast accounting (nanoseconds) and provides precise > measurements around 0 at the expense of precision at, say, 1 hour. It is > usually okay because if you waited 1 hour, you don't care if it was really > 1 hour and 3 seconds. But at 1us, you really care about being precise. > Could you explain please, what does you mean when you say - "Another common mistake is to not have a appropriate decay parameter on your latency measurements, so older requests eventually gets removed from the latency graph[0]"? Why the older requests should remove from latency graph? As I know the hdrHistogram is good fit for high precision measurements and graphs with different order and magnitude. How it can help me? -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.