Re: devstat overhead VS precision

Poul-Henning Kamp Mon, 15 Apr 2013 23:24:32 -0700

In message <516c71bc.4000...@freebsd.org>, Alexander Motin writes:
>On 15.04.2013 23:43, Poul-Henning Kamp wrote:
>> In message <516c515a.9090...@freebsd.org>, Alexander Motin writes:
>>


>> For tuning anything on a non-ridiculous SSD device or modern
>> harddisks, it will be useless because of the bias you introduce is
>> *not* one which averages out over many operations.
>
>Could you please explain why?
>
>> The fundamental problem is that on a busy system, getbinuptime()
>> does not get called at random times, it will be heavily affected
>> by the I/O traffic, because of the interrupts, the bus-traffic
>> itself, the cache-effects of I/O transfers and the context-switches
>> by the processes causing the I/O.
>
>I'm sorry, but I am not sure I understand above paragraphs.

That was the exact explanation you asked for, and I'm not sure I can
find a better way to explain it, but I'll try:

Your assumption that the error will cancel out, implicitly assumes
that the timestamp returned from getbinuptime() is updated at
times which are totally independent from the I/O traffic you are
trying to measure the latency of.

That is not the case.  The interrupt which updates getbinuptime()'s
cached timestamp is affected a lot by the I/O traffic, for the various
reasons I mention above.

>Sure, getbinuptime() won't allow to answer how many requests completed 
>within 0.5ms, but present API doesn't allow to calculate that any way, 
>providing only total/average times. And why "_5-10_ timecounter interrupts"?

A: Yes it actually does, a userland application running on a dedicated
CPU core can poll the shared memory devstat structure at a very high
rate and get very useful information about short latencies.

Most people don't do that, becuase they don't care about the difference
between 0.5 and 0.45 milliseconds.

B: To get the systematic bias down to 10-20% of the measured interval.

>>      Latency distribution:
>>
>>              <5msec:         92.12 %
>>              <10msec:         0.17 %
>>              <20msec:         1.34 %
>>              <50msec:         6.37 %
>>              >50msec:         0.00 %
>>
>I agree that such functionality could be interesting. The only worry is 
>which buckets should be there. For modern HDDs above buckets could be 
>fine. For high-end SSD it may go about microseconds then milliseconds. I 
>have doubt that 5 buckets will be universal enough, unless separated by 
>factor of 5-10.

Remember what people use this for:  Answering the question "Does my
disk subsystem suck, and if so, how much"

Buckets like the ones proposed will tell you that.

>> The %busy crap should be killed, all it does is confuse people.
>
>I agree that it heavily lies, especially for cached writes, but at least 
>it allows to make some very basic estimates. 

For rotating disks:  It always lies.

For SSD: It almost always lies.

Kill it.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
p...@freebsd.org         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: devstat overhead VS precision

Reply via email to