On Tue, 20 Aug 2013, Vitja Makarov wrote:
Description:
Recently I was playing with small socket timeouts. setsockopt(2)
SO_RCVTIMEO and found a problem with it: if timeout is small enough
read(2) may return before timeout is actually expired.
I was unable to reproduce this on linux box.
I found that kernel uses a timer with 1/HZ precision so it converts
time in microseconds to ticks that's ok linux does it as well. The
problem is in details: freebsd uses floor() approach while linux uses
ceil():
from FreeBSD's sys/kern/uipc_socket.c:
val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick;
This is actually an off-by-2 error in most case. ceil() isn't high enough
either, since for example with hz = 100 and tv = 25 msec, the ceil() of 3
ticks is 2 full ticks plus a fractional tick which may be 1 nsec long. At
least with old timeout code.
if (val == 0 && tv.tv_usec != 0)
val = 1; /* at least one tick if tv > 0 */
This does the ceil() in the special case where tv < 1 tick. This is a
waste of timeout, at least with old timeout code, since callout_reset()
used to add 1. This seems to have been lost, breaking old callers that
depended on it. Current timeout code tries to be more accurute, but that
means that it less accurate if the caller is broken and rounds down.
Maybe your bug can only be seen with the increased accuracy.
tvtohz() should always be used to convert timevals to ticks. It rounds
up and adds 1, and handles overflow. The conversion in uipc_socket.c
isn't even short. It takes 15 lines for its own overflow handling. It
seems to check the SHRT_MAX limit twice.
If uipc_socket.c called tvtohz(), then it would still have to check
that the result fits in a short. Its error handling when it doesn't
fit seems wrong. EDOM is documented as a domain error for math
software. setsockopt() isn't math software, and EDOM isn't a documented
errno for it. EINVAL and EOVERFLOW are more usual kernel errors for
unrepresentable values.
Grepping for ' / tick' in /sys shows no other home made tvtohz()'s.
from Linux's net/core/sock.c:
*timeo_p = tv.tv_sec*HZ + (tv.tv_usec+(1000000/HZ-1))/(1000000/HZ);
The conversion is much simpler when HZ is hard-coded. Linux has some
bounds checking before this, but the error handling in at least
Linux-2.6.10 is to ignore invalid tv's and return success without
changing the timeout.
So, for instance, we have a freebsd system running with kern.hz set to
100 and set receive timeout to 25ms that is converted to 2 ticks which
is 20ms. In my test program read(2) returns with EAGAIN set in
0.019ms.
Bruce
Bruce
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"