LingaoM opened a new pull request, #19014:
URL: https://github.com/apache/nuttx/pull/19014
Summary
uart_tcdrain() has two latent bugs in its timeout handling:
1. Cancellation point leak on timeout. uart_tcdrain() enters a
cancellation point on entry (when cancelable=true) and the normal exit
path calls leave_cancellation_point(). The early return -ETIMEDOUT
from inside the FIFO-empty polling loop bypasses this, leaving
tcb->cpcount permanently incremented. Repeated timeouts desync the
counter and break subsequent pthread_cancel() /
pthread_setcancelstate() semantics for the calling thread.
2. Caller-supplied timeout does not bound the xmit-buffer drain. The
timeout argument (e.g. 10 * TICK_PER_SEC from the TCDRN ioctl
path) was only honored by the final FIFO polling loop. The earlier
xmit-buffer drain loop called nxsem_wait(&dev->xmitsem) with no
timeout, so any condition that prevents the lower half from posting
xmitsem (a stuck DMA-completion path, a wedged hardware-flow-control
stall, etc.) would block tcdrain() indefinitely regardless of the
timeout the caller asked for. The pre-existing comment ("NOTE: There
is no timeout on the following loop. ... the caller should call
tcflush() first") openly acknowledged this hang.
This series fixes both:
- Patch 1 moves the missing leave_cancellation_point() onto the
timeout return path so cancellation-point nesting stays balanced.
- Patch 2 takes a single timestamp on entry, replaces the bare
nxsem_wait() with nxsem_tickwait() using the remaining time, and
short-circuits to -ETIMEDOUT when the caller's budget is already
exhausted. The caller's timeout now bounds the total time spent in
uart_tcdrain() regardless of which phase stalls.
Impact
- Users / applications: tcdrain(fd) (and the kernel-internal
tcsendbreak path that calls uart_tcdrain with a 4 s budget) are now
guaranteed to return within the caller-supplied timeout, even when the
lower half is stuck. Previously they could block forever and required
an external tcflush() from another thread to recover.
- Behavior change: Code that relied on tcdrain() blocking
indefinitely under hardware-flow-control stalls will now see
-ETIMEDOUT after the timeout elapses (10 s on the standard
TCDRN ioctl path). This matches the long-standing intent of the
timeout parameter; the existing kludge comment in the source already
treats the indefinite-block behavior as a bug.
- Lower-half drivers: No API changes; no changes required in any
arch/board UART driver.
- Build / size / performance: Negligible. Two extra lines on the
timeout exit, one extra clock_systime_ticks() read on entry, one
cheap subtraction per inner-loop iteration.
- Security / compatibility / documentation: No security impact.
Source comments updated to reflect that the inner loop is now bounded.
No documentation, header, or config changes required.
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]