[PR] Fix/uart tcdrain bugs [nuttx]

via GitHub Mon, 01 Jun 2026 21:13:33 -0700


LingaoM opened a new pull request, #19014:
URL: https://github.com/apache/nuttx/pull/19014


   Summary
     
     uart_tcdrain() has two latent bugs in its timeout handling:
   
     1. Cancellation point leak on timeout.  uart_tcdrain() enters a
     cancellation point on entry (when cancelable=true) and the normal exit
     path calls leave_cancellation_point().  The early return -ETIMEDOUT
     from inside the FIFO-empty polling loop bypasses this, leaving
     tcb->cpcount permanently incremented.  Repeated timeouts desync the
     counter and break subsequent pthread_cancel() /
     pthread_setcancelstate() semantics for the calling thread.
     2. Caller-supplied timeout does not bound the xmit-buffer drain.  The
     timeout argument (e.g. 10 * TICK_PER_SEC from the TCDRN ioctl
     path) was only honored by the final FIFO polling loop.  The earlier
     xmit-buffer drain loop called nxsem_wait(&dev->xmitsem) with no
     timeout, so any condition that prevents the lower half from posting
     xmitsem (a stuck DMA-completion path, a wedged hardware-flow-control
     stall, etc.) would block tcdrain() indefinitely regardless of the
     timeout the caller asked for.  The pre-existing comment ("NOTE: There
     is no timeout on the following loop. ... the caller should call
     tcflush() first") openly acknowledged this hang.
   
     This series fixes both:
   
     - Patch 1 moves the missing leave_cancellation_point() onto the
     timeout return path so cancellation-point nesting stays balanced.
     - Patch 2 takes a single timestamp on entry, replaces the bare
     nxsem_wait() with nxsem_tickwait() using the remaining time, and
     short-circuits to -ETIMEDOUT when the caller's budget is already
     exhausted.  The caller's timeout now bounds the total time spent in
     uart_tcdrain() regardless of which phase stalls.
   
     Impact
   
     - Users / applications:  tcdrain(fd) (and the kernel-internal
     tcsendbreak path that calls uart_tcdrain with a 4 s budget) are now
     guaranteed to return within the caller-supplied timeout, even when the
     lower half is stuck.  Previously they could block forever and required
     an external tcflush() from another thread to recover.
     - Behavior change:  Code that relied on tcdrain() blocking
     indefinitely under hardware-flow-control stalls will now see
     -ETIMEDOUT after the timeout elapses (10 s on the standard
     TCDRN ioctl path).  This matches the long-standing intent of the
     timeout parameter; the existing kludge comment in the source already
     treats the indefinite-block behavior as a bug.
     - Lower-half drivers:  No API changes; no changes required in any
     arch/board UART driver.
     - Build / size / performance:  Negligible.  Two extra lines on the
     timeout exit, one extra clock_systime_ticks() read on entry, one
     cheap subtraction per inner-loop iteration.
     - Security / compatibility / documentation:  No security impact.
     Source comments updated to reflect that the inner loop is now bounded.
     No documentation, header, or config changes required.
   
     ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Fix/uart tcdrain bugs [nuttx]

Reply via email to