Author: jimharris
Date: Fri Oct 30 16:06:34 2015
New Revision: 290198
URL: https://svnweb.freebsd.org/changeset/base/290198

Log:
  nvme: fix race condition in split bio completion path
  
  Fixes race condition observed under following circumstances:
  
  1) I/O split on 128KB boundary with Intel NVMe controller.
     Current Intel controllers produce better latency when
     I/Os do not span a 128KB boundary - even if the I/O size
     itself is less than 128KB.
  2) Per-CPU I/O queues are enabled.
  3) Child I/Os are submitted on different submission queues.
  4) Interrupts for child I/O completions occur almost
     simultaneously.
  5) ithread for child I/O A increments bio_inbed, then
     immediately is preempted (rendezvous IPI, higher priority
     interrupt).
  6) ithread for child I/O B increments bio_inbed, then completes
     parent bio since all children are now completed.
  7) parent bio is freed, and immediately reallocated for a VFS
     or gpart bio (including setting bio_children to 1 and
     clearing bio_driver1).
  8) ithread for child I/O A resumes processing.  bio_children
     for what it thinks is the parent bio is set to 1, so it
     thinks it needs to complete the parent bio.
  
  Result is either calling a NULL callback function, or double freeing
  the bio to its uma zone.
  
  PR:           203746
  Reported by:  Drew Gallatin <galla...@netflix.com>,
                Marc Goroff <mgor...@quorum.net>
  Tested by:    Drew Gallatin <galla...@netflix.com>
  MFC after:    3 days
  Sponsored by: Intel

Modified:
  head/sys/dev/nvme/nvme_ns.c

Modified: head/sys/dev/nvme/nvme_ns.c
==============================================================================
--- head/sys/dev/nvme/nvme_ns.c Fri Oct 30 15:52:10 2015        (r290197)
+++ head/sys/dev/nvme/nvme_ns.c Fri Oct 30 16:06:34 2015        (r290198)
@@ -239,7 +239,7 @@ static void
 nvme_bio_child_inbed(struct bio *parent, int bio_error)
 {
        struct nvme_completion  parent_cpl;
-       int                     inbed;
+       int                     children, inbed;
 
        if (bio_error != 0) {
                parent->bio_flags |= BIO_ERROR;
@@ -248,10 +248,13 @@ nvme_bio_child_inbed(struct bio *parent,
 
        /*
         * atomic_fetchadd will return value before adding 1, so we still
-        *  must add 1 to get the updated inbed number.
+        *  must add 1 to get the updated inbed number.  Save bio_children
+        *  before incrementing to guard against race conditions when
+        *  two children bios complete on different queues.
         */
+       children = atomic_load_acq_int(&parent->bio_children);
        inbed = atomic_fetchadd_int(&parent->bio_inbed, 1) + 1;
-       if (inbed == parent->bio_children) {
+       if (inbed == children) {
                bzero(&parent_cpl, sizeof(parent_cpl));
                if (parent->bio_flags & BIO_ERROR)
                        parent_cpl.status.sc = NVME_SC_DATA_TRANSFER_ERROR;
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to