On Fri, 22 Jan 2010, Mikolaj Golub wrote:


We have nonempty nm_bufq, nm_bufqiods = 1, but actually there is no nfsiod
thread run for this mount, which is wrong -- nm_bufq will not be emptied until
some other process starts writing to the nfsmount and starts nfsiod thread for
this mount.

Reviewing the code how it could happen I see the following path. Could someone
confirm or disprove me?

in nfs_bio.c:nfs_asyncio() we have:

  1363         mtx_lock(&nfs_iod_mtx);
...
  1374         /*
  1375          * Find a free iod to process this request.
  1376          */
  1377         for (iod = 0; iod < nfs_numasync; iod++)
  1378                 if (nfs_iodwant[iod]) {
  1379                         gotiod = TRUE;
  1380                         break;
  1381                 }
  1382
  1383         /*
  1384          * Try to create one if none are free.
  1385          */
  1386         if (!gotiod) {
  1387                 iod = nfs_nfsiodnew();
  1388                 if (iod != -1)
  1389                         gotiod = TRUE;
  1390         }

Let's consider situation when new nfsiod is created.

nfs_nfsiod.c:nfs_nfsiodnew() before creating nfssvc_iod thread unlocks 
nfs_iod_mtx:

   179         mtx_unlock(&nfs_iod_mtx);
   180         error = kthread_create(nfssvc_iod, nfs_asyncdaemon + i, NULL, 
RFHIGHPID,
   181             0, "nfsiod %d", newiod);
   182         mtx_lock(&nfs_iod_mtx);


And  nfs_nfsiod.c:nfssvc_iod() do the followin:

   226         mtx_lock(&nfs_iod_mtx);
...
   238                 nfs_iodwant[myiod] = curthread->td_proc;
   239                 nfs_iodmount[myiod] = NULL;
...
   244                 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT 
| PCATCH,
   245                     "-", timo);

Let's at this moment another nfs_asyncio() request for another nfsmount has
happened and this thread has locked nfs_iod_mtx. Then this thread will found
nfs_iodwant[iod] in "for" loop and will use it. When the first thread actually
has returned from nfs_nfsiodnew() it will insert buffer to nmp->nm_bufq but
nfsiod will process other nmp.


Ok, good catch, I think you've found the problem (or at least a race
that might have caused it).

It looks like the fix for this situation would be to check nfs_iodwant[iod]
after nfs_nfsiodnew():

--- nfs_bio.c.orig      2010-01-22 15:38:02.000000000 +0000
+++ nfs_bio.c   2010-01-22 15:39:58.000000000 +0000
@@ -1385,7 +1385,7 @@ again:
        */
       if (!gotiod) {
               iod = nfs_nfsiodnew();
-               if (iod != -1)
+               if ((iod != -1) && (nfs_iodwant[iod] == NULL))
                       gotiod = TRUE;
       }


Unfortunately, I don't think the above fixes the problem.
If another thread that called nfs_asyncio() has "stolen" the this "iod",
it will have set nfs_iodwant[iod] == NULL (set non-NULL at #238)
and it will remain NULL until the other thread is done with it.

If you instead make it:
                if (iod != -1 && nfs_iodwant[iod] != NULL)
                        gotiod = TRUE;
then I think it fixes your scenario above, but will break for the
case where the mtx_lock(&nfs_iod_mtx) call in nfs_nfsnewiod() (#182) wins
out over the one near the beginning of nfssvc_iod() (#226), since in that
case, nfs_iodwant[iod] will still be NULL because it hasn't yet been
set by nfssvc_iod() (#238).

There should probably be some sort of 3 way handshake between
the code in nfs_asyncio() after calling nfs_nfsnewiod() and the
code near the beginning of nfssvc_iod(), but I think the following
somewhat cheesy fix might do the trick:

        if (!gotiod) {
                iod = nfs_nfsiodnew();
                if (iod != -1) {
                        if (nfs_iodwant[iod] == NULL) {
                                /*
                                 * Either another thread has acquired this
                                 * iod or I acquired the nfs_iod_mtx mutex
                                 * before the new iod thread did in
                                 * nfssvc_iod(). To be safe, go back and
                                 * try again after allowing another thread
                                 * to acquire the nfs_iod_mtx mutex.
                                 */
                                mtx_unlock(&nfs_iod_mtx);
                                /*
                                 * So long as mtx_lock() implements some
                                 * sort of fairness, nfssvc_iod() should
                                 * get nfs_iod_mtx here and set
                                 * nfs_iodwant[iod] != NULL for the case
                                 * where the iod has not been "stolen" by
                                 * another thread for a different mount
                                 * point.
                                 */
                                mtx_lock(&nfs_iod_mtx);
                                goto again;
                        }
                        gotiod = TRUE;
                }
        }

Does anyone else have a better solution?
(Mikolaj, could you by any chance test this? You can test yours, but I
think it breaks.)

rick

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to