On Sat, 8 May 2010, Ulrich Sp?rlein wrote:

On Sat, 08.05.2010 at 18:00:50 +0200, Attilio Rao wrote:
2010/5/8 Ulrich Sp?rlein <u...@spoerlein.net>:
On Sat, 08.05.2010 at 12:20:05 +0200, Ulrich Sp?rlein wrote:
This LOR also is not yet listed on the LOR page, so I guess it's rather
new. I do use SUJ.

lock order reversal:
 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
 2nd 0xec0fe304 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091
KDB: stack backtrace:
db_trace_self_wrapper(c09394fe,fb817308,c062e515,c061e8ab,c093c4f1,...) at 
db_trace_self_wrapper+0x26
kdb_backtrace(c061e8ab,c093c4f1,c418b168,c418ef28,fb817364,...) at 
kdb_backtrace+0x29
_witness_debugger(c093c4f1,c49e56b8,c092e785,c418ef28,c094369d,...) at 
_witness_debugger+0x25
witness_checkorder(c49e56b8,9,c094369d,82b,0,...) at witness_checkorder+0x839
__lockmgr_args(c49e56b8,80100,c49e56d8,0,0,...) at __lockmgr_args+0x7f9
ffs_lock(fb817488,c062e2bb,c0942b3f,80100,c49e5660,...) at ffs_lock+0x82
VOP_LOCK1_APV(c09bd600,fb817488,c4827cd4,c09d62a0,c49e5660,...) at 
VOP_LOCK1_APV+0xb5
_vn_lock(c49e5660,80100,c094369d,82b,4,...) at _vn_lock+0x5e
vget(c49e5660,80100,c4827c30,50,0,...) at vget+0xb9
vfs_hash_get(c47bea20,b803,80000,c4827c30,fb8175d8,...) at vfs_hash_get+0xe6
ffs_vgetf(c47bea20,b803,80000,fb8175d8,1,...) at ffs_vgetf+0x49
softdep_sync_metadata(c4838880,0,c0962957,144,0,...) at 
softdep_sync_metadata+0xc82
ffs_syncvnode(c4838880,1,c4827c30,fb817698,246,...) at ffs_syncvnode+0x3e2
ffs_truncate(c4838880,200,0,880,c41fb480,...) at ffs_truncate+0x862
ufs_direnter(c4838880,c49e5660,fb81794c,fb817bd4,0,...) at ufs_direnter+0x8d4
ufs_makeinode(fb817bd4,0,fb817b30,fb817a94,c08e4cf5,...) at ufs_makeinode+0x517
ufs_create(fb817b30,fb817b48,0,0,fb817ba8,...) at ufs_create+0x30
VOP_CREATE_APV(c09bd600,fb817b30,2,fb817ac0,0,...) at VOP_CREATE_APV+0xa5
vn_open_cred(fb817ba8,fb817c5c,1a4,0,c41fb480,...) at vn_open_cred+0x1de
vn_open(fb817ba8,fb817c5c,1a4,c47e2428,0,...) at vn_open+0x3b
kern_openat(c4827c30,ffffff9c,804c5e8,0,602,...) at kern_openat+0x125
kern_open(c4827c30,804c5e8,0,601,21b6,...) at kern_open+0x35
open(c4827c30,fb817cf8,c0972725,c091f062,c47ea2a8,...) at open+0x30
syscall(fb817d38) at syscall+0x220
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (5, FreeBSD ELF32, open), eip = 0x2817bf33, esp = 0xbfbfec4c, ebp = 
0xbfbfecb8 ---

And now the system is hanging again. While I can still ping and receive
dmesg updates (eg. USB ports appearing), I/O is frozen solid. This is
during portupgrade, when the configure script runs and usually takes 1-2
minutes to provoke.

This part looks supsicious to me:

db> show alllocks
Process 28014 (mkdir) thread 0xc691ac30 (100152)
exclusive lockmgr bufwait (bufwait) r = 0 (0xec2bdaf0) locked @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:10684
exclusive lockmgr ufs (ufs) r = 0 (0xc6bcd5a8) locked @ 
/usr/src/sys/kern/vfs_subr.c:2091
exclusive lockmgr bufwait (bufwait) r = 0 (0xec2983f4) locked @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:11363
exclusive lockmgr ufs (ufs) r = 0 (0xc6d976b8) locked @ 
/usr/src/sys/kern/vfs_lookup.c:502
Process 1990 (sshd) thread 0xc5462750 (100117)
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc546e08c) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 12 (intr) thread 0xc41f4750 (100004)
exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc425ae04) locked @ 
/usr/src/sys/dev/dcons/dcons_os.c:232
db>

Along with show alllocks may you also get the following from DDB:
ps, show pcpu, alltrace, lockedvnods.

1. a kernel before SUJ went in is running fine with SU only
2. the following is on a recent -CURRENT that has SUJ, *but* i've
disabled it, so it is running with soft-updates only (I hope)

I ran a portupgrade and the first configure script triggered the I/O
hang

db> ps
 pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
13467 13444 12937     0  R+                          mkdir
13444 13204 12937     0  S+      wait     0xc54352a8 sh
13204 13035 12937     0  S+      wait     0xc5436000 sh
13035 12937 12937     0  S+      wait     0xc4ffad48 sh
12937 12936 12937     0  Ss+     wait     0xc4ff9d48 make
12936  3722  3722     0  R+                          script
3722  2021  3722     0  S+      (threaded)          ruby18
100132                   S       wait     0xc4ffa7f8 ruby18
2404  2007  2404  1000  Ss+     ttyin    0xc4d74870 zsh
2325  2015  2325  1000  R+                          top
2021  2009  2021     0  S+      pause    0xc4ff9058 csh
2015  2007  2015  1000  Ss+     pause    0xc4ffa058 zsh
2009  2007  2009  1000  Ss+     pause    0xc4d4e850 zsh
2007  2006  2007  1000  Rs                          screen
2006  1991  2006  1000  R+                          screen
2005  2001  2005     0  R+                          systat
2001  1976  2001     0  S+      pause    0xc3d52058 csh
2000     1  2000     0  Ss      select   0xc3d5b1a4 ssh-agent
1991  1990  1991  1000  Ss+     pause    0xc3d52850 zsh
1990  1986  1986  1000  S       select   0xc3d5aca4 sshd
1989     1  1989  1000  Ss      select   0xc3d5ae24 ssh-agent
1986  1884  1986     0  Ss      sbwait   0xc4f2058c sshd
1985     1  1985     0  Ss+     ttyin    0xc3e62670 getty
1984     1  1984     0  Ss+     ttyin    0xc3e61c70 getty
1983     1  1983     0  Ss+     ttyin    0xc3e60070 getty
1982     1  1982     0  Ss+     ttyin    0xc3e60a70 getty
1981     1  1981     0  Ss+     ttyin    0xc3da7a70 getty
1980     1  1980     0  Ss+     ttyin    0xc3da7c70 getty
1979     1  1979     0  Ss+     ttyin    0xc3e60c70 getty
1976     1  1976     0  Ss+     wait     0xc4c392a8 login
1955     1  1955     0  Rs                          bsnmpd
1931     1  1931     0  Ss      select   0xc4d0f164 inetd
1890     1  1890     0  Rs                          cron
1884     1  1884     0  Ss      select   0xc4d0eda4 sshd
1808     1  1808     0  Rs                          sendmail
1764     1  1764     0  Rs                          moused
1754     1  1754     0  Ss      (threaded)          nslcd
100111                   S       accept   0xc4d076ae nslcd
100110                   S       accept   0xc4d076ae nslcd
100109                   S       accept   0xc4d076ae nslcd
100108                   S       accept   0xc4d076ae nslcd
100107                   S       accept   0xc4d076ae nslcd
100076                   S       uwait    0xc4493180 nslcd
1748     1  1748     0  Rs                          openvpn
1724     1  1724   201  Ss      accept   0xc4d07b82 privoxy
1672     1  1672     0  Rs                          powerd
1664     1  1664     0  Rs                          ntpd
1658     1  1658     0  Ss      (threaded)          nscd
100094                   S       kqread   0xc4cf1700 nscd
100093                   S       kqread   0xc4cf1700 nscd
100092                   S       kqread   0xc4cf1700 nscd
100091                   S       kqread   0xc4cf1700 nscd
100090                   S       kqread   0xc4cf1700 nscd
100089                   S       kqread   0xc4cf1700 nscd
100088                   S       kqread   0xc4cf1700 nscd
100087                   S       kqread   0xc4cf1700 nscd
100075                   S       uwait    0xc449f680 nscd
1576     1  1576   389  Ss      (threaded)          slapd
100115                   S       ucond    0xc4cf2700 slapd
100086                   S       ucond    0xc4cf2600 slapd
100085                   S       select   0xc4d0eea4 slapd
100073                   S       uwait    0xc42a9580 slapd
1563     1  1562     0  S       nanslp   0xc09f3864 smartd
1526     1  1526     0  Rs                          rpc.statd
1445     1  1445     0  Rs                          amd
1418     1  1418     0  Rs                          rpcbind
1403     1  1403    53  Ss      (threaded)          named
100084                   S       kqread   0xc4cdcb80 named
100083                   S       ucond    0xc4260b00 named
100082                   S       ucond    0xc449fb80 named
100074                   S       sigwait  0xfb4b8be0 named
1306     1  1306     0  Rs                          syslogd
 997     0     0     0  RL                          [pfpurge]
 973     1   973     0  Ss      select   0xc41eb0a4 devd
 964     1   964    65  Ss      select   0xc44ac524 dhclient
 942     1   942     0  Ss      select   0xc426f424 dhclient
 856     1   856     0  Ss      select   0xc41eb424 moused
 561     1   561     0  Ss      select   0xc44acde4 wpa_supplicant
 150     1   150     0  Ss      pause    0xc42ec850 adjkerntz
  95     0     0     0  DL      mdwait   0xc426a000 [md0]
  74     0     0     0  DL      geli:w   0xc4265400 [g_eli[0] ad0s4f]
  22     0     0     0  RL                          [softdepflush]
  21     0     0     0  RL                          [syncer]
  20     0     0     0  RL                          [vnlru]
  19     0     0     0  RL                          [bufdaemon]
  18     0     0     0  RL                          [pagezero]
  17     0     0     0  DL      psleep   0xc0b769d8 [vmdaemon]
  16     0     0     0  RL                          [pagedaemon]
   9     0     0     0  DL      ccb_scan 0xc09d78d4 [xpt_thrd]
  15     0     0     0  DL      tzpoll   0xc09dafcc [acpi_thermal]
   8     0     0     0  SL      -        0xc3f55000 [fw0_probe]
   7     0     0     0  DL      -        0xc3ebd3bc [cbb0 event thread]
  14     0     0     0  DL      (threaded)          [usb]
100040                   D       -        0xc3f39d0c [usbus3]
100039                   D       -        0xc3f39cdc [usbus3]
100038                   D       -        0xc3f39cac [usbus3]
100037                   D       -        0xc3f39c7c [usbus3]
100036                   D       -        0xc3f1fdac [usbus2]
100035                   D       -        0xc3f1fd7c [usbus2]
100034                   D       -        0xc3f1fd4c [usbus2]
100033                   D       -        0xc3f1fd1c [usbus2]
100032                   D       -        0xc3f0ddac [usbus1]
100031                   D       -        0xc3f0dd7c [usbus1]
100030                   D       -        0xc3f0dd4c [usbus1]
100029                   D       -        0xc3f0dd1c [usbus1]
100028                   D       -        0xc3ef7dac [usbus0]
100027                   D       -        0xc3ef7d7c [usbus0]
100026                   D       -        0xc3ef7d4c [usbus0]
100025                   D       -        0xc3ef7d1c [usbus0]
   6     0     0     0  DL      crypto_r 0xc0b7446c [crypto returns]
   5     0     0     0  DL      crypto_w 0xc0b74448 [crypto]
  13     0     0     0  DL      -        0xc09f36c4 [yarrow]
   4     0     0     0  DL      -        0xc09f13e4 [g_down]
   3     0     0     0  DL      -        0xc09f13e0 [g_up]
   2     0     0     0  RL                          [g_event]
  12     0     0     0  RL      (threaded)          [intr]
100049                   I                           [irq12: psm0]
100048                   I                           [irq1: atkbd0]
100046                   I                           [irq15: ata1]
100045                   I                           [irq14: ata0]
100024                   I                           [irq11: cbb0 bfe0++*]
100023                   I                           [irq9: pcm0 iwi0+]
100022                   I                           [swi6: Giant taskq]
100020                   I                           [swi5: +]
100019                   I                           [swi2: cambio]
100014                   I                           [swi6: task queue]
100006                   I                           [swi3: vm]
100005                   Run     CPU 0               [swi4: clock]
100004                   I                           [swi1: netisr 0]
  11     0     0     0  RL                          [idle: cpu0]
   1     0     1     0  SLs     wait     0xc3df2d48 [init]
  10     0     0     0  DL      audit_wo 0xc0b747e0 [audit]
   0     0     0     0  RLs     (threaded)          [kernel]
100050                   D       deadlkre 0xc09f36c4 [deadlkres]
100044                   D       -        0xc3f5db80 [iwi0 taskq]
100042                   D       -        0xc3f5e8c0 [fw0_taskq]
100021                   D       -        0xc3ea08c0 [thread taskq]
100018                   D       -        0xc3ea0a80 [kqueue taskq]
100017                   D       -        0xc3ea0c40 [acpi_task_2]
100016                   D       -        0xc3ea0c40 [acpi_task_1]
100015                   D       -        0xc3ea0c40 [acpi_task_0]
100010                   D       -        0xc3dda9c0 [firmware taskq]
100000                   RunQ                        [swapper]
1999  1976  1976     0  Z+                          ssh-agent
1988  1986  1986  1000  Z                           ssh-agent
db> show alllocks
Process 13467 (mkdir) thread 0xc543b270 (100150)
exclusive lockmgr bufwait (bufwait) r = 0 (0xebd1f8ec) locked @ 
/usr/src/sys/kern/vfs_bio.c:2581
exclusive lockmgr ufs (ufs) r = 0 (0xc542fc08) locked @ 
/usr/src/sys/kern/vfs_subr.c:2091
exclusive lockmgr bufwait (bufwait) r = 0 (0xebd15194) locked @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:11363
exclusive lockmgr ufs (ufs) r = 0 (0xc54279e8) locked @ 
/usr/src/sys/kern/vfs_lookup.c:502
Process 1986 (sshd) thread 0xc4d2ec30 (100113)
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc4f20560) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 22 (softdepflush) thread 0xc428f750 (100058)
exclusive sleep mutex struct mount mtx (struct mount mtx) r = 0 (0xc42a3a20) 
locked @ /usr/src/sys/kern/vfs_subr.c:341
Process 12 (intr) thread 0xc3df44e0 (100005)
exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc3e63604) locked @ 
/usr/src/sys/dev/dcons/dcons_os.c:232
db> show pcpu
cpuid        = 0
dynamic pcpu    = 0x64e400
curthread    = 0xc3df44e0: pid 12 "swi4: clock"
curpcb       = 0xc3bfbd90
fpcurthread  = none
idlethread   = 0xc3df49c0: pid 11 "idle: cpu0"
APIC ID      = 0
currentldt   = 0x50
spin locks held:
db> show lockedvnods
Locked vnodes

0xc5427990: tag ufs, type VDIR
   usecount 5, writecount 0, refcount 7 mountedhere 0
   flags ()
   v_object 0xc54224c8 ref 0 pages 0
   lock type ufs: EXCL by thread 0xc543b270 (pid 13467)
       ino 11375831, on dev ad0s4f.eli

0xc542fbb0: tag ufs, type VREG
   usecount 2, writecount 0, refcount 5 mountedhere 0
   flags ()
   v_object 0xc5422880 ref 0 pages 7
   lock type ufs: EXCL by thread 0xc543b270 (pid 13467)
       ino 11375884, on dev ad0s4f.eli
db> alltrace

Tracing command mkdir pid 13467 tid 100150 td 0xc543b270
sched_switch(c543b270,0,104,191,f614b28d,...) at sched_switch+0x36a
mi_switch(104,0,c093a853,1eb,4c,...) at mi_switch+0x200
sleepq_switch(c543b270,0,c093a853,260,0,...) at sleepq_switch+0x15f
sleepq_wait(ebd1f88c,4c,c0941795,0,0,...) at sleepq_wait+0x63
_sleep(ebd1f88c,c3da14ac,4c,c0941795,0,...) at _sleep+0x365
bwait(ebd1f88c,4c,c0941795,ebd1f88c,fb5d5584,...) at bwait+0x6f
bufwait(ebd1f88c,ebd1f88c,0,ebd1f88c,c54165e4,...) at bufwait+0x48
bufwrite(ebd1f88c,0,c09625e3,74d,0) at bufwrite+0x165
ffs_bufwrite(ebd1f88c,c542d500,100,4000,0,...) at ffs_bufwrite+0x290
ffs_update(c542fbb0,1,80000,fb5d567c,1,...) at ffs_update+0x27c
softdep_sync_metadata(c5427990,0,c0962b76,144,0,...) at 
softdep_sync_metadata+0xc9f
ffs_syncvnode(c5427990,1,c543b270,fb5d573c,246,...) at ffs_syncvnode+0x3e2
ffs_truncate(c5427990,400,0,880,c4c4f680,...) at ffs_truncate+0x862
ufs_direnter(c5427990,c542f660,fb5d5a1c,fb5d5c00,ebd2a930,...) at 
ufs_direnter+0x8d4
ufs_mkdir(fb5d5c28,fb5d5c3c,0,fb5d5bd4,fb5d5b6c,...) at ufs_mkdir+0x81f
VOP_MKDIR_APV(c09bd840,fb5d5c28,68,fb5d5bc0,0,...) at VOP_MKDIR_APV+0xa5
kern_mkdirat(c543b270,ffffff9c,bfbfeaa2,0,1ff,...) at kern_mkdirat+0x211
kern_mkdir(c543b270,bfbfeaa2,0,1ff,fb5d5d2c,...) at kern_mkdir+0x2e
mkdir(c543b270,fb5d5cf8,c,c,c5434aa0,...) at mkdir+0x29
syscall(fb5d5d38) at syscall+0x220
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (136, FreeBSD ELF32, mkdir), eip = 0x28178ef3, esp = 0xbfbfe7ec, 
ebp = 0xbfbfe8b8 ---

Tracing command sh pid 13444 tid 100147 td 0xc4fff9c0
sched_switch(c4fff9c0,0,104,191,f5d80613,...) at sched_switch+0x36a
mi_switch(104,0,c093a853,1eb,5c,...) at mi_switch+0x200
sleepq_switch(c4fff9c0,0,c093a853,1a0,5c,...) at sleepq_switch+0x15f
sleepq_catch_signals(c093a853,160,0,100,100,...) at sleepq_catch_signals+0xb7
sleepq_wait_sig(c54352a8,5c,c093d112,100,0,...) at sleepq_wait_sig+0x17
_sleep(c54352a8,c5435330,15c,c093d112,0,...) at _sleep+0x34e
kern_wait(c4fff9c0,ffffffff,fb5ccc74,2,0,...) at kern_wait+0xb76
wait4(c4fff9c0,fb5cccf8,c0972944,c093d0f1,c54352a8,...) at wait4+0x3d
syscall(fb5ccd38) at syscall+0x220
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (7, FreeBSD ELF32, wait4), eip = 0x281681eb, esp = 0xbfbfe5bc, ebp 
= 0xbfbfe5d8 ---


<snip>
I hope you don't really need all of the traces, I'll keep the system at
the DDB prompt for now, so I can do other stuff.

I really need any trace that is involved in vfs. Is this machine hooked to a serial console for ddb so you can pipe output to a file with script(1)? I understand the paths that hold the locks but I don't understand who is deadlocked against it. It looks like the thread that is holding the locks is just waiting on io to complete. So something that completes IO is hung.

The earlier deadlock you reported should have actually been there for years. It only happens when a directory is truncated though which is infrequent so it may not have been reported.

Thanks,
Jeff


Regards,
Uli
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to