>Number:         178997
>Category:       kern
>Synopsis:       Heavy disk I/O may hang system
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 26 20:00:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Klaus Weber
>Release:        Tested with 9/STABLE, verified in 10-CURRENT
>Organization:
-
>Environment:
FreeBSD filepile 9.1-STABLE FreeBSD 9.1-STABLE #5 r250475: Sun May 12 19:14:21 
CEST 2013     root@filepile:/usr/obj/usr/src/sys/FILEPILE  amd64
(Kernel config has some drivers removed and debugging options added. Tested 
with GENERIC as well)
>Description:
During heavy disk I/O (two bonnie++ processes working on the same disk 
simultaneously) causes an extreme degradation in disk throughput (combined 
throughput as observed in iostat drops to ~1-3 MB/sec). The problem shows when 
both bonnie++ processes are in the "Rewriting..." phase. 

Sometimes, the systems "hangs" (throughput reaches 0 MB/sec); in that case, the 
system does not recover from this state. "Hang" means that the system responds 
to ping etc., processes that are already running will continue, but every 
command that requires disk accesses (even on disks other than the disk being 
tested) just hangs. If shutdown can be started without disk access (i.e. still 
in cache), the system will begin the shutdown, but fails to sync all buffers.

Most testing was done on 9/Stable and an Areca ARC-1882 controller (arcmsr), 
but I have test-booted a 10.0-current kernel from early May 
("kernel-10.0-CURRENT-5.5.2013"), and verified that the problem still exists 
there. I have also tested with with an on-board SATA port ("Intel Cougar Point 
AHCI SATA controller"); with this setup, I can only observe extreme throughput 
degradation, but no hangs (see below for details).

Test-setup:
da0 is a 600 GB volume on the Areca controller:
da0 at arcmsr0 bus 0 scbus0 target 0 lun 0
da0: <Areca ARC-1882-VOL#000 R001> Fixed Direct Access SCSI-5 device 
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 572202MB (1171870720 512 byte sectors: 255H 63S/T 72945C)

gpart create -s gpt da0
gpart add -s 500G -t freebsd-ufs -l test0 -a 1M da0
newfs -b 64k -f 8k /dev/da0p1
mount /dev/da0p1 /mnt && mkdir /mnt/t{1,2} && chmod g+w /mnt/t{1,2}

To observe the problem, I start two bonnie++ instances (installed from ports) 
like this:
1) bonnie++ -s 64g -n 0 -d /mnt/t1
2) bonnie++ -s 64g -n 0 -d /mnt/t2

(in order to better see the various stages, I start the second bonnie++ about a 
minute after the first. There is nothing magic about the 64gb, the size should 
be large enough to keep the disk busy for a few minutes.)

I then observe disk throughput for da0 with 

iostat -dC -w 1 da0

             da0             cpu
  KB/t tps  MB/s  us ni sy in id
[... first bonnie++ starts, "writing 1 byte at a time" phase ...]
 64.00   6  0.37   0  0 12  0 88
 64.00   4  0.25   0  0 13  0 87
 64.00   6  0.37   0  0 12  0 88
[first bonnie++ enters "writing intelligently" phase]
 63.97 3847 240.33   0  0 12  0 87
 64.00 20824 1301.51   0  0 14  0 85
 64.00 20449 1278.04   0  0 14  1 85
 64.00 17840 1115.01   0  0 14  1 85
 64.00 16595 1037.21   0  0 13  1 85
[this is my baseline for upper limit on throughput: ~1 GB/sec]
[...second bonnie++ starts, no big changes]
 64.00 12869 804.30   0  0 15  1 84
[second bonnie++ gets to "Writing intelligently" phase]
 64.00 7773 485.78   0  0  9  0 90
 64.00 7310 456.86   0  0  7  1 92
 64.00 7430 464.40   0  0 10  0 89
 64.00 7151 446.93   0  0  6  1 93
 64.00 7410 463.10   0  0  8  0 91
[...]
 64.00 7349 459.29   0  0 10  0 90
 64.00 7478 467.35   0  0  7  0 93
 64.00 7153 447.06   0  0  9  0 90
 64.00 7335 458.41   0  0  7  0 93
 64.00 7918 494.88   0  0 10  0 90
[first bonnie++ gets to "Rewriting" phase]
 64.00 7426 464.10   0  0 12  1 87
 64.00 7251 453.17   0  0  9  0 90
 64.00 7036 439.75   0  0  6  1 93
 64.00 7296 455.98   0  0  8  1 90
 64.00 6808 425.51   0  0  6  1 93
 64.00 7531 470.66   0  0  8  1 91
 64.00 7323 457.67   0  0  8  0 91
 64.00 7666 479.14   0  0  6  0 94
 64.00 11921 745.07   0  0 12  1 87
 64.00 12043 752.69   0  0 12  1 87
 64.00 10814 675.89   0  0 11  1 89
 64.00 10925 682.82   0  0 12  0 88
[...]
[second bonnie++ starts "Rewriting" phase]
 64.00 10873 679.56   0  0 10  1 89
 64.00 12242 765.12   0  0 13  1 86
 64.00 10739 671.19   0  0 11  0 88
 64.00 3486 217.91   0  0  8  0 91
 64.00 6327 395.42   0  0 14  0 86
 64.00 6637 414.84   0  0  9  0 90
 64.00 4931 308.19   0  0 10  0 89
 64.00 4874 304.63   0  0 13  0 87
 64.00 4322 270.11   0  0 18  0 82
 64.00 4616 288.52   0  0 19  0 80
 64.00 3567 222.96   0  0 20  0 80
 64.00 3395 212.16   0  0 30  0 70
 64.00 1694 105.89   0  0 33  0 67
 64.00 1898 118.63   0  0 36  0 64
  0.00   0  0.00   0  0 12  0 88
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  4  0 96
  0.00   0  0.00   0  0  5  0 95
  0.00   0  0.00   0  0  5  0 95
[etc. The system does not recover from this state. All attempts to do disk I/O 
hang.]

When both bonnie++ processes are in their "Rewriting" phase, the system hangs 
within a few seconds. Both bonnie++ processes are in state "nbufkv". bufdaemon 
takes about 40% CPU time and is in state "qsleep" when not active.

I have also briefly tested a single disk instead of a RAID volume. In that 
case, I could not get the system to hang, put performance drops from ~150 
MB/sec to ~1-3 MB/sec. I do not know if that indicates a problem with the 
arcmsr driver, or whether I cannot create "enough" I/O load with a single disk:

[same setup as above, but on single disk connected to onboard SATA port.]
Baseline: 150 MB/sec (first bonnie++ in "Writing intelligently" phase):
            ada2             cpu
  KB/t tps  MB/s  us ni sy in id
 128.00 1221 152.60   0  0  2  0 97
 128.00 1173 146.60   0  0  2  0 98
 128.00 1171 146.35   0  0  2  0 98
[... first bonnie++ in "Rewriting", second in "Writing intelligently" phase:]
 127.83 1140 142.30   0  0  2  0 97
 128.00 1121 140.11   0  0  2  0 97
 128.00 1114 139.24   0  0  1  0 98
 127.29 904 112.39   0  0  5  0 95
[second bonnie++ starts rewriting]
 120.24  66  7.74   0  0  0  0 100
 128.00  59  7.37   0  0  0  0 100
 122.43  69  8.24   0  0  1  0 99
 128.00 222 27.72   0  0  0  0 100
 128.00 1122 140.23   0  0  3  0 97
 128.00 1080 134.99   0  0  3  0 97
 128.00 1157 144.61   0  0  3  0 97
 127.75 504 62.94   0  0  2  0 98
 128.00  59  7.37   0  0  1  0 99
 128.00 708 88.54   0  0  2  0 98
 128.00 969 121.13   0  0 11  0 89
 127.89 1161 144.99   0  0  3  0 97
 127.60 479 59.63   0  0  1  0 99
 128.00 596 74.55   0  0  2  0 98
 128.00 1169 146.10   0  0  3  0 97
 127.88 1076 134.37   0  0  2  0 97
 127.89 1202 150.10   0  0  3  0 97
 128.00 1074 134.24   0  0  4  0 96
 128.00 1039 129.87   0  0  8  0 92
 128.00 913 114.14   0  0 10  0 90
 127.73 896 111.78   0  0 11  0 89
 128.00 930 116.26   0  0 10  0 90
 127.93 909 113.57   0  0 11  0 89
 128.00 887 110.89   0  0 13  0 87
 128.00 743 92.91   0  0 18  0 81
 128.00 548 68.56   0  0 23  0 76
 128.00 523 65.43   0  0 24  0 76
 127.89 587 73.36   0  0 21  0 79
 127.89 586 73.24   0  0 22  0 78
 128.00 465 58.07   0  0 28  0 72
 127.89 558 69.74   0  0 25  0 75
 127.79 595 74.30   0  0 21  0 79
 128.00 593 74.18   0  0 23  0 77
 128.00 546 68.31   0  0 22  0 78
 127.70 428 53.32   0  0 32  0 68
 128.00 561 70.18   0  0 25  0 75
 128.00 645 80.67   0  0 22  0 78
 127.78 582 72.68   0  0 23  0 77
 114.68 518 58.07   0  0 27  0 73
 103.33 497 50.20   0  0 29  0 71
 110.00 128 13.74   0  0 35  0 65
 65.94  33  2.12   0  0 37  0 63
 74.97 286 20.92   0  0 36  0 64
 105.83 306 31.59   0  0 34  0 66
 95.12 399 37.03   0  0 30  0 70
 108.57 336 35.59   0  0 29  0 71
 127.22 246 30.53   0  0 41  0 59
 104.16 255 25.91   0  0 32  0 68
 103.68 528 53.51   0  0 23  0 77
 127.14 373 46.27   0  0 29  0 71
 90.65 341 30.16   0  0 35  0 65
 127.30 368 45.70   0  0 30  0 70
 66.56  25  1.62   0  0 37  0 63
 66.00  32  2.06   0  0 38  0 62
 94.99  95  8.80   0  0 38  0 62
 94.00 433 39.71   0  0 35  0 65
 107.66 666 70.06   0  0 26  0 74
 119.43 478 55.69   0  0 27  0 73
 95.64 441 41.15   0  0 28  0 72
 116.45 449 51.01   0  0 29  0 71
 102.22 503 50.26   0  0 28  0 72
 127.62 509 63.50   0  0 29  0 71
 111.84 388 42.33   0  0 29  0 71

(i.e. disk throughput jumps wildly between full speed and ~1% of maximum, 
despite continuous I/O pressure)


Unfortunately, I cannot use kdb on this machine - "ALT-CTRL-ESC" does nothing 
and "sysctl debug.kdb.enter=1" just says "debug.kdb.enter: 0 -> 0". "sysctl 
debug.kdb.panic=1" does panic the machine, but the keyboard is not working.

I have, however, logged the sysctl vas subtree every second while the problem 
occurs: 
As long as both bonnie++ are in "Writing intelligently..." phase, 
vfs.numdirtybuffers stays mostly between 80-100. When the first bonnie++ goes 
to "Rewriting..." (at 3:40 below) starts to increase (about 150-350 new dirty 
buffers per second). When the second bonnie++ goes to "Rewriting..." as well 
(at 4:24), this speeds up a lot (about 3000-5000 new dirty buffers/sec). A few 
seconds later, vfs.numdirtybuffers reaches its limit, and remains there until 
the machine is rebooted:

00-03-28.log:vfs.numdirtybuffers: 107
00-03-29.log:vfs.numdirtybuffers: 106
00-03-30.log:vfs.numdirtybuffers: 105
00-03-31.log:vfs.numdirtybuffers: 105
00-03-32.log:vfs.numdirtybuffers: 110
00-03-33.log:vfs.numdirtybuffers: 112
00-03-34.log:vfs.numdirtybuffers: 110
00-03-35.log:vfs.numdirtybuffers: 115
00-03-36.log:vfs.numdirtybuffers: 117
00-03-37.log:vfs.numdirtybuffers: 118
00-03-38.log:vfs.numdirtybuffers: 118
00-03-39.log:vfs.numdirtybuffers: 118
00-03-40.log:vfs.numdirtybuffers: 118
[first bonnie++ starts Rewriting]
00-03-41.log:vfs.numdirtybuffers: 294
00-03-42.log:vfs.numdirtybuffers: 404
00-03-43.log:vfs.numdirtybuffers: 959
00-03-44.log:vfs.numdirtybuffers: 1071
00-03-45.log:vfs.numdirtybuffers: 1155
00-03-46.log:vfs.numdirtybuffers: 1253
00-03-47.log:vfs.numdirtybuffers: 1366
00-03-48.log:vfs.numdirtybuffers: 1457
00-03-49.log:vfs.numdirtybuffers: 1549
00-03-50.log:vfs.numdirtybuffers: 1680
00-03-51.log:vfs.numdirtybuffers: 1904
00-03-52.log:vfs.numdirtybuffers: 2136
00-03-53.log:vfs.numdirtybuffers: 2363
00-03-54.log:vfs.numdirtybuffers: 2503
00-03-55.log:vfs.numdirtybuffers: 3023
00-03-56.log:vfs.numdirtybuffers: 3268
00-03-57.log:vfs.numdirtybuffers: 3601
00-03-58.log:vfs.numdirtybuffers: 3743
00-03-59.log:vfs.numdirtybuffers: 3904
00-04-00.log:vfs.numdirtybuffers: 4106
00-04-01.log:vfs.numdirtybuffers: 4325
00-04-02.log:vfs.numdirtybuffers: 4461
00-04-03.log:vfs.numdirtybuffers: 4643
00-04-04.log:vfs.numdirtybuffers: 5063
00-04-05.log:vfs.numdirtybuffers: 5202
00-04-06.log:vfs.numdirtybuffers: 5456
00-04-07.log:vfs.numdirtybuffers: 5593
00-04-08.log:vfs.numdirtybuffers: 5819
00-04-09.log:vfs.numdirtybuffers: 6092
00-04-10.log:vfs.numdirtybuffers: 6249
00-04-11.log:vfs.numdirtybuffers: 6412
00-04-12.log:vfs.numdirtybuffers: 6591
00-04-13.log:vfs.numdirtybuffers: 6698
00-04-14.log:vfs.numdirtybuffers: 6853
00-04-15.log:vfs.numdirtybuffers: 7031
00-04-16.log:vfs.numdirtybuffers: 7192
00-04-17.log:vfs.numdirtybuffers: 7387
00-04-18.log:vfs.numdirtybuffers: 7609
00-04-19.log:vfs.numdirtybuffers: 7757
00-04-20.log:vfs.numdirtybuffers: 7888
00-04-21.log:vfs.numdirtybuffers: 8156
00-04-22.log:vfs.numdirtybuffers: 8362
00-04-23.log:vfs.numdirtybuffers: 8558
[second bonnie goes Rewriting as well]
00-04-24.log:vfs.numdirtybuffers: 11586
00-04-25.log:vfs.numdirtybuffers: 16325
00-04-26.log:vfs.numdirtybuffers: 24333
00-04-27.log:vfs.numdirtybuffers: 29058
00-04-28.log:vfs.numdirtybuffers: 33752
00-04-29.log:vfs.numdirtybuffers: 38792
00-04-30.log:vfs.numdirtybuffers: 42663
00-04-31.log:vfs.numdirtybuffers: 47305
00-04-33.log:vfs.numdirtybuffers: 52070
00-04-34.log:vfs.numdirtybuffers: 52066
00-04-35.log:vfs.numdirtybuffers: 52063
00-04-37.log:vfs.numdirtybuffers: 52098
00-04-39.log:vfs.numdirtybuffers: 52098
00-04-41.log:vfs.numdirtybuffers: 52097
00-04-44.log:vfs.numdirtybuffers: 52097
00-04-46.log:vfs.numdirtybuffers: 52098
00-04-49.log:vfs.numdirtybuffers: 52096
00-04-52.log:vfs.numdirtybuffers: 52098
00-04-54.log:vfs.numdirtybuffers: 52096
00-04-57.log:vfs.numdirtybuffers: 52098
00-05-00.log:vfs.numdirtybuffers: 52096
[ etc. ]

vfs.hidirtybuffers: 52139
vfs.lodirtybuffers: 26069
(the machine has 32 GB RAM)

(I could upload the archive of the vfs logs somewhere, so you can check other 
values over time, if that would be helpful)


I have found 
http://freebsd.1045724.n5.nabble.com/7-1-RELEASE-I-O-hang-td3966945.html , 
which seems to describe the same issue (also in combination with a RAID 
controller, albeit a different one).

The symptoms described in r189878 (issues in bufdaemon, causing processes to 
hang in nbufkv) sound simliar. 

Please let me know if there is any other info that I can provide.


>How-To-Repeat:
gpart create -s gpt da0
gpart add -s 500G -t freebsd-ufs -l test0 -a 1M da0
newfs -b 64k -f 8k /dev/da0p1
mount /dev/da0p1 /mnt && mkdir /mnt/t{1,2} && chmod g+w /mnt/t{1,2}

Start two bonnie++ instances (installed from ports) like this:
1) bonnie++ -s 64g -n 0 -d /mnt/t1
2) bonnie++ -s 64g -n 0 -d /mnt/t2

Observe disk throughput for da0 with 

iostat -dC -w 1 da0

in particular when both bonnie++ processes are in their "Rewriting..." phase.
>Fix:
No fix known.

>Release-Note:
>Audit-Trail:
>Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to