Re: mfi(4) IO performance regression, post 8.1
On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote: > On 7/12/12 4:34 PM, Steve McCoy wrote: > > On 7/12/12 4:14 PM, Charles Owens wrote: > >> On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote: > >>> > >>> On 6/15/12 8:04 AM, John Baldwin wrote: > >>> > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: > >>> >> Hello FreeBSD folk, > >>> >> > >>> >> We're seeing what appears to be a storage performance regression > >>> as we > >>> >> try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it > >>> >> appears that the regression happened between 8.1 and 8.2. > >>> >> > >>> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core > >>> CPUs. > >>> >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 > >>> >> configuration, using UFS + geom_journal for filesystem. > >>> >> > >>> >> Postgresql performance, as seen via pgbench, dropped by approx 20%. > >>> >> This testing was done with our usual PAE-enabled kernels. We then > >>> went > >>> >> back to GENERIC kernels and did comparisons using "bonnie", results > >>> >> below. Following that is a kernel boot log. > >>> >> > >>> >> Notably, we're seeing this regression only with our RAID mfi(4) based > >>> >> systems. Notably, from looking at FreeBSD source changelogs it > >>> appears > >>> >> that the mfi(4) code has seen some changes since 8.1. > >>> > Between 8.1 and 8.2 mfi has not had any significant changes. The > >>> only changes > >>> > made to sys/dev/mfi were to add a new constant: > >>> > > >>> >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi > >>> > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi > >>> > Index: mfireg.h > >>> > === > >>> > --- mfireg.h(.../8.1/sys/dev/mfi) (revision 237134) > >>> > +++ mfireg.h(.../8.2/sys/dev/mfi) (revision 237134) > >>> > @@ -975,7 +975,9 @@ > >>> > MFI_PD_STATE_OFFLINE = 0x10, > >>> > MFI_PD_STATE_FAILED = 0x11, > >>> > MFI_PD_STATE_REBUILD = 0x14, > >>> > - MFI_PD_STATE_ONLINE = 0x18 > >>> > + MFI_PD_STATE_ONLINE = 0x18, > >>> > + MFI_PD_STATE_COPYBACK = 0x20, > >>> > + MFI_PD_STATE_SYSTEM = 0x40 > >>> > }; > >>> > > >>> > union mfi_ld_ref { > >>> > > >>> > The difference in write performance must be due to something else. > >>> You > >>> > mentioned you are using UFS + gjournal. I think gjournal uses > >>> BIO_FLUSH, so I > >>> > wonder if this is related: > >>> > > >>> > > >>> > >>> > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 > >>> lines > >>> > > >>> > MFC 212160: > >>> > > >>> > Correct bioq_disksort so that bioq_insert_tail() offers barrier > >>> semantic. > >>> > Add the BIO_ORDERED flag for struct bio and update bio clients to > >>> use it. > >>> > > >>> > The barrier semantics of bioq_insert_tail() were broken in two ways: > >>> > > >>> > o In bioq_disksort(), an added bio could be inserted at the head of > >>> > the queue, even when a barrier was present, if the sort key for > >>> > the new entry was less than that of the last queued barrier bio. > >>> > > >>> > o The last_offset used to generate the sort key for newly queued > >>> bios > >>> > did not stay at the position of the barrier until either the > >>> > barrier was de-queued, or a new barrier (which updates > >>> last_offset) > >>> > was queued. When a barrier is in effect, we know that the disk > >>> > will pass through the barrier position just before the > >>> > "blocked bios" are released, so using the barrier's offset for > >>> > last_offset is the optimal choice. > >>> > > >>> > sys/geom/sched/subr_disk.c: > >>> > sys/kern/subr_disk.c: > >>> > o Update last_offset in bioq_insert_tail(). > >>> > > >>> > o Only update last_offset in bioq_remove() if the removed > >>> bio is > >>> >at the head of the queue (typically due to a call via > >>> >bioq_takefirst()) and no barrier is active. > >>> > > >>> > o In bioq_disksort(), if we have a barrier (insert_point is > >>> non-NULL), > >>> >set prev to the barrier and cur to it's next element. > >>> Now that > >>> >last_offset is kept at the barrier position, this change > >>> isn't > >>> >strictly necessary, but since we have to take a decision > >>> branch > >>> >anyway, it does avoid one, no-op, loop iteration in the > >>> while > >>> >loop that immediately follows. > >>> > > >>> > o In bioq_disksort(), bypass the normal sort for bios with > >>> the > >>> >BIO_ORDERED attribute and instead insert them into the > >>> queue > >>> >with bioq_insert_tail(). bioq_insert_tail() not only gives > >>> >the desired command order during insertion, but also > >>> provides > >>> >barrier semantics so that
stable/9 panic Bad tailq NEXT(0xffffffff80e52660->tqh_last) != NULL
Well this is new. I haven't a clue what Dell has done on this R620, but this popped up today after I did a boat load of BIOS updates and tried to install stable/9 from our yahoo tree. If anyone sees the obvious solution here, I'd love to figure it out. found-> vendor=0x14e4, dev=0x165f, revid=0x00 domain=0, bus=2, slot=0, func=1 class=02-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=6 powerspec 3 supports D0 D3 current D0 MSI supports 8 messages, 64 bit MSI-X supports 17 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base 0xd50d, size 16, enabled pcib1: allocated prefetch range (0xd50d-0xd50d) for rid 10 of pci0:2:0:1 map[18]: type Prefetchable Memory, range 64, base 0xd50e, size 16, enabled pcib1: allocated prefetch range (0xd50e-0xd50e) for rid 18 of pci0:2:0:1 map[20]: type Prefetchable Memory, range 64, base 0xd50f, size 16, enabled pcib1: allocated prefetch range (0xd50f-0xd50f) for rid 20 of pci0:2:0:1 pcib1: matched entry for 2.0.INTB pcib1: slot 0 INTB hardwired to IRQ 36 bge0: mem 0xd50a-0xd50a,0xd50b-0xd50b,0xd50c-0xd50c irq 34 at device 0.0 on pci2 bge0: APE FW version: NCSI v1.0.80.0 bge0: attempting to allocate 1 MSI vectors (8 supported) msi: routing MSI IRQ 264 to local APIC 0 vector 59 bge0: using IRQ 264 for MSI bge0: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E bge0: Disabling fastboot bge0: Disabling fastboot miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: OUI 0x001be9, model 0x0036, rev. 0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: bpf attached bge0: Ethernet address: 18:03:73:fd:9e:36 bge1: mem 0xd50d-0xd50d,0xd50e-0xd50e,0xd50f-0xd50f irq 36 at device 0.1 on pci2 bge1: APE FW version: NCSI v1.0.80.0 bge1: attempting to allocate 1 MSI vectors (8 supported) msi: routing MSI IRQ 265 to local APIC 0 vector 60 bge1: using IRQ 265 for MSI bge1: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E bge1: Disabling fastboot bge1: Disabling fastboot miibus1: on bge1 brgphy1: PHY 2 on miibus1 brgphy1: OUI 0x001be9, model 0x0036, rev. 0 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge1: bpf attached bge1: Ethernet address: 18:03:73:fd:9e:37 pcib2: irq 53 at device 1.1 on pci0 pcib0: allocated type 3 (0xd880-0xd8ff) for rid 20 of pcib2 pcib0: allocated type 3 (0xd510-0xd51f) for rid 24 of pcib2 pcib2: domain0 pcib2: secondary bus 1 pcib2: subordinate bus 1 pcib2: memory decode 0xd880-0xd8ff pcib2: prefetched decode 0xd510-0xd51f pci1: on pcib2 pci1: domain=0, physical bus=1 found-> vendor=0x14e4, dev=0x165f, revid=0x00 domain=0, bus=1, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=15 powerspec 3 supports D0 D3 current D0 MSI supports 8 messages, 64 bit MSI-X supports 17 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base 0xd51a, size 16, enabled pcib2: allocated prefetch range (0xd51a-0xd51a) for rid 10 of pci0:1:0:0 map[18]: type Prefetchable Memory, range 64, base 0xd51b, size 16, enabled pcib2: allocated prefetch range (0xd51b-0xd51b) for rid 18 of pci0:1:0:0 map[20]: type Prefetchable Memory, range 64, base 0xd51c, size 16, enabled pcib2: allocated prefetch range (0xd51c-0xd51c) for rid 20 of pci0:1:0:0 pcib2: matched entry for 1.0.INTA pcib2: slot 0 INTA hardwired to IRQ 35 found-> vendor=0x14e4, dev=0x165f, revid=0x00 domain=0, bus=1, slot=0, func=1 class=02-00-00, hdrtype=0x00, mfdev=1 cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=6 powerspec 3 supports D0 D3 current D0 MSI supports 8 messages, 64 bit MSI-X supports 17 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base 0xd51d, size 16, enabled pcib2: allocated prefetch range (0xd51d-0xd51d) for rid 10 of pci0:1:0:1 map[18]: type Prefetchable Memory, range 64, base 0xd51e, size 16, enabled pcib2: allocated prefetch range (0xd51e-0xd51e) for rid 18 of pci0:1:0:1 map[20]: type Prefetchable Memory, range 64, base 0xd51f, size 16, enabled pcib2: allocated prefetch range (0xd51f-0xd51f) for rid 20 of pci0:1:0:1 pcib2: matched entry for 1.0.INTB pcib2: slo
Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting
On Thu, 2012-07-12 at 12:06 -0700, Sean Bruno wrote: > On Thu, 2012-07-12 at 14:59 -0700, YongHyeon PYUN wrote: > > > I grabbed these updates and applied them cleanly to stable/9 on a > > Dell > > > R620 with a quad port BCM5720, I still see watchdog timeouts and > > reset > > > indications. I am able to ping out of the box for a short amount of > > > time before the device hangs and times out. > > > > > > > Sean, sorry for late reply. > > Given that I have no problems on sample 5720 controller I still > > have no clue yet. > > > > No problems ... :-) > > > > > > > > > > -bash-4.2# ping XXX.XXX.XXX.1 > > > PING XXX.XXX.XXX.1 (XXX.XXX.XXX.XXX): 56 data bytes > > > ping: sendto: Network is down > > > ping: sendto: Network is down > > > ping: sendto: Network is down > > > ping: sendto: Network is down > > > ping: sendto: Network is down > > > Jul 9 17:31:41 x89 kernel: bge2: watchdog timeout -- > > > resetting > > > Jul 9 17:31:41 x89 kernel: bge2: link state changed > > to > > > DOWN > > > Jul 9 17:31:41 x89 kernel: bge2: link state changed > > to > > > > Two link state change message indicates there is an issue in state > > tracking. I'm experimenting a different approach but it seems it > > takes too long due to lack of time. Any way, I've uploaded updated > > bge(4)(URL is the same as before). > > I see a bunch of firmware updates for this host along with an update to > the BCM firmware package for this Dell box. > > I'll update my system's driver first, validate pass/fail, if fail, then > I'll update the firmware bits and validate some more. > > sean > No real change. I suspect something else is going on here that I don't understand. I note that when the system malfunctions now, the system cannot boot and requires me to enter the bios to check my settings. Sean ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting
- Original Message - From: "Sean Bruno" No real change. I suspect something else is going on here that I don't understand. I note that when the system malfunctions now, the system cannot boot and requires me to enter the bios to check my settings. We've had a machine which was having watchdog timeouts on a bge which turned out to be a hardware failure. Not exactly sure exactly what but disabling cores of the second CPU solved the problem. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"