Re: mfi(4) IO performance regression, post 8.1

2012-07-13 Thread John Baldwin
On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote:
> On 7/12/12 4:34 PM, Steve McCoy wrote:
> > On 7/12/12 4:14 PM, Charles Owens wrote:
> >> On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
> >>>
> >>> On 6/15/12 8:04 AM, John Baldwin wrote:
> >>> > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
> >>> >> Hello FreeBSD folk,
> >>> >>
> >>> >> We're seeing what appears to be a storage performance regression
> >>> as we
> >>> >> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
> >>> >> appears that the regression happened between 8.1 and 8.2.
> >>> >>
> >>> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
> >>> CPUs.
> >>> >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
> >>> >> configuration, using UFS + geom_journal for filesystem.
> >>> >>
> >>> >> Postgresql performance, as seen via pgbench, dropped by approx 20%.
> >>> >> This testing was done with our usual PAE-enabled kernels.  We then
> >>> went
> >>> >> back to GENERIC kernels and did comparisons using "bonnie", results
> >>> >> below.  Following that is a kernel boot log.
> >>> >>
> >>> >> Notably, we're seeing this regression only with our RAID mfi(4) based
> >>> >> systems.  Notably, from looking at FreeBSD source changelogs it
> >>> appears
> >>> >> that the mfi(4) code has seen some changes since 8.1.
> >>> > Between 8.1 and 8.2 mfi has not had any significant changes.  The
> >>> only changes
> >>> > made to sys/dev/mfi were to add a new constant:
> >>> >
> >>> >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
> >>> > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
> >>> > Index: mfireg.h
> >>> > ===
> >>> > --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
> >>> > +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
> >>> > @@ -975,7 +975,9 @@
> >>> >  MFI_PD_STATE_OFFLINE = 0x10,
> >>> >  MFI_PD_STATE_FAILED = 0x11,
> >>> >  MFI_PD_STATE_REBUILD = 0x14,
> >>> > -   MFI_PD_STATE_ONLINE = 0x18
> >>> > +   MFI_PD_STATE_ONLINE = 0x18,
> >>> > +   MFI_PD_STATE_COPYBACK = 0x20,
> >>> > +   MFI_PD_STATE_SYSTEM = 0x40
> >>> >   };
> >>> >
> >>> >   union mfi_ld_ref {
> >>> >
> >>> > The difference in write performance must be due to something else.
> >>> You
> >>> > mentioned you are using UFS + gjournal.  I think gjournal uses
> >>> BIO_FLUSH, so I
> >>> > wonder if this is related:
> >>> >
> >>> >
> >>> 
> >>> > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
> >>> lines
> >>> >
> >>> > MFC 212160:
> >>> >
> >>> > Correct bioq_disksort so that bioq_insert_tail() offers barrier
> >>> semantic.
> >>> > Add the BIO_ORDERED flag for struct bio and update bio clients to
> >>> use it.
> >>> >
> >>> > The barrier semantics of bioq_insert_tail() were broken in two ways:
> >>> >
> >>> >   o In bioq_disksort(), an added bio could be inserted at the head of
> >>> > the queue, even when a barrier was present, if the sort key for
> >>> > the new entry was less than that of the last queued barrier bio.
> >>> >
> >>> >   o The last_offset used to generate the sort key for newly queued
> >>> bios
> >>> > did not stay at the position of the barrier until either the
> >>> > barrier was de-queued, or a new barrier (which updates
> >>> last_offset)
> >>> > was queued.  When a barrier is in effect, we know that the disk
> >>> > will pass through the barrier position just before the
> >>> > "blocked bios" are released, so using the barrier's offset for
> >>> > last_offset is the optimal choice.
> >>> >
> >>> > sys/geom/sched/subr_disk.c:
> >>> > sys/kern/subr_disk.c:
> >>> >  o Update last_offset in bioq_insert_tail().
> >>> >
> >>> >  o Only update last_offset in bioq_remove() if the removed
> >>> bio is
> >>> >at the head of the queue (typically due to a call via
> >>> >bioq_takefirst()) and no barrier is active.
> >>> >
> >>> >  o In bioq_disksort(), if we have a barrier (insert_point is
> >>> non-NULL),
> >>> >set prev to the barrier and cur to it's next element.
> >>> Now that
> >>> >last_offset is kept at the barrier position, this change
> >>> isn't
> >>> >strictly necessary, but since we have to take a decision
> >>> branch
> >>> >anyway, it does avoid one, no-op, loop iteration in the
> >>> while
> >>> >loop that immediately follows.
> >>> >
> >>> >  o In bioq_disksort(), bypass the normal sort for bios with
> >>> the
> >>> >BIO_ORDERED attribute and instead insert them into the
> >>> queue
> >>> >with bioq_insert_tail().  bioq_insert_tail() not only gives
> >>> >the desired command order during insertion, but also
> >>> provides
> >>> >barrier semantics so that

stable/9 panic Bad tailq NEXT(0xffffffff80e52660->tqh_last) != NULL

2012-07-13 Thread Sean Bruno
Well this is new.  I haven't a clue what Dell has done on this R620, but
this popped up today after I did a boat load of BIOS updates and tried
to install stable/9 from our yahoo tree.  If anyone sees the obvious
solution here, I'd love to figure it out.

found-> vendor=0x14e4, dev=0x165f, revid=0x00
domain=0, bus=2, slot=0, func=1
class=02-00-00, hdrtype=0x00, mfdev=1
cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=b, irq=6
powerspec 3  supports D0 D3  current D0
MSI supports 8 messages, 64 bit
MSI-X supports 17 messages in map 0x20
map[10]: type Prefetchable Memory, range 64, base 0xd50d,
size 16, enabled
pcib1: allocated prefetch range (0xd50d-0xd50d) for rid 10 of
pci0:2:0:1
map[18]: type Prefetchable Memory, range 64, base 0xd50e,
size 16, enabled
pcib1: allocated prefetch range (0xd50e-0xd50e) for rid 18 of
pci0:2:0:1
map[20]: type Prefetchable Memory, range 64, base 0xd50f,
size 16, enabled
pcib1: allocated prefetch range (0xd50f-0xd50f) for rid 20 of
pci0:2:0:1
pcib1: matched entry for 2.0.INTB
pcib1: slot 0 INTB hardwired to IRQ 36
bge0:  mem
0xd50a-0xd50a,0xd50b-0xd50b,0xd50c-0xd50c irq 34
at device 0.0 on pci2
bge0: APE FW version: NCSI v1.0.80.0
bge0: attempting to allocate 1 MSI vectors (8 supported)
msi: routing MSI IRQ 264 to local APIC 0 vector 59
bge0: using IRQ 264 for MSI
bge0: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
bge0: Disabling fastboot
bge0: Disabling fastboot
miibus0:  on bge0
brgphy0:  PHY 1 on miibus0
brgphy0: OUI 0x001be9, model 0x0036, rev. 0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: bpf attached
bge0: Ethernet address: 18:03:73:fd:9e:36
bge1:  mem
0xd50d-0xd50d,0xd50e-0xd50e,0xd50f-0xd50f irq 36
at device 0.1 on pci2
bge1: APE FW version: NCSI v1.0.80.0
bge1: attempting to allocate 1 MSI vectors (8 supported)
msi: routing MSI IRQ 265 to local APIC 0 vector 60
bge1: using IRQ 265 for MSI
bge1: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
bge1: Disabling fastboot
bge1: Disabling fastboot
miibus1:  on bge1
brgphy1:  PHY 2 on miibus1
brgphy1: OUI 0x001be9, model 0x0036, rev. 0
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: bpf attached
bge1: Ethernet address: 18:03:73:fd:9e:37
pcib2:  irq 53 at device 1.1 on pci0
pcib0: allocated type 3 (0xd880-0xd8ff) for rid 20 of pcib2
pcib0: allocated type 3 (0xd510-0xd51f) for rid 24 of pcib2
pcib2:   domain0
pcib2:   secondary bus 1
pcib2:   subordinate bus   1
pcib2:   memory decode 0xd880-0xd8ff
pcib2:   prefetched decode 0xd510-0xd51f
pci1:  on pcib2
pci1: domain=0, physical bus=1
found-> vendor=0x14e4, dev=0x165f, revid=0x00
domain=0, bus=1, slot=0, func=0
class=02-00-00, hdrtype=0x00, mfdev=1
cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=a, irq=15
powerspec 3  supports D0 D3  current D0
MSI supports 8 messages, 64 bit
MSI-X supports 17 messages in map 0x20
map[10]: type Prefetchable Memory, range 64, base 0xd51a,
size 16, enabled
pcib2: allocated prefetch range (0xd51a-0xd51a) for rid 10 of
pci0:1:0:0
map[18]: type Prefetchable Memory, range 64, base 0xd51b,
size 16, enabled
pcib2: allocated prefetch range (0xd51b-0xd51b) for rid 18 of
pci0:1:0:0
map[20]: type Prefetchable Memory, range 64, base 0xd51c,
size 16, enabled
pcib2: allocated prefetch range (0xd51c-0xd51c) for rid 20 of
pci0:1:0:0
pcib2: matched entry for 1.0.INTA
pcib2: slot 0 INTA hardwired to IRQ 35
found-> vendor=0x14e4, dev=0x165f, revid=0x00
domain=0, bus=1, slot=0, func=1
class=02-00-00, hdrtype=0x00, mfdev=1
cmdreg=0x0006, statreg=0x0010, cachelnsz=16 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=b, irq=6
powerspec 3  supports D0 D3  current D0
MSI supports 8 messages, 64 bit
MSI-X supports 17 messages in map 0x20
map[10]: type Prefetchable Memory, range 64, base 0xd51d,
size 16, enabled
pcib2: allocated prefetch range (0xd51d-0xd51d) for rid 10 of
pci0:1:0:1
map[18]: type Prefetchable Memory, range 64, base 0xd51e,
size 16, enabled
pcib2: allocated prefetch range (0xd51e-0xd51e) for rid 18 of
pci0:1:0:1
map[20]: type Prefetchable Memory, range 64, base 0xd51f,
size 16, enabled
pcib2: allocated prefetch range (0xd51f-0xd51f) for rid 20 of
pci0:1:0:1
pcib2: matched entry for 1.0.INTB
pcib2: slo

Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting

2012-07-13 Thread Sean Bruno
On Thu, 2012-07-12 at 12:06 -0700, Sean Bruno wrote:
> On Thu, 2012-07-12 at 14:59 -0700, YongHyeon PYUN wrote:
> > > I grabbed these updates and applied them cleanly to stable/9 on a
> > Dell
> > > R620 with a quad port BCM5720, I still see watchdog timeouts and
> > reset
> > > indications.  I am able to ping out of the box for a short amount of
> > > time before the device hangs and times out.
> > > 
> > 
> > Sean, sorry for late reply.
> > Given that I have no problems on sample 5720 controller I still
> > have no clue yet.
> > 
> 
> No problems ... :-)
> 
> > > 
> > > 
> > > -bash-4.2# ping XXX.XXX.XXX.1
> > > PING XXX.XXX.XXX.1 (XXX.XXX.XXX.XXX): 56 data bytes
> > > ping: sendto: Network is down
> > > ping: sendto: Network is down
> > > ping: sendto: Network is down
> > > ping: sendto: Network is down
> > > ping: sendto: Network is down
> > > Jul  9 17:31:41  x89 kernel: bge2: watchdog timeout --
> > > resetting
> > > Jul  9 17:31:41  x89 kernel: bge2: link state changed
> > to
> > > DOWN
> > > Jul  9 17:31:41  x89 kernel: bge2: link state changed
> > to
> > 
> > Two link state change message indicates there is an issue in state
> > tracking. I'm experimenting a different approach but it seems it
> > takes too long due to lack of time. Any way, I've uploaded updated
> > bge(4)(URL is the same as before).
> 
> I see a bunch of firmware updates for this host along with an update to
> the BCM firmware package for this Dell box.
> 
> I'll update my system's driver first, validate pass/fail, if fail, then
> I'll update the firmware bits and validate some more.
> 
> sean
> 

No real change.  I suspect something else is going on here that I don't
understand.  I note that when the system malfunctions now, the system
cannot boot and requires me to enter the bios to check my settings.

Sean


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting

2012-07-13 Thread Steven Hartland
- Original Message - 
From: "Sean Bruno" 

No real change.  I suspect something else is going on here that I don't
understand.  I note that when the system malfunctions now, the system
cannot boot and requires me to enter the bios to check my settings.


We've had a machine which was having watchdog timeouts on a bge
which turned out to be a hardware failure.

Not exactly sure exactly what but disabling cores of the second CPU
solved the problem.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"