Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting

2012-07-12 Thread Sean Bruno
On Thu, 2012-07-12 at 14:59 -0700, YongHyeon PYUN wrote:
> > I grabbed these updates and applied them cleanly to stable/9 on a
> Dell
> > R620 with a quad port BCM5720, I still see watchdog timeouts and
> reset
> > indications.  I am able to ping out of the box for a short amount of
> > time before the device hangs and times out.
> > 
> 
> Sean, sorry for late reply.
> Given that I have no problems on sample 5720 controller I still
> have no clue yet.
> 

No problems ... :-)

> > 
> > 
> > -bash-4.2# ping XXX.XXX.XXX.1
> > PING XXX.XXX.XXX.1 (XXX.XXX.XXX.XXX): 56 data bytes
> > ping: sendto: Network is down
> > ping: sendto: Network is down
> > ping: sendto: Network is down
> > ping: sendto: Network is down
> > ping: sendto: Network is down
> > Jul  9 17:31:41  x89 kernel: bge2: watchdog timeout --
> > resetting
> > Jul  9 17:31:41  x89 kernel: bge2: link state changed
> to
> > DOWN
> > Jul  9 17:31:41  x89 kernel: bge2: link state changed
> to
> 
> Two link state change message indicates there is an issue in state
> tracking. I'm experimenting a different approach but it seems it
> takes too long due to lack of time. Any way, I've uploaded updated
> bge(4)(URL is the same as before).

I see a bunch of firmware updates for this host along with an update to
the BCM firmware package for this Dell box.

I'll update my system's driver first, validate pass/fail, if fail, then
I'll update the firmware bits and validate some more.

sean

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mfi(4) IO performance regression, post 8.1

2012-07-12 Thread Steve McCoy

On 7/12/12 4:14 PM, Charles Owens wrote:

On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:


On 6/15/12 8:04 AM, John Baldwin wrote:
> On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
>> Hello FreeBSD folk,
>>
>> We're seeing what appears to be a storage performance regression as we
>> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
>> appears that the regression happened between 8.1 and 8.2.
>>
>> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
CPUs.
>> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
>> configuration, using UFS + geom_journal for filesystem.
>>
>> Postgresql performance, as seen via pgbench, dropped by approx 20%.
>> This testing was done with our usual PAE-enabled kernels.  We then
went
>> back to GENERIC kernels and did comparisons using "bonnie", results
>> below.  Following that is a kernel boot log.
>>
>> Notably, we're seeing this regression only with our RAID mfi(4) based
>> systems.  Notably, from looking at FreeBSD source changelogs it
appears
>> that the mfi(4) code has seen some changes since 8.1.
> Between 8.1 and 8.2 mfi has not had any significant changes.  The
only changes
> made to sys/dev/mfi were to add a new constant:
>
>> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
> svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
> Index: mfireg.h
> ===
> --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
> +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
> @@ -975,7 +975,9 @@
>  MFI_PD_STATE_OFFLINE = 0x10,
>  MFI_PD_STATE_FAILED = 0x11,
>  MFI_PD_STATE_REBUILD = 0x14,
> -   MFI_PD_STATE_ONLINE = 0x18
> +   MFI_PD_STATE_ONLINE = 0x18,
> +   MFI_PD_STATE_COPYBACK = 0x20,
> +   MFI_PD_STATE_SYSTEM = 0x40
>   };
>
>   union mfi_ld_ref {
>
> The difference in write performance must be due to something else.  You
> mentioned you are using UFS + gjournal.  I think gjournal uses
BIO_FLUSH, so I
> wonder if this is related:
>
>

> r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
lines
>
> MFC 212160:
>
> Correct bioq_disksort so that bioq_insert_tail() offers barrier
semantic.
> Add the BIO_ORDERED flag for struct bio and update bio clients to
use it.
>
> The barrier semantics of bioq_insert_tail() were broken in two ways:
>
>   o In bioq_disksort(), an added bio could be inserted at the head of
> the queue, even when a barrier was present, if the sort key for
> the new entry was less than that of the last queued barrier bio.
>
>   o The last_offset used to generate the sort key for newly queued bios
> did not stay at the position of the barrier until either the
> barrier was de-queued, or a new barrier (which updates last_offset)
> was queued.  When a barrier is in effect, we know that the disk
> will pass through the barrier position just before the
> "blocked bios" are released, so using the barrier's offset for
> last_offset is the optimal choice.
>
> sys/geom/sched/subr_disk.c:
> sys/kern/subr_disk.c:
>  o Update last_offset in bioq_insert_tail().
>
>  o Only update last_offset in bioq_remove() if the removed
bio is
>at the head of the queue (typically due to a call via
>bioq_takefirst()) and no barrier is active.
>
>  o In bioq_disksort(), if we have a barrier (insert_point is
non-NULL),
>set prev to the barrier and cur to it's next element.
Now that
>last_offset is kept at the barrier position, this change
isn't
>strictly necessary, but since we have to take a decision
branch
>anyway, it does avoid one, no-op, loop iteration in the
while
>loop that immediately follows.
>
>  o In bioq_disksort(), bypass the normal sort for bios with the
>BIO_ORDERED attribute and instead insert them into the queue
>with bioq_insert_tail().  bioq_insert_tail() not only gives
>the desired command order during insertion, but also
provides
>barrier semantics so that commands disksorted in the future
>cannot pass the just enqueued transaction.
>
> sys/sys/bio.h:
>  Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
>
> sys/cam/ata/ata_da.c:
> sys/cam/scsi/scsi_da.c
>  Use an ordered command for SCSI/ATA-NCQ commands issued in
>  response to bios with the BIO_ORDERED flag set.
>
> sys/cam/scsi/scsi_da.c
>  Use an ordered tag when issuing a synchronize cache command.
>
>  Wrap some lines to 80 columns.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> sys/geom/geom_io.c
>  Mark bios with the BIO_FLUSH command as BIO_ORDERED.
>
> Sponsored by:   Spectra Logic Corporation
>

Re: mfi(4) IO performance regression, post 8.1

2012-07-12 Thread Steve McCoy

On 7/12/12 4:34 PM, Steve McCoy wrote:

On 7/12/12 4:14 PM, Charles Owens wrote:

On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:


On 6/15/12 8:04 AM, John Baldwin wrote:
> On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
>> Hello FreeBSD folk,
>>
>> We're seeing what appears to be a storage performance regression
as we
>> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
>> appears that the regression happened between 8.1 and 8.2.
>>
>> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
CPUs.
>> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
>> configuration, using UFS + geom_journal for filesystem.
>>
>> Postgresql performance, as seen via pgbench, dropped by approx 20%.
>> This testing was done with our usual PAE-enabled kernels.  We then
went
>> back to GENERIC kernels and did comparisons using "bonnie", results
>> below.  Following that is a kernel boot log.
>>
>> Notably, we're seeing this regression only with our RAID mfi(4) based
>> systems.  Notably, from looking at FreeBSD source changelogs it
appears
>> that the mfi(4) code has seen some changes since 8.1.
> Between 8.1 and 8.2 mfi has not had any significant changes.  The
only changes
> made to sys/dev/mfi were to add a new constant:
>
>> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
> svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
> Index: mfireg.h
> ===
> --- mfireg.h(.../8.1/sys/dev/mfi)   (revision 237134)
> +++ mfireg.h(.../8.2/sys/dev/mfi)   (revision 237134)
> @@ -975,7 +975,9 @@
>  MFI_PD_STATE_OFFLINE = 0x10,
>  MFI_PD_STATE_FAILED = 0x11,
>  MFI_PD_STATE_REBUILD = 0x14,
> -   MFI_PD_STATE_ONLINE = 0x18
> +   MFI_PD_STATE_ONLINE = 0x18,
> +   MFI_PD_STATE_COPYBACK = 0x20,
> +   MFI_PD_STATE_SYSTEM = 0x40
>   };
>
>   union mfi_ld_ref {
>
> The difference in write performance must be due to something else.
You
> mentioned you are using UFS + gjournal.  I think gjournal uses
BIO_FLUSH, so I
> wonder if this is related:
>
>

> r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
lines
>
> MFC 212160:
>
> Correct bioq_disksort so that bioq_insert_tail() offers barrier
semantic.
> Add the BIO_ORDERED flag for struct bio and update bio clients to
use it.
>
> The barrier semantics of bioq_insert_tail() were broken in two ways:
>
>   o In bioq_disksort(), an added bio could be inserted at the head of
> the queue, even when a barrier was present, if the sort key for
> the new entry was less than that of the last queued barrier bio.
>
>   o The last_offset used to generate the sort key for newly queued
bios
> did not stay at the position of the barrier until either the
> barrier was de-queued, or a new barrier (which updates
last_offset)
> was queued.  When a barrier is in effect, we know that the disk
> will pass through the barrier position just before the
> "blocked bios" are released, so using the barrier's offset for
> last_offset is the optimal choice.
>
> sys/geom/sched/subr_disk.c:
> sys/kern/subr_disk.c:
>  o Update last_offset in bioq_insert_tail().
>
>  o Only update last_offset in bioq_remove() if the removed
bio is
>at the head of the queue (typically due to a call via
>bioq_takefirst()) and no barrier is active.
>
>  o In bioq_disksort(), if we have a barrier (insert_point is
non-NULL),
>set prev to the barrier and cur to it's next element.
Now that
>last_offset is kept at the barrier position, this change
isn't
>strictly necessary, but since we have to take a decision
branch
>anyway, it does avoid one, no-op, loop iteration in the
while
>loop that immediately follows.
>
>  o In bioq_disksort(), bypass the normal sort for bios with
the
>BIO_ORDERED attribute and instead insert them into the
queue
>with bioq_insert_tail().  bioq_insert_tail() not only gives
>the desired command order during insertion, but also
provides
>barrier semantics so that commands disksorted in the future
>cannot pass the just enqueued transaction.
>
> sys/sys/bio.h:
>  Add BIO_ORDERED as bit 4 of the bio_flags field in struct
bio.
>
> sys/cam/ata/ata_da.c:
> sys/cam/scsi/scsi_da.c
>  Use an ordered command for SCSI/ATA-NCQ commands issued in
>  response to bios with the BIO_ORDERED flag set.
>
> sys/cam/scsi/scsi_da.c
>  Use an ordered tag when issuing a synchronize cache command.
>
>  Wrap some lines to 80 columns.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> sys/geom/geom_io.c
>  Mark bios with the BIO_FLUSH command as BIO_ORDERED.
>
> Sponsored by:   Spectra Logic Corporation
>
-