On 7/12/12 4:34 PM, Steve McCoy wrote:
On 7/12/12 4:14 PM, Charles Owens wrote:
On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:

On 6/15/12 8:04 AM, John Baldwin wrote:
> On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
>> Hello FreeBSD folk,
>>
>> We're seeing what appears to be a storage performance regression
as we
>> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
>> appears that the regression happened between 8.1 and 8.2.
>>
>> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
CPUs.
>> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
>> configuration, using UFS + geom_journal for filesystem.
>>
>> Postgresql performance, as seen via pgbench, dropped by approx 20%.
>> This testing was done with our usual PAE-enabled kernels.  We then
went
>> back to GENERIC kernels and did comparisons using "bonnie", results
>> below.  Following that is a kernel boot log.
>>
>> Notably, we're seeing this regression only with our RAID mfi(4) based
>> systems.  Notably, from looking at FreeBSD source changelogs it
appears
>> that the mfi(4) code has seen some changes since 8.1.
> Between 8.1 and 8.2 mfi has not had any significant changes.  The
only changes
> made to sys/dev/mfi were to add a new constant:
>
>> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
> svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
> Index: mfireg.h
> ===================================================================
> --- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
> +++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
> @@ -975,7 +975,9 @@
>          MFI_PD_STATE_OFFLINE = 0x10,
>          MFI_PD_STATE_FAILED = 0x11,
>          MFI_PD_STATE_REBUILD = 0x14,
> -       MFI_PD_STATE_ONLINE = 0x18
> +       MFI_PD_STATE_ONLINE = 0x18,
> +       MFI_PD_STATE_COPYBACK = 0x20,
> +       MFI_PD_STATE_SYSTEM = 0x40
>   };
>
>   union mfi_ld_ref {
>
> The difference in write performance must be due to something else.
You
> mentioned you are using UFS + gjournal.  I think gjournal uses
BIO_FLUSH, so I
> wonder if this is related:
>
>
------------------------------------------------------------------------
> r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
lines
>
> MFC 212160:
>
> Correct bioq_disksort so that bioq_insert_tail() offers barrier
semantic.
> Add the BIO_ORDERED flag for struct bio and update bio clients to
use it.
>
> The barrier semantics of bioq_insert_tail() were broken in two ways:
>
>   o In bioq_disksort(), an added bio could be inserted at the head of
>     the queue, even when a barrier was present, if the sort key for
>     the new entry was less than that of the last queued barrier bio.
>
>   o The last_offset used to generate the sort key for newly queued
bios
>     did not stay at the position of the barrier until either the
>     barrier was de-queued, or a new barrier (which updates
last_offset)
>     was queued.  When a barrier is in effect, we know that the disk
>     will pass through the barrier position just before the
>     "blocked bios" are released, so using the barrier's offset for
>     last_offset is the optimal choice.
>
> sys/geom/sched/subr_disk.c:
> sys/kern/subr_disk.c:
>          o Update last_offset in bioq_insert_tail().
>
>          o Only update last_offset in bioq_remove() if the removed
bio is
>            at the head of the queue (typically due to a call via
>            bioq_takefirst()) and no barrier is active.
>
>          o In bioq_disksort(), if we have a barrier (insert_point is
non-NULL),
>            set prev to the barrier and cur to it's next element.
Now that
>            last_offset is kept at the barrier position, this change
isn't
>            strictly necessary, but since we have to take a decision
branch
>            anyway, it does avoid one, no-op, loop iteration in the
while
>            loop that immediately follows.
>
>          o In bioq_disksort(), bypass the normal sort for bios with
the
>            BIO_ORDERED attribute and instead insert them into the
queue
>            with bioq_insert_tail().  bioq_insert_tail() not only gives
>            the desired command order during insertion, but also
provides
>            barrier semantics so that commands disksorted in the future
>            cannot pass the just enqueued transaction.
>
> sys/sys/bio.h:
>          Add BIO_ORDERED as bit 4 of the bio_flags field in struct
bio.
>
> sys/cam/ata/ata_da.c:
> sys/cam/scsi/scsi_da.c
>          Use an ordered command for SCSI/ATA-NCQ commands issued in
>          response to bios with the BIO_ORDERED flag set.
>
> sys/cam/scsi/scsi_da.c
>          Use an ordered tag when issuing a synchronize cache command.
>
>          Wrap some lines to 80 columns.
>
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> sys/geom/geom_io.c
>          Mark bios with the BIO_FLUSH command as BIO_ORDERED.
>
> Sponsored by:   Spectra Logic Corporation
>
------------------------------------------------------------------------
>
> Can you try perhaps commenting out the 'bp->bio_flags |=
BIO_ORDERED' line
> changed in geom_io.c in 8.2?  That would be effectively reverting this
> portion of the diff:
>
> Index: geom_io.c
> ===================================================================
> --- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
> +++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
> @@ -265,6 +265,7 @@
>          g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
>          bp = g_alloc_bio();
>          bp->bio_cmd = BIO_FLUSH;
> +       bp->bio_flags |= BIO_ORDERED;
>          bp->bio_done = NULL;
>          bp->bio_attribute = NULL;
>          bp->bio_offset = cp->provider->mediasize;
>

John... thanks for the suggestion.  I've built and tested a kernel with
this change made.  Result:  no change (same performance as with
8.2-GENERIC). Any thoughts as to where to go next?

Hmm.  That seemed the most plausible candidate when I looked at this.

Do you use quotas (there is one change in UFS related to quotas)?

There are 5 changes that involve sys/kern/vfs_bio.c in 8.2:
209459, 212229, 212562, 212583, and 213890.

Can you possibly test out kernels from stable/8 at those revisions on an
8.1
world and see if you can narrow it down futher?

Barring that, can you do a binary search of kernels from stable/8
between 8.1
and 8.2 on an 8.1 world to see which commit caused the change in write
performance?


Hi John, I'm working with Charles to narrow this down.

Looks like revision 212229 is the culprit, or at least around the same
time to it, if this change isn't what slowed things down. The change to
sys/kern/vfs_bio.c modifies some synchronization in dev_strategy():


Actually, hold that thought. I had a hunch that I wasn't thorough enough, so I decided to try 212228 — the performance is the same as with 212229, so vfs_bio seems to be out of the picture. I'm going to binary search between 209459 and 212229, and see what I find.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to