Charles Owens
Great Bay Software, Inc.
v: 603.617.4844   m: 603.866.0860

On 6/22/12 10:22 AM, John Baldwin wrote:
On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
On 6/15/12 8:04 AM, John Baldwin wrote:
On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
Hello FreeBSD folk,

We're seeing what appears to be a storage performance regression as we
try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
appears that the regression happened between 8.1 and 8.2.

Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
configuration, using UFS + geom_journal for filesystem.

Postgresql performance, as seen via pgbench, dropped by approx 20%.
This testing was done with our usual PAE-enabled kernels.  We then went
back to GENERIC kernels and did comparisons using "bonnie", results
below.  Following that is a kernel boot log.

Notably, we're seeing this regression only with our RAID mfi(4) based
systems.  Notably, from looking at FreeBSD source changelogs it appears
that the mfi(4) code has seen some changes since 8.1.
Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
made to sys/dev/mfi were to add a new constant:

svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
Index: mfireg.h
===================================================================
--- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
+++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
@@ -975,7 +975,9 @@
          MFI_PD_STATE_OFFLINE = 0x10,
          MFI_PD_STATE_FAILED = 0x11,
          MFI_PD_STATE_REBUILD = 0x14,
-       MFI_PD_STATE_ONLINE = 0x18
+       MFI_PD_STATE_ONLINE = 0x18,
+       MFI_PD_STATE_COPYBACK = 0x20,
+       MFI_PD_STATE_SYSTEM = 0x40
   };
union mfi_ld_ref {

The difference in write performance must be due to something else.  You
mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I
wonder if this is related:

------------------------------------------------------------------------
r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines

MFC 212160:

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

   o In bioq_disksort(), an added bio could be inserted at the head of
     the queue, even when a barrier was present, if the sort key for
     the new entry was less than that of the last queued barrier bio.

   o The last_offset used to generate the sort key for newly queued bios
     did not stay at the position of the barrier until either the
     barrier was de-queued, or a new barrier (which updates last_offset)
     was queued.  When a barrier is in effect, we know that the disk
     will pass through the barrier position just before the
     "blocked bios" are released, so using the barrier's offset for
     last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
          o Update last_offset in bioq_insert_tail().

          o Only update last_offset in bioq_remove() if the removed bio is
            at the head of the queue (typically due to a call via
            bioq_takefirst()) and no barrier is active.

          o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
            set prev to the barrier and cur to it's next element.  Now that
            last_offset is kept at the barrier position, this change isn't
            strictly necessary, but since we have to take a decision branch
            anyway, it does avoid one, no-op, loop iteration in the while
            loop that immediately follows.

          o In bioq_disksort(), bypass the normal sort for bios with the
            BIO_ORDERED attribute and instead insert them into the queue
            with bioq_insert_tail().  bioq_insert_tail() not only gives
            the desired command order during insertion, but also provides
            barrier semantics so that commands disksorted in the future
            cannot pass the just enqueued transaction.

sys/sys/bio.h:
          Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
          Use an ordered command for SCSI/ATA-NCQ commands issued in
          response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
          Use an ordered tag when issuing a synchronize cache command.

          Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
          Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:   Spectra Logic Corporation
------------------------------------------------------------------------

Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line
changed in geom_io.c in 8.2?  That would be effectively reverting this
portion of the diff:

Index: geom_io.c
===================================================================
--- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
+++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
@@ -265,6 +265,7 @@
          g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
          bp = g_alloc_bio();
          bp->bio_cmd = BIO_FLUSH;
+       bp->bio_flags |= BIO_ORDERED;
          bp->bio_done = NULL;
          bp->bio_attribute = NULL;
          bp->bio_offset = cp->provider->mediasize;

John... thanks for the suggestion.  I've built and tested a kernel with
this change made.  Result:  no change (same performance as with
8.2-GENERIC). Any thoughts as to where to go next?
Hmm.  That seemed the most plausible candidate when I looked at this.

Do you use quotas (there is one change in UFS related to quotas)?

There are 5 changes that involve sys/kern/vfs_bio.c in 8.2:
209459, 212229, 212562, 212583, and 213890.

Can you possibly test out kernels from stable/8 at those revisions on an 8.1
world and see if you can narrow it down futher?

Barring that, can you do a binary search of kernels from stable/8 between 8.1
and 8.2 on an 8.1 world to see which commit caused the change in write
performance?


I've been sidetracked for a bit... and am now starting to work through the revisions you've suggested. Ahead of that, I've reinstalled the system with filesystems configured *without* geom_journal and repeated tests with both 8.1 and 8.2 kernels. Finding: no issue -- in fact 8.2 performs slightly better than 8.1, as we might hope for in general. (bonnie output below) Not sure if this really helps us narrow things down, but geom_journal is certainly part of the story.

I'll have more results in the next day or so.

8.1 GENERIC
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 196050 99.9 344414 43.4 535679 43.2 152666 99.7 3233854 100.0 
251145.9 215.2

8.2 GENERIC
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 190687 97.8 436428 43.0 537990 42.9 155766 98.3 4268089 100.0 
231347.6 240.6


_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to