Re: [GENERAL] (Relatively) Oversized Checkpoint

Nathaniel Talbott Wed, 18 Jun 2014 07:36:13 -0700

> What is the sustained volume of disk output you experience, for
> example from vmstat snapshots?


Unfortunately our metrics are... less than comprehensive at this point, so
I don't have data for that at the moment. Going to work to rectify, but
will take some time.


> 10% of 8GB (which is sounds like the mount of of RAM you have) is
> nearly a gig, which is a lot of dirty data to keep around.  I'd use
> dirty_background_bytes to set it, as it give finer control than
> dirty_background_ratio does, and set it maybe 100MB (but depends on
> your IO system)

After realizing that often (but not always) these events seem to happen a
few minutes after an auto vacuum run, I'm inclined to agree. I'm trying
100Mb for now; we'll see overnight whether that helps.


> Series 3 kernels before 3.8 seem to have a poor (but vague) reputation
> for running database-type workloads.

Super helpful to know. Getting the OS upgraded on these boxes is on the
short list anyhow, and I'll make sure we end up on at least 3.8 by the time
we're done.


> How many drives of what RPM in what RAID configuration, and does the
> RAID controller have BBU.  I personally don't know from drivers, but
> other people on the list do, so that info could be useful as well.

    /proc/driver/cciss$ cat cciss0
    cciss0: HP Smart Array P400i Controller
    Board ID: X
    Firmware Version: 7.22
    IRQ: 64
    Logical drives: 1
    Current Q depth: 0
    Current # commands on controller: 0
    Max Q depth since init: 191
    Max # commands on controller since init: 310
    Max SG entries since init: 128
    Sequential access devices: 0

    cciss/c0d0:  127.99GB RAID 1(1+0)

More:

      ~$ sudo hpacucli controller slot=0 show

      Smart Array P400i in Slot 0 (Embedded)
         Bus Interface: PCI
         Slot: 0
         Serial Number: X
         Cache Serial Number: X
         RAID 6 (ADG) Status: Disabled
         Controller Status: OK
         Hardware Revision: D
         Firmware Version: 7.22
         Rebuild Priority: Medium
         Expand Priority: Medium
         Surface Scan Delay: 15 secs
         Surface Scan Mode: Idle
         Wait for Cache Room: Disabled
         Surface Analysis Inconsistency Notification: Disabled
         Post Prompt Timeout: 0 secs
         Cache Board Present: True
         Cache Status: OK
         Cache Ratio: 100% Read / 0% Write
         Drive Write Cache: Disabled
         Total Cache Size: 256 MB
         Total Cache Memory Available: 208 MB
         No-Battery Write Cache: Disabled
         Battery/Capacitor Count: 0
         SATA NCQ Supported: True

Physical drives:

   ~$ sudo hpacucli controller slot=0 pd all show detail

   Smart Array P400i in Slot 0 (Embedded)

      array A

         physicaldrive 1I:1:1
            Port: 1I
            Box: 1
            Bay: 1
            Status: OK
            Drive Type: Data Drive
            Interface Type: SATA
            Size: 128.0 GB
            Firmware Revision: DXM04B0Q
            Serial Number: X
            Model: ATA     Samsung SSD 840
            SATA NCQ Capable: True
            SATA NCQ Enabled: True
            Current Temperature (C): 23
            Maximum Temperature (C): 70
            PHY Count: 1
            PHY Transfer Rate: 1.5Gbps

         physicaldrive 1I:1:2
            Port: 1I
            Box: 1
            Bay: 2
            Status: OK
            Drive Type: Data Drive
            Interface Type: SATA
            Size: 128.0 GB
            Firmware Revision: DXM04B0Q
            Serial Number: X
            Model: ATA     Samsung SSD 840
            SATA NCQ Capable: True
            SATA NCQ Enabled: True
            Current Temperature (C): 21
            Maximum Temperature (C): 70
            PHY Count: 1
            PHY Transfer Rate: 1.5Gbps

--
Nathaniel


On Mon, Jun 16, 2014 at 5:01 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Mon, Jun 16, 2014 at 12:40 PM, Nathaniel Talbott
> <nathan...@spreedly.com> wrote:
> >> So the fsync (by the kernel) of a single file took 64 seconds.  This
> >> single event explains almost all of the overrun and the performance
> >> problems.
> >
> > So the 10x number of buffers being written is a red herring and/or
> > correlates with a slow fsync?
>
> Two theories there:
>
> Either the uniformly high level of activity is not actually uniform
> and it was extra high during that checkpoint.
>
> Or, much of the writing that happens is normally being done by the
> background writer rather than the checkpoint writer. Severe IO
> problems caused the background writer to freeze up or slow down,
> leaving the work to be done by the checkpoint writer instead.  Once
> the checkpoint writer hits the fsync phase of its work, it then gets
> blocked by the same IO problems that were affecting the background
> writer.  So in this theory, the shift of the work load from background
> writer to checkpoint writer is just a symptom of the IO problems.
>
> Unfortunately there is little good way to assess those theories,
> unless you have a lot of logging data for the period of interest, like
> "sar" or "vmstat" or have been running "select * from
> pg_stat_bgwriter" periodically and saving the results.
>
> What is the sustained volume of disk output you experience, for
> example from vmstat snapshots?
>
> >
> >
> >> Lowering /proc/sys/vm/dirty_background_ratio and
> >> /proc/sys/vm/dirty_background_bytes
> >
> > I've never tweaked dirty_background_* before; currently we have:
> >
> >     $ cat /proc/sys/vm/dirty_background_ratio
> >     10
> >     $ cat /proc/sys/vm/dirty_background_bytes
> >     0
> >
> > Where should I start with tweaking those?
>
> 10% of 8GB (which is sounds like the mount of of RAM you have) is
> nearly a gig, which is a lot of dirty data to keep around.  I'd use
> dirty_background_bytes to set it, as it give finer control than
> dirty_background_ratio does, and set it maybe 100MB (but depends on
> your IO system)
>
> >
> >
> >> not using ext3 for the file system
> >
> > What are the recommended alternatives? We're using ext4 currently.
>
> ext3 and ext4 are the only thing I've used personally for production
> databases.  I think ext4 mostly fixed the problems that ext3 had.
> Some people like the XFS or JFS but I have no experience with them.
>
> Since you aren't using the dreaded ext3, I wouldn't worry about that
> aspect.
>
> >> are the clearest I know of, but people have also reported that
> >> lowering shared_buffers can help.
> >
> > Currently we have:
> >
> >     => SHOW shared_buffers;
> >      shared_buffers
> >      ----------------
> >      2GB
> >     (1 row)
> >
> > That seems to match up with tuning recommendations I've read (25% of
> > available RAM). And of course generally it seems to work great.
>
> People who reported problems with large shared_buffers usually had
> huge amounts of RAM, so that 25% of it was far larger than 2GB.  So
> probably nothing for you to worry about.
>
> >
> >
> >> This is fundamentally a kernel/FS/IO issue.  What are the current
> >> settings for those, and what kernel and file system, and IO subsystem
> >> are you using?
> >
> > Kernel: Linux 3.2.0-51-generic #77-Ubuntu SMP x86_64
> > Filesystem: ext4
>
> Series 3 kernels before 3.8 seem to have a poor (but vague) reputation
> for running database-type workloads.
>
> >
> > Not sure what you're asking RE IO subsystem - do you mean
> drivers/hardware?
>
> How many drives of what RPM in what RAID configuration, and does the
> RAID controller have BBU.  I personally don't know from drivers, but
> other people on the list do, so that info could be useful as well.
>
> Cheers,
>
> Jeff
>

Re: [GENERAL] (Relatively) Oversized Checkpoint

Reply via email to