date:20080913

Re: [PERFORM] Effects of setting linux block device readahead size

2008-09-13 Thread david


On Fri, 12 Sep 2008, James Mansion wrote:


Scott Carey wrote:
Consumer drives will often read-ahead much more than server drives 
optimized for i/o per second.

...
The Linux readahead setting is _definitely_ in the kernel, definitely uses 
and fills the page cache, and from what I can gather, simply issues extra 
I/O's to the hardware beyond the last one requested by an app in certain 
situations.  It does not make your I/O request larger, it just queues an 
extra I/O following your request.
So ... fiddling with settings in Linux is going to force read-ahead, but the 
read-ahead data will hit the controller cache and the system buffers.


And the drives use their caches for cyclinder caching implicitly (maybe the 
SATA drives appear to preread more because the storage density per cylinder 
is higher?)..


But is there any way for an OS or application to (portably) ask SATA, SAS or 
SCSI drives to read ahead more (or less) than their default and NOT return 
the data to the controller?


I've never heard of such a thing, but I'm no expert in the command sets for 
any of this stuff.


I'm pretty sure that's not possible. the OS isn't supposed to even know 
the internals of the drive.


David Lang


James



On Thu, Sep 11, 2008 at 12:54 PM, James Mansion 
<[EMAIL PROTECTED] > wrote:


Greg Smith wrote:

The point I was trying to make there is that even under
impossibly optimal circumstances, you'd be hard pressed to
blow out the disk's read cache with seek-dominated data even
if you read a lot at each seek point.  That idea didn't make
it from my head into writing very well though.

Isn't there a bigger danger in blowing out the cache on the
controller and causing premature pageout of its dirty pages?

If you could get the readahead to work on the drive and not return
data to the controller, that might be dandy, but I'm sceptical.

James



-- Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance








--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Choosing a filesystem

2008-09-13 Thread david


On Fri, 12 Sep 2008, Merlin Moncure wrote:


On Fri, Sep 12, 2008 at 5:11 AM, Greg Smith <[EMAIL PROTECTED]> wrote:

On Fri, 12 Sep 2008, Guillaume Cottenceau wrote:

That's the main thing, and nothing else you can do will accelerate that.
Without a useful write cache (which usually means RAM with a BBU), you'll at
best get about 100-200 write transactions per second for any one client, and
something like 500/second even with lots of clients (queued up transaction
fsyncs do get combined).  Those numbers increase to several thousand per
second the minute there's a good caching controller in the mix.


While this is correct, if heavy writing is sustained, especially on
large databases, you will eventually outrun the write cache on the
controller and things will start to degrade towards the slow case.  So
it's fairer to say that caching raid controllers burst up to several
thousand per second, with a sustained write rate somewhat better than
write-through but much worse than the burst rate.

How fast things degrade from the burst rate depends on certain
factors...how big the database is relative to the o/s read cache in
the controller write cache, and how random the i/o is generally.  One
thing raid controllers are great at is smoothing bursty i/o during
checkpoints for example.

Unfortunately when you outrun cache on raid controllers the behavior
is not always very pleasant...in at least one case I've experienced
(perc 5/i) when the cache fills up the card decides to clear it before
continuing.  This means that if fsync is on, you get unpredictable
random freezing pauses while the cache is clearing.


although for postgres the thing that you are doing the fsync on is the WAL 
log file. that is a single (usually) contiguous file. As such it is very 
efficiant to write large chunks of it. so while you will degrade from the 
battery-only mode, the fact that the controller can flush many requests 
worth of writes out to the WAL log at once while you fill the cache with 
them one at a time is still a significant win.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Effects of setting linux block device readahead size

Re: [PERFORM] Choosing a filesystem

2 matches

Site Navigation

Mail list logo

Footer information