Re: [PERFORM] BBU Cache vs. spindles

2010-12-22 Thread Bruce Momjian
Bruce Momjian wrote: > Greg Smith wrote: > > Kevin Grittner wrote: > > > I assume that we send a full > > > 8K to the OS cache, and the file system writes disk sectors > > > according to its own algorithm. With either platters or BBU cache, > > > the data is persisted on fsync; why do you see a ri

Re: [PERFORM] BBU Cache vs. spindles

2010-12-01 Thread Bruce Momjian
Pierre C wrote: > > > Is that true? I have no idea. I thought everything was done at the > > 512-byte block level. > > Newer disks (2TB and up) can have 4k sectors, but this still means a page > spans several sectors. Yes, I had heard about that. -- Bruce Momjian http://momjian.

Re: [PERFORM] BBU Cache vs. spindles

2010-12-01 Thread Pierre C
Is that true? I have no idea. I thought everything was done at the 512-byte block level. Newer disks (2TB and up) can have 4k sectors, but this still means a page spans several sectors. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your

Re: [PERFORM] BBU Cache vs. spindles

2010-11-30 Thread Bruce Momjian
Greg Smith wrote: > Tom Lane wrote: > > You've got entirely too simplistic a view of what the "delta" might be, > > I fear. In particular there are various sorts of changes that involve > > inserting the data carried in the WAL record and shifting pre-existing > > data around to make room, or remo

Re: [PERFORM] BBU Cache vs. spindles

2010-11-30 Thread Bruce Momjian
Kevin Grittner wrote: > Greg Smith wrote: > > > I think Kevin's point here may be that if your fsync isn't > > reliable, you're always in trouble. But if your fsync is good, > > even torn pages should be repairable by the deltas written to the > > WAL > > I was actually just arguing that a BB

Re: [PERFORM] BBU Cache vs. spindles

2010-11-30 Thread Bruce Momjian
Greg Smith wrote: > Kevin Grittner wrote: > > I assume that we send a full > > 8K to the OS cache, and the file system writes disk sectors > > according to its own algorithm. With either platters or BBU cache, > > the data is persisted on fsync; why do you see a risk with one but > > not the other

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread david
On Fri, 29 Oct 2010, James Mansion wrote: Tom Lane wrote: Uh, no, it is not. The difference is that we can update a byte in a shared buffer, and know that it *isn't* getting written out before we Well, I don't know where yu got the idea I was refering to that sort of thing - its the same as

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread James Mansion
Tom Lane wrote: Uh, no, it is not. The difference is that we can update a byte in a shared buffer, and know that it *isn't* getting written out before we Well, I don't know where yu got the idea I was refering to that sort of thing - its the same as writing to a buffer before copying to the

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread david
On Fri, 29 Oct 2010, Robert Haas wrote: On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote: James Mansion writes: Tom Lane wrote: The other and probably worse problem is that there's no application control over how soon changes to mmap'd pages get to disk.  An msync will flush them out, but th

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Alvaro Herrera
Excerpts from Greg Smith's message of jue oct 21 14:04:17 -0300 2010: > What I would like to do is beef up the documentation with some concrete > examples of how to figure out if your cache and associated write path > are working reliably or not. It should be possible to include "does > this h

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Robert Haas
On Fri, Oct 29, 2010 at 11:56 AM, Aidan Van Dyk wrote: > 1) The pages you write to must be in the page cache, or your memcpy is > going to fault them in.  With a plain write, you don't need the > over-written page in the cache. I seem to remember a time many years ago when I got bitten by this pr

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Tom Lane
Robert Haas writes: > On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote: >> It's true that we don't know whether write() causes an immediate or >> delayed disk write, but we generally don't care that much.  What we do >> care about is being able to ensure that a WAL write happens before the >> data

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Aidan Van Dyk
On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas wrote: > Well, we COULD keep the data in shared buffers, and then copy it into > an mmap()'d region rather than calling write(), but I'm not sure > there's any advantage to it.  Managing address space mappings is a > pain in the butt. I could see thi

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Robert Haas
On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote: > James Mansion writes: >> Tom Lane wrote: >>> The other and probably worse problem is that there's no application >>> control over how soon changes to mmap'd pages get to disk.  An msync >>> will flush them out, but the kernel is free to write dir

Re: [PERFORM] BBU Cache vs. spindles

2010-10-28 Thread Tom Lane
James Mansion writes: > Tom Lane wrote: >> The other and probably worse problem is that there's no application >> control over how soon changes to mmap'd pages get to disk. An msync >> will flush them out, but the kernel is free to write dirty pages sooner. >> So if they're depending for consiste

Re: [PERFORM] BBU Cache vs. spindles

2010-10-28 Thread James Mansion
Tom Lane wrote: The other and probably worse problem is that there's no application control over how soon changes to mmap'd pages get to disk. An msync will flush them out, but the kernel is free to write dirty pages sooner. So if they're depending for consistency on writes not happening until m

Re: [PERFORM] BBU Cache vs. spindles

2010-10-27 Thread Rob Wultsch
On Wed, Oct 27, 2010 at 6:55 PM, Robert Haas wrote: > On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wrote: >> On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote: >>> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote: The double write buffer is one of the few areas where InnoDB does more

Re: [PERFORM] BBU Cache vs. spindles

2010-10-27 Thread Robert Haas
On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wrote: > On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote: >> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote: >>> The double write buffer is one of the few areas where InnoDB does more >>> IO (in the form of fsynch's) than PG. InnoDB also has

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Rob Wultsch
On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote: > On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote: >> The double write buffer is one of the few areas where InnoDB does more >> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy >> checkpoints (which help to keep dirty pages in mem

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Robert Haas
On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote: > The double write buffer is one of the few areas where InnoDB does more > IO (in the form of fsynch's) than PG. InnoDB also has fuzzy > checkpoints (which help to keep dirty pages in memory longer), > buffering of writing out changes to seconda

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Rob Wultsch
On Tue, Oct 26, 2010 at 5:41 AM, Robert Haas wrote: > On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner > wrote: >> Rob Wultsch wrote: >> >>> I would think full_page_writes=off + double write buffer should be >>> far superior, particularly given that the WAL is shipped over the >>> network to slav

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Robert Haas
On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner wrote: > Rob Wultsch wrote: > >> I would think full_page_writes=off + double write buffer should be >> far superior, particularly given that the WAL is shipped over the >> network to slaves. > > For a reasonably brief description of InnoDB double wr

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Scott Carey
On Oct 22, 2010, at 1:06 PM, Rob Wultsch wrote: > On Fri, Oct 22, 2010 at 12:05 PM, Kevin Grittner > wrote: >> Rob Wultsch wrote: >> >>> I would think full_page_writes=off + double write buffer should be >>> far superior, particularly given that the WAL is shipped over the >>> network to slave

Re: [PERFORM] BBU Cache vs. spindles

2010-10-24 Thread Tom Lane
Greg Smith writes: > James Mansion wrote: >> When I looked at the internals of TokyoCabinet for example, the design >> was flawed but >> would be 'fairly robust' so long as mmap'd pages that were dirtied did >> not get persisted >> until msync, and were then persisted atomically. > If TokyoCabi

Re: [PERFORM] BBU Cache vs. spindles

2010-10-24 Thread Greg Smith
Jesper Krogh wrote: Can you point to some ZFS docs that tell that this is the case.. I'd be surprised if it doesnt copy away the old block and replaces it with the new one in-place. The other behaviour would quite quickly lead to a hugely fragmented filesystem that performs next to useless an

Re: [PERFORM] BBU Cache vs. spindles

2010-10-24 Thread Greg Smith
James Mansion wrote: When I looked at the internals of TokyoCabinet for example, the design was flawed but would be 'fairly robust' so long as mmap'd pages that were dirtied did not get persisted until msync, and were then persisted atomically. If TokyoCabinet presumes that's true and overwri

Re: [PERFORM] BBU Cache vs. spindles

2010-10-24 Thread James Mansion
Kevin Grittner wrote: On what do you base that assumption? I assume that we send a full 8K to the OS cache, and the file system writes disk sectors according to its own algorithm. With either platters or BBU cache, the data is persisted on fsync; why do you see a risk with one but not the other

Re: [PERFORM] BBU Cache vs. spindles

2010-10-23 Thread Josh Berkus
Even if it's possible, it's far from clear to me that it would be an improvement. The author estimates (apparently somewhat loosely) that it's a 5% to 10% performance hit in InnoDB; I'm far from certain that full_page_writes cost us that much. Does anyone have benchmark numbers handy? It mos

Re: [PERFORM] BBU Cache vs. spindles

2010-10-23 Thread Kevin Grittner
Rob Wultsch wrote: > I really would like to work with PG more and this seems like > [full_page_writes] would be a significant hindrance for certain > usage patterns. Lots of replication does not take place over gig... Certainly most of the Wisconsin State Courts replication takes place over WA

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 1:15 PM, Kevin Grittner wrote: > Rob Wultsch wrote: > >> not needing full_page_writes would make geographically dispersed >> replication possible for certain cases where it is not currently >> (or at least rather painful). > > Do you have any hard numbers on WAL file size

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Kevin Grittner
Rob Wultsch wrote: > not needing full_page_writes would make geographically dispersed > replication possible for certain cases where it is not currently > (or at least rather painful). Do you have any hard numbers on WAL file size impact? How much does pglesslog help in a file-based WAL trans

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 12:05 PM, Kevin Grittner wrote: > Rob Wultsch wrote: > >> I would think full_page_writes=off + double write buffer should be >> far superior, particularly given that the WAL is shipped over the >> network to slaves. > > For a reasonably brief description of InnoDB double w

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Kevin Grittner
Rob Wultsch wrote: > I would think full_page_writes=off + double write buffer should be > far superior, particularly given that the WAL is shipped over the > network to slaves. For a reasonably brief description of InnoDB double write buffers, I found this: http://www.mysqlperformanceblog.co

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 10:28 AM, Kevin Grittner wrote: > Rob Wultsch wrote: > >> has PG considered using a double write buffer similar to InnodB? > > That seems inferior to the full_page_writes strategy, where you only > write a page twice the first time it is written after a checkpoint. > We're

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Kevin Grittner
Rob Wultsch wrote: > has PG considered using a double write buffer similar to InnodB? That seems inferior to the full_page_writes strategy, where you only write a page twice the first time it is written after a checkpoint. We're talking about when we might be able to write *less*, not more.

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 8:37 AM, Greg Smith wrote: > Tom Lane wrote: >> >> You've got entirely too simplistic a view of what the "delta" might be, >> I fear.  In particular there are various sorts of changes that involve >> inserting the data carried in the WAL record and shifting pre-existing >>

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Jesper Krogh
On 2010-10-22 17:37, Greg Smith wrote: I think that most people who have thought they were safe to turn off full_page_writes in the past did so because they believed they were in category (1) here. I've never advised anyone to do that, because it's so difficult to validate the truth of. Jus

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Greg Smith
Tom Lane wrote: You've got entirely too simplistic a view of what the "delta" might be, I fear. In particular there are various sorts of changes that involve inserting the data carried in the WAL record and shifting pre-existing data around to make room, or removing an item and moving remaining

Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Kevin Grittner
Greg Smith wrote: > I think Kevin's point here may be that if your fsync isn't > reliable, you're always in trouble. But if your fsync is good, > even torn pages should be repairable by the deltas written to the > WAL I was actually just arguing that a BBU doesn't eliminate a risk here; if th

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Greg Smith
Kevin Grittner wrote: With either platters or BBU cache, the data is persisted on fsync; why do you see a risk with one but not the other Forgot to address this part. The troublesome sequence if you don't have a BBU is: 1) WAL data is written to the OS cache 2) PG calls fsync 3) Data is tra

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Tom Lane
Greg Smith writes: > At this point, you now have a torn 8K page, with 1/2 old and 1/2 new > data. Right. > Without a full page write in the WAL, is it always possible to > restore its original state now? In theory, I think you do. Since the > delta in the WAL should be overwriting all of th

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Greg Smith
Kevin Grittner wrote: I assume that we send a full 8K to the OS cache, and the file system writes disk sectors according to its own algorithm. With either platters or BBU cache, the data is persisted on fsync; why do you see a risk with one but not the other I'd like a 10 minute argument pleas

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Andres Freund
On Thursday 21 October 2010 21:42:06 Kevin Grittner wrote: > Bruce Momjian wrote: > > I assume we send a full 8k to the controller, and a failure during > > that write is not registered as a write. > > On what do you base that assumption? I assume that we send a full > 8K to the OS cache, and th

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote: > Bruce Momjian wrote: > > > I assume we send a full 8k to the controller, and a failure during > > that write is not registered as a write. > > On what do you base that assumption? I assume that we send a full > 8K to the OS cache, and the file system writes disk sector

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Kevin Grittner
Bruce Momjian wrote: > I assume we send a full 8k to the controller, and a failure during > that write is not registered as a write. On what do you base that assumption? I assume that we send a full 8K to the OS cache, and the file system writes disk sectors according to its own algorithm. W

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote: > Bruce Momjian wrote: > > > If the write fails to the controller, the page is not flushed and > > PG does not continue. If the write fails, the fsync never > > happens, and hence PG stops. > > PG stops? This case at issue is when the OS crashes or the plug is > pulled

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Kevin Grittner
Bruce Momjian wrote: > If the write fails to the controller, the page is not flushed and > PG does not continue. If the write fails, the fsync never > happens, and hence PG stops. PG stops? This case at issue is when the OS crashes or the plug is pulled in the middle of writing a page. I do

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote: > Greg Smith wrote: > > Kevin Grittner wrote: > > >> So you're confident that an 8kB write to the controller will not > >> be done as a series of smaller atomic writes by the OS file > >> system? > > > > Sure, that happens. But if the BBU has gotten an fsync call after >

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Kevin Grittner
Greg Smith wrote: > Kevin Grittner wrote: >> So you're confident that an 8kB write to the controller will not >> be done as a series of smaller atomic writes by the OS file >> system? > > Sure, that happens. But if the BBU has gotten an fsync call after > the 8K write, it shouldn't return suc

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Greg Smith
Kevin Grittner wrote: Bruce Momjian wrote: full_page_writes is designed to guard against a partial write to a device. I don't think the raid cache can be partially written to So you're confident that an 8kB write to the controller will not be done as a series of smaller atomic wri

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Kevin Grittner
Bruce Momjian wrote: > full_page_writes is designed to guard against a partial write to a > device. I don't think the raid cache can be partially written to So you're confident that an 8kB write to the controller will not be done as a series of smaller atomic writes by the OS file system? -

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Greg Smith
Bruce Momjian wrote: With a BBU you can turn off full_page_writes, which should decrease the WAL traffic. However, I don't see this mentioned in our documentation. Should I add it? What I would like to do is beef up the documentation with some concrete examples of how to figure out if you

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Bruce Momjian
Kevin Grittner wrote: > Bruce Momjian wrote: > > > With a BBU you can turn off full_page_writes > > My understanding is that that is not without risk. What happens if > the WAL is written, there is a commit, but the data page has not yet > been written to the controller? Don't we still have

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Kevin Grittner
Bruce Momjian wrote: > With a BBU you can turn off full_page_writes My understanding is that that is not without risk. What happens if the WAL is written, there is a commit, but the data page has not yet been written to the controller? Don't we still have a torn page? -Kevin -- Sent via

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Bruce Momjian
Scott Marlowe wrote: > On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake > wrote: > > On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote: > >> Ben Chobot wrote: > >> > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: > >> > > >> > > I'm weighing options for a new server. In addition to Postgr

Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Steve Crawford
On 10/20/2010 09:45 PM, Scott Marlowe wrote: On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake wrote: On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote: Ben Chobot wrote: On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: I'm weighing options for a new server.

Re: [PERFORM] BBU Cache vs. spindles

2010-10-20 Thread Scott Marlowe
On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake wrote: > On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote: >> Ben Chobot wrote: >> > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: >> > >> > > I'm weighing options for a new server. In addition to PostgreSQL, this >> > > machine will hand

Re: [PERFORM] BBU Cache vs. spindles

2010-10-20 Thread Joshua D. Drake
On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote: > Ben Chobot wrote: > > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: > > > > > I'm weighing options for a new server. In addition to PostgreSQL, this > > > machine will handle some modest Samba and Rsync load. > > > > > > I will have en

Re: [PERFORM] BBU Cache vs. spindles

2010-10-20 Thread Bruce Momjian
Ben Chobot wrote: > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: > > > I'm weighing options for a new server. In addition to PostgreSQL, this > > machine will handle some modest Samba and Rsync load. > > > > I will have enough RAM so the virtually all disk-read activity will be > > cached.

Re: [PERFORM] BBU Cache vs. spindles

2010-10-08 Thread Ben Chobot
On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote: > I'm weighing options for a new server. In addition to PostgreSQL, this > machine will handle some modest Samba and Rsync load. > > I will have enough RAM so the virtually all disk-read activity will be > cached. The average PostgreSQL read act