Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-27 Thread Josh Berkus
Simon, > I guess I'd be concerned that the poor bgwriter can't do all of this > work. I was thinking about a separate log writer, so we could have both > bgwriter and logwriter active simultaneously on I/O. It has taken a > while to get bgwriter to perform its duties efficiently, so I'd rather > n

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-27 Thread Simon Riggs
On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote: > Josh Berkus writes: > >> We should run tests with much higher wal_buffers numbers to nullify the > >> effect described above and reduce contention. That way we will move > >> towards the log disk speed being the limiting factor, patch or no patc

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Josh Berkus
Tom, > I have no idea whether the DBT benchmarks would benefit at all, but > given that they are affected positively by increasing wal_buffers, > they must have a fair percentage of not-small transactions. Even if they don't, we'll have series tests for DW here at GreenPlum soon, and I'll bet th

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Tom Lane
Josh Berkus writes: >> We should run tests with much higher wal_buffers numbers to nullify the >> effect described above and reduce contention. That way we will move >> towards the log disk speed being the limiting factor, patch or no patch. > I've run such tests, at a glance they do seem to impr

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Josh Berkus
Simon, > We should run tests with much higher wal_buffers numbers to nullify the > effect described above and reduce contention. That way we will move > towards the log disk speed being the limiting factor, patch or no patch. I've run such tests, at a glance they do seem to improve performance.

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Simon Riggs
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote: > Hmm. Eyeballing the NOTPM trace for cases 302912 and 302909, it sure > looks like the post-checkpoint performance recovery is *slower* in > the latter. And why is 302902 visibly slower overall than 302905? > I thought for a bit that you had got

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-25 Thread Mark Wong
On Fri, 22 Jul 2005 19:11:36 -0400 Tom Lane <[EMAIL PROTECTED]> wrote: > BTW, I'd like to look at 302906, but its [Details] link is broken. Ugh, I tried digging onto the internal systems and it looks like they were destroyed (or not saved) somehow. It'll have to be rerun. Sorry... Mark --

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, > There's something awfully weird going on here. I was prepared to see > no statistically-significant differences, but not multiple cases that > seem to be going the "wrong direction". There's a lot of variance in the tests. I'm currently running a variance test battery on one machine to

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus writes: >> Um, where are the test runs underlying this spreadsheet? I don't have a >> whole lot of confidence in looking at full-run average TPM numbers to >> discern whether transient dropoffs in TPM are significant or not. > Web in the form of: > http://khack.osdl.org/stp/#test_nu

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > For any benchmarking to be meaningful you have to set the checkpoint interval > to something more realistic. Something like 5 minutes. That way when the final > checkpoint cycle isn't completely included in the timing data you'll at least > be missing a stat

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, > Um, where are the test runs underlying this spreadsheet? I don't have a > whole lot of confidence in looking at full-run average TPM numbers to > discern whether transient dropoffs in TPM are significant or not. Web in the form of: http://khack.osdl.org/stp/#test_number#/ Where #test_nu

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Greg, > For any benchmarking to be meaningful you have to set the checkpoint > interval to something more realistic. Something like 5 minutes. That way > when the final checkpoint cycle isn't completely included in the timing > data you'll at least be missing a statistically insignificant portion

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus writes: > Bruce, >> Did you test with full_page_writes on and off? > I didn't use your full_page_writes version because Tom said it was > problematic. This is CVS from July 3rd. We already know the results: should be equivalent to the hack Josh tried first. So what we know at thi

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus writes: > Looks like the CRC calculation work isn't the issue. I did test runs of > no-CRC vs. regular DBT2 with different checkpoint timeouts, and didn't > discern any statistical difference. See attached spreadsheet chart (the > two different runs are on two different machines).

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Greg Stark
Josh Berkus writes: > I think this test run http://khack.osdl.org/stp/302903/results/0/, with a > 30-min checkpoint shows pretty clearly that the behavior of the > performance drop is consistent with needing to "re-prime" the WAL will > full page images. Each checkpoint drops performance a

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Josh Berkus wrote: > Bruce, > > > I think we need those tests run. > > Sure. What CVS day should I grab? What's the option syntax? ( -c > full_page_writes=false)? Yes. You can grab any from the day Tom fixed it, which was I think two weeks ago. > I have about 20 tests in queue right no

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Bruce, > I think we need those tests run. Sure. What CVS day should I grab? What's the option syntax? ( -c full_page_writes=false)? I have about 20 tests in queue right now but can stack yours up behind them. -- --Josh Josh Berkus Aglio Database Solutions San Francisco -

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Josh Berkus wrote: > Bruce, > > > Did you test with full_page_writes on and off? > > I didn't use your full_page_writes version because Tom said it was > problematic. This is CVS from July 3rd. I think we need those tests run. -- Bruce Momjian| http://candle.pha.p

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Bruce, > Did you test with full_page_writes on and off? I didn't use your full_page_writes version because Tom said it was problematic. This is CVS from July 3rd. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)-

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Did you test with full_page_writes on and off? --- Josh Berkus wrote: > Tom, > > > This will remove just the CRC calculation work associated with backed-up > > pages. ?Note that any attempt to recover from the WAL will fail

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, > This will remove just the CRC calculation work associated with backed-up > pages.  Note that any attempt to recover from the WAL will fail, but I > assume you don't need that for the purposes of the test run. Looks like the CRC calculation work isn't the issue. I did test runs of no-CRC

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-21 Thread Josh Berkus
Tom, > Josh, I see that all of those runs seem to be using wal_buffers = 8. > Have you tried materially increasing wal_buffers (say to 100 or 1000) > and/or experimenting with different wal_sync_method options since we > fixed the bufmgrlock problem? I am wondering if the real issue is > WAL buff

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-19 Thread Tom Lane
Josh Berkus writes: > So, now that we know what the performance bottleneck is, how do we fix it? Josh, I see that all of those runs seem to be using wal_buffers = 8. Have you tried materially increasing wal_buffers (say to 100 or 1000) and/or experimenting with different wal_sync_method options s

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-16 Thread Kevin Brown
Bruce Momjian wrote: > > I don't think our problem is partial writes of WAL, which we already > check, but heap/index page writes, which we currently do not check for > partial writes. Hmm...I've read through the thread again and thought about the problem further, and now think I understand what

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-16 Thread Bruce Momjian
I don't think our problem is partial writes of WAL, which we already check, but heap/index page writes, which we currently do not check for partial writes. --- Kevin Brown wrote: > Tom Lane wrote: > > Simon Riggs <[EMAIL PRO

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-15 Thread Kevin Brown
Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I don't think we should care too much about indexes. We can rebuild > > them...but losing heap sectors means *data loss*. > > If you're so concerned about *data loss* then none of this will be > acceptable to you at all. We are talking

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Josh Berkus
Simon, Tom, > > Will do. Results in a few days. Actually, between the bad patch on the 5th and ongoing STP issues, I don't think I will have results before I leave town.Will e-mail you offlist to give you info to retrieve results. > Any chance you'd be able to do this with > > ext3 and a

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 09:34 +0200, Zeugswetter Andreas DAZ SD wrote: > >>> The point here is that fsync-off is only realistic for development > or > >>> playpen installations. You don't turn it off in a production > >>> machine, and I can't see that you'd turn off the full-page-write > >>> opti

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I don't think we should care too much about indexes. We can rebuild > > them...but losing heap sectors means *data loss*. > > If you're so concerned about *data loss* then none of this will be > accept

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-09 Thread Hannu Krosing
On R, 2005-07-08 at 14:45 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I don't think we should care too much about indexes. We can rebuild > > them...but losing heap sectors means *data loss*. There might be some merit in idea to disabling WAL/PITR for indexes, where one ca

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > I don't think we should care too much about indexes. We can rebuild > them...but losing heap sectors means *data loss*. If you're so concerned about *data loss* then none of this will be acceptable to you at all. We are talking about going from a system t

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Simon Riggs
On Fri, 2005-07-08 at 09:47 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Having raised that objection, ISTM that checking for torn pages can be > > accomplished reasonably well using a few rules... > > I have zero confidence in this; the fact that you can think of > (incomp

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Josh Berkus
Tom, > Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT > and just now --- Bruce's full_page_writes patch introduced a large > random negative component into the timing ... Ach. Starting over, then. --Josh -- Josh Berkus Aglio Database Solutions San Francisco -

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Heikki Linnakangas
On Thu, 7 Jul 2005, Tom Lane wrote: We still don't know enough about the situation to know what a solution might look like. Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs, or the extra I/O cost, or excessive locking of the WAL-related data structures while we do this stuff

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > Is there also a potential showstopper in the redo machinery? We work on > the assumption that the post-checkpoint block is available in WAL as a > before image. Redo for all actions merely replay the write action again > onto the block. If we must reapply t

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Dawid Kuroczko
On 7/7/05, Bruce Momjian wrote: > One idea would be to just tie its behavior directly to fsync and remove > the option completely (that was the original TODO), or we can adjust it > so it doesn't have the same risks as fsync, or the same lack of failure > reporting as fsync. I wonder about one th

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Simon Riggs
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian writes: > > > Tom Lane wrote: > > >> The point here is that fsync-off is only realistic for development > > >> or playpen installations. You don't turn it off in a production > > >> machine, and I can't se

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Zeugswetter Andreas DAZ SD
>>> The point here is that fsync-off is only realistic for development or >>> playpen installations. You don't turn it off in a production >>> machine, and I can't see that you'd turn off the full-page-write >>> option either. So we have not solved anyone's performance problem. > >> Yes, thi

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Josh Berkus writes: >> If so, please undo the previous patch (which disabled page dumping >> entirely) and instead try removing this block of code, starting >> at about xlog.c line 620 in CVS tip: > Will do. Results in a few days. Great. BTW, don't bother testing snapshots between 2005/07/05 2

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Kenneth Marshall
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote: > Greg Stark <[EMAIL PROTECTED]> writes: > > Tom Lane <[EMAIL PROTECTED]> writes: > >> What we *could* do is calculate a page-level CRC and > >> store it in the page header just before writing out. Torn pages > >> would then manifest as a w

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Josh Berkus
Tom, > Josh, is OSDL up enough that you can try another comparison run? Thankfully, yes. > If so, please undo the previous patch (which disabled page dumping > entirely) and instead try removing this block of code, starting > at about xlog.c line 620 in CVS tip: Will do. Results in a few days.

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote: > On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: > > Well, I added #1 yesterday as 'full_page_writes', and it has the same > > warnings as fsync (namely, on crash, be prepared to recovery or check > > your system thoroughly. > > Yes, which is why I comment now that the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Joshua D. Drake wrote: > > >>Just to make my position perfectly clear: I don't want to see this > >>option shipped in 8.1. It's reasonable to have it in there for now > >>as an aid to our performance investigations, but I don't see that it > >>has any value for production. > > > > > > Well, thi

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Joshua D. Drake
Just to make my position perfectly clear: I don't want to see this option shipped in 8.1. It's reasonable to have it in there for now as an aid to our performance investigations, but I don't see that it has any value for production. Well, this is the first I am hearing that, and of course yo

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian writes: > > Tom Lane wrote: > >> The point here is that fsync-off is only realistic for development > >> or playpen installations. You don't turn it off in a production > >> machine, and I can't see that you'd turn off the full-page-write > >> option either. So we

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian writes: > Tom Lane wrote: >> The point here is that fsync-off is only realistic for development >> or playpen installations. You don't turn it off in a production >> machine, and I can't see that you'd turn off the full-page-write >> option either. So we have not solved anyone's pe

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote: > Bruce Momjian writes: > > As far as #2, my posted proposal was to write the full pages to WAL when > > they are written to the file system, and not when they are first > > modified in the shared buffers --- > > That is *completely* unworkable. Or were you planning to abandon th

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> What we *could* do is calculate a page-level CRC and >> store it in the page header just before writing out. Torn pages >> would then manifest as a wrong CRC on read. No correction ability, >> but at least a reliable

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes: > > Only workable solution would imho be to write the LSN to each 512 > > byte block (not that I am propagating that idea). > > We're not doing anything like that, as it would create an impossible > s

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Zeugswetter Andreas DAZ SD wrote: > > >> Are you sure about that? That would probably be the normal case, but > >> are you promised that the hardware will write all of the sectors of a > > >> block in order? > > > > I don't think you can possibly assume that. If the block > > crosses a cylind

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian writes: > Yes, that is a good idea! ... which was shot down in the very next message. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choo

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote: > > SCSI tagged queueing certainly allows 512-byte blocks to be reordered > > during writes. > > Then a torn-page tell-tale is required that will tell us of any change > to any of the 512-byte sectors that make up a block/page. > > Here's an idea: > > We read the page that we

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
>> Only workable solution would imho be to write the LSN to each 512 byte >> block (not that I am propagating that idea). "Only workable" was a stupid formulation, I meant a solution that works with a LSN. > We're not doing anything like that, as it would create an > impossible space-managemen

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
I wrote: > We still don't know enough about the situation to know what a solution > might look like. Is the slowdown Josh is seeing due to the extra CPU > cost of the CRCs, or the extra I/O cost, or excessive locking of the > WAL-related data structures while we do this stuff, or ???. Need more >

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
"Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes: > Only workable solution would imho be to write the LSN to each 512 > byte block (not that I am propagating that idea). We're not doing anything like that, as it would create an impossible space-management problem (or are you happy with lim

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
> Here's an idea: > > We read the page that we would have backed up, calc the CRC and > write a short WAL record with just the CRC, not the block. When > we recover we re-read the database page, calc its CRC and > compare it with the CRC from the transaction log. If they > differ, we know tha

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
>> Are you sure about that? That would probably be the normal case, but >> are you promised that the hardware will write all of the sectors of a >> block in order? > > I don't think you can possibly assume that. If the block > crosses a cylinder boundary then it's certainly an unsafe > assum

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruno Wolff III <[EMAIL PROTECTED]> writes: > > > Are you sure about that? That would probably be the normal case, but are > > > you promised that the hardware will write all of the sectors of a block > > > in order? > >

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote: > >>Tom, I think you're the only person that could or would be trusted to > >>make such a change. Even past the 8.1 freeze, I say we need to do > >>something now on this issue. > > > > > > I think if we document full_page_writes as similar

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruce Momjian
Tom Lane wrote: > Bruno Wolff III <[EMAIL PROTECTED]> writes: > > Are you sure about that? That would probably be the normal case, but are > > you promised that the hardware will write all of the sectors of a block > > in order? > > I don't think you can possibly assume that. If the block crosses

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Tom Lane
Bruno Wolff III <[EMAIL PROTECTED]> writes: > Are you sure about that? That would probably be the normal case, but are > you promised that the hardware will write all of the sectors of a block > in order? I don't think you can possibly assume that. If the block crosses a cylinder boundary then it

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruno Wolff III
On Wed, Jul 06, 2005 at 21:48:44 +0100, Simon Riggs <[EMAIL PROTECTED]> wrote: > > We could implement the torn-pages option, but that seems a lot of work. > Another way of implementing a tell-tale would be to append the LSN again > as a data page trailer as the last 4 bytes of the page. Thus the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Tom Lane
Bruce Momjian writes: > As far as #2, my posted proposal was to write the full pages to WAL when > they are written to the file system, and not when they are first > modified in the shared buffers --- That is *completely* unworkable. Or were you planning to abandon the promise that a transaction

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: >> Well, I added #1 yesterday as 'full_page_writes', and it has the same >> warnings as fsync (namely, on crash, be prepared to recovery or check >> your system thoroughly. > Yes, which is why I comme

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Oliver Jowett
Simon Riggs wrote: > I agree we *must* have the GUC, but we also *must* have a way for crash > recovery to tell us for certain that it has definitely worked, not just > maybe worked. Doesn't the same argument apply to the existing fsync = off case? i.e. we already have a case where we don't provi

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Joshua D. Drake
Tom, I think you're the only person that could or would be trusted to make such a change. Even past the 8.1 freeze, I say we need to do something now on this issue. I think if we document full_page_writes as similar to fsync in risk, we are OK for 8.1, but if something can be done easily, it

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Simon Riggs
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: > Well, I added #1 yesterday as 'full_page_writes', and it has the same > warnings as fsync (namely, on crash, be prepared to recovery or check > your system thoroughly. Yes, which is why I comment now that the GUC alone is not enough. There

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruce Momjian
Simon Riggs wrote: > On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: > > Josh Berkus writes: > > >> Uh, what exactly did you cut out? I suggested dropping the dumping of > > >> full page images, but not removing CRCs altogether ... > > > > > Attached is the patch I used. > > > > OK, thanks f

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Simon Riggs
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: > Josh Berkus writes: > >> Uh, what exactly did you cut out? I suggested dropping the dumping of > >> full page images, but not removing CRCs altogether ... > > > Attached is the patch I used. > > OK, thanks for the clarification. So it does s

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> Partial writes. Without the full-page image, we do not have enough >> information in WAL to reconstruct the correct page contents. > Sure, but why not? > If a 8k page contains 16 low level segments on disk and the o

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > Can someone explain exactly what the problem being defeated by writing whole > > pages to the WAL log? > > Partial writes. Without the full-page image, we do not have enough > information in WAL to reconstruct the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Can someone explain exactly what the problem being defeated by writing whole > pages to the WAL log? Partial writes. Without the full-page image, we do not have enough information in WAL to reconstruct the correct page contents. >> A further optimization

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Russell Smith
On Sun, 3 Jul 2005 04:47 pm, Greg Stark wrote: > > Bruce Momjian writes: > > > I have an idea! Currently we write the backup pages (copies of pages > > modified since the last checkpoint) when we write the WAL changes as > > part of the commit. See the XLogCheckBuffer() call in XLogInsert(). >

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-02 Thread Greg Stark
Bruce Momjian writes: > I have an idea! Currently we write the backup pages (copies of pages > modified since the last checkpoint) when we write the WAL changes as > part of the commit. See the XLogCheckBuffer() call in XLogInsert(). Can someone explain exactly what the problem being defeated

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-02 Thread Bruce Momjian
Tom Lane wrote: > Josh Berkus writes: > >> Uh, what exactly did you cut out? I suggested dropping the dumping of > >> full page images, but not removing CRCs altogether ... > > > Attached is the patch I used. > > OK, thanks for the clarification. So it does seem that dumping full > page images

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Qingqing Zhou
""Magnus Hagander"" <[EMAIL PROTECTED]> writes > > FWIW, MSSQL deals with this using "Torn Page Detection". This is off by > default (no check at all!), but can be abled on a per-database level. > Note that it only *detects* torn pages. If it finds one, it won't start > and tell you to recover fro

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Josh Berkus
Tom, > > What I'm confused about is that this shouldn't be anything new for > > 8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0 > > does, and it's pretty much entirely due to this check. > > That's simply not believable --- better recheck your analysis. If 8.1 > is worse it's n

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Tom Lane
Josh Berkus writes: > What I'm confused about is that this shouldn't be anything new for 8.1. Yet > 8.1 has *worse* performance on the STP machines than 8.0 does, and it's > pretty much entirely due to this check. That's simply not believable --- better recheck your analysis. If 8.1 is worse

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Josh Berkus
Tom, > Database pages. The current theory is that we can completely > reconstruct from WAL data every page that's been modified since the > last checkpoint. So the first write of any page after a checkpoint > dumps a full image of the page into WAL; subsequent writes only write > differences. W

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Magnus Hagander
> 2. Think of a better defense against partial-page writes. > > I like #2, or would if I could think of a better defense. > Ideas anyone? FWIW, MSSQL deals with this using "Torn Page Detection". This is off by default (no check at all!), but can be abled on a per-database level. Note that it on

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, > 1. Offer a GUC to turn off full-page-image dumping, which you'd use only > if you really trust your hardware :-( Are these just WAL pages? Or database pages as well? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)-

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus writes: >> 1. Offer a GUC to turn off full-page-image dumping, which you'd use only >> if you really trust your hardware :-( > Are these just WAL pages? Or database pages as well? Database pages. The current theory is that we can completely reconstruct from WAL data every page that

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus writes: >> Uh, what exactly did you cut out? I suggested dropping the dumping of >> full page images, but not removing CRCs altogether ... > Attached is the patch I used. OK, thanks for the clarification. So it does seem that dumping full page images is a pretty big hit these days.

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, > Uh, what exactly did you cut out? I suggested dropping the dumping of > full page images, but not removing CRCs altogether ... Attached is the patch I used. (it's a -Urn patch 'cause that's what STP takes) -- --Josh Josh Berkus Aglio Database Solutions San Francisco diff -urN pgsql/s

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus writes: > Ok, finally managed though the peristent efforts of Mark Wong to get some > tests through. Here are two tests with the CRC and wall buffer checking > completely cut out of the code, as Tom suggested: Uh, what exactly did you cut out? I suggested dropping the dumping of f

[HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, All: Ok, finally managed though the peristent efforts of Mark Wong to get some tests through. Here are two tests with the CRC and wall buffer checking completely cut out of the code, as Tom suggested: 5-min checkpoint: http://khack.osdl.org/stp/302738/results/0/ http://khack.osdl.org/stp/