Simon,
> I guess I'd be concerned that the poor bgwriter can't do all of this
> work. I was thinking about a separate log writer, so we could have both
> bgwriter and logwriter active simultaneously on I/O. It has taken a
> while to get bgwriter to perform its duties efficiently, so I'd rather
> n
On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote:
> Josh Berkus writes:
> >> We should run tests with much higher wal_buffers numbers to nullify the
> >> effect described above and reduce contention. That way we will move
> >> towards the log disk speed being the limiting factor, patch or no patc
Tom,
> I have no idea whether the DBT benchmarks would benefit at all, but
> given that they are affected positively by increasing wal_buffers,
> they must have a fair percentage of not-small transactions.
Even if they don't, we'll have series tests for DW here at GreenPlum soon,
and I'll bet th
Josh Berkus writes:
>> We should run tests with much higher wal_buffers numbers to nullify the
>> effect described above and reduce contention. That way we will move
>> towards the log disk speed being the limiting factor, patch or no patch.
> I've run such tests, at a glance they do seem to impr
Simon,
> We should run tests with much higher wal_buffers numbers to nullify the
> effect described above and reduce contention. That way we will move
> towards the log disk speed being the limiting factor, patch or no patch.
I've run such tests, at a glance they do seem to improve performance.
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote:
> Hmm. Eyeballing the NOTPM trace for cases 302912 and 302909, it sure
> looks like the post-checkpoint performance recovery is *slower* in
> the latter. And why is 302902 visibly slower overall than 302905?
> I thought for a bit that you had got
On Fri, 22 Jul 2005 19:11:36 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:
> BTW, I'd like to look at 302906, but its [Details] link is broken.
Ugh, I tried digging onto the internal systems and it looks like they
were destroyed (or not saved) somehow. It'll have to be rerun.
Sorry...
Mark
--
Tom,
> There's something awfully weird going on here. I was prepared to see
> no statistically-significant differences, but not multiple cases that
> seem to be going the "wrong direction".
There's a lot of variance in the tests. I'm currently running a variance
test battery on one machine to
Josh Berkus writes:
>> Um, where are the test runs underlying this spreadsheet? I don't have a
>> whole lot of confidence in looking at full-run average TPM numbers to
>> discern whether transient dropoffs in TPM are significant or not.
> Web in the form of:
> http://khack.osdl.org/stp/#test_nu
Greg Stark <[EMAIL PROTECTED]> writes:
> For any benchmarking to be meaningful you have to set the checkpoint interval
> to something more realistic. Something like 5 minutes. That way when the final
> checkpoint cycle isn't completely included in the timing data you'll at least
> be missing a stat
Tom,
> Um, where are the test runs underlying this spreadsheet? I don't have a
> whole lot of confidence in looking at full-run average TPM numbers to
> discern whether transient dropoffs in TPM are significant or not.
Web in the form of:
http://khack.osdl.org/stp/#test_number#/
Where #test_nu
Greg,
> For any benchmarking to be meaningful you have to set the checkpoint
> interval to something more realistic. Something like 5 minutes. That way
> when the final checkpoint cycle isn't completely included in the timing
> data you'll at least be missing a statistically insignificant portion
Josh Berkus writes:
> Bruce,
>> Did you test with full_page_writes on and off?
> I didn't use your full_page_writes version because Tom said it was
> problematic. This is CVS from July 3rd.
We already know the results: should be equivalent to the hack Josh
tried first.
So what we know at thi
Josh Berkus writes:
> Looks like the CRC calculation work isn't the issue. I did test runs of
> no-CRC vs. regular DBT2 with different checkpoint timeouts, and didn't
> discern any statistical difference. See attached spreadsheet chart (the
> two different runs are on two different machines).
Josh Berkus writes:
> I think this test run http://khack.osdl.org/stp/302903/results/0/, with a
> 30-min checkpoint shows pretty clearly that the behavior of the
> performance drop is consistent with needing to "re-prime" the WAL will
> full page images. Each checkpoint drops performance a
Josh Berkus wrote:
> Bruce,
>
> > I think we need those tests run.
>
> Sure. What CVS day should I grab? What's the option syntax? ( -c
> full_page_writes=false)?
Yes. You can grab any from the day Tom fixed it, which was I think two
weeks ago.
> I have about 20 tests in queue right no
Bruce,
> I think we need those tests run.
Sure. What CVS day should I grab? What's the option syntax? ( -c
full_page_writes=false)?
I have about 20 tests in queue right now but can stack yours up behind
them.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
-
Josh Berkus wrote:
> Bruce,
>
> > Did you test with full_page_writes on and off?
>
> I didn't use your full_page_writes version because Tom said it was
> problematic. This is CVS from July 3rd.
I think we need those tests run.
--
Bruce Momjian| http://candle.pha.p
Bruce,
> Did you test with full_page_writes on and off?
I didn't use your full_page_writes version because Tom said it was
problematic. This is CVS from July 3rd.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of broadcast)-
Did you test with full_page_writes on and off?
---
Josh Berkus wrote:
> Tom,
>
> > This will remove just the CRC calculation work associated with backed-up
> > pages. ?Note that any attempt to recover from the WAL will fail
Tom,
> This will remove just the CRC calculation work associated with backed-up
> pages. Note that any attempt to recover from the WAL will fail, but I
> assume you don't need that for the purposes of the test run.
Looks like the CRC calculation work isn't the issue. I did test runs of
no-CRC
Tom,
> Josh, I see that all of those runs seem to be using wal_buffers = 8.
> Have you tried materially increasing wal_buffers (say to 100 or 1000)
> and/or experimenting with different wal_sync_method options since we
> fixed the bufmgrlock problem? I am wondering if the real issue is
> WAL buff
Josh Berkus writes:
> So, now that we know what the performance bottleneck is, how do we fix it?
Josh, I see that all of those runs seem to be using wal_buffers = 8.
Have you tried materially increasing wal_buffers (say to 100 or 1000)
and/or experimenting with different wal_sync_method options s
Bruce Momjian wrote:
>
> I don't think our problem is partial writes of WAL, which we already
> check, but heap/index page writes, which we currently do not check for
> partial writes.
Hmm...I've read through the thread again and thought about the problem
further, and now think I understand what
I don't think our problem is partial writes of WAL, which we already
check, but heap/index page writes, which we currently do not check for
partial writes.
---
Kevin Brown wrote:
> Tom Lane wrote:
> > Simon Riggs <[EMAIL PRO
Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
>
> If you're so concerned about *data loss* then none of this will be
> acceptable to you at all. We are talking
Simon, Tom,
> > Will do. Results in a few days.
Actually, between the bad patch on the 5th and ongoing STP issues, I don't
think I will have results before I leave town.Will e-mail you offlist to
give you info to retrieve results.
> Any chance you'd be able to do this with
>
> ext3 and a
On Fri, 2005-07-08 at 09:34 +0200, Zeugswetter Andreas DAZ SD wrote:
> >>> The point here is that fsync-off is only realistic for development
> or
> >>> playpen installations. You don't turn it off in a production
> >>> machine, and I can't see that you'd turn off the full-page-write
> >>> opti
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
>
> If you're so concerned about *data loss* then none of this will be
> accept
On R, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
There might be some merit in idea to disabling WAL/PITR for indexes,
where one ca
Simon Riggs <[EMAIL PROTECTED]> writes:
> I don't think we should care too much about indexes. We can rebuild
> them...but losing heap sectors means *data loss*.
If you're so concerned about *data loss* then none of this will be
acceptable to you at all. We are talking about going from a system
t
On Fri, 2005-07-08 at 09:47 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Having raised that objection, ISTM that checking for torn pages can be
> > accomplished reasonably well using a few rules...
>
> I have zero confidence in this; the fact that you can think of
> (incomp
Tom,
> Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT
> and just now --- Bruce's full_page_writes patch introduced a large
> random negative component into the timing ...
Ach. Starting over, then.
--Josh
--
Josh Berkus
Aglio Database Solutions
San Francisco
-
On Thu, 7 Jul 2005, Tom Lane wrote:
We still don't know enough about the situation to know what a solution
might look like. Is the slowdown Josh is seeing due to the extra CPU
cost of the CRCs, or the extra I/O cost, or excessive locking of the
WAL-related data structures while we do this stuff
Simon Riggs <[EMAIL PROTECTED]> writes:
> Is there also a potential showstopper in the redo machinery? We work on
> the assumption that the post-checkpoint block is available in WAL as a
> before image. Redo for all actions merely replay the write action again
> onto the block. If we must reapply t
On 7/7/05, Bruce Momjian wrote:
> One idea would be to just tie its behavior directly to fsync and remove
> the option completely (that was the original TODO), or we can adjust it
> so it doesn't have the same risks as fsync, or the same lack of failure
> reporting as fsync.
I wonder about one th
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian writes:
> > > Tom Lane wrote:
> > >> The point here is that fsync-off is only realistic for development
> > >> or playpen installations. You don't turn it off in a production
> > >> machine, and I can't se
>>> The point here is that fsync-off is only realistic for development
or
>>> playpen installations. You don't turn it off in a production
>>> machine, and I can't see that you'd turn off the full-page-write
>>> option either. So we have not solved anyone's performance problem.
>
>> Yes, thi
Josh Berkus writes:
>> If so, please undo the previous patch (which disabled page dumping
>> entirely) and instead try removing this block of code, starting
>> at about xlog.c line 620 in CVS tip:
> Will do. Results in a few days.
Great. BTW, don't bother testing snapshots between 2005/07/05 2
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> What we *could* do is calculate a page-level CRC and
> >> store it in the page header just before writing out. Torn pages
> >> would then manifest as a w
Tom,
> Josh, is OSDL up enough that you can try another comparison run?
Thankfully, yes.
> If so, please undo the previous patch (which disabled page dumping
> entirely) and instead try removing this block of code, starting
> at about xlog.c line 620 in CVS tip:
Will do. Results in a few days.
Simon Riggs wrote:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> > Well, I added #1 yesterday as 'full_page_writes', and it has the same
> > warnings as fsync (namely, on crash, be prepared to recovery or check
> > your system thoroughly.
>
> Yes, which is why I comment now that the
Joshua D. Drake wrote:
>
> >>Just to make my position perfectly clear: I don't want to see this
> >>option shipped in 8.1. It's reasonable to have it in there for now
> >>as an aid to our performance investigations, but I don't see that it
> >>has any value for production.
> >
> >
> > Well, thi
Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1. It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.
Well, this is the first I am hearing that, and of course yo
Tom Lane wrote:
> Bruce Momjian writes:
> > Tom Lane wrote:
> >> The point here is that fsync-off is only realistic for development
> >> or playpen installations. You don't turn it off in a production
> >> machine, and I can't see that you'd turn off the full-page-write
> >> option either. So we
Bruce Momjian writes:
> Tom Lane wrote:
>> The point here is that fsync-off is only realistic for development
>> or playpen installations. You don't turn it off in a production
>> machine, and I can't see that you'd turn off the full-page-write
>> option either. So we have not solved anyone's pe
Tom Lane wrote:
> Bruce Momjian writes:
> > As far as #2, my posted proposal was to write the full pages to WAL when
> > they are written to the file system, and not when they are first
> > modified in the shared buffers ---
>
> That is *completely* unworkable. Or were you planning to abandon th
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> What we *could* do is calculate a page-level CRC and
>> store it in the page header just before writing out. Torn pages
>> would then manifest as a wrong CRC on read. No correction ability,
>> but at least a reliable
Tom Lane <[EMAIL PROTECTED]> writes:
> "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
> > Only workable solution would imho be to write the LSN to each 512
> > byte block (not that I am propagating that idea).
>
> We're not doing anything like that, as it would create an impossible
> s
Zeugswetter Andreas DAZ SD wrote:
>
> >> Are you sure about that? That would probably be the normal case, but
> >> are you promised that the hardware will write all of the sectors of a
>
> >> block in order?
> >
> > I don't think you can possibly assume that. If the block
> > crosses a cylind
Bruce Momjian writes:
> Yes, that is a good idea!
... which was shot down in the very next message.
regards, tom lane
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choo
Simon Riggs wrote:
> > SCSI tagged queueing certainly allows 512-byte blocks to be reordered
> > during writes.
>
> Then a torn-page tell-tale is required that will tell us of any change
> to any of the 512-byte sectors that make up a block/page.
>
> Here's an idea:
>
> We read the page that we
>> Only workable solution would imho be to write the LSN to each 512
byte
>> block (not that I am propagating that idea).
"Only workable" was a stupid formulation, I meant a solution that works
with
a LSN.
> We're not doing anything like that, as it would create an
> impossible space-managemen
I wrote:
> We still don't know enough about the situation to know what a solution
> might look like. Is the slowdown Josh is seeing due to the extra CPU
> cost of the CRCs, or the extra I/O cost, or excessive locking of the
> WAL-related data structures while we do this stuff, or ???. Need more
>
"Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
> Only workable solution would imho be to write the LSN to each 512
> byte block (not that I am propagating that idea).
We're not doing anything like that, as it would create an impossible
space-management problem (or are you happy with lim
> Here's an idea:
>
> We read the page that we would have backed up, calc the CRC and
> write a short WAL record with just the CRC, not the block. When
> we recover we re-read the database page, calc its CRC and
> compare it with the CRC from the transaction log. If they
> differ, we know tha
>> Are you sure about that? That would probably be the normal case, but
>> are you promised that the hardware will write all of the sectors of a
>> block in order?
>
> I don't think you can possibly assume that. If the block
> crosses a cylinder boundary then it's certainly an unsafe
> assum
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruno Wolff III <[EMAIL PROTECTED]> writes:
> > > Are you sure about that? That would probably be the normal case, but are
> > > you promised that the hardware will write all of the sectors of a block
> > > in order?
> >
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote:
> >>Tom, I think you're the only person that could or would be trusted to
> >>make such a change. Even past the 8.1 freeze, I say we need to do
> >>something now on this issue.
> >
> >
> > I think if we document full_page_writes as similar
Tom Lane wrote:
> Bruno Wolff III <[EMAIL PROTECTED]> writes:
> > Are you sure about that? That would probably be the normal case, but are
> > you promised that the hardware will write all of the sectors of a block
> > in order?
>
> I don't think you can possibly assume that. If the block crosses
Bruno Wolff III <[EMAIL PROTECTED]> writes:
> Are you sure about that? That would probably be the normal case, but are
> you promised that the hardware will write all of the sectors of a block
> in order?
I don't think you can possibly assume that. If the block crosses a
cylinder boundary then it
On Wed, Jul 06, 2005 at 21:48:44 +0100,
Simon Riggs <[EMAIL PROTECTED]> wrote:
>
> We could implement the torn-pages option, but that seems a lot of work.
> Another way of implementing a tell-tale would be to append the LSN again
> as a data page trailer as the last 4 bytes of the page. Thus the
Bruce Momjian writes:
> As far as #2, my posted proposal was to write the full pages to WAL when
> they are written to the file system, and not when they are first
> modified in the shared buffers ---
That is *completely* unworkable. Or were you planning to abandon the
promise that a transaction
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
>> Well, I added #1 yesterday as 'full_page_writes', and it has the same
>> warnings as fsync (namely, on crash, be prepared to recovery or check
>> your system thoroughly.
> Yes, which is why I comme
Simon Riggs wrote:
> I agree we *must* have the GUC, but we also *must* have a way for crash
> recovery to tell us for certain that it has definitely worked, not just
> maybe worked.
Doesn't the same argument apply to the existing fsync = off case? i.e.
we already have a case where we don't provi
Tom, I think you're the only person that could or would be trusted to
make such a change. Even past the 8.1 freeze, I say we need to do
something now on this issue.
I think if we document full_page_writes as similar to fsync in risk, we
are OK for 8.1, but if something can be done easily, it
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> Well, I added #1 yesterday as 'full_page_writes', and it has the same
> warnings as fsync (namely, on crash, be prepared to recovery or check
> your system thoroughly.
Yes, which is why I comment now that the GUC alone is not enough.
There
Simon Riggs wrote:
> On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
> > Josh Berkus writes:
> > >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> > >> full page images, but not removing CRCs altogether ...
> >
> > > Attached is the patch I used.
> >
> > OK, thanks f
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
> Josh Berkus writes:
> >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> >> full page images, but not removing CRCs altogether ...
>
> > Attached is the patch I used.
>
> OK, thanks for the clarification. So it does s
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Partial writes. Without the full-page image, we do not have enough
>> information in WAL to reconstruct the correct page contents.
> Sure, but why not?
> If a 8k page contains 16 low level segments on disk and the o
Tom Lane <[EMAIL PROTECTED]> writes:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Can someone explain exactly what the problem being defeated by writing whole
> > pages to the WAL log?
>
> Partial writes. Without the full-page image, we do not have enough
> information in WAL to reconstruct the
Greg Stark <[EMAIL PROTECTED]> writes:
> Can someone explain exactly what the problem being defeated by writing whole
> pages to the WAL log?
Partial writes. Without the full-page image, we do not have enough
information in WAL to reconstruct the correct page contents.
>> A further optimization
On Sun, 3 Jul 2005 04:47 pm, Greg Stark wrote:
>
> Bruce Momjian writes:
>
> > I have an idea! Currently we write the backup pages (copies of pages
> > modified since the last checkpoint) when we write the WAL changes as
> > part of the commit. See the XLogCheckBuffer() call in XLogInsert().
>
Bruce Momjian writes:
> I have an idea! Currently we write the backup pages (copies of pages
> modified since the last checkpoint) when we write the WAL changes as
> part of the commit. See the XLogCheckBuffer() call in XLogInsert().
Can someone explain exactly what the problem being defeated
Tom Lane wrote:
> Josh Berkus writes:
> >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> >> full page images, but not removing CRCs altogether ...
>
> > Attached is the patch I used.
>
> OK, thanks for the clarification. So it does seem that dumping full
> page images
""Magnus Hagander"" <[EMAIL PROTECTED]> writes
>
> FWIW, MSSQL deals with this using "Torn Page Detection". This is off by
> default (no check at all!), but can be abled on a per-database level.
> Note that it only *detects* torn pages. If it finds one, it won't start
> and tell you to recover fro
Tom,
> > What I'm confused about is that this shouldn't be anything new for
> > 8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0
> > does, and it's pretty much entirely due to this check.
>
> That's simply not believable --- better recheck your analysis. If 8.1
> is worse it's n
Josh Berkus writes:
> What I'm confused about is that this shouldn't be anything new for 8.1. Yet
> 8.1 has *worse* performance on the STP machines than 8.0 does, and it's
> pretty much entirely due to this check.
That's simply not believable --- better recheck your analysis. If 8.1
is worse
Tom,
> Database pages. The current theory is that we can completely
> reconstruct from WAL data every page that's been modified since the
> last checkpoint. So the first write of any page after a checkpoint
> dumps a full image of the page into WAL; subsequent writes only write
> differences.
W
> 2. Think of a better defense against partial-page writes.
>
> I like #2, or would if I could think of a better defense.
> Ideas anyone?
FWIW, MSSQL deals with this using "Torn Page Detection". This is off by
default (no check at all!), but can be abled on a per-database level.
Note that it on
Tom,
> 1. Offer a GUC to turn off full-page-image dumping, which you'd use only
> if you really trust your hardware :-(
Are these just WAL pages? Or database pages as well?
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of broadcast)-
Josh Berkus writes:
>> 1. Offer a GUC to turn off full-page-image dumping, which you'd use only
>> if you really trust your hardware :-(
> Are these just WAL pages? Or database pages as well?
Database pages. The current theory is that we can completely
reconstruct from WAL data every page that
Josh Berkus writes:
>> Uh, what exactly did you cut out? I suggested dropping the dumping of
>> full page images, but not removing CRCs altogether ...
> Attached is the patch I used.
OK, thanks for the clarification. So it does seem that dumping full
page images is a pretty big hit these days.
Tom,
> Uh, what exactly did you cut out? I suggested dropping the dumping of
> full page images, but not removing CRCs altogether ...
Attached is the patch I used. (it's a -Urn patch 'cause that's what STP
takes)
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
diff -urN pgsql/s
Josh Berkus writes:
> Ok, finally managed though the peristent efforts of Mark Wong to get some
> tests through. Here are two tests with the CRC and wall buffer checking
> completely cut out of the code, as Tom suggested:
Uh, what exactly did you cut out? I suggested dropping the dumping of
f
Tom, All:
Ok, finally managed though the peristent efforts of Mark Wong to get some
tests through. Here are two tests with the CRC and wall buffer checking
completely cut out of the code, as Tom suggested:
5-min checkpoint:
http://khack.osdl.org/stp/302738/results/0/
http://khack.osdl.org/stp/
86 matches
Mail list logo