Simon,
> I guess I'd be concerned that the poor bgwriter can't do all of this
> work. I was thinking about a separate log writer, so we could have both
> bgwriter and logwriter active simultaneously on I/O. It has taken a
> while to get bgwriter to perform its duties efficiently, so I'd rather
> n
On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote:
> Josh Berkus writes:
> >> We should run tests with much higher wal_buffers numbers to nullify the
> >> effect described above and reduce contention. That way we will move
> >> towards the log disk speed being the limiting factor, patch or no patc
Tom,
> I have no idea whether the DBT benchmarks would benefit at all, but
> given that they are affected positively by increasing wal_buffers,
> they must have a fair percentage of not-small transactions.
Even if they don't, we'll have series tests for DW here at GreenPlum soon,
and I'll bet th
Josh Berkus writes:
>> We should run tests with much higher wal_buffers numbers to nullify the
>> effect described above and reduce contention. That way we will move
>> towards the log disk speed being the limiting factor, patch or no patch.
> I've run such tests, at a glance they do seem to impr
Simon,
> We should run tests with much higher wal_buffers numbers to nullify the
> effect described above and reduce contention. That way we will move
> towards the log disk speed being the limiting factor, patch or no patch.
I've run such tests, at a glance they do seem to improve performance.
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote:
> Hmm. Eyeballing the NOTPM trace for cases 302912 and 302909, it sure
> looks like the post-checkpoint performance recovery is *slower* in
> the latter. And why is 302902 visibly slower overall than 302905?
> I thought for a bit that you had got
On Fri, 22 Jul 2005 19:11:36 -0400
Tom Lane <[EMAIL PROTECTED]> wrote:
> BTW, I'd like to look at 302906, but its [Details] link is broken.
Ugh, I tried digging onto the internal systems and it looks like they
were destroyed (or not saved) somehow. It'll have to be rerun.
Sorry...
Mark
--
Tom,
> There's something awfully weird going on here. I was prepared to see
> no statistically-significant differences, but not multiple cases that
> seem to be going the "wrong direction".
There's a lot of variance in the tests. I'm currently running a variance
test battery on one machine to
Josh Berkus writes:
>> Um, where are the test runs underlying this spreadsheet? I don't have a
>> whole lot of confidence in looking at full-run average TPM numbers to
>> discern whether transient dropoffs in TPM are significant or not.
> Web in the form of:
> http://khack.osdl.org/stp/#test_nu
Greg Stark <[EMAIL PROTECTED]> writes:
> For any benchmarking to be meaningful you have to set the checkpoint interval
> to something more realistic. Something like 5 minutes. That way when the final
> checkpoint cycle isn't completely included in the timing data you'll at least
> be missing a stat
Tom,
> Um, where are the test runs underlying this spreadsheet? I don't have a
> whole lot of confidence in looking at full-run average TPM numbers to
> discern whether transient dropoffs in TPM are significant or not.
Web in the form of:
http://khack.osdl.org/stp/#test_number#/
Where #test_nu
Greg,
> For any benchmarking to be meaningful you have to set the checkpoint
> interval to something more realistic. Something like 5 minutes. That way
> when the final checkpoint cycle isn't completely included in the timing
> data you'll at least be missing a statistically insignificant portion
Josh Berkus writes:
> Bruce,
>> Did you test with full_page_writes on and off?
> I didn't use your full_page_writes version because Tom said it was
> problematic. This is CVS from July 3rd.
We already know the results: should be equivalent to the hack Josh
tried first.
So what we know at thi
Josh Berkus writes:
> Looks like the CRC calculation work isn't the issue. I did test runs of
> no-CRC vs. regular DBT2 with different checkpoint timeouts, and didn't
> discern any statistical difference. See attached spreadsheet chart (the
> two different runs are on two different machines).
Josh Berkus writes:
> I think this test run http://khack.osdl.org/stp/302903/results/0/, with a
> 30-min checkpoint shows pretty clearly that the behavior of the
> performance drop is consistent with needing to "re-prime" the WAL will
> full page images. Each checkpoint drops performance a
Josh Berkus wrote:
> Bruce,
>
> > I think we need those tests run.
>
> Sure. What CVS day should I grab? What's the option syntax? ( -c
> full_page_writes=false)?
Yes. You can grab any from the day Tom fixed it, which was I think two
weeks ago.
> I have about 20 tests in queue right no
Bruce,
> I think we need those tests run.
Sure. What CVS day should I grab? What's the option syntax? ( -c
full_page_writes=false)?
I have about 20 tests in queue right now but can stack yours up behind
them.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
-
Josh Berkus wrote:
> Bruce,
>
> > Did you test with full_page_writes on and off?
>
> I didn't use your full_page_writes version because Tom said it was
> problematic. This is CVS from July 3rd.
I think we need those tests run.
--
Bruce Momjian| http://candle.pha.p
Bruce,
> Did you test with full_page_writes on and off?
I didn't use your full_page_writes version because Tom said it was
problematic. This is CVS from July 3rd.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of broadcast)-
Did you test with full_page_writes on and off?
---
Josh Berkus wrote:
> Tom,
>
> > This will remove just the CRC calculation work associated with backed-up
> > pages. ?Note that any attempt to recover from the WAL will fail
Tom,
> This will remove just the CRC calculation work associated with backed-up
> pages. Note that any attempt to recover from the WAL will fail, but I
> assume you don't need that for the purposes of the test run.
Looks like the CRC calculation work isn't the issue. I did test runs of
no-CRC
Tom,
> Josh, I see that all of those runs seem to be using wal_buffers = 8.
> Have you tried materially increasing wal_buffers (say to 100 or 1000)
> and/or experimenting with different wal_sync_method options since we
> fixed the bufmgrlock problem? I am wondering if the real issue is
> WAL buff
Josh Berkus writes:
> So, now that we know what the performance bottleneck is, how do we fix it?
Josh, I see that all of those runs seem to be using wal_buffers = 8.
Have you tried materially increasing wal_buffers (say to 100 or 1000)
and/or experimenting with different wal_sync_method options s
Bruce Momjian wrote:
>
> I don't think our problem is partial writes of WAL, which we already
> check, but heap/index page writes, which we currently do not check for
> partial writes.
Hmm...I've read through the thread again and thought about the problem
further, and now think I understand what
I don't think our problem is partial writes of WAL, which we already
check, but heap/index page writes, which we currently do not check for
partial writes.
---
Kevin Brown wrote:
> Tom Lane wrote:
> > Simon Riggs <[EMAIL PRO
Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
>
> If you're so concerned about *data loss* then none of this will be
> acceptable to you at all. We are talking
Simon, Tom,
> > Will do. Results in a few days.
Actually, between the bad patch on the 5th and ongoing STP issues, I don't
think I will have results before I leave town.Will e-mail you offlist to
give you info to retrieve results.
> Any chance you'd be able to do this with
>
> ext3 and a
On Fri, 2005-07-08 at 09:34 +0200, Zeugswetter Andreas DAZ SD wrote:
> >>> The point here is that fsync-off is only realistic for development
> or
> >>> playpen installations. You don't turn it off in a production
> >>> machine, and I can't see that you'd turn off the full-page-write
> >>> opti
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
>
> If you're so concerned about *data loss* then none of this will be
> accept
On R, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
There might be some merit in idea to disabling WAL/PITR for indexes,
where one ca
Simon Riggs <[EMAIL PROTECTED]> writes:
> I don't think we should care too much about indexes. We can rebuild
> them...but losing heap sectors means *data loss*.
If you're so concerned about *data loss* then none of this will be
acceptable to you at all. We are talking about going from a system
t
On Fri, 2005-07-08 at 09:47 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Having raised that objection, ISTM that checking for torn pages can be
> > accomplished reasonably well using a few rules...
>
> I have zero confidence in this; the fact that you can think of
> (incomp
Tom,
> Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT
> and just now --- Bruce's full_page_writes patch introduced a large
> random negative component into the timing ...
Ach. Starting over, then.
--Josh
--
Josh Berkus
Aglio Database Solutions
San Francisco
-
On Thu, 7 Jul 2005, Tom Lane wrote:
We still don't know enough about the situation to know what a solution
might look like. Is the slowdown Josh is seeing due to the extra CPU
cost of the CRCs, or the extra I/O cost, or excessive locking of the
WAL-related data structures while we do this stuff
Simon Riggs <[EMAIL PROTECTED]> writes:
> Is there also a potential showstopper in the redo machinery? We work on
> the assumption that the post-checkpoint block is available in WAL as a
> before image. Redo for all actions merely replay the write action again
> onto the block. If we must reapply t
On 7/7/05, Bruce Momjian wrote:
> One idea would be to just tie its behavior directly to fsync and remove
> the option completely (that was the original TODO), or we can adjust it
> so it doesn't have the same risks as fsync, or the same lack of failure
> reporting as fsync.
I wonder about one th
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian writes:
> > > Tom Lane wrote:
> > >> The point here is that fsync-off is only realistic for development
> > >> or playpen installations. You don't turn it off in a production
> > >> machine, and I can't se
>>> The point here is that fsync-off is only realistic for development
or
>>> playpen installations. You don't turn it off in a production
>>> machine, and I can't see that you'd turn off the full-page-write
>>> option either. So we have not solved anyone's performance problem.
>
>> Yes, thi
Josh Berkus writes:
>> If so, please undo the previous patch (which disabled page dumping
>> entirely) and instead try removing this block of code, starting
>> at about xlog.c line 620 in CVS tip:
> Will do. Results in a few days.
Great. BTW, don't bother testing snapshots between 2005/07/05 2
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> What we *could* do is calculate a page-level CRC and
> >> store it in the page header just before writing out. Torn pages
> >> would then manifest as a w
Tom,
> Josh, is OSDL up enough that you can try another comparison run?
Thankfully, yes.
> If so, please undo the previous patch (which disabled page dumping
> entirely) and instead try removing this block of code, starting
> at about xlog.c line 620 in CVS tip:
Will do. Results in a few days.
Simon Riggs wrote:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> > Well, I added #1 yesterday as 'full_page_writes', and it has the same
> > warnings as fsync (namely, on crash, be prepared to recovery or check
> > your system thoroughly.
>
> Yes, which is why I comment now that the
Joshua D. Drake wrote:
>
> >>Just to make my position perfectly clear: I don't want to see this
> >>option shipped in 8.1. It's reasonable to have it in there for now
> >>as an aid to our performance investigations, but I don't see that it
> >>has any value for production.
> >
> >
> > Well, thi
Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1. It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.
Well, this is the first I am hearing that, and of course yo
Tom Lane wrote:
> Bruce Momjian writes:
> > Tom Lane wrote:
> >> The point here is that fsync-off is only realistic for development
> >> or playpen installations. You don't turn it off in a production
> >> machine, and I can't see that you'd turn off the full-page-write
> >> option either. So we
Bruce Momjian writes:
> Tom Lane wrote:
>> The point here is that fsync-off is only realistic for development
>> or playpen installations. You don't turn it off in a production
>> machine, and I can't see that you'd turn off the full-page-write
>> option either. So we have not solved anyone's pe
Tom Lane wrote:
> Bruce Momjian writes:
> > As far as #2, my posted proposal was to write the full pages to WAL when
> > they are written to the file system, and not when they are first
> > modified in the shared buffers ---
>
> That is *completely* unworkable. Or were you planning to abandon th
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> What we *could* do is calculate a page-level CRC and
>> store it in the page header just before writing out. Torn pages
>> would then manifest as a wrong CRC on read. No correction ability,
>> but at least a reliable
Tom Lane <[EMAIL PROTECTED]> writes:
> "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
> > Only workable solution would imho be to write the LSN to each 512
> > byte block (not that I am propagating that idea).
>
> We're not doing anything like that, as it would create an impossible
> s
Zeugswetter Andreas DAZ SD wrote:
>
> >> Are you sure about that? That would probably be the normal case, but
> >> are you promised that the hardware will write all of the sectors of a
>
> >> block in order?
> >
> > I don't think you can possibly assume that. If the block
> > crosses a cylind
Bruce Momjian writes:
> Yes, that is a good idea!
... which was shot down in the very next message.
regards, tom lane
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choo
Simon Riggs wrote:
> > SCSI tagged queueing certainly allows 512-byte blocks to be reordered
> > during writes.
>
> Then a torn-page tell-tale is required that will tell us of any change
> to any of the 512-byte sectors that make up a block/page.
>
> Here's an idea:
>
> We read the page that we
>> Only workable solution would imho be to write the LSN to each 512
byte
>> block (not that I am propagating that idea).
"Only workable" was a stupid formulation, I meant a solution that works
with
a LSN.
> We're not doing anything like that, as it would create an
> impossible space-managemen
I wrote:
> We still don't know enough about the situation to know what a solution
> might look like. Is the slowdown Josh is seeing due to the extra CPU
> cost of the CRCs, or the extra I/O cost, or excessive locking of the
> WAL-related data structures while we do this stuff, or ???. Need more
>
"Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
> Only workable solution would imho be to write the LSN to each 512
> byte block (not that I am propagating that idea).
We're not doing anything like that, as it would create an impossible
space-management problem (or are you happy with lim
> Here's an idea:
>
> We read the page that we would have backed up, calc the CRC and
> write a short WAL record with just the CRC, not the block. When
> we recover we re-read the database page, calc its CRC and
> compare it with the CRC from the transaction log. If they
> differ, we know tha
>> Are you sure about that? That would probably be the normal case, but
>> are you promised that the hardware will write all of the sectors of a
>> block in order?
>
> I don't think you can possibly assume that. If the block
> crosses a cylinder boundary then it's certainly an unsafe
> assum
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruno Wolff III <[EMAIL PROTECTED]> writes:
> > > Are you sure about that? That would probably be the normal case, but are
> > > you promised that the hardware will write all of the sectors of a block
> > > in order?
> >
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote:
> >>Tom, I think you're the only person that could or would be trusted to
> >>make such a change. Even past the 8.1 freeze, I say we need to do
> >>something now on this issue.
> >
> >
> > I think if we document full_page_writes as similar
Tom Lane wrote:
> Bruno Wolff III <[EMAIL PROTECTED]> writes:
> > Are you sure about that? That would probably be the normal case, but are
> > you promised that the hardware will write all of the sectors of a block
> > in order?
>
> I don't think you can possibly assume that. If the block crosses
Bruno Wolff III <[EMAIL PROTECTED]> writes:
> Are you sure about that? That would probably be the normal case, but are
> you promised that the hardware will write all of the sectors of a block
> in order?
I don't think you can possibly assume that. If the block crosses a
cylinder boundary then it
On Wed, Jul 06, 2005 at 21:48:44 +0100,
Simon Riggs <[EMAIL PROTECTED]> wrote:
>
> We could implement the torn-pages option, but that seems a lot of work.
> Another way of implementing a tell-tale would be to append the LSN again
> as a data page trailer as the last 4 bytes of the page. Thus the
Bruce Momjian writes:
> As far as #2, my posted proposal was to write the full pages to WAL when
> they are written to the file system, and not when they are first
> modified in the shared buffers ---
That is *completely* unworkable. Or were you planning to abandon the
promise that a transaction
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
>> Well, I added #1 yesterday as 'full_page_writes', and it has the same
>> warnings as fsync (namely, on crash, be prepared to recovery or check
>> your system thoroughly.
> Yes, which is why I comme
Simon Riggs wrote:
> I agree we *must* have the GUC, but we also *must* have a way for crash
> recovery to tell us for certain that it has definitely worked, not just
> maybe worked.
Doesn't the same argument apply to the existing fsync = off case? i.e.
we already have a case where we don't provi
Tom, I think you're the only person that could or would be trusted to
make such a change. Even past the 8.1 freeze, I say we need to do
something now on this issue.
I think if we document full_page_writes as similar to fsync in risk, we
are OK for 8.1, but if something can be done easily, it
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> Well, I added #1 yesterday as 'full_page_writes', and it has the same
> warnings as fsync (namely, on crash, be prepared to recovery or check
> your system thoroughly.
Yes, which is why I comment now that the GUC alone is not enough.
There
Simon Riggs wrote:
> On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
> > Josh Berkus writes:
> > >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> > >> full page images, but not removing CRCs altogether ...
> >
> > > Attached is the patch I used.
> >
> > OK, thanks f
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
> Josh Berkus writes:
> >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> >> full page images, but not removing CRCs altogether ...
>
> > Attached is the patch I used.
>
> OK, thanks for the clarification. So it does s
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Partial writes. Without the full-page image, we do not have enough
>> information in WAL to reconstruct the correct page contents.
> Sure, but why not?
> If a 8k page contains 16 low level segments on disk and the o
Tom Lane <[EMAIL PROTECTED]> writes:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Can someone explain exactly what the problem being defeated by writing whole
> > pages to the WAL log?
>
> Partial writes. Without the full-page image, we do not have enough
> information in WAL to reconstruct the
Greg Stark <[EMAIL PROTECTED]> writes:
> Can someone explain exactly what the problem being defeated by writing whole
> pages to the WAL log?
Partial writes. Without the full-page image, we do not have enough
information in WAL to reconstruct the correct page contents.
>> A further optimization
On Sun, 3 Jul 2005 04:47 pm, Greg Stark wrote:
>
> Bruce Momjian writes:
>
> > I have an idea! Currently we write the backup pages (copies of pages
> > modified since the last checkpoint) when we write the WAL changes as
> > part of the commit. See the XLogCheckBuffer() call in XLogInsert().
>
Bruce Momjian writes:
> I have an idea! Currently we write the backup pages (copies of pages
> modified since the last checkpoint) when we write the WAL changes as
> part of the commit. See the XLogCheckBuffer() call in XLogInsert().
Can someone explain exactly what the problem being defeated
Tom Lane wrote:
> Josh Berkus writes:
> >> Uh, what exactly did you cut out? I suggested dropping the dumping of
> >> full page images, but not removing CRCs altogether ...
>
> > Attached is the patch I used.
>
> OK, thanks for the clarification. So it does seem that dumping full
> page images
""Magnus Hagander"" <[EMAIL PROTECTED]> writes
>
> FWIW, MSSQL deals with this using "Torn Page Detection". This is off by
> default (no check at all!), but can be abled on a per-database level.
> Note that it only *detects* torn pages. If it finds one, it won't start
> and tell you to recover fro
Tom,
> > What I'm confused about is that this shouldn't be anything new for
> > 8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0
> > does, and it's pretty much entirely due to this check.
>
> That's simply not believable --- better recheck your analysis. If 8.1
> is worse it's n
Josh Berkus writes:
> What I'm confused about is that this shouldn't be anything new for 8.1. Yet
> 8.1 has *worse* performance on the STP machines than 8.0 does, and it's
> pretty much entirely due to this check.
That's simply not believable --- better recheck your analysis. If 8.1
is worse
Tom,
> Database pages. The current theory is that we can completely
> reconstruct from WAL data every page that's been modified since the
> last checkpoint. So the first write of any page after a checkpoint
> dumps a full image of the page into WAL; subsequent writes only write
> differences.
W
> 2. Think of a better defense against partial-page writes.
>
> I like #2, or would if I could think of a better defense.
> Ideas anyone?
FWIW, MSSQL deals with this using "Torn Page Detection". This is off by
default (no check at all!), but can be abled on a per-database level.
Note that it on
Tom,
> 1. Offer a GUC to turn off full-page-image dumping, which you'd use only
> if you really trust your hardware :-(
Are these just WAL pages? Or database pages as well?
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of broadcast)-
Josh Berkus writes:
>> 1. Offer a GUC to turn off full-page-image dumping, which you'd use only
>> if you really trust your hardware :-(
> Are these just WAL pages? Or database pages as well?
Database pages. The current theory is that we can completely
reconstruct from WAL data every page that
Josh Berkus writes:
>> Uh, what exactly did you cut out? I suggested dropping the dumping of
>> full page images, but not removing CRCs altogether ...
> Attached is the patch I used.
OK, thanks for the clarification. So it does seem that dumping full
page images is a pretty big hit these days.
Tom,
> Uh, what exactly did you cut out? I suggested dropping the dumping of
> full page images, but not removing CRCs altogether ...
Attached is the patch I used. (it's a -Urn patch 'cause that's what STP
takes)
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
diff -urN pgsql/s
Josh Berkus writes:
> Ok, finally managed though the peristent efforts of Mark Wong to get some
> tests through. Here are two tests with the CRC and wall buffer checking
> completely cut out of the code, as Tom suggested:
Uh, what exactly did you cut out? I suggested dropping the dumping of
f
85 matches
Mail list logo