Bruce Momjian wrote:
> Greg Smith wrote:
> > Kevin Grittner wrote:
> > > I assume that we send a full
> > > 8K to the OS cache, and the file system writes disk sectors
> > > according to its own algorithm. With either platters or BBU cache,
> > > the data is persisted on fsync; why do you see a ri
Pierre C wrote:
>
> > Is that true? I have no idea. I thought everything was done at the
> > 512-byte block level.
>
> Newer disks (2TB and up) can have 4k sectors, but this still means a page
> spans several sectors.
Yes, I had heard about that.
--
Bruce Momjian http://momjian.
Is that true? I have no idea. I thought everything was done at the
512-byte block level.
Newer disks (2TB and up) can have 4k sectors, but this still means a page
spans several sectors.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your
Greg Smith wrote:
> Tom Lane wrote:
> > You've got entirely too simplistic a view of what the "delta" might be,
> > I fear. In particular there are various sorts of changes that involve
> > inserting the data carried in the WAL record and shifting pre-existing
> > data around to make room, or remo
Kevin Grittner wrote:
> Greg Smith wrote:
>
> > I think Kevin's point here may be that if your fsync isn't
> > reliable, you're always in trouble. But if your fsync is good,
> > even torn pages should be repairable by the deltas written to the
> > WAL
>
> I was actually just arguing that a BB
Greg Smith wrote:
> Kevin Grittner wrote:
> > I assume that we send a full
> > 8K to the OS cache, and the file system writes disk sectors
> > according to its own algorithm. With either platters or BBU cache,
> > the data is persisted on fsync; why do you see a risk with one but
> > not the other
On Fri, 29 Oct 2010, James Mansion wrote:
Tom Lane wrote:
Uh, no, it is not. The difference is that we can update a byte in a
shared buffer, and know that it *isn't* getting written out before we
Well, I don't know where yu got the idea I was refering to that sort of thing
- its
the same as
Tom Lane wrote:
Uh, no, it is not. The difference is that we can update a byte in a
shared buffer, and know that it *isn't* getting written out before we
Well, I don't know where yu got the idea I was refering to that sort of
thing - its
the same as writing to a buffer before copying to the
On Fri, 29 Oct 2010, Robert Haas wrote:
On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote:
James Mansion writes:
Tom Lane wrote:
The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk. An msync
will flush them out, but th
Excerpts from Greg Smith's message of jue oct 21 14:04:17 -0300 2010:
> What I would like to do is beef up the documentation with some concrete
> examples of how to figure out if your cache and associated write path
> are working reliably or not. It should be possible to include "does
> this h
On Fri, Oct 29, 2010 at 11:56 AM, Aidan Van Dyk wrote:
> 1) The pages you write to must be in the page cache, or your memcpy is
> going to fault them in. With a plain write, you don't need the
> over-written page in the cache.
I seem to remember a time many years ago when I got bitten by this
pr
Robert Haas writes:
> On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote:
>> It's true that we don't know whether write() causes an immediate or
>> delayed disk write, but we generally don't care that much. What we do
>> care about is being able to ensure that a WAL write happens before the
>> data
On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas wrote:
> Well, we COULD keep the data in shared buffers, and then copy it into
> an mmap()'d region rather than calling write(), but I'm not sure
> there's any advantage to it. Managing address space mappings is a
> pain in the butt.
I could see thi
On Thu, Oct 28, 2010 at 5:26 PM, Tom Lane wrote:
> James Mansion writes:
>> Tom Lane wrote:
>>> The other and probably worse problem is that there's no application
>>> control over how soon changes to mmap'd pages get to disk. An msync
>>> will flush them out, but the kernel is free to write dir
James Mansion writes:
> Tom Lane wrote:
>> The other and probably worse problem is that there's no application
>> control over how soon changes to mmap'd pages get to disk. An msync
>> will flush them out, but the kernel is free to write dirty pages sooner.
>> So if they're depending for consiste
Tom Lane wrote:
The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk. An msync
will flush them out, but the kernel is free to write dirty pages sooner.
So if they're depending for consistency on writes not happening until
m
On Wed, Oct 27, 2010 at 6:55 PM, Robert Haas wrote:
> On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wrote:
>> On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote:
>>> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote:
The double write buffer is one of the few areas where InnoDB does more
On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wrote:
> On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote:
>> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote:
>>> The double write buffer is one of the few areas where InnoDB does more
>>> IO (in the form of fsynch's) than PG. InnoDB also has
On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas wrote:
> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote:
>> The double write buffer is one of the few areas where InnoDB does more
>> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
>> checkpoints (which help to keep dirty pages in mem
On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wrote:
> The double write buffer is one of the few areas where InnoDB does more
> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
> checkpoints (which help to keep dirty pages in memory longer),
> buffering of writing out changes to seconda
On Tue, Oct 26, 2010 at 5:41 AM, Robert Haas wrote:
> On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner
> wrote:
>> Rob Wultsch wrote:
>>
>>> I would think full_page_writes=off + double write buffer should be
>>> far superior, particularly given that the WAL is shipped over the
>>> network to slav
On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner
wrote:
> Rob Wultsch wrote:
>
>> I would think full_page_writes=off + double write buffer should be
>> far superior, particularly given that the WAL is shipped over the
>> network to slaves.
>
> For a reasonably brief description of InnoDB double wr
On Oct 22, 2010, at 1:06 PM, Rob Wultsch wrote:
> On Fri, Oct 22, 2010 at 12:05 PM, Kevin Grittner
> wrote:
>> Rob Wultsch wrote:
>>
>>> I would think full_page_writes=off + double write buffer should be
>>> far superior, particularly given that the WAL is shipped over the
>>> network to slave
Greg Smith writes:
> James Mansion wrote:
>> When I looked at the internals of TokyoCabinet for example, the design
>> was flawed but
>> would be 'fairly robust' so long as mmap'd pages that were dirtied did
>> not get persisted
>> until msync, and were then persisted atomically.
> If TokyoCabi
Jesper Krogh wrote:
Can you point to some ZFS docs that tell that this is the case.. I'd
be surprised
if it doesnt copy away the old block and replaces it with the new one
in-place. The
other behaviour would quite quickly lead to a hugely fragmented
filesystem that
performs next to useless an
James Mansion wrote:
When I looked at the internals of TokyoCabinet for example, the design
was flawed but
would be 'fairly robust' so long as mmap'd pages that were dirtied did
not get persisted
until msync, and were then persisted atomically.
If TokyoCabinet presumes that's true and overwri
Kevin Grittner wrote:
On what do you base that assumption? I assume that we send a full
8K to the OS cache, and the file system writes disk sectors
according to its own algorithm. With either platters or BBU cache,
the data is persisted on fsync; why do you see a risk with one but
not the other
Even if it's possible, it's far from clear to me that it would be an
improvement. The author estimates (apparently somewhat loosely)
that it's a 5% to 10% performance hit in InnoDB; I'm far from
certain that full_page_writes cost us that much. Does anyone have
benchmark numbers handy?
It mos
Rob Wultsch wrote:
> I really would like to work with PG more and this seems like
> [full_page_writes] would be a significant hindrance for certain
> usage patterns. Lots of replication does not take place over gig...
Certainly most of the Wisconsin State Courts replication takes place
over WA
On Fri, Oct 22, 2010 at 1:15 PM, Kevin Grittner
wrote:
> Rob Wultsch wrote:
>
>> not needing full_page_writes would make geographically dispersed
>> replication possible for certain cases where it is not currently
>> (or at least rather painful).
>
> Do you have any hard numbers on WAL file size
Rob Wultsch wrote:
> not needing full_page_writes would make geographically dispersed
> replication possible for certain cases where it is not currently
> (or at least rather painful).
Do you have any hard numbers on WAL file size impact? How much does
pglesslog help in a file-based WAL trans
On Fri, Oct 22, 2010 at 12:05 PM, Kevin Grittner
wrote:
> Rob Wultsch wrote:
>
>> I would think full_page_writes=off + double write buffer should be
>> far superior, particularly given that the WAL is shipped over the
>> network to slaves.
>
> For a reasonably brief description of InnoDB double w
Rob Wultsch wrote:
> I would think full_page_writes=off + double write buffer should be
> far superior, particularly given that the WAL is shipped over the
> network to slaves.
For a reasonably brief description of InnoDB double write buffers, I
found this:
http://www.mysqlperformanceblog.co
On Fri, Oct 22, 2010 at 10:28 AM, Kevin Grittner
wrote:
> Rob Wultsch wrote:
>
>> has PG considered using a double write buffer similar to InnodB?
>
> That seems inferior to the full_page_writes strategy, where you only
> write a page twice the first time it is written after a checkpoint.
> We're
Rob Wultsch wrote:
> has PG considered using a double write buffer similar to InnodB?
That seems inferior to the full_page_writes strategy, where you only
write a page twice the first time it is written after a checkpoint.
We're talking about when we might be able to write *less*, not more.
On Fri, Oct 22, 2010 at 8:37 AM, Greg Smith wrote:
> Tom Lane wrote:
>>
>> You've got entirely too simplistic a view of what the "delta" might be,
>> I fear. In particular there are various sorts of changes that involve
>> inserting the data carried in the WAL record and shifting pre-existing
>>
On 2010-10-22 17:37, Greg Smith wrote:
I think that most people who have thought they were safe to turn off
full_page_writes in the past did so because they believed they were
in category (1) here. I've never advised anyone to do that, because
it's so difficult to validate the truth of. Jus
Tom Lane wrote:
You've got entirely too simplistic a view of what the "delta" might be,
I fear. In particular there are various sorts of changes that involve
inserting the data carried in the WAL record and shifting pre-existing
data around to make room, or removing an item and moving remaining
Greg Smith wrote:
> I think Kevin's point here may be that if your fsync isn't
> reliable, you're always in trouble. But if your fsync is good,
> even torn pages should be repairable by the deltas written to the
> WAL
I was actually just arguing that a BBU doesn't eliminate a risk
here; if th
Kevin Grittner wrote:
With either platters or BBU cache,
the data is persisted on fsync; why do you see a risk with one but
not the other
Forgot to address this part. The troublesome sequence if you don't have
a BBU is:
1) WAL data is written to the OS cache
2) PG calls fsync
3) Data is tra
Greg Smith writes:
> At this point, you now have a torn 8K page, with 1/2 old and 1/2 new
> data.
Right.
> Without a full page write in the WAL, is it always possible to
> restore its original state now? In theory, I think you do. Since the
> delta in the WAL should be overwriting all of th
Kevin Grittner wrote:
I assume that we send a full
8K to the OS cache, and the file system writes disk sectors
according to its own algorithm. With either platters or BBU cache,
the data is persisted on fsync; why do you see a risk with one but
not the other
I'd like a 10 minute argument pleas
On Thursday 21 October 2010 21:42:06 Kevin Grittner wrote:
> Bruce Momjian wrote:
> > I assume we send a full 8k to the controller, and a failure during
> > that write is not registered as a write.
>
> On what do you base that assumption? I assume that we send a full
> 8K to the OS cache, and th
Kevin Grittner wrote:
> Bruce Momjian wrote:
>
> > I assume we send a full 8k to the controller, and a failure during
> > that write is not registered as a write.
>
> On what do you base that assumption? I assume that we send a full
> 8K to the OS cache, and the file system writes disk sector
Bruce Momjian wrote:
> I assume we send a full 8k to the controller, and a failure during
> that write is not registered as a write.
On what do you base that assumption? I assume that we send a full
8K to the OS cache, and the file system writes disk sectors
according to its own algorithm. W
Kevin Grittner wrote:
> Bruce Momjian wrote:
>
> > If the write fails to the controller, the page is not flushed and
> > PG does not continue. If the write fails, the fsync never
> > happens, and hence PG stops.
>
> PG stops? This case at issue is when the OS crashes or the plug is
> pulled
Bruce Momjian wrote:
> If the write fails to the controller, the page is not flushed and
> PG does not continue. If the write fails, the fsync never
> happens, and hence PG stops.
PG stops? This case at issue is when the OS crashes or the plug is
pulled in the middle of writing a page. I do
Kevin Grittner wrote:
> Greg Smith wrote:
> > Kevin Grittner wrote:
>
> >> So you're confident that an 8kB write to the controller will not
> >> be done as a series of smaller atomic writes by the OS file
> >> system?
> >
> > Sure, that happens. But if the BBU has gotten an fsync call after
>
Greg Smith wrote:
> Kevin Grittner wrote:
>> So you're confident that an 8kB write to the controller will not
>> be done as a series of smaller atomic writes by the OS file
>> system?
>
> Sure, that happens. But if the BBU has gotten an fsync call after
> the 8K write, it shouldn't return suc
Kevin Grittner wrote:
Bruce Momjian wrote:
full_page_writes is designed to guard against a partial write to a
device. I don't think the raid cache can be partially written to
So you're confident that an 8kB write to the controller will not be
done as a series of smaller atomic wri
Bruce Momjian wrote:
> full_page_writes is designed to guard against a partial write to a
> device. I don't think the raid cache can be partially written to
So you're confident that an 8kB write to the controller will not be
done as a series of smaller atomic writes by the OS file system?
-
Bruce Momjian wrote:
With a BBU you can turn off full_page_writes, which should decrease the
WAL traffic.
However, I don't see this mentioned in our documentation. Should I add
it?
What I would like to do is beef up the documentation with some concrete
examples of how to figure out if you
Kevin Grittner wrote:
> Bruce Momjian wrote:
>
> > With a BBU you can turn off full_page_writes
>
> My understanding is that that is not without risk. What happens if
> the WAL is written, there is a commit, but the data page has not yet
> been written to the controller? Don't we still have
Bruce Momjian wrote:
> With a BBU you can turn off full_page_writes
My understanding is that that is not without risk. What happens if
the WAL is written, there is a commit, but the data page has not yet
been written to the controller? Don't we still have a torn page?
-Kevin
--
Sent via
Scott Marlowe wrote:
> On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake
> wrote:
> > On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote:
> >> Ben Chobot wrote:
> >> > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
> >> >
> >> > > I'm weighing options for a new server. In addition to Postgr
On 10/20/2010 09:45 PM, Scott Marlowe wrote:
On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake wrote:
On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote:
Ben Chobot wrote:
On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
I'm weighing options for a new server.
On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drake wrote:
> On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote:
>> Ben Chobot wrote:
>> > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
>> >
>> > > I'm weighing options for a new server. In addition to PostgreSQL, this
>> > > machine will hand
On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote:
> Ben Chobot wrote:
> > On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
> >
> > > I'm weighing options for a new server. In addition to PostgreSQL, this
> > > machine will handle some modest Samba and Rsync load.
> > >
> > > I will have en
Ben Chobot wrote:
> On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
>
> > I'm weighing options for a new server. In addition to PostgreSQL, this
> > machine will handle some modest Samba and Rsync load.
> >
> > I will have enough RAM so the virtually all disk-read activity will be
> > cached.
On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:
> I'm weighing options for a new server. In addition to PostgreSQL, this
> machine will handle some modest Samba and Rsync load.
>
> I will have enough RAM so the virtually all disk-read activity will be
> cached. The average PostgreSQL read act
60 matches
Mail list logo