On 09.07.18 15:49, Heikki Linnakangas wrote:
> The way
> forward is to test if we can get the same performance benefit from
> switching to CMPXCHG16B, and keep the WAL format unchanged.
I have implemented this. I didn't see any performance benefit using the
benchmark that Alexander Kuzmenkov ou
On 09/07/18 18:57, Simon Riggs wrote:
On 9 July 2018 at 14:49, Heikki Linnakangas wrote:
I'll mark this as "returned with feedback" in the commitfest. The way
forward is to test if we can get the same performance benefit from switching
to CMPXCHG16B, and keep the WAL format unchanged. If not, t
On 9 July 2018 at 14:49, Heikki Linnakangas wrote:
> On 03/04/18 19:20, Andres Freund wrote:
>>
>> On 2018-04-03 09:56:24 -0400, Tom Lane wrote:
>>>
>>> Heikki Linnakangas writes:
But let's go back to why we're considering this. The idea was to
optimize this block:
...
On
On 03/04/18 19:20, Andres Freund wrote:
On 2018-04-03 09:56:24 -0400, Tom Lane wrote:
Heikki Linnakangas writes:
But let's go back to why we're considering this. The idea was to
optimize this block:
...
One trick that we could do is to replace that with a 128-bit atomic
compare-and-swap instru
On 2018-04-03 09:56:24 -0400, Tom Lane wrote:
> Heikki Linnakangas writes:
> > But let's go back to why we're considering this. The idea was to
> > optimize this block:
> > ...
> > One trick that we could do is to replace that with a 128-bit atomic
> > compare-and-swap instruction. Modern 64-bit
Heikki Linnakangas writes:
> But let's go back to why we're considering this. The idea was to
> optimize this block:
> ...
> One trick that we could do is to replace that with a 128-bit atomic
> compare-and-swap instruction. Modern 64-bit Intel systems have that,
> it's called CMPXCHG16B. Don't
On Thu, Mar 29, 2018 at 5:18 PM, Tom Lane wrote:
> Tomas Vondra writes:
>> If each WAL record has xl_curr, then we know to which position the
>> record belongs (after verifying the checksum). And we do know the size
>> of each WAL record, so we should be able to deduce if two records are
>> immed
On 27/03/18 21:02, Tom Lane wrote:
Robert Haas writes:
On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane wrote:
This is ignoring the possibility of damaged data in between, ie
A ... B ... CHKPT ... C ... a few zeroed pages ... D ... CHKPT ... E ... F
It's hard for me to believe that this case ma
On 29 March 2018 at 23:16, Tomas Vondra wrote:
>
>
> On 03/29/2018 11:18 PM, Tom Lane wrote:
>> Tomas Vondra writes:
>>> If each WAL record has xl_curr, then we know to which position the
>>> record belongs (after verifying the checksum). And we do know the size
>>> of each WAL record, so we shou
On 03/29/2018 11:18 PM, Tom Lane wrote:
> Tomas Vondra writes:
>> If each WAL record has xl_curr, then we know to which position the
>> record belongs (after verifying the checksum). And we do know the size
>> of each WAL record, so we should be able to deduce if two records are
>> immediately a
Tomas Vondra writes:
> If each WAL record has xl_curr, then we know to which position the
> record belongs (after verifying the checksum). And we do know the size
> of each WAL record, so we should be able to deduce if two records are
> immediately after each other.
Per my point earlier, XLOG_SWI
On 29 March 2018 at 21:03, Tomas Vondra wrote:
>
> On 03/29/2018 08:02 PM, Robert Haas wrote:
>> On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
>> wrote:
>>> On 03/29/2018 06:42 PM, Tom Lane wrote:
The long and the short of it is that this is a very dangerous-looking
proposal, we are at
On 29 March 2018 at 19:02, Robert Haas wrote:
>
>>> Also, I will say it once more: this change DOES decrease robustness.
>>> It's like blockchain without the chain aspect, or git commits without
>>> a parent pointer. We are not only interested in whether individual
>>> WAL records are valid, but
On 03/29/2018 08:02 PM, Robert Haas wrote:
> On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
> wrote:
>> On 03/29/2018 06:42 PM, Tom Lane wrote:
>>> The long and the short of it is that this is a very dangerous-looking
>>> proposal, we are at the tail end of a development cycle, and there are
>>> ~
On 29 March 2018 at 18:13, Tomas Vondra wrote:
> On 03/29/2018 06:42 PM, Tom Lane wrote:
>> Simon Riggs writes:
>>> I know the approach is new and surprising but I thought about it a lot
>>> before proposing it and I couldn't see any holes; still can't. Please
>>> give this some thought so we can
On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
wrote:
> On 03/29/2018 06:42 PM, Tom Lane wrote:
>> The long and the short of it is that this is a very dangerous-looking
>> proposal, we are at the tail end of a development cycle, and there are
>> ~100 other patches remaining in the commitfest that a
On 03/29/2018 06:42 PM, Tom Lane wrote:
> Simon Riggs writes:
>> I know the approach is new and surprising but I thought about it a lot
>> before proposing it and I couldn't see any holes; still can't. Please
>> give this some thought so we can get comfortable with this idea and
>> increase perfor
Simon Riggs writes:
> I know the approach is new and surprising but I thought about it a lot
> before proposing it and I couldn't see any holes; still can't. Please
> give this some thought so we can get comfortable with this idea and
> increase performance as a result. Thanks.
The long and the s
On 29 March 2018 at 01:50, Robert Haas wrote:
> On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee
> wrote:
>> Yeah, we should not do that. The patch surely does not intend to replay any
>> more WAL than what we do today. The idea is to just use a different
>> mechanism to find the prior checkpoint
On Wed, Mar 28, 2018 at 08:50:21PM -0400, Robert Haas wrote:
> On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee
> wrote:
>> Yeah, we should not do that. The patch surely does not intend to replay any
>> more WAL than what we do today. The idea is to just use a different
>> mechanism to find the p
On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee
wrote:
> Yeah, we should not do that. The patch surely does not intend to replay any
> more WAL than what we do today. The idea is to just use a different
> mechanism to find the prior checkpoint. But we should surely find the latest
> prior checkpo
On Wed, Mar 28, 2018 at 02:13:19PM +1030, Andrew Dunstan wrote:
> On Wed, Mar 28, 2018 at 12:57 PM, Michael Paquier wrote:
>> Something that would address the issue would be to enforce a segment
>> switch after each checkpoint, but that's a high price to pay on mostly
>> idle systems with large WA
On Wed, Mar 28, 2018 at 7:36 AM, Michael Paquier
wrote:
> On Tue, Mar 27, 2018 at 02:02:10PM -0400, Tom Lane wrote:
>
> >
> > Well, the point of checkpoints is that WAL data before the last one
> > should no longer matter anymore, isn't it?
>
> I have to agree with Tom here. If you force pg_rewi
On Wed, Mar 28, 2018 at 12:57 PM, Michael Paquier wrote:
> On Tue, Mar 27, 2018 at 10:09:59PM -0400, Robert Haas wrote:
>>> I have to agree with Tom here. If you force pg_rewind to replay more
>>> WAL records from a checkpoint older than the checkpoint prior to where
>>> WAL has forked at promoti
On Tue, Mar 27, 2018 at 10:09:59PM -0400, Robert Haas wrote:
>> I have to agree with Tom here. If you force pg_rewind to replay more
>> WAL records from a checkpoint older than the checkpoint prior to where
>> WAL has forked at promotion then you have a risk of losing data.
>
> Oh! I see now. G
On Tue, Mar 27, 2018 at 10:06 PM, Michael Paquier wrote:
> On Tue, Mar 27, 2018 at 02:02:10PM -0400, Tom Lane wrote:
>> Robert Haas writes:
>>> On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane wrote:
This is ignoring the possibility of damaged data in between, ie
A ... B ... CHKPT ... C ...
On Tue, Mar 27, 2018 at 02:02:10PM -0400, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane wrote:
>>> This is ignoring the possibility of damaged data in between, ie
>>> A ... B ... CHKPT ... C ... a few zeroed pages ... D ... CHKPT ... E ... F
>
>> It's hard
On Wed, Mar 28, 2018 at 1:21 AM, Pavan Deolasee
wrote:
>
>
> TBH I still don't see why this does not provide the same guarantee that the
> current code provides, but given the concerns expressed by others, I am not
> gonna pursue beyond a point. But one last time :-)
>
> The current code uses xl_p
Robert Haas writes:
> On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane wrote:
>> This is ignoring the possibility of damaged data in between, ie
>> A ... B ... CHKPT ... C ... a few zeroed pages ... D ... CHKPT ... E ... F
> It's hard for me to believe that this case matters very much. If
> you're t
On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane wrote:
> Pavan Deolasee writes:
>> On Tue, Mar 27, 2018 at 7:28 PM, Tom Lane wrote:
>>> If you have to search backwards, this breaks it. Full stop.
>
>> We don't really need to fetch the previous record. We really need to find
>> the last checkpoint pr
Pavan Deolasee writes:
> On Tue, Mar 27, 2018 at 7:28 PM, Tom Lane wrote:
>> If you have to search backwards, this breaks it. Full stop.
> We don't really need to fetch the previous record. We really need to find
> the last checkpoint prior to a given LSN. That can be done by reading WAL
> segm
Simon Riggs writes:
> On 27 March 2018 at 14:58, Tom Lane wrote:
>> The point of the xl_prev links is, essentially, to allow verification
>> that we are reading a coherent series of WAL records, ie that record B
>> which appears to follow record A actually is supposed to follow it.
>> This throws
On Tue, Mar 27, 2018 at 7:28 PM, Tom Lane wrote:
>
>
> I had not noticed that in the kerfuffle over the more extreme version,
> but I think this is a different route to breaking the same guarantees.
> The point of the xl_prev links is, essentially, to allow verification
> that we are reading a co
On 27 March 2018 at 14:58, Tom Lane wrote:
> The point of the xl_prev links is, essentially, to allow verification
> that we are reading a coherent series of WAL records, ie that record B
> which appears to follow record A actually is supposed to follow it.
> This throws away that cross-check jus
Robert Haas writes:
> Taking a look at this version, I think the key thing we have to decide
> is whether we're comfortable with this:
> --- a/src/include/access/xlogrecord.h
> +++ b/src/include/access/xlogrecord.h
> @@ -42,7 +42,7 @@ typedef struct XLogRecord
> {
> uint32 xl_tot_len;
On Tue, Mar 27, 2018 at 6:35 AM, Andrew Dunstan
wrote:
> Leaving aside the arguments about process, the patch is pretty small
> and fairly straightforward. Given the claimed performance gains that's
> a nice bang for the buck.
>
> I haven't seen any obvious holes, but this is surely a case for as
On Sat, Mar 24, 2018 at 11:01 PM, Pavan Deolasee
wrote:
>
>
> TBH this is not a heavily redesigned version. There were two parts to the
> original patch:
>
> 1. Replacing 8-byte xl_prev with 2-byte xl_walid in order to reduce the size
> of the WAL record header
> 2. Changing XLogCtlInsert.CurrByte
On Sat, Mar 24, 2018 at 12:37:21PM +, Simon Riggs wrote:
> If you have anything further to say, make it a formal complaint
> offlist so we can discuss engineering here.
Commit fest operations and patch reviews concern the community. There
is always something to learn for other people followin
On 24 March 2018 at 12:19, Robert Haas wrote:
> On Sat, Mar 24, 2018 at 8:11 AM, Simon Riggs wrote:
>> On 24 March 2018 at 11:58, Robert Haas wrote:
>>> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote:
I suggest we focus on the engineering. I've not discussed this patch
with Pavan
On Sat, Mar 24, 2018 at 5:28 PM, Robert Haas wrote:
>
> This is a patch about which multiple people have expressed concerns.
> You're trying to jam a heavily redesigned version in at the last
> minute without adequate review. Please don't do that.
>
>
TBH this is not a heavily redesigned version
On Sat, Mar 24, 2018 at 8:11 AM, Simon Riggs wrote:
> On 24 March 2018 at 11:58, Robert Haas wrote:
>> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote:
>>> I suggest we focus on the engineering. I've not discussed this patch
>>> with Pavan offline.
>>
>> Well then proposing to commit moments
On 24 March 2018 at 11:58, Robert Haas wrote:
> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote:
>> I suggest we focus on the engineering. I've not discussed this patch
>> with Pavan offline.
>
> Well then proposing to commit moments after it's been posted is
> ridiculous. That's exactly the
On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs wrote:
> I suggest we focus on the engineering. I've not discussed this patch
> with Pavan offline.
Well then proposing to commit moments after it's been posted is
ridiculous. That's exactly the opposite of "focusing on the
engineering".
This is a pa
I suggest we focus on the engineering. I've not discussed this patch
with Pavan offline.
On 23 March 2018 at 23:32, Michael Paquier wrote:
> On Fri, Mar 23, 2018 at 11:06:48AM +, Simon Riggs wrote:
>> Your assumption that I would commit a new patch that was 29 mins old
>> is frankly pretty ri
On Fri, Mar 23, 2018 at 11:06:48AM +, Simon Riggs wrote:
> Your assumption that I would commit a new patch that was 29 mins old
> is frankly pretty ridiculous, so yes, lets keep calm.
When a committer says that a patch is "ready for commit" and that he
calls for "last objections", I am underst
On 2018-03-23 11:06:48 +, Simon Riggs wrote:
> Your assumption that I would commit a new patch that was 29 mins old
> is frankly pretty ridiculous, so yes, lets keep calm.
Uh.
On 23 March 2018 at 09:22, Michael Paquier wrote:
> On Fri, Mar 23, 2018 at 09:04:55AM +, Simon Riggs wrote:
>> So it shows clear benefit for both bulk actions and OLTP, with no
>> regressions.
>>
>> No objection exists to the approach used in the patch, so I'm now
>> ready to commit this.
>>
On Fri, Mar 23, 2018 at 09:04:55AM +, Simon Riggs wrote:
> So it shows clear benefit for both bulk actions and OLTP, with no regressions.
>
> No objection exists to the approach used in the patch, so I'm now
> ready to commit this.
>
> Last call for objections?
Please hold on. It is Friday
On 23 March 2018 at 08:35, Pavan Deolasee wrote:
>
>
> On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut
> wrote:
>>
>> On 2/1/18 19:21, Simon Riggs wrote:
>> > If we really can't persuade you of that, it doesn't sink the patch. We
>> > can have the WAL pointer itself - it wouldn't save space but
On 2 February 2018 at 02:17, Michael Paquier wrote:
> On Fri, Feb 02, 2018 at 12:21:49AM +, Simon Riggs wrote:
>> Yes, it would be about 99% of the time.
>
> When it comes to recovery, I don't think that 99% is a guarantee
> sufficient. (Wondering about the maths behind such a number as well.
On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
> On 2/1/18 19:21, Simon Riggs wrote:
> > If we really can't persuade you of that, it doesn't sink the patch. We
> > can have the WAL pointer itself - it wouldn't save space but it would
> > at least allevi
On 2/1/18 19:21, Simon Riggs wrote:
> If we really can't persuade you of that, it doesn't sink the patch. We
> can have the WAL pointer itself - it wouldn't save space but it would
> at least alleviate the spinlock.
Do you want to send in an alternative patch that preserves the WAL
pointer and onl
On Mon, Feb 12, 2018 at 04:33:51PM +0530, Pavan Deolasee wrote:
> I'm not sure if Michael has spotted a real problem or was that just a
> concern. He himself later rightly pointed out that when a WAL file is
> switched, the old file is filled with zeros. So I don't see a problem
> there. May be I a
On Fri, Feb 2, 2018 at 9:07 PM, Robert Haas wrote:
> On Thu, Feb 1, 2018 at 7:21 PM, Simon Riggs wrote:
> > Yes, it would be about 99% of the time.
> >
> > But you have it backwards - we are not assuming that case. That is the
> > only case that has risk - the one where an old WAL record starts
On Thu, Feb 1, 2018 at 7:21 PM, Simon Riggs wrote:
> Yes, it would be about 99% of the time.
>
> But you have it backwards - we are not assuming that case. That is the
> only case that has risk - the one where an old WAL record starts at
> exactly the place the latest one stops. Otherwise the rest
On Fri, Feb 02, 2018 at 12:21:49AM +, Simon Riggs wrote:
> Yes, it would be about 99% of the time.
When it comes to recovery, I don't think that 99% is a guarantee
sufficient. (Wondering about the maths behind such a number as well.)
> But you have it backwards - we are not assuming that cas
On 1 February 2018 at 18:55, Robert Haas wrote:
> On Thu, Feb 1, 2018 at 1:00 AM, Pavan Deolasee
> wrote:
>> IMHO we're missing a point here. When xl_prev is changed to a 2-byte value
>> (as the patch suggests), the risk only comes when an old WAL file is
>> recycled for some future WAL file and
On Thu, Feb 1, 2018 at 1:00 AM, Pavan Deolasee wrote:
> IMHO we're missing a point here. When xl_prev is changed to a 2-byte value
> (as the patch suggests), the risk only comes when an old WAL file is
> recycled for some future WAL file and the old and the future WAL file's
> segment numbers shar
On Thu, Feb 1, 2018 at 11:05 AM, Michael Paquier
wrote:
> On Sat, Jan 13, 2018 at 03:40:01PM +, Simon Riggs wrote:
> > The new two byte value is protected by CRC. The 2 byte value repeats
> > every 32768 WAL files. Any bit error in that value that made it appear
> > to be a current value woul
On Sat, Jan 13, 2018 at 03:40:01PM +, Simon Riggs wrote:
> The new two byte value is protected by CRC. The 2 byte value repeats
> every 32768 WAL files. Any bit error in that value that made it appear
> to be a current value would need to have a rare set of circumstances.
If you use the two lo
On 12 January 2018 at 15:45, Tom Lane wrote:
>> I have some reservations about whether this makes the mechanism less
>> reliable.
>
> Yeah, it scares me too. The xl_prev field is our only way of detecting
> that we're looking at old WAL data when we cross a sector boundary.
> I have no faith tha
On Fri, Jan 12, 2018 at 10:49 PM, Tom Lane wrote:
> Claudio Freire writes:
> > On Sat, Dec 30, 2017 at 7:32 AM, Simon Riggs
> wrote:
> >> So we can't completely remove xl_prev field, without giving up some
> >> functionality.
>
> > Or, you can use the lower 16-bits of the previous record's CRC
Claudio Freire writes:
> On Sat, Dec 30, 2017 at 7:32 AM, Simon Riggs wrote:
>> So we can't completely remove xl_prev field, without giving up some
>> functionality.
> Or, you can use the lower 16-bits of the previous record's CRC
Hmm ... that is an interesting idea, but I'm not sure it helps m
On Fri, Jan 12, 2018 at 8:43 PM, Claudio Freire
wrote:
>
>
> On Sat, Dec 30, 2017 at 7:32 AM, Simon Riggs
> wrote:
>
>> So we can't completely remove xl_prev field, without giving up some
>> functionality. But we don't really need to store the 8-byte previous
>> WAL pointer in order to detect to
On Sat, Dec 30, 2017 at 7:32 AM, Simon Riggs wrote:
> So we can't completely remove xl_prev field, without giving up some
> functionality. But we don't really need to store the 8-byte previous
> WAL pointer in order to detect torn pages. Something else which can
> tell us that the WAL record does
On 2018-01-12 17:43:00 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2018-01-12 17:24:54 -0500, Tom Lane wrote:
> >> Andres Freund writes:
> >>> Right. I wonder if it be reasonable to move that to a page's header
> >>> instead of individual records? To avoid torn page issues we'd have to
Andres Freund writes:
> On 2018-01-12 17:24:54 -0500, Tom Lane wrote:
>> Andres Freund writes:
>>> Right. I wonder if it be reasonable to move that to a page's header
>>> instead of individual records? To avoid torn page issues we'd have to
>>> reduce the page size to a sector size, but I'm not
On 2018-01-12 17:24:54 -0500, Tom Lane wrote:
> Andres Freund writes:
> > On 2018-01-12 10:45:54 -0500, Tom Lane wrote:
> >> The xl_prev field is our only way of detecting that we're looking at
> >> old WAL data when we cross a sector boundary.
>
> > Right. I wonder if it be reasonable to move th
Andres Freund writes:
> On 2018-01-12 10:45:54 -0500, Tom Lane wrote:
>> The xl_prev field is our only way of detecting that we're looking at
>> old WAL data when we cross a sector boundary.
> Right. I wonder if it be reasonable to move that to a page's header
> instead of individual records? To
On 2018-01-12 10:45:54 -0500, Tom Lane wrote:
> Robert Haas writes:
> > On Sat, Dec 30, 2017 at 5:32 AM, Simon Riggs wrote:
> > I have some reservations about whether this makes the mechanism less
> > reliable.
>
> Yeah, it scares me too.
Same here.
> The xl_prev field is our only way of dete
I wrote:
> Robert Haas writes:
>> On Sat, Dec 30, 2017 at 5:32 AM, Simon Riggs wrote:
>>> For this to work, we must ensure that WAL files are either recycled in
>>> such a way that the "xl_walid" of the previous (to be recycled) WAL
>>> differs from the new WAL or we zero-out the new WAL file. Se
Robert Haas writes:
> On Sat, Dec 30, 2017 at 5:32 AM, Simon Riggs wrote:
>> So we can't completely remove xl_prev field, without giving up some
>> functionality. But we don't really need to store the 8-byte previous
>> WAL pointer in order to detect torn pages. Something else which can
>> tell u
On Sat, Dec 30, 2017 at 5:32 AM, Simon Riggs wrote:
> So we can't completely remove xl_prev field, without giving up some
> functionality. But we don't really need to store the 8-byte previous
> WAL pointer in order to detect torn pages. Something else which can
> tell us that the WAL record does
Hi Simon, Pavan,
I tried benchmarking your patch. I ran pgbench on a 72-core machine with
a simple append-only script:
BEGIN;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid,
:bid, :aid, :delta, CURRENT_TIMESTAMP);
the insert repeated 20 times total ---
END;
Comments in ReserveXLogInsertLocation() says
"* This is the performance critical part of XLogInsert that must be serialized
* across backends. The rest can happen mostly in parallel. Try to keep this
* section as short as possible, insertpos_lck can be heavily contended on a
* busy system."
75 matches
Mail list logo