Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Thomas Munro
On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera  wrote:
> Thomas Munro wrote:
>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera  
>> wrote:
>> > My guess is that the file existed, and perhaps had one or more pages,
>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
>> > back.  read() returns 0 in this case but doesn't set errno.
>> >
>> > I didn't find a way to set things so that the file exists but is of
>> > shorter contents than oldestMulti by the time the checkpoint record is
>> > replayed.
>>
>> I'm just starting to learn about the recovery machinery, so forgive me
>> if I'm missing something basic here, but I just don't get this.  As I
>> understand it, offsets/0046 should either have been copied with that
>> page present in it if it existed before the backup started (apparently
>> not in this case), or extended to contain it by WAL records that come
>> after the backup label but before the checkpoint record that
>> references it (also apparently not in this case).
>
> Exactly --- that's the spot at which I am, also.  I have had this
> spinning in my head for three days now, and tried every single variation
> that I could think of, but like you I was unable to reproduce the issue.
> However, our customer took a second base backup and it failed in exactly
> the same way, module some changes to the counters (the file that
> didn't exist was 004B rather than 0046).  I'm still at a loss at what
> the failure mode is.  We must be missing some crucial detail ...

I have finally reproduced that error!  See attached repro shell script.

The conditions are:

1.  next multixact == oldest multixact (no active multixacts, pointing
past the end)
2.  next multixact would be the first item on a new page (multixact % 2048 == 0)
3.  the page must not be the first in a segment (or we'd get the
read-zeroes case)

That gives you odds of 1/2048 * 31/32 * (probability of a wraparound
vacuum followed by no multixact creations right before your backup
checkpoint).  That seems like reasonably low odds... if it happened
twice in a row, maybe I'm missing something here and there is some
other way to get this...

I realise now that this is actually a symptom of a problem spotted by
Noah recently:

http://www.postgresql.org/message-id/20150601045534.gb23...@tornado.leadboat.com

He noticed the problem for segment boundaries, when not in recovery.
In recovery, segment boundaries don't raise an error (the read-zeroes
case applies), but page boundaries do.  The fix is probably to do
nothing if they are the same, as we do elsewhere, like in the attached
patch.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-truncate-none.patch
Description: Binary data


copy-page-boundary.sh
Description: Bourne shell script

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Planner cost adjustments

2015-06-03 Thread Daniel Begin
Thank Bill,

About disks performance, all drives are identical and connected using USB3
connections and yes, I can tweak values and restart Postgres without any
hardship!-)
About seq_page_cost and random_page_cost, I am about to test different lower
values as you and Thomas propose.
Raising the statistics target is a good idea. Since most of the data have a
power law distribution it might affect the statistics.

I will do as suggested and get back to the community for further comments. I
wished to follow the procedure proposed by PT, just in case I eventually had
to on step 4 (contact PostgreSQL developers so they can improve the
planner).

And I am still open to other proposal 
Daniel

-Original Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Moran
Sent: June-02-15 23:31
To: Daniel Begin
Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
Subject: Re: [GENERAL] Planner cost adjustments

On Tue, 2 Jun 2015 14:01:35 -0400
Daniel Begin  wrote:

> Here is a follow-up on the step-by-step procedure proposed by PT
> 
> #1 - setup postgresql planner's cost estimate settings for my hardware. 
> --
> --
> --
> Current parameters values described in section 18.7.2 haven't been 
> changed except for the effective_cache_size seq_page_cost = 1 
> random_page_cost = 4 cpu_tuple_cost = 0.01 cpu_index_tuple_cost = 
> 0.005 cpu_operator_cost = 0.0025 effective_cache_size = 10GB
> 
> I did a bunch of tests on frequently used queries to see how well they 
> perform - using SET enable_seqscan = ON/OFF.
> As described earlier in this tread, the planner use Seq Scan on tables 
> even if using an Index Scan is in this case 5 times faster!
> Here are the logs of EXPLAIN ANALYSE on a query...

In an earlier message you mention that the drives are external to the
computer. I don't remember details, but external drives can be quite dicey
as far as performance goes, depending on the specific technlogy (USB vs.
external SATA vs. NAS, for example) as well as some pretty wild variances
between different brands of the same technology.
See:
http://www.databasesoup.com/2012/05/random-page-cost-revisited.html

As a result, I'm suspicious that the default values you're using for
random_page_cost and seq_page_cost are throwing things off becuase your
disks aren't performing like internally connected disks.

Correct me if I'm wrong on any of the assumptions I'm making here, but I got
the impression that you can tweak values and restart Postgres without any
hardship. If that's the case, I'm guessing that raising seq_page_cost
(possible to 2) will cause Postgres to make better decisions about what are
good plans. My suggestion is to try some different values for those two
settings, doing several tests after each change, and see if you can find a
set of values that starts getting you good plans. It appears that the
planner thinks that it can get better performance by reading from the disk
in sequence than by picking random pages, and that makes me think that the
difference between seq_page_cost and random_page_cost is bigger than the
actual behavior of the drives.

More comments below.

> osmdump=# SET enable_seqscan = ON;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed 
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM 
> sample.users);
> --
> --
> -
>  Hash Semi Join  (cost=21.50..819505.27 rows=726722 width=24) (actual
> time=1574.914..7444.938 rows=338568 loops=1)
>Hash Cond: (changesets.user_id = users.id)
>->  Seq Scan on changesets  (cost=0.00..745407.22 rows=25139722 
> width=24) (actual time=0.002..4724.578 rows=25133929 loops=1)
>->  Hash  (cost=14.00..14.00 rows=600 width=8) (actual 
> time=0.165..0.165
> rows=600 loops=1)
>  Buckets: 1024  Batches: 1  Memory Usage: 24kB
>  ->  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) 
> (actual
> time=0.003..0.073 rows=600 loops=1)
>  Total runtime: 7658.715 ms
> (7 rows)
> 
> osmdump=# SET enable_seqscan = OFF;
> osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed 
> FROM changesets WHERE changesets.user_id IN(SELECT id FROM 
> sample.users);
> --
> --
> 
>  Nested Loop  (cost=115.94..10001072613.45 rows=726722 
> width=24) (actual time=0.268..1490.515 rows=338568 loops=1)
>->  HashAggregate  (cost=115.50..121.50 rows=600 
> width=8) (actual time=0.207..0.531 rows=600 loops=1)
>  ->  Seq Scan on users  (cost=100.00..114.00
> rows=600 width=8) (actual time=0.003..0.037 rows=600 loops=1)
>->  Index Scan using changesets_useridndx on c

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas
On Wed, Jun 3, 2015 at 4:48 AM, Thomas Munro
 wrote:
> On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera  
> wrote:
>> Thomas Munro wrote:
>>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera  
>>> wrote:
>>> > My guess is that the file existed, and perhaps had one or more pages,
>>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
>>> > back.  read() returns 0 in this case but doesn't set errno.
>>> >
>>> > I didn't find a way to set things so that the file exists but is of
>>> > shorter contents than oldestMulti by the time the checkpoint record is
>>> > replayed.
>>>
>>> I'm just starting to learn about the recovery machinery, so forgive me
>>> if I'm missing something basic here, but I just don't get this.  As I
>>> understand it, offsets/0046 should either have been copied with that
>>> page present in it if it existed before the backup started (apparently
>>> not in this case), or extended to contain it by WAL records that come
>>> after the backup label but before the checkpoint record that
>>> references it (also apparently not in this case).
>>
>> Exactly --- that's the spot at which I am, also.  I have had this
>> spinning in my head for three days now, and tried every single variation
>> that I could think of, but like you I was unable to reproduce the issue.
>> However, our customer took a second base backup and it failed in exactly
>> the same way, module some changes to the counters (the file that
>> didn't exist was 004B rather than 0046).  I'm still at a loss at what
>> the failure mode is.  We must be missing some crucial detail ...
>
> I have finally reproduced that error!  See attached repro shell script.
>
> The conditions are:
>
> 1.  next multixact == oldest multixact (no active multixacts, pointing
> past the end)
> 2.  next multixact would be the first item on a new page (multixact % 2048 == 
> 0)
> 3.  the page must not be the first in a segment (or we'd get the
> read-zeroes case)
>
> That gives you odds of 1/2048 * 31/32 * (probability of a wraparound
> vacuum followed by no multixact creations right before your backup
> checkpoint).  That seems like reasonably low odds... if it happened
> twice in a row, maybe I'm missing something here and there is some
> other way to get this...
>
> I realise now that this is actually a symptom of a problem spotted by
> Noah recently:
>
> http://www.postgresql.org/message-id/20150601045534.gb23...@tornado.leadboat.com
>
> He noticed the problem for segment boundaries, when not in recovery.
> In recovery, segment boundaries don't raise an error (the read-zeroes
> case applies), but page boundaries do.  The fix is probably to do
> nothing if they are the same, as we do elsewhere, like in the attached
> patch.

Actually, we still need to call DetermineSafeOldestOffset in that
case.  Otherwise, if someone goes from lots of multixacts to none, the
stop point won't advance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas
On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund  wrote:
>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>> > that's actually ok; because the value at the beginning of the segment
>> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have
>> > filled the page with zeros.
>> >
>> > I think that should be harmless, the worst that can happen is that
>> > oldestOffset errorneously is 0, which should be correct, even though
>> > possibly overly conservative, in these cases.
>>
>> Uh oh.  That seems like a real bad problem for this approach.  What
>> keeps that from being the opposite of too conservative?  There's no
>> "safe" value in a circular numbering space.
>
> I think it *might* (I'm really jetlagged) be fine because that'll only
> happen after a upgrade from < 9.3. And in that case we initialize
> nextOffset to 0. That ought to safe us?

That's pretty much betting the farm on the bugs we know about today
being the only ones there are.  That seems imprudent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Problem when temp_tablespace get full?

2015-06-03 Thread Daniel Begin
Context.

A query almost filled a SSD drive (temp_tablespace) and my PC got
unresponsive;

I did SELECT pg_cancel_backend(6600); 6600 was the pid of the process and
got a successful answer (t)

However, just after that, the disk simply turned off and the system crashed.

I was able to restart the PC - The SSD drive and the data are still there
but the drive keeps going off without warning.

 

I know I will have to change my drive!-)

But ma question is: What append the temp_tablespace drive get full?

 

Daniel



Re: [GENERAL] Problem when temp_tablespace get full?

2015-06-03 Thread David G. Johnston
On Wed, Jun 3, 2015 at 11:14 AM, Daniel Begin  wrote:

>
> A query
>

​OK...​

But ma question is: What append the temp_tablespace drive get full?
>

​The query, probably...

"There is also a temp_tablespaces parameter, which determines the placement
of temporary tables and indexes, as well as temporary files that are used
for purposes such as sorting large data sets."

http://www.postgresql.org/docs/9.4/static/manage-ag-tablespaces.html

David J.
​


Re: [GENERAL] Database designpattern - product feature

2015-06-03 Thread Gmail


Sent from my iPad

> On Jun 3, 2015, at 7:50 AM, Adrian Stern  wrote:
> 
> Hi William, thanks for joining the conversation.
> 
> 1) We do hope for constraints since a connection to an ERP system is possible 
> in the future. We want to plan ahead. 
> 
> 2) As for the subclass approach: I would need about 30 subclasses and it will 
> get really hard to add new products since a change in the database will be 
> necessary each time. That's why we want a more generic approach.
> 
> Maybe I don't understand you right, because of the language barrier. Can you 
> provide me a link to a subclassing example?
> -> 
> https://docs.djangoproject.com/en/1.8/topics/db/models/#multi-table-inheritance
>  ?
> 
> ORM is a given in my case. This is not a high performance application. 
> 
> Freundliche Grüsse
> 
I don't understand 'ORM is a given' in conjunction with a reluctance to db 
evolution? These seem oxymoronic.


>>> 


Re: [GENERAL] Problem when temp_tablespace get full?

2015-06-03 Thread Daniel Begin
Sorry, my question might not have been clear…

 

I set myself the temp_tablespace to that location but did not expect the drive 
could get full;

Multiple factors may have caused the drive to turn off (not necessarily 
postgresql); 

So, if that temp_tablespace gets full, how postgresql will react/manage the 
situation?

 

Daniel

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G. Johnston
Sent: June-03-15 11:54
To: Daniel Begin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Problem when temp_tablespace get full?

 

On Wed, Jun 3, 2015 at 11:14 AM, Daniel Begin  wrote:

 

A query

 

​OK...​

 

But ma question is: What append the temp_tablespace drive get full?

 

​The query, probably...

 

"There is also a temp_tablespaces parameter, which determines the placement of 
temporary tables and indexes, as well as temporary files that are used for 
purposes such as sorting large data sets."

 

http://www.postgresql.org/docs/9.4/static/manage-ag-tablespaces.html

 

David J.

​



Re: [GENERAL] Problem when temp_tablespace get full?

2015-06-03 Thread Jan Lentfer

Am 2015-06-03 19:00, schrieb Daniel Begin:

Sorry, my question might not have been clear…

I set myself the temp_tablespace to that location but did not expect
the drive could get full;

Multiple factors may have caused the drive to turn off (not
necessarily postgresql);

So, if that temp_tablespace gets full, how postgresql will
react/manage the situation?


Queries running out of space in pgsql_tmp will just cancel (and 
rollback).


Jan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Alvaro Herrera
Thomas Munro wrote:

> I have finally reproduced that error!  See attached repro shell script.
> 
> The conditions are:
> 
> 1.  next multixact == oldest multixact (no active multixacts, pointing
> past the end)
> 2.  next multixact would be the first item on a new page (multixact % 2048 == 
> 0)
> 3.  the page must not be the first in a segment (or we'd get the
> read-zeroes case)
> 
> That gives you odds of 1/2048 * 31/32 * (probability of a wraparound
> vacuum followed by no multixact creations right before your backup
> checkpoint).  That seems like reasonably low odds... if it happened
> twice in a row, maybe I'm missing something here and there is some
> other way to get this...

Thanks, this is pretty cool (as far as these things go), but it's not
the scenario I see, in which the complained-about segment is very old by
the time it's truncated away by a checkpoint after freeze.  Segment
requested by checkpoint.oldestMulti is 0046 (offset 140k something --
just to clarify it's not at segment boundary), and the base backup
contains segments from 004B to 00D5.  My problem scenario has
oldestMulti close to 5 million and nextMulti close to 14 million.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Andres Freund
On 2015-06-03 00:42:55 -0300, Alvaro Herrera wrote:
> Thomas Munro wrote:
> > On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera  
> > wrote:
> > > My guess is that the file existed, and perhaps had one or more pages,
> > > but the wanted page doesn't exist, so we tried to read but got 0 bytes
> > > back.  read() returns 0 in this case but doesn't set errno.
> > >
> > > I didn't find a way to set things so that the file exists but is of
> > > shorter contents than oldestMulti by the time the checkpoint record is
> > > replayed.
> > 
> > I'm just starting to learn about the recovery machinery, so forgive me
> > if I'm missing something basic here, but I just don't get this.  As I
> > understand it, offsets/0046 should either have been copied with that
> > page present in it if it existed before the backup started (apparently
> > not in this case), or extended to contain it by WAL records that come
> > after the backup label but before the checkpoint record that
> > references it (also apparently not in this case).

That's not necessarily the case though, given how the code currently
works. In a bunch of places the SLRUs are accessed *before* having been
made consistent by WAL replay. Especially if several checkpoints/vacuums
happened during the base backup the assumed state (i.e. the mxacts
checkpoints refer to) of the data directory soon after the initial
start, and the state of pg_multixact/ won't necessarily match at all.

> Exactly --- that's the spot at which I am, also.  I have had this
> spinning in my head for three days now, and tried every single variation
> that I could think of, but like you I was unable to reproduce the issue.
> However, our customer took a second base backup and it failed in exactly
> the same way, module some changes to the counters (the file that
> didn't exist was 004B rather than 0046).  I'm still at a loss at what
> the failure mode is.  We must be missing some crucial detail ...

I might have missed it in this already long thread. Could you share a
bunch of details about hte case? It'd be very interesting to see the
contents of the backup label (to see where start/end are), the contents
of the initial checkpoint (to see which mxacts we assume to exist at
start) and what the initial contents of pg_multixact are (to match up).

Greetings,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-06-03 00:42:55 -0300, Alvaro Herrera wrote:
> > Thomas Munro wrote:
> > > On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera  
> > > wrote:
> > > > My guess is that the file existed, and perhaps had one or more pages,
> > > > but the wanted page doesn't exist, so we tried to read but got 0 bytes
> > > > back.  read() returns 0 in this case but doesn't set errno.
> > > >
> > > > I didn't find a way to set things so that the file exists but is of
> > > > shorter contents than oldestMulti by the time the checkpoint record is
> > > > replayed.
> > > 
> > > I'm just starting to learn about the recovery machinery, so forgive me
> > > if I'm missing something basic here, but I just don't get this.  As I
> > > understand it, offsets/0046 should either have been copied with that
> > > page present in it if it existed before the backup started (apparently
> > > not in this case), or extended to contain it by WAL records that come
> > > after the backup label but before the checkpoint record that
> > > references it (also apparently not in this case).
> 
> That's not necessarily the case though, given how the code currently
> works. In a bunch of places the SLRUs are accessed *before* having been
> made consistent by WAL replay. Especially if several checkpoints/vacuums
> happened during the base backup the assumed state (i.e. the mxacts
> checkpoints refer to) of the data directory soon after the initial
> start, and the state of pg_multixact/ won't necessarily match at all.

Well, the log extract is quite simple; it says that as soon as the
standby starts replay WAL, it fetches exactly one WAL segment, which
contains exactly one online checkpoint record; this record contains
oldestMulti=4624559, which belongs in offset file 0046.  But the
basebackup only contains files starting from 004B onwards.  The base
backup takes a long time, so my guess of what happened is that the
backup label points to the start of the WAL stream (obviously), then the
data files are copied to the standby; during this long process, a
multixact truncation happens.  So by the time the base backup reaches
the pg_multixact directory, the 0046 file has been removed.

2015-05-27 16:15:17 UTC [4782]: [3-1] user=,db= LOG: entering standby mode
2015-05-27 16:15:18 UTC [4782]: [4-1] user=,db= LOG: restored log file 
"000173DD00AD" from archive
2015-05-27 16:15:18 UTC [4782]: [5-1] user=,db= FATAL: could not access status 
of transaction 4624559
2015-05-27 16:15:18 UTC [4782]: [6-1] user=,db= DETAIL: Could not read from 
file "pg_multixact/offsets/0046" at offset 147456: Success.
2015-05-27 16:15:18 UTC [4778]: [4-1] user=,db= LOG: startup process (PID 4782) 
exited with exit code 1
2015-05-27 16:15:18 UTC [4778]: [5-1] user=,db= LOG: aborting startup due to 
startup process failure

One idea I had was: what if the oldestMulti pointed to another multi
earlier in the same 0046 file, so that it is read-as-zeroes (and the
file is created), and then a subsequent multixact truncate tries to read
a later page in the file.  In SlruPhysicalReadPage() this would give a
change for open() to not fail, and then read() can fail as above.
However, we would have an earlier LOG message about "reading as zeroes".

Really, the whole question of how this code goes past the open() failure
in SlruPhysicalReadPage baffles me.  I don't see any possible way for
the file to be created ...

> > Exactly --- that's the spot at which I am, also.  I have had this
> > spinning in my head for three days now, and tried every single variation
> > that I could think of, but like you I was unable to reproduce the issue.
> > However, our customer took a second base backup and it failed in exactly
> > the same way, module some changes to the counters (the file that
> > didn't exist was 004B rather than 0046).  I'm still at a loss at what
> > the failure mode is.  We must be missing some crucial detail ...
> 
> I might have missed it in this already long thread. Could you share a
> bunch of details about hte case? It'd be very interesting to see the
> contents of the backup label (to see where start/end are), the contents
> of the initial checkpoint (to see which mxacts we assume to exist at
> start) and what the initial contents of pg_multixact are (to match up).

pg_xlogdump says about the checkpoint record:

rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
73DD/AD834470, prev 73DD/AD8343B8, bkp: , desc: checkpoint: redo 
73DD/4003F648; tli 1; prev tli 1; fpw true; xid 1/3530575042; oid 81303440; 
multi 14003656; offset 66311341; oldest xid 2530472730 in DB 17081243; oldest 
multi 4624559 in DB 17081243; oldest running xid 3530551992; online

I don't have the backup label.  But a listing of the files in
pg_multixact/offsets in the master has this:

./offsets:
total 35816
drwx-- 2 postgres postgres 4096 May 27 16:25 .
drwx-- 4 postgres postgres 4096 Jan 3 03:32 ..
-rw--- 1 postgres postgres 262144 Apr 20 13:07 004

Re: [GENERAL] Database designpattern - product feature

2015-06-03 Thread Roxanne Reid-Bennett

On 6/3/2015 2:50 AM, Adrian Stern wrote:

Hi William, thanks for joining the conversation.

1) We do hope for constraints since a connection to an ERP system is 
possible in the future. We want to plan ahead.


2) As for the subclass approach: I would need about 30 subclasses and 
it will get really hard to add new products since a change in the 
database will be necessary each time. That's why we want a more 
generic approach.


At the root of your description it appears to me that you are choosing 
essentially an EAV design pattern.  Constraints and case specific 
validation become difficult (or impossible) to consistently enforce.  I 
have personal experience in cleaning up a system that used this *exact* 
pattern (product -> product attribute). Different developers approached 
updating information in different ways.  They also chose to store the 
attributes as text to avoid the text vs number issue.  However, they 
failed to force any validation and any hint of attempting to prevent 
duplicates.  That destroyed hope of supporting an underlying business 
need to search for specific values during data analysis.  (think of how 
many different ways you can misspell "poughkeepsie" - but hey... it's 
"just a name")


I inherited the results of poor development controls and poor control 
over the weakness of the design - e.g. validation... and the cleanup has 
been long, and painful.


I think you should evaluate your unease with having to update the 
database on release (potentially many times) carefully for what it is 
and why you have it.  [I'm not saying it is invalid - just know why you 
have it]  Because no matter how well you design your system - databases 
evolve.  Manage that.


Anybody can muck up part of a project and cause garbage - but speaking 
from experience... this design pattern really encourages it.  If you 
choose to use it - then you HAVE to control where and when 
inserts/updates are done and be very careful with specifying how 
validation is to be done to the entire development team (and potentially 
the users)... and then review the code (and/or data) regularly.


wide tables/sparse fill on the columns ... I haven't tried to calculate 
the overhead on this... but disk space is generally considered to be 
cheap.  [that doesn't mean your case wouldn't be criminally wasteful]  
Choosing 1 wide table or 30 sub-tables to deal with detail data. I don't 
know how that directly effects Postgres' performance if you choose 
30 tables ... meh... at least you'll know exactly where your data is - 
and exactly what is allowed for each and every variant.  Remember to 
enforce a consistent guessable naming convention.


All that said - there are reasons to use this pattern to gain the power 
of the dynamically expandable configuration of allowed values.  I just 
see the use cases where the gain (flexibility) is worth the pain 
(validation control) to be few and far between.


just my $0.01

Roxanne




Maybe I don't understand you right, because of the language barrier. 
Can you provide me a link to a subclassing example?
-> 
https://docs.djangoproject.com/en/1.8/topics/db/models/#multi-table-inheritance 
?


ORM is a given in my case. This is not a high performance application.

Freundliche Grüsse

Adrian Stern
unchained - web solutions

adrian.st...@unchained.ch 
+41 79 292 83 47

On Tue, Jun 2, 2015 at 5:35 PM, William Dunn > wrote:


Hello Adrian,

May I ask why you need a non-standard model? By standard models I
mean the following:

1) When you don't need to have subclass specific database
constraints: All subclasses in the same table, subclasses that do
not have an attribute have that column null. This has the best
performance because no joins are required when querying both
superclass and subclass attributes, and all the data for an object
will be in the same block on disk. The disadvantage of this is
that you cannot enforce constraints, such as not-null, on subclass
specific attributes columns because the constraints would also be
applied to the superclasses. If you can ensure that your
application is the only way data can be inserted those constraints
will naturally be enforced there.

2) When you need subclass specific database constraints: Use an
ORM such as Django's ORM or SQLAlchemy which has one table with
the superclass attributes and a table for each subclass with their
subclass specific attributes. This is slower because joins will be
needed and the data for an object will be in 2 different blocks of
disk but it allows you to enforce constraints within the database
which will be checked whenever any application tries to insert
values. There is a lot of complexity added because there will be
so many small tables and indexes but the ORM takes care of that
for you.

*Will J. Dunn*
_willjdunn.com _

 

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Really, the whole question of how this code goes past the open() failure
> in SlruPhysicalReadPage baffles me.  I don't see any possible way for
> the file to be created ...

Hmm, the checkpointer can call TruncateMultiXact when in recovery, on
restartpoints. I wonder if in recovery this could trigger file creation.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Andres Freund
On 2015-06-03 15:01:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > That's not necessarily the case though, given how the code currently
> > works. In a bunch of places the SLRUs are accessed *before* having been
> > made consistent by WAL replay. Especially if several checkpoints/vacuums
> > happened during the base backup the assumed state (i.e. the mxacts
> > checkpoints refer to) of the data directory soon after the initial
> > start, and the state of pg_multixact/ won't necessarily match at all.
> 
> Well, the log extract is quite simple; it says that as soon as the
> standby starts replay WAL, it fetches exactly one WAL segment, which
> contains exactly one online checkpoint record; this record contains
> oldestMulti=4624559, which belongs in offset file 0046.  But the
> basebackup only contains files starting from 004B onwards.  The base
> backup takes a long time, so my guess of what happened is that the
> backup label points to the start of the WAL stream (obviously), then the
> data files are copied to the standby; during this long process, a
> multixact truncation happens.  So by the time the base backup reaches
> the pg_multixact directory, the 0046 file has been removed.

Yes. That's precisely what I was concerned about above and elsewhere in
the thread. It's simply not ok to access a SLRU while not yet
consistent...

> One idea I had was: what if the oldestMulti pointed to another multi
> earlier in the same 0046 file, so that it is read-as-zeroes (and the
> file is created), and then a subsequent multixact truncate tries to read
> a later page in the file.  In SlruPhysicalReadPage() this would give a
> change for open() to not fail, and then read() can fail as above.
> However, we would have an earlier LOG message about "reading as zeroes".
> 
> Really, the whole question of how this code goes past the open() failure
> in SlruPhysicalReadPage baffles me.  I don't see any possible way for
> the file to be created ...

Wouldn't a previous WAL record zeroing another part of that segment
explain this? A zero sized segment pretty much would lead to this error,
right? Or were you able to check how things look after the failure?

> 2015-05-27 16:15:17 UTC [4782]: [3-1] user=,db= LOG: entering standby mode
> 2015-05-27 16:15:18 UTC [4782]: [4-1] user=,db= LOG: restored log file 
> "000173DD00AD" from archive
> 2015-05-27 16:15:18 UTC [4782]: [5-1] user=,db= FATAL: could not access 
> status of transaction 4624559
> 2015-05-27 16:15:18 UTC [4782]: [6-1] user=,db= DETAIL: Could not read from 
> file "pg_multixact/offsets/0046" at offset 147456: Success.
> 2015-05-27 16:15:18 UTC [4778]: [4-1] user=,db= LOG: startup process (PID 
> 4782) exited with exit code 1
> 2015-05-27 16:15:18 UTC [4778]: [5-1] user=,db= LOG: aborting startup due to 
> startup process failure

>From this isn't not entirely clear where this error was triggered from.

Greetings,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Alvaro Herrera
Andres Freund wrote:
> On 2015-06-03 15:01:46 -0300, Alvaro Herrera wrote:

> > One idea I had was: what if the oldestMulti pointed to another multi
> > earlier in the same 0046 file, so that it is read-as-zeroes (and the
> > file is created), and then a subsequent multixact truncate tries to read
> > a later page in the file.  In SlruPhysicalReadPage() this would give a
> > change for open() to not fail, and then read() can fail as above.
> > However, we would have an earlier LOG message about "reading as zeroes".
> > 
> > Really, the whole question of how this code goes past the open() failure
> > in SlruPhysicalReadPage baffles me.  I don't see any possible way for
> > the file to be created ...
> 
> Wouldn't a previous WAL record zeroing another part of that segment
> explain this? A zero sized segment pretty much would lead to this error,
> right? Or were you able to check how things look after the failure?

But why would there be a previous WAL record zeroing another part of
that segment?  Note that this segment is very old -- hasn't been written
in quite a while, it's certainly not in slru buffers anymore.

> > 2015-05-27 16:15:17 UTC [4782]: [3-1] user=,db= LOG: entering standby mode
> > 2015-05-27 16:15:18 UTC [4782]: [4-1] user=,db= LOG: restored log file 
> > "000173DD00AD" from archive
> > 2015-05-27 16:15:18 UTC [4782]: [5-1] user=,db= FATAL: could not access 
> > status of transaction 4624559
> > 2015-05-27 16:15:18 UTC [4782]: [6-1] user=,db= DETAIL: Could not read from 
> > file "pg_multixact/offsets/0046" at offset 147456: Success.
> > 2015-05-27 16:15:18 UTC [4778]: [4-1] user=,db= LOG: startup process (PID 
> > 4782) exited with exit code 1
> > 2015-05-27 16:15:18 UTC [4778]: [5-1] user=,db= LOG: aborting startup due 
> > to startup process failure
> 
> From this isn't not entirely clear where this error was triggered from.

Well, reading code, it seems reasonable that to assume that replay of
the checkpoint record I mentioned leads to that error message when the
file exists but is not long enough to contain the given offset.  There
are not MultiXact wal records in the segment.  Also note that there's no
other "restored log file" message after the "entering standby mode"
message.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Robert Haas
On Wed, Jun 3, 2015 at 8:24 AM, Robert Haas  wrote:
> On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund  wrote:
>>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning
>>> > the disk it'll always get one at a segment boundary, right? I'm not sure
>>> > that's actually ok; because the value at the beginning of the segment
>>> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have
>>> > filled the page with zeros.
>>> >
>>> > I think that should be harmless, the worst that can happen is that
>>> > oldestOffset errorneously is 0, which should be correct, even though
>>> > possibly overly conservative, in these cases.
>>>
>>> Uh oh.  That seems like a real bad problem for this approach.  What
>>> keeps that from being the opposite of too conservative?  There's no
>>> "safe" value in a circular numbering space.
>>
>> I think it *might* (I'm really jetlagged) be fine because that'll only
>> happen after a upgrade from < 9.3. And in that case we initialize
>> nextOffset to 0. That ought to safe us?
>
> That's pretty much betting the farm on the bugs we know about today
> being the only ones there are.  That seems imprudent.

So here's a patch taking a different approach.  In this approach, if
the multixact whose members we want to look up doesn't exist, we don't
use a later one (that might or might not be valid).  Instead, we
attempt to cope with the unknown.  That means:

1. In TruncateMultiXact(), we don't truncate.

2. If setting the offset stop limit (the point where we refuse to
create new multixact space), we don't arm the stop point.  This means
that if you're in this situation, you run without member wraparound
protection until it's corrected.  A message gets logged once per
checkpoint telling you that you have this problem, and another message
gets logged when things get straightened out and the guards are
enabled.

3. If setting the vacuum force point, we assume that it's appropriate
to immediately force vacuum.

I've only tested this very lightly - this is just to see what you and
Noah and others think of the approach.  As compared with the previous
approach, it has the advantage of making minimal assumptions about the
sanity of what's on disk.  It has the disadvantage that, for some
people, the member-wraparound guard won't be enabled at startup -- but
note that those people can't start 9.3.7/9.4.2 *at all*, so currently
they are either running without member wraparound protection anyway
(if they haven't upgraded to those releases) or they're down entirely.
Another disadvantage is that we'll be triggering what may be quite a
bit of autovacuum activity for some people, which could be painful.
On the plus side, they'll hopefully end up with sane relminmxid and
datminmxid guards afterwards.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9568ff1..4400fc5 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -198,13 +198,24 @@ typedef struct MultiXactStateData
 	/* next-to-be-assigned offset */
 	MultiXactOffset nextOffset;
 
+	/* Have we completed multixact startup? */
+	bool		finishedStartup;
+
 	/*
-	 * Oldest multixact that is still on disk.  Anything older than this
-	 * should not be consulted.  These values are updated by vacuum.
+	 * Oldest multixact that is still potentially referenced by a relation.
+	 * Anything older than this should not be consulted.  These values are
+	 * updated by vacuum.
 	 */
 	MultiXactId oldestMultiXactId;
 	Oid			oldestMultiXactDB;
+
+	/*
+	 * Oldest multixact offset that is potentially referenced by a
+	 * multixact referenced by a relation.  We don't always know this value,
+	 * so there's a flag here to indicate whether or not we currently do.
+	 */
 	MultiXactOffset oldestOffset;
+	bool		oldestOffsetKnown;
 
 	/*
 	 * This is what the previous checkpoint stored as the truncate position.
@@ -221,6 +232,7 @@ typedef struct MultiXactStateData
 
 	/* support for members anti-wraparound measures */
 	MultiXactOffset offsetStopLimit;
+	bool offsetStopLimitKnown;
 
 	/*
 	 * Per-backend data starts here.  We have two arrays stored in the area
@@ -350,10 +362,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1,
 		MultiXactOffset offset2);
 static void ExtendMultiXactOffset(MultiXactId multi);
 static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers);
-static void DetermineSafeOldestOffset(MultiXactId oldestMXact);
+static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact);
 static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary,
 		 MultiXactOffset start, uint32 distance);
-static MultiXactOffset find_multixact_start(MultiXactId multi);
+static bool SetOffsetVacuumLimit(bool finish_setup);
+static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result);
 static void WriteMZero

Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Alvaro Herrera
Robert Haas wrote:

> So here's a patch taking a different approach.

I tried to apply this to 9.3 but it's messy because of pgindent.  Anyone
would have a problem with me backpatching a pgindent run of multixact.c?

Also, you have a new function SlruPageExists, but we already have
SimpleLruDoesPhysicalPageExist.  No need for two copies of pretty much
the same code, I think.  Your code uses fstat() instead of
lseek(.., SEEK_END) but the exact technique used is probably not
relevant.

I think I like this approach better than your other patch, FWIW --
mainly because it seems simpler.

Will review.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Thomas Munro
On Mon, Jun 1, 2015 at 4:55 PM, Noah Misch  wrote:
> While testing this (with inconsistent-multixact-fix-master.patch applied,
> FWIW), I noticed a nearby bug with a similar symptom.  TruncateMultiXact()
> omits the nextMXact==oldestMXact special case found in each other
> find_multixact_start() caller, so it reads the offset of a not-yet-created
> MultiXactId.  The usual outcome is to get rangeStart==0, so we truncate less
> than we could.  This can't make us truncate excessively, because
> nextMXact==oldestMXact implies no table contains any mxid.  If nextMXact
> happens to be the first of a segment, an error is possible.  Procedure:
>
> 1. Make a fresh cluster.
> 2. UPDATE pg_database SET datallowconn = true
> 3. Consume precisely 131071 mxids.  Number of offsets per mxid is unimportant.
> 4. vacuumdb --freeze --all
>
> Expected state after those steps:
> $ pg_controldata | grep NextMultiXactId
> Latest checkpoint's NextMultiXactId:  131072
>
> Checkpoint will fail like this:
> 26699 2015-05-31 17:22:33.134 GMT LOG:  statement: checkpoint
> 26661 2015-05-31 17:22:33.134 GMT DEBUG:  performing replication slot 
> checkpoint
> 26661 2015-05-31 17:22:33.136 GMT ERROR:  could not access status of 
> transaction 131072
> 26661 2015-05-31 17:22:33.136 GMT DETAIL:  Could not open file 
> "pg_multixact/offsets/0002": No such file or directory.
> 26699 2015-05-31 17:22:33.234 GMT ERROR:  checkpoint request failed
> 26699 2015-05-31 17:22:33.234 GMT HINT:  Consult recent messages in the 
> server log for details.
> 26699 2015-05-31 17:22:33.234 GMT STATEMENT:  checkpoint
>
> This does not block startup, and creating one mxid hides the problem again.
> Thus, it is not a top-priority bug like some other parts of this thread.  I
> mention it today mostly so it doesn't surprise hackers testing other fixes.

Thanks.   As mentioned elsewhere in the thread, I discovered that the
same problem exists for page boundaries, with a different error
message.  I've tried the attached repro scripts on 9.3.0, 9.3.5, 9.4.1
and master with the same results:

FATAL:  could not access status of transaction 2048
DETAIL:  Could not read from file "pg_multixact/offsets/" at
offset 8192: Undefined error: 0.

FATAL:  could not access status of transaction 131072
DETAIL:  Could not open file "pg_multixact/offsets/0002": No such file
or directory.

But, yeah, this isn't the bug we're looking for.

-- 
Thomas Munro
http://www.enterprisedb.com


checkpoint-page-boundary.sh
Description: Bourne shell script


checkpoint-segment-boundary.sh
Description: Bourne shell script

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database designpattern - product feature

2015-06-03 Thread Jan de Visser
On June 3, 2015 02:04:28 PM Roxanne Reid-Bennett wrote:
> I think you should evaluate your unease with having to update the database
> on release (potentially many times) carefully for what it is and why you
> have it.  [I'm not saying it is invalid - just know why you have it] 
> Because no matter how well you design your system - databases evolve. 
> Manage that. 

Having been guilty of designing an EAV system before, I think I know his 
hesitation is due to the fact that some databases (*cough* Oracle *cough*) 
don't allow DDL in transactions (or at least back when I worked with Oracle it 
didn't), making dynamic table creation a bit of a problem, especially when you 
run in a container which gives you little flexibility in your tx handling 
(i.e. you get it all the time, or never). This used to be a problem in many 
iterations of J2EE containers. Also, lots of DBAs get skittish when they hear 
about applications doing DDL. And again, many of the technological roadblocks 
are fixed by now, but the hesitation remains.

Combine this with the fact that you want users to be able to create new 
products, which should be built up out of existing and/or newly defined 
attributes, you quickly end up with something EAV like. Because you don't want 
your product management people coming crying to your DBAs to have a new table 
for a new product defined. You want the product management people to point-
and-click their way through a nice GUI.

So this is what lead me to that EAV design for this exact problem: we started 
off with Oracle as the database and a temperamental J2EE container, found out 
we couldn't do DDL (or at least DDL was hard), and by the time we were on less 
brain dead containers and databases the "damage" was done and there was no 
going back.

But in my defense I will say that mine was one of the prettiest EAV systems 
ever built. In my opinion at least :-)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database designpattern - product feature

2015-06-03 Thread Adrian Stern
(Sorry Jan for receiving this twice)

@ Roxanne & Jan:
Thank you both for your Input. I absolutely see your point against EAV.

I will work out two variants based on the specific solution, where i would
create a (bigger) Table for each group of similar products, and based on
the dynamic JSON solution.
With this we can have a good discussion about the pros and cons in our next
meeting. In the end, the customer hast to decide how important the
flexibility is for him. (And yes, the customer has IT-People who will
understand why and what we talk about)

So thanks for doubting my approach, it has been helpful!

Freundliche Grüsse

Adrian Stern
unchained - web solutions

adrian.st...@unchained.ch
+41 79 292 83 47

On Thu, Jun 4, 2015 at 4:02 AM, Jan de Visser  wrote:

> On June 3, 2015 02:04:28 PM Roxanne Reid-Bennett wrote:
> > I think you should evaluate your unease with having to update the
> database
> > on release (potentially many times) carefully for what it is and why you
> > have it.  [I'm not saying it is invalid - just know why you have it]
> > Because no matter how well you design your system - databases evolve.
> > Manage that.
>
> Having been guilty of designing an EAV system before, I think I know his
> hesitation is due to the fact that some databases (*cough* Oracle *cough*)
> don't allow DDL in transactions (or at least back when I worked with
> Oracle it
> didn't), making dynamic table creation a bit of a problem, especially when
> you
> run in a container which gives you little flexibility in your tx handling
> (i.e. you get it all the time, or never). This used to be a problem in many
> iterations of J2EE containers. Also, lots of DBAs get skittish when they
> hear
> about applications doing DDL. And again, many of the technological
> roadblocks
> are fixed by now, but the hesitation remains.
>
> Combine this with the fact that you want users to be able to create new
> products, which should be built up out of existing and/or newly defined
> attributes, you quickly end up with something EAV like. Because you don't
> want
> your product management people coming crying to your DBAs to have a new
> table
> for a new product defined. You want the product management people to point-
> and-click their way through a nice GUI.
>
> So this is what lead me to that EAV design for this exact problem: we
> started
> off with Oracle as the database and a temperamental J2EE container, found
> out
> we couldn't do DDL (or at least DDL was hard), and by the time we were on
> less
> brain dead containers and databases the "damage" was done and there was no
> going back.
>
> But in my defense I will say that mine was one of the prettiest EAV systems
> ever built. In my opinion at least :-)
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-03 Thread Noah Misch
On Wed, Jun 03, 2015 at 04:53:46PM -0400, Robert Haas wrote:
> So here's a patch taking a different approach.  In this approach, if
> the multixact whose members we want to look up doesn't exist, we don't
> use a later one (that might or might not be valid).  Instead, we
> attempt to cope with the unknown.  That means:
> 
> 1. In TruncateMultiXact(), we don't truncate.

I like that change a lot.  It's much easier to seek forgiveness for wasting <=
28 GiB of disk than for deleting visibility information wrongly.

> 2. If setting the offset stop limit (the point where we refuse to
> create new multixact space), we don't arm the stop point.  This means
> that if you're in this situation, you run without member wraparound
> protection until it's corrected.  A message gets logged once per
> checkpoint telling you that you have this problem, and another message
> gets logged when things get straightened out and the guards are
> enabled.
> 
> 3. If setting the vacuum force point, we assume that it's appropriate
> to immediately force vacuum.

Those seem reasonable, too.

> I've only tested this very lightly - this is just to see what you and
> Noah and others think of the approach.  As compared with the previous
> approach, it has the advantage of making minimal assumptions about the
> sanity of what's on disk.  It has the disadvantage that, for some
> people, the member-wraparound guard won't be enabled at startup -- but
> note that those people can't start 9.3.7/9.4.2 *at all*, so currently
> they are either running without member wraparound protection anyway
> (if they haven't upgraded to those releases) or they're down entirely.

That disadvantage is negligible, considering.

> Another disadvantage is that we'll be triggering what may be quite a
> bit of autovacuum activity for some people, which could be painful.
> On the plus side, they'll hopefully end up with sane relminmxid and
> datminmxid guards afterwards.

That sounds good so long as each table requires just one successful emergency
autovacuum.  I'm not seeing code to ensure that the launched autovacuum will
indeed perform a full-table scan and update relminmxid; is it there?

For sites that can't tolerate an autovacuum storm, what alternative can we
provide?  Is "SET vacuum_multixact_freeze_table_age = 0; VACUUM " of
every table, done before applying the minor update, sufficient?

>  static void
> -DetermineSafeOldestOffset(MultiXactId oldestMXact)
> +DetermineSafeOldestOffset(MultiXactOffset oldestMXact)

Leftover change from an earlier iteration?  The values passed continue to be
MultiXactId values.

>   /* move back to start of the corresponding segment */
> - oldestOffset -= oldestOffset %
> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
> + offsetStopLimit = oldestOffset - (oldestOffset %
> + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT));
> + /* always leave one segment before the wraparound point */
> + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * 
> SLRU_PAGES_PER_SEGMENT);
> +
> + /* if nothing has changed, we're done */
> + if (prevOffsetStopLimitKnown && offsetStopLimit == prevOffsetStopLimit)
> + return;
>  
>   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
> - /* always leave one segment before the wraparound point */
> - MultiXactState->offsetStopLimit = oldestOffset -
> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT);
> + MultiXactState->offsetStopLimit = oldestOffset;

That last line needs s/oldestOffset/offsetStopLimit/, I presume.

Thanks,
nm


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general