Re: amazon aroura config - seriously overcommited defaults? (May be Off Topic)

2018-12-09 Thread Andrew Dunstan



On 12/8/18 6:38 PM, Andres Freund wrote:

On 2018-12-08 15:23:19 -0800, Rob Sargent wrote:



On Dec 8, 2018, at 3:12 PM, Andres Freund  wrote:

On 2018-12-08 12:06:23 -0800, Jeremy Schneider wrote:

On RDS PostgreSQL, the default is 25% of your server memory. This seems
to be pretty widely accepted as a good starting point on PostgreSQL.

FWIW, I think it's widely cited, but also bad advice.  25% for a OLTP
workload on a 1TB machine with a database size above 25% is a terrible
idea.


Sorry, could you please expand “database size above 25%”?  25% of what?

Memory available to postgres (i.e. 100% of the server's memory on a
server dedicated to postgres, less if it's shared duty).




I think the best advice these days is that you need to triangulate to 
find the best setting for shared_buffers. It's very workload dependent, 
and there isn't even a semi-reliable rule of thumb.



cheers


andrew


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Adrian Klaver

On 12/8/18 11:26 PM, Mike Lissner wrote:

Hi, first time poster.

I just ran into a rather messy problem when doing a schema migration 
with logical replication. I'm not entirely sure what went wrong, why, or 
how to prevent it in the future. The migration I ran was pretty simple 
(though auto-generated by Django):




And after running this migration, I started getting this error on the 
subscriber:


|2018-12-09 05:59:45 UTC::@:[13373]:LOG: logical replication apply 
worker for subscription "replicasubscription" has started 2018-12-09 
05:59:45 UTC::@:[13373]:ERROR: null value in column 
"recap_sequence_number" violates not-null constraint 2018-12-09 05:59:45 
UTC::@:[13373]:DETAIL: Failing row contains (48064261, 2018-12-07 
04:48:40.388377+00, 2018-12-07 04:48:40.388402+00, null, 576, , 4571214, 
null, null). 2018-12-09 05:59:45 UTC::@:[6342]:LOG: worker process: 
logical replication worker for subscription 18390 (PID 13373) exited 
with exit code 1|



So, my migration created a new column with a null constraint and somehow 
the subscriber got data that violated that. I don't know how that's 
possible since this was a new column and it was never nullable.



The above seems to be the crux of the problem, how did NULL get into the 
column data?


The DDL migration did what it was supposed to do.



I applied the above migration simultaneously on my publisher and 
subscriber thinking that postgresql was smart enough to do the right 
thing. I think the subscriber finished first (it has less traffic).


And it did as far as the schema changes where concerned.



The docs hint that postgresql might be smart enough to not worry about 
the order you do migrations:


 > /Logical replication is robust when schema definitions change in a 
live database:/ When the schema is changed on the publisher and 
replicated data starts arriving at the subscriber but does not fit into 
the table schema, replication will error until the schema is updated.


And it even hints that doing a migration on the subscriber first is a 
good thing in some cases:


 > In many cases, intermittent errors can be avoided by applying 
additive schema changes to the subscriber first.


But I'm now supremely skeptical that doing anything at the subscriber 
first is a good idea. Are the docs wrong? Does the above error make 
sense? Is the process for schema migrations documented somewhere beyond 
the above?


I have lots of questions because I thought this would have gone smoother 
than it did.


As for the fix: I made the column nullable on the subscriber and I'm 
waiting for it to catch up. Once it does I'll re-sync its schema with 
the publisher. Anybody interested in following along with all this (or 
finding this later and having questions) can follow the issue here:


https://github.com/freelawproject/courtlistener/issues/919

Thank you for the lovely database! I hope this is helpful.

Mike



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Mike Lissner
>
>
> The above seems to be the crux of the problem, how did NULL get into the
> column data?
>
>
I agree. My queries are generated by Django (so I never write SQL myself),
but:

 - the column has always been NOT NULL for its entire lifetime
 - we don't send *any* SQL commands to the replica yet, so that's not a
factor (for now it's just a live backup)
 - the publisher now has a NOT NULL constraint on that column. I never had
to clear out null values to put it in place. I assume that if that column
ever had a null value and I tried to run a DDL to add a null constraint,
the DDL would have failed, right?

Something feels wrong here, the more I think about it.


Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Adrian Klaver

On 12/9/18 8:03 AM, Mike Lissner wrote:


The above seems to be the crux of the problem, how did NULL get into
the
column data?


I agree. My queries are generated by Django (so I never write SQL 
myself), but:


  - the column has always been NOT NULL for its entire lifetime


The lifetime being since the migration did this?:

ALTER TABLE "search_docketentry" ADD COLUMN "recap_sequence_number" 
varchar(50) DEFAULT '' NOT NULL;
ALTER TABLE "search_docketentry" ALTER COLUMN "recap_sequence_number" 
DROP DEFAULT;


Also does the column recap_sequence_number appear in any other tables. 
Just wondering if the error was on another table?



  - we don't send *any* SQL commands to the replica yet, so that's not a 
factor (for now it's just a live backup)
  - the publisher now has a NOT NULL constraint on that column. I never 
had to clear out null values to put it in place. I assume that if that 


This part confuses me. You seem to imply that the column existed before 
the migration and you just added a NOT NULL constraint. The migration 
shows the column being created with a NOT NULL constraint.


column ever had a null value and I tried to run a DDL to add a null 
constraint, the DDL would have failed, right?


Something feels wrong here, the more I think about it.


A start would be to figure out what generated?:

failing row contains (48064261, 2018-12-07 04:48:40.388377+00, 
2018-12-07 04:48:40.388402+00, null, 576, , 4571214, null, null)



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Mike Lissner
On Sun, Dec 9, 2018 at 8:43 AM Adrian Klaver 
wrote:

> On 12/9/18 8:03 AM, Mike Lissner wrote:
> >
> > The above seems to be the crux of the problem, how did NULL get into
> > the
> > column data?
> >
> >
> > I agree. My queries are generated by Django (so I never write SQL
> > myself), but:
> >
> >   - the column has always been NOT NULL for its entire lifetime
>
> The lifetime being since the migration did this?:
>
> ALTER TABLE "search_docketentry" ADD COLUMN "recap_sequence_number"
> varchar(50) DEFAULT '' NOT NULL;
> ALTER TABLE "search_docketentry" ALTER COLUMN "recap_sequence_number"
> DROP DEFAULT;
>

"Lifetime" meaning that there was never a time when this column allowed
nulls.


> Also does the column recap_sequence_number appear in any other tables.
> Just wondering if the error was on another table?
>

Good idea, but no. This column only exists in one table.


> >   - we don't send *any* SQL commands to the replica yet, so that's not a
> > factor (for now it's just a live backup)
> >   - the publisher now has a NOT NULL constraint on that column. I never
> > had to clear out null values to put it in place. I assume that if that
>
> This part confuses me. You seem to imply that the column existed before
> the migration and you just added a NOT NULL constraint. The migration
> shows the column being created with a NOT NULL constraint.
>

Sorry, what I mean is that if *somehow* the master had null values in that
column at some point, which I don't know how would even be possible because
it only came into existence with the command above — if somehow that
happened, I'd know, because I wouldn't have been *able* to add a NULL
constraint without first fixing the data in that column, which I never did.

My contention is that for all these reasons, there should *never* have been
a null value in that column on master.


>
> > column ever had a null value and I tried to run a DDL to add a null
> > constraint, the DDL would have failed, right?
> >
> > Something feels wrong here, the more I think about it.
>
> A start would be to figure out what generated?:
>
> failing row contains (48064261, 2018-12-07 04:48:40.388377+00,
> 2018-12-07 04:48:40.388402+00, null, 576, , 4571214, null, null)
>

Yes, I completely agree. I can't think of any way that that should have
ever been created.


>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Temp tables

2018-12-09 Thread bhargav kamineni
Hi,

What happens if we create and insert/update the data in TEMP tables , Does
that data really gets inserted at disk level or at buffer level and what
happens to this data after completion of the transaction ?



Thanks
Banu


Re: Temp tables

2018-12-09 Thread David G. Johnston
On Sunday, December 9, 2018, bhargav kamineni  wrote:
>
> What happens if we create and insert/update the data in TEMP tables , Does
> that data really gets inserted at disk level or at buffer level
>

Disk


>  and what happens to this data after completion of the transaction ?
>

Your choice. See “on conflict” clause of create temp table command.

David J.


Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Adrian Klaver

On 12/9/18 10:47 AM, Mike Lissner wrote:




My contention is that for all these reasons, there should *never* have 
been a null value in that column on master.



 > column ever had a null value and I tried to run a DDL to add a null
 > constraint, the DDL would have failed, right?
 >
 > Something feels wrong here, the more I think about it.

A start would be to figure out what generated?:

failing row contains (48064261, 2018-12-07 04:48:40.388377+00,
2018-12-07 04:48:40.388402+00, null, 576, , 4571214, null, null)


Yes, I completely agree. I can't think of any way that that should have 
ever been created.


1) Using psql have you verified that NOT NULL is set on that column on 
the publisher?


2) And that the row that failed in the subscriber is in the publisher table.

3) That there are no NULL values in the publisher column?

Whatever the answers to 1), 2) and 3) are the next question is:

4) Do you want/need recap_sequence_number to be NOT NULL.

a) If not then you could leave things as they are.

b) If so then you:

1) Have to figure out what is sending NULL values to the column.

   Maybe a model that has null=True set when it shouldn't be?

   A Form/ModelForm that is allowing None/Null?

   Some code that is operating outside the ORM e.g. doing a
   direct query using from django.db import connection.

2) Clean up the NULL values in the column in the subscriber
   and/or publisher.





-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Errors with schema migration and logical replication — expected?

2018-12-09 Thread Mike Lissner
On Sun, Dec 9, 2018 at 12:42 PM Adrian Klaver 
wrote:

>
> 1) Using psql have you verified that NOT NULL is set on that column on
> the publisher?
>

Yes, on the publisher and the subscriber. That was my first step when I saw
the log lines about this.

2) And that the row that failed in the subscriber is in the publisher table.
>

Yep, it's there (though it doesn't show a null for that column, and I don't
know how it ever could have).


> 3) That there are no NULL values in the publisher column?
>

This on the publisher:

select * from search_docketentry where recap_sequence_number is null;

returns zero rows, so yeah, no nulls in there (which makes sense since
they're not allowed).

Whatever the answers to 1), 2) and 3) are the next question is:
>
> 4) Do you want/need recap_sequence_number to be NOT NULL.
>

Yes, and indeed that's how it always has been.

a) If not then you could leave things as they are.
>

Well, I was able to fix this by briefly allowing nulls on the subscriber,
letting it catch up with the publisher, setting all nulls to empty strings
(a Django convention), and then disallowing nulls again. After letting it
catch up, there were 118 nulls on the subscriber in this column:

https://github.com/freelawproject/courtlistener/issues/919#issuecomment-445520185

That shouldn't be possible since nulls were never allowed in this column on
the publisher.


> b) If so then you:
>
> 1) Have to figure out what is sending NULL values to the column.
>
> Maybe a model that has null=True set when it shouldn't be?
>

Nope, never had that. I'm 100% certain.


> A Form/ModelForm that is allowing None/Null?
>

Even if that was the case, the error wouldn't have shown up on the
subscriber since that null would have never been allowed in the publisher.
But anyway, I don't use any forms with this column.


> Some code that is operating outside the ORM e.g. doing a
>direct query using from django.db import connection.
>

That's an idea, but like I said, nothing sends SQL to the subscriber (not
even read requests), and this shouldn't have been possible in the publisher
due to the NOT NULL constraint that has *always* been on that column.

 2) Clean up the NULL values in the column in the subscriber
> and/or publisher.
>

There were only NULL values in the subscriber, never in the publisher.
Something is amiss here.

I appreciate all the responses. I'm scared to say so, but I think this is a
bug in logical replication. Somehow a null value appeared at the subscriber
that was never in the publisher.

I also still have this question/suggestion from my first email:

> Is the process for schema migrations documented somewhere beyond the
above?

Thank you again,

Mike


Re: Tables(s) that feed pg_controldata

2018-12-09 Thread Ian Barwick

On 12/09/2018 01:25 AM, Ron wrote:
>
> In v9.2 (yes, I know it's EOL; there's nothing I can do about it), what tables
> do I query to get these values, and can I also get them from the streamed
> replication host?
>
> - Database cluster state
> - Latest checkpoint location
> - Time of latest checkpoint

The pg_control file is written directly by PostgreSQL, it does not derive from
any tables.

From 9.6 there are a bunch of functions which will report values contained in
pg_controldata:


https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-CONTROLDATA

but you're out of luck for 9.2. The only option to query the desired values via
SQL would be to write an extension which reads pg_controldata
(possibly as a backport of the above-mentioned functions).


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Tables(s) that feed pg_controldata

2018-12-09 Thread Ron

On 12/09/2018 07:51 PM, Ian Barwick wrote:

On 12/09/2018 01:25 AM, Ron wrote:
>
> In v9.2 (yes, I know it's EOL; there's nothing I can do about it), what 
tables

> do I query to get these values, and can I also get them from the streamed
> replication host?
>
> - Database cluster state
> - Latest checkpoint location
> - Time of latest checkpoint

The pg_control file is written directly by PostgreSQL, it does not derive 
from

any tables.

From 9.6 there are a bunch of functions which will report values contained in
pg_controldata:

https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-CONTROLDATA

but you're out of luck for 9.2. The only option to query the desired 
values via

SQL would be to write an extension which reads pg_controldata
(possibly as a backport of the above-mentioned functions).


Thanks.  grep(1) and cut(1) make an adequate work-around.

--
Angular momentum makes the world go 'round.



Re: Tables(s) that feed pg_controldata

2018-12-09 Thread Michael Paquier
On Sun, Dec 09, 2018 at 08:10:57PM -0600, Ron wrote:
> On 12/09/2018 07:51 PM, Ian Barwick wrote:
>> but you're out of luck for 9.2. The only option to query the desired
>> values via
>> SQL would be to write an extension which reads pg_controldata
>> (possibly as a backport of the above-mentioned functions).
> 
> Thanks.  grep(1) and cut(1) make an adequate work-around.

Perhaps.  Even with that, writing an extension for 9.2 would require
copy-pasting a lot of code from pg_controldata.c and taking as well the
code which has been introduced for the various control functions.
Dirty, still doable.  Adding the control functions has also refactored
the code so as getting control file data is easy for backend code (see
get_controlfile in controldata_utils.c).

(9.2 is EOL'd for one year now, you may want to upgrade.)
--
Michael


signature.asc
Description: PGP signature


Re: Tables(s) that feed pg_controldata

2018-12-09 Thread Ron

On 12/09/2018 08:50 PM, Michael Paquier wrote:
[snip]

(9.2 is EOL'd for one year now, you may want to upgrade.)


You assume that the DBA controls the data; he doesn't.  It's the customer's 
data, and they control the OS, RDBMS and application versions.  We just 
migrated the big databases off ancient physical servers, RHEL5 and Pg 8.4 
only because the PCI auditors wouldn't pass it without the upgrade.  There 
are still some Windows 2003 / SQL 2005 servers running production.  That's 
the Pg 8.0 era...


--
Angular momentum makes the world go 'round.



Re: Tables(s) that feed pg_controldata

2018-12-09 Thread Ian Barwick

On 12/10/2018 11:50 AM, Michael Paquier wrote:

On Sun, Dec 09, 2018 at 08:10:57PM -0600, Ron wrote:

On 12/09/2018 07:51 PM, Ian Barwick wrote:

but you're out of luck for 9.2. The only option to query the desired
values via
SQL would be to write an extension which reads pg_controldata
(possibly as a backport of the above-mentioned functions).


Thanks.  grep(1) and cut(1) make an adequate work-around.


Perhaps.  Even with that, writing an extension for 9.2 would require
copy-pasting a lot of code from pg_controldata.c and taking as well the
code which has been introduced for the various control functions.
Dirty, still doable. 


I've got some code which does just that to read pg_control as far
back as 9.3, and it is indeed not pretty ;).


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pg_ctl kill INT vs. pg_cancel_backend

2018-12-09 Thread Arthur Zakirov

Hello,

On 08.12.2018 20:30, Ron wrote:

Hi,

Is there a substantive difference between the two, or are they just 
different interfaces to the same action?


I think they both only execute kill() and send a signal to a process. 
But pg_cancel_backend() after checking privileges sends only SIGINT and 
only to a PostgreSQL's backend process.

pg_ctl can send a specified signal to any process. From the documentation:

pg_ctl kill signal_name process_id

Here signal_name is HUP, INT and others.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company