Re: [BUGS] Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5

2012-10-04 Thread Mark Kirkwood

On 04/10/12 19:06, Simon Riggs wrote:

On 4 October 2012 05:32, Mark Kirkwood  wrote:

I am seeing the situation where the reported flush location for the sync
standby (standby1 below) is *behind* the reported current xlog location of
the primary. This is Postgres 9.1.5 , and I was under the impression that
transactions initiated on the master do not commit until the corresponding
wal is flushed on the sync standby.

Now the standby is definitely working in sync mode, because stopping it
halts all write transactions on the primary (sync_standby_names contains
only standby1). So is the reported lag in flush location merely an artifact
of timing in the query, or is there something else going on? [1]


The writing of new WAL is independent of the wait that occurs on
commit, so it is entirely possible, even desirable, that the observed
effect occurs.



Ah right - it did occur to me (after posting of course), that *other* 
non commit wal could be causing the effect... thank you for clarifying!


This could be worth mentioning in docs for the view - as the context 
I've encountered this effect is folks writing scripts for replication 
lag etc.


Cheers

Mark


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-04 Thread Amit kapila
On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote:
On 02.10.2012 10:36, Amit kapila wrote:
> On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
>>> So let's think how this should ideally work from a user's point of view.
>>> I think there should be just two settings: walsender_timeout and
>>> walreceiver_timeout. walsender_timeout specifies how long a walsender
>>> will keep a connection open if it doesn't hear from the walreceiver, and
>>> walreceiver_timeout is the same for walreceiver. The system should


>>> The Ping/Pong messages don't necessarily need to be new message types,
>>> we can use the message types we currently have, perhaps with an
>>> additional flag attached to them, to request the other side to reply
>>> immediately.
>
>> Can't we make the decision to send reply immediately based on message type, 
>> because these message types will be unique.
>
>> To clarify my understanding,
>> 1. the heartbeat message from walsender side will be keepalive message ('k') 
>> and from walreceiver side it will be Hot Standby feedback message ('h').
>> 2. the reply message from walreceiver side will be current reply message 
>> ('r').

> Yep. I wonder why need separate message types for Hot Standby Feedback
> 'h' and Reply 'r', though. Seems it would be simpler to have just one
> messasge type that includes all the fields from both messages.

moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for 
heart-beat purpose.

>> 3. currently there is no reply kind of message from walsender, so do we need 
>> to introduce one new message for it or can use some existing message only?
>>  if new, do we need to send any additional information along with it, 
>> for existing messages can we use keepalive message it self as reply message 
>> but with an additional byte
>>  to indicate it is reply?

> Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an 
> additional flag.
   Okay. I have done it in Patch.

Thank you for suggestions. 
I have addressed your suggestions in patch attached with this mail.

Following changes are done to support replication timeout in sender as well as 
receiver: 

1. One new configuration parameter wal_receiver_timeout is added to detect 
timeout at receiver task. 
2. Existing parameter replication_timeout is renamed to wal_sender_timeout. 
3. Now PrimaryKeepaliveMessage structure is modified to add one more field to 
indicate whether keep-alive is of type 'r' (i.e. 
reply) or 'h' (i.e. heart-beat). 
4. Now the keep-alive message from sender will be sent to standby if it was 
idle for more than or equal to half of wal_sender_timeout. 
In this case it will send keep-alive of type 'h'. 
5. Once the standby receiver a keep-alive, it needs to send an immediate reply 
to primary to indicate connection is alive. 
6. Now Reply message to send wal offset and Feedback message to send oldest 
transaction are merged into single Reply message. 
So now the structure StandbyReplyMessage is changed to add two more fields 
as xmin and epoch. Also StandbyHSFeedbackMessage 
structure is changed to remove xmin and epoch fields (as these are moved to 
StandbyReplyMessage). 
7. Because of changes as in step-6, once receiver task receives some data from 
primary then it will only send Reply Message. 
8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then 
reply is sent immediately but incase of step-7, reply is sent 
 if wal_receiver_status_interval has lapsed (this part is same as earlier). 
9. Similar to sender, if receiver finds itself idle for more than or equal to 
half of configured wal_receiver_timeout, then it will send the 
 hot-standby heartbeat. This heart-beat has been modified to send only 
sendTime. 
10. Once sender task receiver heart-beat message from standby then it sends 
back the reply immediately. In this keep-alive message is 
   sent of type 'r'. 
11. If even after wal_sender_timeout no message received from standby then it 
will be considered as network break at sender task. 
12. If even after wal_receiver_timeout no message received from primary then it 
will be considered as network break at receiver task. 


With Regards,
Amit Kapila.

replication_timeout_patch_v3.patch
Description: replication_timeout_patch_v3.patch

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-10-04 Thread Amit Kapila


> -Original Message-
> From: pgsql-bugs-ow...@postgresql.org [mailto:pgsql-bugs-
> ow...@postgresql.org] On Behalf Of Amit kapila
> Sent: Thursday, October 04, 2012 3:43 PM
> To: Heikki Linnakangas
> Cc: Fujii Masao; pgsql-bugs@postgresql.org; pgsql-hack...@postgresql.org
> Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w
> breakdown
> 
> On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote:
> On 02.10.2012 10:36, Amit kapila wrote:
> > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote:
> >>> So let's think how this should ideally work from a user's point of
> view.
> >>> I think there should be just two settings: walsender_timeout and
> >>> walreceiver_timeout. walsender_timeout specifies how long a
> >>> walsender will keep a connection open if it doesn't hear from the

> 
> Thank you for suggestions.
> I have addressed your suggestions in patch attached with this mail.
> 
> Following changes are done to support replication timeout in sender as
> well as receiver:


Testing Done for the Patch

1. Verified the value of new configuration parameter and changed
configuration parameter using the show command (using Show of specific 
   parameter as well as show all). 
2. Verified the new configuration parameter in --describe-config. 
3. Verified the existing parameter replication_timeout's new name in
--describe-config. 
4. Start primary and standby node with default timeout, leave it for
sometime in idle situation. 
   It should not error out due to network break error. 
5. a. Start primary and standby node with default timeout, bring down the
network. 
   b. Both sender and receiver should be able to detect network break-down
almost at same time. 
   c. Once the network is up again, connection should get re-established
successfully. 
5. a. Start primary and standby node with wal_sender_timeout less than
wal_receiver_timeout, bring down the network. 
   b. Sender should be able to detect network break-down before receiver
task. 
   c. Once the network is up again, connection should get re-established
successfully. 
6. a. Start primary and standby node with wal_receiver_timeout less than
wal_sender_timeout, bring down the network. 
   b. Receiver should be able to detect network break-down before sender
task. 
   c. Once the network is up again, connection should get re-established
successfully. 
7. a. In 5th test case, change the value of wal_receiver_status_interval to
more than wal_receiver_timeout and hence more than  
  wal_sender_timeout. 
   b. Then bring down the network down.
   c. Sender task should be able to detect network break-down once
wal_sender_timeout has lapsed. 
   d. Once the network is up again, connection should get re-established
successfully.
   Intent of this test is to check there is no dependency of
wal_sender_timeout on wal_receiver_status_interval for detection of
   Network break.

All the above tests are passed. 

With Regards,
Amit Kapila.
 



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #7583: Problem using INHERITS and LIKE

2012-10-04 Thread bryan . love
The following bug has been logged on the website:

Bug reference:  7583
Logged by:  Bryan Love
Email address:  bryan.l...@iovation.com
PostgreSQL version: 9.2.1
Operating system:   Centos 6
Description:

After creating a table using LIKE and INHERITS, if you drop a column from
the parent table that existed prior to creation of the child table, the
column will not be dropped from the child table.

Test Case:

=# create table foo(col1 int, col2 int);
=# create table bar (like foo) inherits (foo);
NOTICE:  merging column "col1" with inherited definition
NOTICE:  merging column "col2" with inherited definition
CREATE TABLE
=# alter table foo drop col1;
ALTER TABLE
=# \d bar
  Table "public.bar"
 Column |  Type   | Modifiers
+-+---
 col1   | integer |
 col2   | integer |
Inherits: foo

--- copy/paste block commands 
drop table bar;
drop table foo;
create table foo(col1 int, col2 int);
create table bar (like foo) inherits (foo);
alter table foo drop col1;
\d bar



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7583: Problem using INHERITS and LIKE

2012-10-04 Thread Tom Lane
bryan.l...@iovation.com writes:
> After creating a table using LIKE and INHERITS, if you drop a column from
> the parent table that existed prior to creation of the child table, the
> column will not be dropped from the child table.

This is not a bug.  LIKE says the column isn't inherited, so it exists
in the child independently of whether it exists in the parent.

Or, if you want to add it up: the child starts out with one local
definition of the column (from LIKE) and one inherited (from INHERITS).
Dropping the column from the parent removes the latter, but you still
have the former, so the column stays.

> =# create table bar (like foo) inherits (foo);

Just out of curiosity, is there any actual use-case for such a silly
thing?

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

2012-10-04 Thread Bruce Momjian
On Thu, Oct  4, 2012 at 10:40:19AM +1000, Maxim Boguk wrote:
> 
> > Test database have a bit unusual tablespace layout:
> > main tablespace partition was mounted inside data directory of the old
> > cluster...
> > E.g.:
> > data directory - /var/lib/postgresql/9.2/main
> > main tablespace (another partition mount point) -
> > /var/lib/postgresql/9.2/main/largedb
> 
> Can you show us the data directory path of the old and new clusters?
> 
> 
> --old-datadir=/var/lib/postgresql/9.0/main
> --new-datadir=/var/lib/postgresql/9.2/main
> 
> second partition used as tablespace were mounted as:
> /var/lib/postgresql/9.0/main/largedb
> 
> 
> 
> pg_upgrade really doesn't know what is inside that old cluster, so it
> just deletes everything under the data directory.
> 
> 
> Hmm... may be good idea to try opposite way:
> default directories and files layout in PostgreSQL data directory well
> documented and almost never changes.
> May be instead of rm -rf whole data directory try rm -rf only files and
> directories which sure belong to the PostgreSQL?
> 
> Something along with:
> 1)rm -rf base global pg_clog pg_multixact ... and so on
> 2)produce warning if any unusual files left in data directory after that (but
> not delete them).
> 3)delete data directory itself only if that directory completely empty after
> step 1 and 2
> 
> PS: I know that solution will be not completely error-prone but it will 
> prevent
> most probably data-loss scenarios. So it's better then nothing.
> 
> PS: I also think deleting postgresql.conf and pg_hba.conf from old data
> directory is wrong move too... if admin forget copy pg_hba.conf to the new
> cluster - these settings could be lost forever after delete_old_cluster.sh .

This all seems like a step backwards and adds complexity that will fail.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7583: Problem using INHERITS and LIKE

2012-10-04 Thread Bryan Love
Understood, thanks.

Our use case is that we want to copy the indexes from the parent table as
well, so we have to use (like ... including indexes).

On Thu, Oct 4, 2012 at 7:38 AM, Tom Lane  wrote:

> bryan.l...@iovation.com writes:
> > After creating a table using LIKE and INHERITS, if you drop a column from
> > the parent table that existed prior to creation of the child table, the
> > column will not be dropped from the child table.
>
> This is not a bug.  LIKE says the column isn't inherited, so it exists
> in the child independently of whether it exists in the parent.
>
> Or, if you want to add it up: the child starts out with one local
> definition of the column (from LIKE) and one inherited (from INHERITS).
> Dropping the column from the parent removes the latter, but you still
> have the former, so the column stays.
>
> > =# create table bar (like foo) inherits (foo);
>
> Just out of curiosity, is there any actual use-case for such a silly
> thing?
>
> regards, tom lane
>