Re: [BUGS] Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5
On 04/10/12 19:06, Simon Riggs wrote: On 4 October 2012 05:32, Mark Kirkwood wrote: I am seeing the situation where the reported flush location for the sync standby (standby1 below) is *behind* the reported current xlog location of the primary. This is Postgres 9.1.5 , and I was under the impression that transactions initiated on the master do not commit until the corresponding wal is flushed on the sync standby. Now the standby is definitely working in sync mode, because stopping it halts all write transactions on the primary (sync_standby_names contains only standby1). So is the reported lag in flush location merely an artifact of timing in the query, or is there something else going on? [1] The writing of new WAL is independent of the wait that occurs on commit, so it is entirely possible, even desirable, that the observed effect occurs. Ah right - it did occur to me (after posting of course), that *other* non commit wal could be causing the effect... thank you for clarifying! This could be worth mentioning in docs for the view - as the context I've encountered this effect is folks writing scripts for replication lag etc. Cheers Mark -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: On 02.10.2012 10:36, Amit kapila wrote: > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: >>> So let's think how this should ideally work from a user's point of view. >>> I think there should be just two settings: walsender_timeout and >>> walreceiver_timeout. walsender_timeout specifies how long a walsender >>> will keep a connection open if it doesn't hear from the walreceiver, and >>> walreceiver_timeout is the same for walreceiver. The system should >>> The Ping/Pong messages don't necessarily need to be new message types, >>> we can use the message types we currently have, perhaps with an >>> additional flag attached to them, to request the other side to reply >>> immediately. > >> Can't we make the decision to send reply immediately based on message type, >> because these message types will be unique. > >> To clarify my understanding, >> 1. the heartbeat message from walsender side will be keepalive message ('k') >> and from walreceiver side it will be Hot Standby feedback message ('h'). >> 2. the reply message from walreceiver side will be current reply message >> ('r'). > Yep. I wonder why need separate message types for Hot Standby Feedback > 'h' and Reply 'r', though. Seems it would be simpler to have just one > messasge type that includes all the fields from both messages. moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for heart-beat purpose. >> 3. currently there is no reply kind of message from walsender, so do we need >> to introduce one new message for it or can use some existing message only? >> if new, do we need to send any additional information along with it, >> for existing messages can we use keepalive message it self as reply message >> but with an additional byte >> to indicate it is reply? > Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an > additional flag. Okay. I have done it in Patch. Thank you for suggestions. I have addressed your suggestions in patch attached with this mail. Following changes are done to support replication timeout in sender as well as receiver: 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. 3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r' (i.e. reply) or 'h' (i.e. heart-beat). 4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout. In this case it will send keep-alive of type 'h'. 5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive. 6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message. So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage). 7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message. 8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7, reply is sent if wal_receiver_status_interval has lapsed (this part is same as earlier). 9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, then it will send the hot-standby heartbeat. This heart-beat has been modified to send only sendTime. 10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alive message is sent of type 'r'. 11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sender task. 12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receiver task. With Regards, Amit Kapila. replication_timeout_patch_v3.patch Description: replication_timeout_patch_v3.patch -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
> -Original Message- > From: pgsql-bugs-ow...@postgresql.org [mailto:pgsql-bugs- > ow...@postgresql.org] On Behalf Of Amit kapila > Sent: Thursday, October 04, 2012 3:43 PM > To: Heikki Linnakangas > Cc: Fujii Masao; pgsql-bugs@postgresql.org; pgsql-hack...@postgresql.org > Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w > breakdown > > On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: > On 02.10.2012 10:36, Amit kapila wrote: > > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: > >>> So let's think how this should ideally work from a user's point of > view. > >>> I think there should be just two settings: walsender_timeout and > >>> walreceiver_timeout. walsender_timeout specifies how long a > >>> walsender will keep a connection open if it doesn't hear from the > > Thank you for suggestions. > I have addressed your suggestions in patch attached with this mail. > > Following changes are done to support replication timeout in sender as > well as receiver: Testing Done for the Patch 1. Verified the value of new configuration parameter and changed configuration parameter using the show command (using Show of specific parameter as well as show all). 2. Verified the new configuration parameter in --describe-config. 3. Verified the existing parameter replication_timeout's new name in --describe-config. 4. Start primary and standby node with default timeout, leave it for sometime in idle situation. It should not error out due to network break error. 5. a. Start primary and standby node with default timeout, bring down the network. b. Both sender and receiver should be able to detect network break-down almost at same time. c. Once the network is up again, connection should get re-established successfully. 5. a. Start primary and standby node with wal_sender_timeout less than wal_receiver_timeout, bring down the network. b. Sender should be able to detect network break-down before receiver task. c. Once the network is up again, connection should get re-established successfully. 6. a. Start primary and standby node with wal_receiver_timeout less than wal_sender_timeout, bring down the network. b. Receiver should be able to detect network break-down before sender task. c. Once the network is up again, connection should get re-established successfully. 7. a. In 5th test case, change the value of wal_receiver_status_interval to more than wal_receiver_timeout and hence more than wal_sender_timeout. b. Then bring down the network down. c. Sender task should be able to detect network break-down once wal_sender_timeout has lapsed. d. Once the network is up again, connection should get re-established successfully. Intent of this test is to check there is no dependency of wal_sender_timeout on wal_receiver_status_interval for detection of Network break. All the above tests are passed. With Regards, Amit Kapila. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7583: Problem using INHERITS and LIKE
The following bug has been logged on the website: Bug reference: 7583 Logged by: Bryan Love Email address: bryan.l...@iovation.com PostgreSQL version: 9.2.1 Operating system: Centos 6 Description: After creating a table using LIKE and INHERITS, if you drop a column from the parent table that existed prior to creation of the child table, the column will not be dropped from the child table. Test Case: =# create table foo(col1 int, col2 int); =# create table bar (like foo) inherits (foo); NOTICE: merging column "col1" with inherited definition NOTICE: merging column "col2" with inherited definition CREATE TABLE =# alter table foo drop col1; ALTER TABLE =# \d bar Table "public.bar" Column | Type | Modifiers +-+--- col1 | integer | col2 | integer | Inherits: foo --- copy/paste block commands drop table bar; drop table foo; create table foo(col1 int, col2 int); create table bar (like foo) inherits (foo); alter table foo drop col1; \d bar -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7583: Problem using INHERITS and LIKE
bryan.l...@iovation.com writes: > After creating a table using LIKE and INHERITS, if you drop a column from > the parent table that existed prior to creation of the child table, the > column will not be dropped from the child table. This is not a bug. LIKE says the column isn't inherited, so it exists in the child independently of whether it exists in the parent. Or, if you want to add it up: the child starts out with one local definition of the column (from LIKE) and one inherited (from INHERITS). Dropping the column from the parent removes the latter, but you still have the former, so the column stays. > =# create table bar (like foo) inherits (foo); Just out of curiosity, is there any actual use-case for such a silly thing? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)
On Thu, Oct 4, 2012 at 10:40:19AM +1000, Maxim Boguk wrote: > > > Test database have a bit unusual tablespace layout: > > main tablespace partition was mounted inside data directory of the old > > cluster... > > E.g.: > > data directory - /var/lib/postgresql/9.2/main > > main tablespace (another partition mount point) - > > /var/lib/postgresql/9.2/main/largedb > > Can you show us the data directory path of the old and new clusters? > > > --old-datadir=/var/lib/postgresql/9.0/main > --new-datadir=/var/lib/postgresql/9.2/main > > second partition used as tablespace were mounted as: > /var/lib/postgresql/9.0/main/largedb > > > > pg_upgrade really doesn't know what is inside that old cluster, so it > just deletes everything under the data directory. > > > Hmm... may be good idea to try opposite way: > default directories and files layout in PostgreSQL data directory well > documented and almost never changes. > May be instead of rm -rf whole data directory try rm -rf only files and > directories which sure belong to the PostgreSQL? > > Something along with: > 1)rm -rf base global pg_clog pg_multixact ... and so on > 2)produce warning if any unusual files left in data directory after that (but > not delete them). > 3)delete data directory itself only if that directory completely empty after > step 1 and 2 > > PS: I know that solution will be not completely error-prone but it will > prevent > most probably data-loss scenarios. So it's better then nothing. > > PS: I also think deleting postgresql.conf and pg_hba.conf from old data > directory is wrong move too... if admin forget copy pg_hba.conf to the new > cluster - these settings could be lost forever after delete_old_cluster.sh . This all seems like a step backwards and adds complexity that will fail. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7583: Problem using INHERITS and LIKE
Understood, thanks. Our use case is that we want to copy the indexes from the parent table as well, so we have to use (like ... including indexes). On Thu, Oct 4, 2012 at 7:38 AM, Tom Lane wrote: > bryan.l...@iovation.com writes: > > After creating a table using LIKE and INHERITS, if you drop a column from > > the parent table that existed prior to creation of the child table, the > > column will not be dropped from the child table. > > This is not a bug. LIKE says the column isn't inherited, so it exists > in the child independently of whether it exists in the parent. > > Or, if you want to add it up: the child starts out with one local > definition of the column (from LIKE) and one inherited (from INHERITS). > Dropping the column from the parent removes the latter, but you still > have the former, so the column stays. > > > =# create table bar (like foo) inherits (foo); > > Just out of curiosity, is there any actual use-case for such a silly > thing? > > regards, tom lane >