[BUGS] BUG #7532: cannot update to 9.2
The following bug has been logged on the website: Bug reference: 7532 Logged by: arnes Email address: arsen_...@163.com PostgreSQL version: 9.1.5 Operating system: ubuntu 11.04 natty Description: I cannot update my postgresql to 9.2. Is there anything wrong with the repository? My friend uses ubuntu 11.10, and he can update to 9.2. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7532: cannot update to 9.2
Hello Arsen, arsen_...@163.com [2012-09-12 1:48 +]: > Bug reference: 7532 > Logged by: arnes > Email address: arsen_...@163.com > PostgreSQL version: 9.1.5 > Operating system: ubuntu 11.04 natty > Description: > > I cannot update my postgresql to 9.2. Is there anything wrong with the > repository? > My friend uses ubuntu 11.10, and he can update to 9.2. This is not an upstream bug for this list. Neither Ubuntu 11.04 nor Ubuntu 11.10 contain 9.2, as at that time 9.2 wasn't released yet. Presumably your friend is using https://launchpad.net/~pitti/+archive/postgresql or http://pgapt.debian.net/ (but be aware that neither are officially supported). Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6704: ALTER EXTENSION postgis SET SCHEMA leaves dangling relations
Hi, Sorry for being late at the party… been distracted away… Bruce Momjian writes: > On Fri, Jun 22, 2012 at 10:37:10PM -0400, Tom Lane wrote: >> j...@pgexperts.com writes: >> > DROP and CREATE extension appear to work fine, but if you ALTER EXTENSION >> > postgis SET SCHEMA foo, it leaves a few relations behind. >> >> What it seems to be leaving behind is indexes ... also relation rowtypes. >> >> A bit of looking shows that ALTER EXTENSION SET SCHEMA calls >> AlterObjectNamespace_oid on the table. AlterObjectNamespace_oid >> calls AlterRelationNamespaceInternal, and nothing else. In comparison, >> ALTER TABLE SET SCHEMA (AlterTableNamespace) calls >> AlterRelationNamespaceInternal and about four other things. I'm not >> sure if this was broken before the last round of refactoring in this >> area, but for sure it's broken now. Looking at that code, my theory of how we got there is that in the submitted extension patch I did only use DEPENDENCY_INTERNAL and Tom introduced the much better DEPENDENCY_EXTENSION tracking. With the former, indexes and sequences and constraints where found in the dependency walking code, but only the main relation is now registered in the later model. I need to do some testing about dependency tracking on SERIAL generated sequences compared to manually created sequences in extension scripts, I think we track sequences directly only in the manual case. I think we need to share more code in between AlterRelationNamespaceInternal and AlterTableNamespace, but I'm not sure if that's not exactly what Álvaro did try with his _oid() attempt that failed. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7516: PL/Perl crash
On 9/12/12 1:50 AM, Tom Lane wrote: Marko Tiikkaja writes: Joel Jacobson managed to narrow it down to this test case, which crashes consistently on Ubuntu 12.04 both with and without your patch. I, however, wasn't able to reproduce the problem on my OS X Mountain Lion. Doesn't reproduce for me either ... Ok, I can reproduce it on my Ubuntu virtual machine. The problem looks like this: #0 free_plperl_function (prodesc=0x24195a0) at plperl.c:2428 #1 0x7ff47babc045 in plperl_call_handler (fcinfo=0x247c160) at plperl.c:1710 #2 0x005663fa in ExecMakeFunctionResult (fcache=0x247c0f0, econtext=0x247bf00, isNull=0x247ca78 "", isDone=0x247cb90) at execQual.c:1917 #3 0x00568942 in ExecTargetList (isDone=0x7fff1402fc0c, itemIsDone=0x247cb90, isnull=0x247ca78 "", values=0x247ca60, econtext=0x247bf00, targetlist=0x247cb60) at execQual.c:5210 #4 ExecProject (projInfo=, isDone=0x7fff1402fc0c) at execQual.c:5425 #5 0x00578aea in ExecResult (node=0x247bdf0) at nodeResult.c:155 #6 0x00561b18 in ExecProcNode (node=0x247bdf0) at execProcnode.c:367 #7 0x0055ef2a in ExecutePlan (dest=0xad4600, direction=, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x247bdf0, estate=0x247bce0) at execMain.c:1440 #8 standard_ExecutorRun (queryDesc=0x244f6d0, direction=out>, count=0) at execMain.c:314 #9 0x0058192d in _SPI_pquery (tcount=, fire_triggers=1 '\001', queryDesc=) at spi.c:2110 #10 _SPI_execute_plan (paramLI=0x245e1f0, snapshot=, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001', tcount=0, plan=) at spi.c:1922 #11 0x00581e57 in SPI_execute_plan_with_paramlist (plan=0x2476c30, params=0x245e1f0, read_only=0 '\000', tcount=0) at spi.c:423 #12 0x7ff47b0c7075 in exec_run_select (estate=0x7fff14030270, expr=0x24569f0, maxtuples=0, portalP=0x0) at pl_exec.c:4573 #13 0x7ff47b0ca7b4 in exec_stmt_perform (estate=0x7fff14030270, stmt=) at pl_exec.c:1413 #14 exec_stmt (stmt=0x2456ab0, estate=0x7fff14030270) at pl_exec.c:1289 #15 exec_stmts (estate=0x7fff14030270, stmts=) at pl_exec.c:1248 #16 0x7ff47b0cce59 in exec_stmt_block (estate=0x7fff14030270, block=0x2457448) at pl_exec.c:1047 #17 0x7ff47b0ca798 in exec_stmt (stmt=0x2457448, estate=0x7fff14030270) at pl_exec.c:1281 #18 exec_stmts (estate=0x7fff14030270, stmts=) at pl_exec.c:1248 #19 0x7ff47b0ccfcc in exec_stmt_block (estate=0x7fff14030270, block=0x2457508) at pl_exec.c:1186 #20 0x7ff47b0cda37 in plpgsql_exec_function (func=0x243a140, fcinfo=0x7fff14030580) at pl_exec.c:324 #21 0x7ff47b0c26d3 in plpgsql_call_handler (fcinfo=0x7fff14030580) at pl_handler.c:122 #22 0x00566d4d in ExecMakeTableFunctionResult (funcexpr=0x2443368, econtext=0x24428b0, expectedDesc=0x2443110, randomAccess=0 '\000') at execQual.c:2146 #23 0x00578381 in FunctionNext (node=0x24427a0) at nodeFunctionscan.c:65 #24 0x00568dde in ExecScanFetch (recheckMtd=0x578300 , accessMtd=0x578310 , node=0x24427a0) at execScan.c:82 #25 ExecScan (node=0x24427a0, accessMtd=0x578310 , recheckMtd=0x578300 ) at execScan.c:132 #26 0x00561a78 in ExecProcNode (node=0x24427a0) at execProcnode.c:416 #27 0x0055ef2a in ExecutePlan (dest=0xad4600, direction=, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x24427a0, estate=0x2442690) at execMain.c:1440 #28 standard_ExecutorRun (queryDesc=0x243f700, direction=out>, count=0) at execMain.c:314 #29 0x0058192d in _SPI_pquery (tcount=, fire_triggers=1 '\001', queryDesc=) at spi.c:2110 #30 _SPI_execute_plan (paramLI=0x243f670, snapshot=, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001', tcount=0, plan=) at spi.c:1922 #31 0x00581f39 in SPI_execute_plan (plan=0x23ee750, Values=0x243df38, Nulls=0x24376b0 " RCHAR", read_only=0 '\000', tcount=0) at spi.c:391 #32 0x7ff47babce2d in plperl_spi_exec_prepared (query=0x2437690 "0x22930e0", attr=0x0, argc=2, argv=0x243df18) at plperl.c:3425 #33 0x7ff47babe810 in XS__spi_exec_prepared (my_perl=out>, cv=) at SPI.xs:141 #34 0x7ff47b7e67ff in Perl_pp_entersub (my_perl=0x237d800) at pp_hot.c:3046 #35 0x7ff47b7ddc96 in Perl_runops_standard (my_perl=0x237d800) at run.c:41 #36 0x7ff47b77959e in Perl_call_sv (my_perl=0x237d800, sv=0x24017f8, flags=10) at perl.c:2647 #37 0x7ff47bab5f62 in plperl_call_perl_func (desc=0x24195a0, fcinfo=0x242fa90) at plperl.c:2056 #38 0x7ff47baba45b in plperl_func_handler (fcinfo=0x242fa90) at plperl.c:2196 #39 plperl_call_handler (fcinfo=0x242fa90) at plperl.c:1705 #40 0x005663fa in ExecMakeFunctionResult (fcache=0x242fa20, econtext=0x242f830, isNull=0x24303a8 "", isDone=0x24304c0) at execQual.c:1917 #41 0x00568942 in ExecTargetList (isDone=0x7fff140312fc, itemIsDone=0x24304c0, isnull=0x24303a8 "", values=0x2430390, econtext=0x242f830, targetlist=0x2430490) at execQual.c:5210 #42 ExecProject (projInfo=, is
[BUGS] BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
The following bug has been logged on the website: Bug reference: 7533 Logged by: Amit Kapila Email address: amit.kap...@huawei.com PostgreSQL version: 9.2.0 Operating system: Suse Description: M host is primary, S host is standby and CS host is cascaded standby. 1.Set up postgresql-9.2beta2/RC1 on all hosts. 2.Execute command initdb on host M to create fresh database. 3.Modify the configure file postgresql.conf on host M like this: listen_addresses = 'M' port = 15210 wal_level = hot_standby max_wal_senders = 4 hot_standby = on 4.modify the configure file pg_hba.conf on host M like this: host replication repl M/24md5 5.Start the server on host M as primary. 6.Connect one client to primary server and create a user ‘repl’ Create user repl superuser password '123'; 7.Use the command pg_basebackup on the host S to retrieve database of primary host pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h M -p 15210 -U repl –W 8. Copy one recovery.conf.sample from share folder of package to database folder of the host S. Then rename this file to recovery.conf 9.Modify the file recovery.conf on host S as below: standby_mode = on primary_conninfo = 'host=M port=15210 user=repl password=123' 10. Modify the file postgresql.conf on host S as follow: listen_addresses = 'S' 11.Start the server on host S as standby server. 12.Use the command pg_basebackup on the host CS to retrieve database of standby host pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h M -p 15210 -U repl –W 13.Modify the file recovery.conf on host CS as below: standby_mode = on primary_conninfo = 'host=S port=15210 user=repl password=123' 14. Modify the file postgresql.conf on host S as follow: listen_addresses = 'CS' 15.Start the server on host CS as Cascaded standby server node. 16. Try to connect a client to host CS but it gives error as: FATAL: the database system is starting up Observations related to bug -- In the above scenario it is observed that Start-up process has read all data (in our defect scenario minRecoveryPoint is 5016220) till the position 5016220 and then it goes and check for recovery consistency by following condition in function CheckRecoveryConsistency: if (!reachedConsistency && XLByteLE(minRecoveryPoint, EndRecPtr) && XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) At this point first two conditions are true but last condition is not true because still redo has not been applied and hence backupStartPoint has not been reset. So it does not signal postmaster regarding consistent stage. After this it goes and applies the redo and then reset backupStartPoint and then it goes to read next set of record. Since all records have been already read, so it starts waiting for the new record from the Standby node. But since there is no new record from Standby node coming so it keeps waiting for that and it does not get chance to recheck the recovery consistent level. And hence client connection does not get allowed. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
The following bug has been logged on the website: Bug reference: 7534 Logged by: Amit Kapila Email address: amit.kap...@huawei.com PostgreSQL version: 9.2.0 Operating system: Suse 10 Description: 1. Both master and standby machine are connected normally, 2. then you use the command: ifconfig ip down; make the network card of master and standby down, Observation master can detect connect abnormal, but the standby can't detect connect abnormal and show a connected channel long time. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6704: ALTER EXTENSION postgis SET SCHEMA leaves dangling relations
Excerpts from Dimitri Fontaine's message of mié sep 12 06:51:43 -0300 2012: > Hi, > > Sorry for being late at the party… been distracted away… Welcome ;-) > > On Fri, Jun 22, 2012 at 10:37:10PM -0400, Tom Lane wrote: > >> A bit of looking shows that ALTER EXTENSION SET SCHEMA calls > >> AlterObjectNamespace_oid on the table. AlterObjectNamespace_oid > >> calls AlterRelationNamespaceInternal, and nothing else. In comparison, > >> ALTER TABLE SET SCHEMA (AlterTableNamespace) calls > >> AlterRelationNamespaceInternal and about four other things. I'm not > >> sure if this was broken before the last round of refactoring in this > >> area, but for sure it's broken now. > > Looking at that code, my theory of how we got there is that in the > submitted extension patch I did only use DEPENDENCY_INTERNAL and Tom > introduced the much better DEPENDENCY_EXTENSION tracking. With the > former, indexes and sequences and constraints where found in the > dependency walking code, but only the main relation is now registered in > the later model. > > I need to do some testing about dependency tracking on SERIAL generated > sequences compared to manually created sequences in extension scripts, I > think we track sequences directly only in the manual case. Well, what I saw was that both the table and its SERIAL-generated sequence got an DEPENDENCY_EXTENSION row in pg_depend, which is exactly what (IMV) causes the problem. One of my proposals is to tweak the code to avoid that row (but if we do that, then we need to do something about databases that contain such rows today). > I think we need to share more code in between > AlterRelationNamespaceInternal and AlterTableNamespace, but I'm not sure > if that's not exactly what Álvaro did try with his _oid() attempt that > failed. What I did was create an AlterTableNamespaceInternal that goes through all the things that must be moved (this means calling AlterRelationNamespaceInternal for the table, and then doing some more calls to move the sequences, indexes, constraints). With this change, both AlterTableNamespace and AlterObjectNamespace_oid can call it. Previously, AlterObjectNamespace_oid was calling AlterRelationNamespaceInternal directly for the relation only. As far as I can see, that split of AlterTableNamespace is still needed. The problem here is that we need something *beyond* that fix. Did you look at my patches? Am I making sense? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Wed, Sep 12, 2012 at 1:54 PM, wrote: > The following bug has been logged on the website: > > Bug reference: 7534 > Logged by: Amit Kapila > Email address: amit.kap...@huawei.com > PostgreSQL version: 9.2.0 > Operating system: Suse 10 > Description: > > 1. Both master and standby machine are connected normally, > 2. then you use the command: ifconfig ip down; make the network card of > master and standby down, > > Observation > master can detect connect abnormal, but the standby can't detect connect > abnormal and show a connected channel long time. The master will detect it quicker, because it will get an error when it tries to send something. But the standby should detect it either when sending the feedback message (what's your wal_receiver_status_interval set to?) or when ythe kernel does (have you configured the tcp keepalive on the slave somehow?) Oh, and what do you actually mean by "long time"? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Wed, Sep 12, 2012 at 8:54 PM, wrote: > The following bug has been logged on the website: > > Bug reference: 7534 > Logged by: Amit Kapila > Email address: amit.kap...@huawei.com > PostgreSQL version: 9.2.0 > Operating system: Suse 10 > Description: > > 1. Both master and standby machine are connected normally, > 2. then you use the command: ifconfig ip down; make the network card of > master and standby down, > > Observation > master can detect connect abnormal, but the standby can't detect connect > abnormal and show a connected channel long time. What about setting keepalives_xxx libpq parameters? http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS Keepalives are not a perfect solution for the termination of connection, but it would help to a certain extent. If you need something like walreceiver-version of replication_timeout, such feature has not been implemented yet. Please feel free to implement that! Regards, -- Fujii Masao -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] initdb.exe changes --locale option
I've found a general solution: with the locale string, replace the first ", " (comma space) with "_". Around line 33 of initcluster.vbs, add: strLocale = Replace(strLocale,", ","_",1,1) I think it is fine to show "English, New Zealand" in the drop-down menu for the GUI installer, but initcluster.vbs needs to do the replacement to "English_New Zealand" in order to fulfil the correct initialisation. My testing was conducted using a Python script http://pastebin.com/9epyWz7x which produces a tab delimited table of input locales, and the locale chosen by initdb.exe, as well as the default language for text search configuration. The results from 200 locales shows some significant problems with locale detection, such that most "Language, Country" are substituted with only one country (you will pick up the pattern if you look at the data). Secondly, there are cases that are completely off: "Tamazight (Latin), Algeria" : "English_United Kingdom.1252", which is corrected to "Tamazight (Latin)_Algeria.1252" with the proper substitution. However, there are three corner cases (of 200) that either sort-of breaks things, or doesn't resolve anything: Original: Chinese (Traditional), Macao S.A.R. : Chinese (Traditional)_Taiwan.950 Replaced: Chinese (Traditional)_Macao S.A.R. : English_United Kingdom.1252 Original: Lao, Lao P.D.R. : Lao_Lao P.D.R..1252 Replaced: Lao_Lao P.D.R. : English_United Kingdom.1252 Original: Norwegian (Bokmål), Norway : English_United Kingdom.1252 Replaced: Norwegian (Bokmål)_Norway : English_United Kingdom.1252 (Note: I'm testing on a Windows Vista computer from the UK) Lastly, I had a look at the source code initdb.c, which appears to assume only POSIX locale of the format: [language[_territory][.codeset][@modifier]] E.g., see find_matching_ts_config, which assumes this locale format: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/initdb/initdb.c;h=824c7fa7e4c76e0a3b8204ce0cdd21564f23d5df;hb=HEAD#l886 It should probably handle the WIN32 logic separately from POSIX locales, but that's a deeper matter. -Mike -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Re: Probable bug with CreateFakeRelcacheEntry (now with reproducible test case)
Indeed, this is a nasty bug that leads to data corruption. The following sequence results in corruption of the visibility map, but I believe it can be shown to cause problems for a btree or GIN index as well. So it's recoverable if you do a VACUUM or a reindex. drop table foo; create table foo(i int); create index foo_idx on foo(i); insert into foo values(1); vacuum foo; checkpoint; insert into foo values(2); -- Send a SIGQUIT to the postmaster, and restart it. -- Now the buffer for the VM is in shared memory without BM_PERMANENT. -- VM block comes from backup block which still has VM bit set, but wal -- record for insert (2) unsets it in memory. delete from foo where i = 2; -- This checkpoint will *not* write out the buffer because it's not -- BM_PERMANENT. Therefore, it remains the same on disk, with the VM -- bit set. checkpoint; -- Send a SIGQUIT to the postmaster and restart. WAL records prior to -- that last checkpoint aren't replayed, so VM bit is still set. set enable_seqscan=false; select * from foo where i = 2; vacuum foo; WARNING: page is not marked all-visible but visibility map bit is set in relation "foo" page 0 VACUUM select * from foo; i --- 1 (1 row) This bug seems particularly troublesome because the right fix would be to include the relpersistence in the WAL records that need it. But that can't be backported (right?). The problem might not be a live problem before 7e4911b2ae33acff7b85234b91372133ec6df9d4, because that bug was hiding this one. Regards, Jeff Davis -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
On Wed, Sep 12, 2012 at 8:47 PM, wrote: > The following bug has been logged on the website: > > Bug reference: 7533 > Logged by: Amit Kapila > Email address: amit.kap...@huawei.com > PostgreSQL version: 9.2.0 > Operating system: Suse > Description: > > M host is primary, S host is standby and CS host is cascaded standby. > > 1.Set up postgresql-9.2beta2/RC1 on all hosts. > 2.Execute command initdb on host M to create fresh database. > 3.Modify the configure file postgresql.conf on host M like this: > listen_addresses = 'M' >port = 15210 >wal_level = hot_standby >max_wal_senders = 4 >hot_standby = on > 4.modify the configure file pg_hba.conf on host M like this: > host replication repl M/24md5 > 5.Start the server on host M as primary. > 6.Connect one client to primary server and create a user ‘repl’ > Create user repl superuser password '123'; > 7.Use the command pg_basebackup on the host S to retrieve database of > primary host > pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P > -v -h M -p 15210 -U repl –W > 8. Copy one recovery.conf.sample from share folder of package to database > folder of the host S. Then rename this file to recovery.conf > 9.Modify the file recovery.conf on host S as below: > standby_mode = on > primary_conninfo = 'host=M port=15210 user=repl password=123' > 10. Modify the file postgresql.conf on host S as follow: >listen_addresses = 'S' > 11.Start the server on host S as standby server. > 12.Use the command pg_basebackup on the host CS to retrieve database of > standby host > pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P > -v -h M -p 15210 -U repl –W > 13.Modify the file recovery.conf on host CS as below: >standby_mode = on >primary_conninfo = 'host=S port=15210 user=repl password=123' > 14. Modify the file postgresql.conf on host S as follow: > listen_addresses = 'CS' > 15.Start the server on host CS as Cascaded standby server node. > 16. Try to connect a client to host CS but it gives error as: > FATAL: the database system is starting up This procedures didn't reproduce the problem in HEAD. But when I restarted the master server between the step 11 and 12, I was able to reproduce the problem. > Observations related to bug > -- > In the above scenario it is observed that Start-up process has read all data > (in our defect scenario minRecoveryPoint is 5016220) till the position > 5016220 and then it goes and check for recovery consistency by following > condition in function CheckRecoveryConsistency: > if (!reachedConsistency && > XLByteLE(minRecoveryPoint, EndRecPtr) && > XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) > > At this point first two conditions are true but last condition is not true > because still redo has not been applied and hence backupStartPoint has not > been reset. So it does not signal postmaster regarding consistent stage. > After this it goes and applies the redo and then reset backupStartPoint and > then it goes to read next set of record. Since all records have been already > read, so it starts waiting for the new record from the Standby node. But > since there is no new record from Standby node coming so it keeps waiting > for that and it does not get chance to recheck the recovery consistent > level. And hence client connection does not get allowed. If cascaded standby starts a recovery at a normal checkpoint record, this problem will not happen. Because if wal_level is set to hot_standby, XLOG_RUNNING_XACTS WAL record always follows after the normal checkpont record. So while XLOG_RUNNING_XACTS record is being replayed, ControlFile->backupStartPoint can be reset, and then cascaded standby can pass through the consistency test. The problem happens when cascaded standby starts a recovery at a shutdown checkpoint record. In this case, no WAL record might follow the checkpoint one yet. So, after replaying the shutdown checkpoint record, cascaded standby needs to wait for new WAL record to appear before reaching the code block for resetting ControlFile->backupStartPoint. The cascaded standby cannot reach a consistent state and a client cannot connect to the cascaded standby until new WAL has arrived. Attached patch will fix the problem. In this patch, if recovery is beginning at a shutdown checkpoint record, any ControlFile fields (like backupStartPoint) required for checking that an end-of-backup is reached are not set at first. IOW, cascaded standby thinks that the database is consistent from the beginning. This is safe because a shutdown checkpoint record means that there is no running database activity at that point and the database is in consistent state. Comments? Review? Regards, -- Fujii Masao cascaded_standby_and_shutdown_checkpoint_v1.patch Description: Binary data -- Sent via pgsq
[BUGS] BUG #7535: ERROR: variable not found in subplan target list
The following bug has been logged on the website: Bug reference: 7535 Logged by: Louis-David Mitterrand Email address: l...@apartia.fr PostgreSQL version: 9.2.0 Operating system: debian testing Description: /*error in 9.2*/ drop database error92; create database error92; \c error92 CREATE TABLE price ( id_price serial primary key ); CREATE TABLE cabin_class ( id_cabin_class serial primary key, cabin_class_name text not null ); CREATE TABLE cabin_type ( id_cabin_type serial primary key, id_cabin_class integer references cabin_class not null, cabin_type_name text not null, cabin_type_code text not null ); CREATE TABLE cabin_category ( id_cabin_category serial primary key, id_cabin_type integer references cabin_type not null, cabin_cat_name text, cabin_cat_code text ); CREATE TABLE alert_cruise ( id_alert_cruise serial primary key, /*id_cruise integer references cruise not null, id_currency integer references currency,*/ enabled boolean default true not null, md5_code text DEFAULT md5(now()::text || random()::text) ); CREATE TABLE alert_cabin_category ( id_alert_cruise integer references alert_cruise, id_cabin_category integer references cabin_category not null ); CREATE TABLE alert_cabin_type ( id_alert_cruise integer references alert_cruise, id_cabin_type integer references cabin_type not null ); CREATE TABLE alert_cabin_class ( id_alert_cruise integer references alert_cruise, id_cabin_class integer references cabin_class not null ); CREATE VIEW alert_cruise_all AS select c.*, ac.enabled from ( -- ac.id_cabin_class is the first one, so other UNION parts will -- use that column name as well select 'class' as type,cl.cabin_class_name as type_name,ac.* from alert_cabin_class ac join cabin_class cl using (id_cabin_class) union all select 'type' as type,cl.cabin_class_name||case when cl.cabin_class_name = ct.cabin_type_name then '' else ' ~ '||ct.cabin_type_name end as type_name,ac.* from alert_cabin_type ac join cabin_type ct using (id_cabin_type) join cabin_class cl using (id_cabin_class) union all select 'category' as type,cl.cabin_class_name|| case when cl.cabin_class_name = ct.cabin_type_name then '' else ' ~ '||ct.cabin_type_name end||' ~ '||cc.cabin_cat_code as type_name,ac.* from alert_cabin_category ac join cabin_category cc join cabin_type ct using (id_cabin_type) join cabin_class cl using (id_cabin_class) using (id_cabin_category) ) as c join alert_cruise ac using (id_alert_cruise); create view alert_to_category as select ac.*, cl.cabin_class_name, ct.cabin_type_name, cc.cabin_cat_name, cc.id_cabin_category, cc.cabin_cat_code, case when type='class' then cl.cabin_class_name when type='type' then ct.cabin_type_name when type='category' then ct.cabin_type_name||' '||cc.cabin_cat_code end as cabin_name from alert_cruise_all ac left join cabin_class cl on ( (ac.type = 'class' and cl.id_cabin_class=ac.id_cabin_class) or (ac.type = 'type' and cl.id_cabin_class=(select id_cabin_class from cabin_type where id_cabin_type=ac.id_cabin_class)) or (ac.type = 'category' and cl.id_cabin_class=(select ct2.id_cabin_class from cabin_type ct2 join cabin_category cc2 using (id_cabin_type) where cc2.id_cabin_category=ac.id_cabin_class)) ) join cabin_type ct on ( (ac.type = 'class' and ct.id_cabin_class=cl.id_cabin_class) or (ac.type = 'type' and ct.id_cabin_type=ac.id_cabin_class) or (ac.type = 'category' and ct.id_cabin_type=(select id_cabin_type from cabin_category where id_cabin_category=ac.id_cabin_class)) ) join cabin_category cc on ( (ac.type = 'category' and cc.id_cabin_category=ac.id_cabin_class) or (ac.type != 'category' and ct.id_cabin_type=cc.id_cabin_type) ); select first_value(max(p.id_price)) over () as id_price1 ,ac.cabin_name from alert_to_category ac join price p on (p.id_price=ac.id_alert_cruise) group by ac.cabin_name; /* select first_value(max(p.id_price)) over () as id_price1 ,ac.cabin_name from alert_to_category ac join price p on (p.id_cruise=ac.id_cruise) group by ac.cabin_name; */ /*EOF*/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote: On Wed, Sep 12, 2012 at 1:54 PM, wrote: >> The following bug has been logged on the website: > >> Bug reference: 7534 >> Logged by: Amit Kapila >> Email address: amit.kap...@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse 10 >> Description: > >> 1. Both master and standby machine are connected normally, >> 2. then you use the command: ifconfig ip down; make the network card of >> master and standby down, > >> Observation >> master can detect connect abnormal, but the standby can't detect connect >> abnormal and show a connected channel long time. > The master will detect it quicker, because it will get an error when > it tries to send something. > But the standby should detect it either when sending the feedback > message (what's your wal_receiver_status_interval set to?) or when > ythe kernel does (have you configured the tcp keepalive on the slave > somehow?) wal_receiver_status_interval - 10s (we have not changed this. Used as default). We have tried by using tcp keepalive as well, it might not be able to detect as receiver is anyway trying to send Receiver status. It fails during send socket call from XLogWalRcvSendReply() after calling the same many times as internally might be in send() until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data. Also in walsender, it is failing to replication_timeout parameter not due to send failure. So in my opinion, the full-proof solution would be to have mechanism (replication_timeout) similar to walsender in walreceiver. > Oh, and what do you actually mean by "long time"? 15-20 mins. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Wednesday, September 12, 2012 10:15 PM Fujii Masao On Wed, Sep 12, 2012 at 8:54 PM, wrote: >> The following bug has been logged on the website: >> >> Bug reference: 7534 >> Logged by: Amit Kapila >> Email address: amit.kap...@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse 10 >> Description: > >> 1. Both master and standby machine are connected normally, >> 2. then you use the command: ifconfig ip down; make the network card of >> master and standby down, > >> Observation >> master can detect connect abnormal, but the standby can't detect connect >> abnormal and show a connected channel long time. > What about setting keepalives_xxx libpq parameters? > http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE YWORDS > Keepalives are not a perfect solution for the termination of connection, but > it would help to a certain extent. We have tried by enabling keepalive, but it didn't worked maybe because walreceiver is trying to send reveiver status. It fails in sending that after many attempts of same. > If you need something like walreceiver-version of replication_timeout, such feature has not been implemented yet. > Please feel free to implement that! I would like to implement such feature for walreceiver, but there is one confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for master or introduce a new configuration parameter (receiver_replication_timeout). The only point in having different timeout parameters for walsender and walreceiver is for the case of standby which has both walsender and walreceiver to send logs to cascaded standby, in such case somebody might want to have different timeout parameters for walsender and walreceiver. OTOH it will create confusion to have too many parameters. My opinion is to have one timeout parameter for both walsender and walrecevier. Let me know your suggestion/opinion about same. Note- I am marking cc to pgsql-hackers, as it will be a feature request. With Regards, Amit Kapila. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
On Thursday, September 13, 2012 12:34 AM Fujii Masao wrote: On Wed, Sep 12, 2012 at 8:47 PM, wrote: >> The following bug has been logged on the website: > >> Bug reference: 7533 >> Logged by: Amit Kapila >> Email address: amit.kap...@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse >> Description: > >> M host is primary, S host is standby and CS host is cascaded standby. > > This procedures didn't reproduce the problem in HEAD. But when I restarted > the master server between the step 11 and > 12, I was able to reproduce the problem. We also observed that it didn't appear in 9.2rc1 due to commit b8b69d89905e04b910bcd65efce1791477b45d35 by Tom. The reason is checkpoint WAL will come from master after above fix , and cascaded standby will not stuck in LOOP. However if we increase Checkpoint interval, this does appear, some times we need to try 4-5 times. >> Observations related to bug >> -- >> In the above scenario it is observed that Start-up process has read >> all data (in our defect scenario minRecoveryPoint is 5016220) till the >> position 5016220 and then it goes and check for recovery consistency >> by following condition in function CheckRecoveryConsistency: >> if (!reachedConsistency && >> XLByteLE(minRecoveryPoint, EndRecPtr) && >> XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) >> >> At this point first two conditions are true but last condition is not >> true because still redo has not been applied and hence >> backupStartPoint has not been reset. So it does not signal postmaster >> regarding consistent stage. >> After this it goes and applies the redo and then reset >> backupStartPoint and then it goes to read next set of record. Since >> all records have been already read, so it starts waiting for the new >> record from the Standby node. But since there is no new record from >> Standby node coming so it keeps waiting for that and it does not get >> chance to recheck the recovery consistent level. And hence client connection >> does not get allowed. If cascaded standby starts a recovery at a normal checkpoint record, this problem will not happen. Because if wal_level is set to hot_standby, XLOG_RUNNING_XACTS WAL record always follows after the normal checkpont record. So while XLOG_RUNNING_XACTS record is being replayed, ControlFile->backupStartPoint can be reset, and then cascaded standby can pass through the consistency test. > The problem happens when cascaded standby starts a recovery at a shutdown > checkpoint record. In this case, no WAL > record might follow the checkpoint one yet. So, after replaying the shutdown > checkpoint record, cascaded standby needs > to wait for new WAL record to > appear before reaching the code block for resetting > ControlFile->backupStartPoint. > The cascaded standby cannot reach a consistent state and a client cannot > connect to the cascaded standby until new WAL > has arrived. In the above scenario, we are not doing shutdown so how can shutdown checkpoint record can come. Also for the normal checkpoint case I have done brief analysis: I have observed in code that ControlFile->minRecoveryPoint is updated while replaying XLOG_BACKUP_END WAL. On hot standby S, this means that ControlFile->minRecoveryPoint will point to a lsn after checkpoint record. Now when start recovery on cascaded standby CS after basebackup from hot standby S, minRecoveryPoint should point to lsn after checkpoint record lsn, so it might create problem. > Attached patch will fix the problem. In this patch, if recovery is beginning > at a shutdown checkpoint record, any > ControlFile fields (like backupStartPoint) required for checking that an > end-of-backup is reached are not set at first. > IOW, cascaded standby thinks > that the database is consistent from the beginning. This is safe because a > shutdown > checkpoint record means that there is no running database activity at that > point and the database is in consistent > state. I shall test with the patch. With Regards, Amit Kapila. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs