Re: [HACKERS] Sync Rep v17

Yeb Havinga Mon, 28 Feb 2011 01:33:11 -0800

On 2011-02-25 20:40, Jaime Casanova wrote:

On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga<yebhavi...@gmail.com>  wrote:

I also did some initial testing on this patch and got the queue related
errors with>  1 clients. With the code change from Jaime above I still got a
lot of 'not on queue warnings'.


I tried to understand how the queue was supposed to work - resulting in the
changes below that also incorporates a suggestion from Fujii upthread, to
early exit when myproc was found

yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...

I did some more testing over the weekend with this patched v17 patch.Since you've posted a v18 patch, let me write some findings with the v17patch before continuing with the v18 patch.

The tests were done on a x86_64 platform, 1Gbit network interfaces, 3servers. Non default configuration changes are copy pasted at the end ofthis mail.


1) no automatic switch to other synchronous standby
- start master server, add synchronous standby 1
- change allow_standalone_primary to off
- add second synchronous standby
- wait until pg_stat_replication shows both standby's are in STREAMING state
- stop standby 1

what happens is that the master stalls, where I expected that itwould've switched to standby 2 acknowledge commits.

The following thing was pilot error, but since I was test-piloting a newplane, I still think it might be usual feedback. In my opinion, anynumber and order of pg_ctl stops and starts on both the master andstandby servers, as long as they are not with -m immediate, should nevercause the state I reached.


2) reaching some sort of shutdown deadlock state
- start master server, add synchronous standby
- change allow_standalone_primary to off

then I did all sorts of test things, everything still ok. Then I wantedto shutdown everything, and maybe because of some symmetry (stack like)I did the following because I didn't think it through- pg_ctl stop on standby (didn't actualy wait until done, butimmediately in other terminal)

- pg_ctl stop on master
O wait.. master needs to sync transactions
- start standby again. but now: FATAL:  the database system is shutting down

There is no clean way to get out of this situation.allow_standalone_primary in the face of shutdowns might be tricky. Maybeshutdown must be prohibited to enter the shutting down phase inallow_standalone_primary = off together with no sync standby, that wouldallow for the sync standby to attach again.


3) PANIC on standby server

At some point a standby suddenly disconnected after I started a newpgbench run on a existing master/standby pair, with the following errorin the logfile.


LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
PANIC:  XX000: heap_update_redo: failed to add tuple

CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new305453/102

LOCATION:  heap_xlog_update, heapam.c:4724
LOG:  00000: startup process (PID 32597) was terminated by signal 6: Aborted

This might be due to pilot error as well; I did a several tests over theweekend and after this error I was more alert on remembering immediateshutdowns/starting with a clean backup after that, and didn't seesimilar errors since.

4) The performance of the syncrep seems to be quite an improvement overthe previous syncrep patches, I've seen tps-ses of O(650) where theothers were more like O(20). The O(650) tps is limited by the speed ofthe standby server I used-at several times the master would halt onlybecause of heavy disk activity at the standby. A warning in the docsmight be right: be sure to use good IO hardware for your synchronousreplicas! With that bottleneck gone, I suspect the current syncrepversion can go beyond 1000tps over 1 Gbit.


regards,
Yeb Havinga

recovery.conf:
standby_mode = 'on'

primary_conninfo = 'host=mg73 user=repuser password=pwdapplication_name=standby1'

trigger_file = '/tmp/postgresql.trigger.5432'

postgresql.conf nondefault parameters:
log_error_verbosity = verbose
log_min_messages = warning
log_min_error_statement = warning
listen_addresses = '*'                # what IP address(es) to listen on;
search_path='\"$user\", public, hl7'
archive_mode = on

archive_command = 'test ! -f /data/backup_in_progress || cp -i %p/archive/%f < /dev/null'

checkpoint_completion_target = 0.9
checkpoint_segments = 16
default_statistics_target = 500
constraint_exclusion = on
max_connections = 120
maintenance_work_mem = 128MB
effective_cache_size = 1GB
work_mem = 44MB
wal_buffers = 8MB
shared_buffers = 128MB
wal_level = 'archive'
max_wal_senders = 4
wal_keep_segments = 1000 # 16000MB (for production increase this)
synchronous_standby_names = 'standby1,standby2,standby3'
synchronous_replication = on
allow_standalone_primary = off


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

Reply via email to