Re: [HACKERS] pg_basebackup may fail to send feedbacks.

Kyotaro HORIGUCHI Tue, 24 Feb 2015 01:47:10 -0800

Hello, the attached is the v4 patch that checks feedback timing
every WAL segments boundary.


At Fri, 20 Feb 2015 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20150220.172914.241732690.horiguchi.kyot...@lab.ntt.co.jp>
> > Some users may complain about the performance impact by such
> > frequent calls and we may want to get rid of them from
> > walreceiver loop in the future.  If we adopt your idea now,
> > I'm afraid that it would tie our hands in that case.
> > 
> > How much impact can such frequent calls of gettimeofday()
> > have on replication performance? If it's not negligible,
> > probably we should remove them at first and find out another
> > idea to fix the problem you pointed. ISTM that it's not so
> > difficult to remove them. Thought? Do you have any numbers
> > which can prove that such frequent gettimeofday() has only
> > ignorable impact on the performance?
> 
> The attached patch is 'the more sober' version of SIGLARM patch.

I said that checking whether to send feedback every boundary
between WAL segments seemed too long but after some rethinking, I
changed my mind.

 - The most large possible delay source in the busy-receive loop
   is fsyncing at closing WAL segment file just written, this can
   take several seconds.  Freezing longer than the timeout
   interval could not be saved and is not worth saving anyway.

 - 16M bytes-disk-writes intervals between gettimeofday() seems
   to be gentle enough even on platforms where gettimeofday() is
   rather heavy.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 945e18713af86a357a7ac24ff5cd855e1f79a927 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Tue, 24 Feb 2015 17:52:01 +0900
Subject: [PATCH] Make effort to send feedback regulary on heavy load.

pg_basebackup and pg_receivexlog might be forced to omit sending
feedback for long time by continuous replication stream caused by
possible heavy load on receiver side. Keep alives from the server
could be delayed on such a situation. This patch let them make efforts
to send feedback on such a situation. On every boundary between WAL
segments, send feedback if so the time has come just after syncing and
closing the segment just finished.
---
 src/bin/pg_basebackup/receivelog.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 8caedff..5d15b11 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -45,7 +45,8 @@ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
 static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 							   XLogRecPtr *blockpos, uint32 timeline,
 							   char *basedir, stream_stop_callback stream_stop,
-							   char *partial_suffix, bool mark_done);
+							   char *partial_suffix, bool mark_done,
+							   int standby_message_timeout, int64 *last_status);
 static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
 									   XLogRecPtr blockpos, char *basedir, char *partial_suffix,
 									   XLogRecPtr *stoppos, bool mark_done);
@@ -906,7 +907,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			{
 				if (!ProcessXLogDataMsg(conn, copybuf, r, &blockpos,
 										timeline, basedir, stream_stop,
-										partial_suffix, mark_done))
+										partial_suffix, mark_done,
+										standby_message_timeout, &last_status))
 					goto error;
 
 				/*
@@ -1115,7 +1117,8 @@ static bool
 ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 				   XLogRecPtr *blockpos, uint32 timeline,
 				   char *basedir, stream_stop_callback stream_stop,
-				   char *partial_suffix, bool mark_done)
+				   char *partial_suffix, bool mark_done,
+				   int standby_message_timeout, int64 *last_status)
 {
 	int			xlogoff;
 	int			bytes_left;
@@ -1223,12 +1226,29 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 		/* Did we reach the end of a WAL segment? */
 		if (*blockpos % XLOG_SEG_SIZE == 0)
 		{
+			int64 now;
 			if (!close_walfile(basedir, partial_suffix, *blockpos, mark_done))
 				/* Error message written in close_walfile() */
 				return false;
 
 			xlogoff = 0;
 
+			/*
+			 * Continuous input stream might cause long duration after the
+			 * previous feedback. Here is a good point to check if the time to
+			 * feedback has come since the fsync done in close_walfile() might
+			 * have taken long time.
+			 */
+			now = feGetCurrentTimestamp();
+			if (standby_message_timeout > 0 &&
+				feTimestampDifferenceExceeds(*last_status, now,
+											 standby_message_timeout))
+			{
+				if (!sendFeedback(conn, *blockpos, now, false))
+					return false;
+				*last_status = now;
+			}
+
 			if (still_sending && stream_stop(*blockpos, timeline, true))
 			{
 				if (PQputCopyEnd(conn, NULL) <= 0 || PQflush(conn))
-- 
2.1.0.GIT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup may fail to send feedbacks.

Reply via email to