Hello.

This version makes subscriber automatically set its database
encoding to clinet_encoding (if not explicitly set). And changed
the behavior when pg_server_to_client returns NULL to ERROR from
sending original string.

At Wed, 1 Feb 2017 08:39:41 +0000, "Shinoda, Noriyoshi" 
<noriyoshi.shin...@hpe.com> wrote in 
<at5pr84mb0084a18d3bf1d93b862e95e4ee...@at5pr84mb0084.namprd84.prod.outlook.com>
> Thank you for creating patches.
> I strongly hope that your patch will be merged into the new
> version.  Since all databases are not yet based on UTF - 8, I
> think conversion of character codes is still necessary.

Thanks.

> -----Original Message-----
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] 
> Sent: Wednesday, February 01, 2017 3:31 PM
> To: Shinoda, Noriyoshi <noriyoshi.shin...@hpe.com>
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Logical Replication and Character encoding
> 
> At Wed, 01 Feb 2017 12:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> <horiguchi.kyot...@lab.ntt.co.jp> wrote in 
> <20170201.121304.267734380.horiguchi.kyot...@lab.ntt.co.jp>
> > > >  I tried a committed Logical Replication environment. I found  
> > > > that replication between databases of different encodings did  not 
> > > > convert encodings in character type columns. Is this  behavior 
> > > > correct?
> > > 
> > > The output plugin for subscription is pgoutput and it currently 
> > > doesn't consider encoding but would easiliy be added if desired 
> > > encoding is informed.
> > > 
> > > The easiest (but somewhat seems fragile) way I can guess is,
> > > 
> > > - Subscriber connects with client_encoding specification and the
> > >   output plugin pgoutput decide whether it accepts the encoding
> > >   or not. If the subscriber doesn't, pgoutput send data without
> > >   conversion.
> > > 
> > > The attached small patch does this and works with the following 
> > > CREATE SUBSCRIPTION.
> > 
> > Oops. It forgets to care conversion failure. It is amended in the 
> > attached patch.
> > 
> > > CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 
> > > dbname=postgres client_encoding=EUC_JP' PUBLICATION pub1;
> > > 
> > > 
> > > Also we may have explicit negotiation on, for example, 
> > > CREATE_REPLICATION_SLOT.
> > > 
> > >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> > > 
> > > Or output plugin may take options.
> > > 
> > >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> > > 
> > > 
> > > Any opinions?
> 
> This patch chokes replication when the publisher finds an inconvertible 
> character in a tuple to be sent. For the case, dropping-then-recreating 
> subscription is necessary to go forward.

I think this behavior is right and inevitable in order to protect
subscribers from broken strings.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e3af51822d1318743e554e24163390b74b254751 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Thu, 2 Feb 2017 09:05:20 +0900
Subject: [PATCH] Enable logical-replication between databases with different
 encodings

Different from physical replication, logical replication may run
between databases with different encodings. This patch makes
subscriber set client_encoding to database encoding and publisher
follow it.
---
 .../libpqwalreceiver/libpqwalreceiver.c            | 35 +++++++++++++++++++++-
 src/backend/replication/logical/proto.c            | 17 +++++++++++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 44a89c7..ef38af7 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -24,6 +24,7 @@
 #include "access/xlog.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "replication/walreceiver.h"
 #include "storage/proc.h"
@@ -112,11 +113,43 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 				 char **err)
 {
 	WalReceiverConn *conn;
+	const char *myconninfo = conninfo;
 	const char *keys[5];
 	const char *vals[5];
 	int			i = 0;
 
 	/*
+	 * For logical replication, set database encoding as client_encoding if
+	 * not specified in conninfo
+	 */
+	if (logical)
+	{
+		PQconninfoOption   *opts = NULL;
+
+		opts = PQconninfoParse(conninfo, NULL);
+
+		if (opts != NULL)
+		{
+			while (opts->keyword != NULL &&
+				   strcmp(opts->keyword, "client_encoding") != 0)
+				opts++;
+
+			/* add client_encoding to conninfo if not set */
+			if (opts->keyword == NULL || opts->val == NULL)
+			{
+				StringInfoData s;
+
+				/* Assuming that the memory context here is properly set */
+				initStringInfo(&s);
+				appendStringInfoString(&s, conninfo);
+				appendStringInfo(&s, " client_encoding=\"%s\"",
+								 GetDatabaseEncodingName());
+				myconninfo = s.data;
+			}
+		}
+	}
+
+	/*
 	 * We use the expand_dbname parameter to process the connection string (or
 	 * URI), and pass some extra options. The deliberately undocumented
 	 * parameter "replication=true" makes it a replication connection. The
@@ -124,7 +157,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	 * "replication" for .pgpass lookup.
 	 */
 	keys[i] = "dbname";
-	vals[i] = conninfo;
+	vals[i] = myconninfo;
 	keys[++i] = "replication";
 	vals[i] = logical ? "database" : "true";
 	if (!logical)
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 1f30de6..97f928e 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,6 +443,22 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+		{
+			char *p;
+
+			p = pg_server_to_client(outputstr, strlen(outputstr));
+
+			/* Error out if failed */
+			if (!p)
+				ereport(ERROR,
+						(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
+						 errmsg("character conversion failed")));
+
+			pfree(outputstr);
+			outputstr = p;
+		}
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
 		appendBinaryStringInfo(out, outputstr, len); /* data */
-- 
2.9.2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to