Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-28 Thread Alex Kliukin
On Mon, Jan 28, 2019, at 10:25, Michael Paquier wrote:
> On Mon, Jan 28, 2019 at 09:05:26AM +0100, Magnus Hagander wrote:

> > And for plain format, we'd do the same -- sync after each file segment, and
> > then a final one of the directory when done, right?
> 
> Well, the code is doing a double amount of work in its current shape
> as we call fsync_pgdata() for the plain format, which cascades to
> pg_wal and all its files, so it seems to me that there is little point
> in issuing a sync when each segment is finished streaming if that's
> what you mean.

Agreed.

While reading the doc page for the pg_basebackup, I've been confused by the 
fact that it says WAL files will be written to .tarballs (either base.tar or 
pg_wal.tar) when pg_basebackup is instructed to stream WALs alongside the 
backup itself. I think it makes sense to elaborate that it only happens when 
tar format is specified (doc patch is attached).

Cheers,
Oleksii


diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index 57dc83b620..c4f3950e5b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -306,8 +306,8 @@ PostgreSQL documentation
  backup will fail and be unusable.


-The write-ahead log files will be written to
-the base.tar file.
+When tar format mode is used, the write-ahead log files will be
+written to the base.tar file.

   
  
@@ -326,9 +326,10 @@ PostgreSQL documentation
  requires no extra write-ahead logs to be saved on the master.


-The write-ahead log files are written to a separate file
-named pg_wal.tar (if the server is a version
-earlier than 10, the file will be named 
pg_xlog.tar).
+When tar format mode is used, the write-ahead log files will be
+written to a separate file named pg_wal.tar 
+(if the server is a version earlier than 10, the file will be named
+pg_xlog.tar).


 This value is the default.


'tuple concurrently updated' error w/o visible catalog updates

2018-05-17 Thread Alex Kliukin
Hello,

Earlier this week we have split our Postgres 9.6.8 shards, each having two
databases, into one database per shard setup. This was done by promoting
replicas and subsequently removing unused databases.

Immediately afterwards we have discovered repeated 'tuple concurrently updated'
errors on most of those new shards.

The error is always shown at the same statement:
ERROR,XX000,"tuple concurrently updated","SQL statement ""UPDATE
config_content SET cc_content = l_config_content WHERE cc_config_content_id =
l_ccm_content_id""

By searching the archives (i.e.
https://www.postgresql.org/messageid/flat/CAB7nPqSZCkVfibTvx9TYmHYhVtV_vOMNwOpLHnRU85qeiimUaQ%40mail.gmail.com#cab7npqszckvfibtvx9tymhyhvtv_vomnwoplhnru85qeiim...@mail.gmail.com)
I’ve got an impression that this error manifests itself when system catalog
tuples are updated concurrently, however I see none of that in the query that
leads to an ERROR. There are no triggers on 'config_content' table, neither
there are any views referring to it.

The errors stopped when we disabled a call to the 'upsert_foo_content' function
(here and below I obfuscated real names). This is a fairly simple pl/pgsql 
function
that does a few selects and an upsert. The block inside that function that 
contains
the statement at fault is:

--
SELECT ccm_content_id, ccm_simple_update_received_at INTO l_ccm_content_id, 
l_ccm_simple_update_received_at FROM config_content_metadata
WHERE  ccm_config_id = l_c_id AND ccm_sales_channel_id = l_sales_channel_id;

IF (l_ccm_content_id IS NULL) THEN
 -- insert config content --
 INSERT INTO config_content_metadata(ccm_config_id, ccm_sales_channel_id, 
ccm_update_caused_by, ccm_simple_update_eid, ccm_simple_update_received_at)
 VALUES(l_c_id, l_sales_channel_id, l_rp_id, l_content_update_eid, 
l_content_update_received_at) RETURNING ccm_content_id INTO l_ccm_content_id;

 INSERT INTO config_content(cc_config_content_id, cc_content) VALUES 
(l_ccm_content_id, l_config_content);

ELSIF (l_ccm_simple_update_received_at < l_content_update_received_at) THEN

 UPDATE  config_content_metadata
 SET ccm_update_caused_by = l_rp_id, ccm_simple_update_eid = 
l_content_update_eid, ccm_simple_update_received_at = 
l_content_update_received_at, ccm_updated_at = now()
 WHERE ccm_content_id = l_ccm_content_id;

 -- XXX problematic statement XXX
 UPDATE config_content SET cc_content = l_config_content WHERE 
cc_config_content_id = l_ccm_content_id;

END IF;
--

Note that config_content references config_metdata with a foreign key, however,
the referenced column is not updated.

That 'upsert_foo_content' is called by another one,  upsert_foo_content_batch,
in a loop over the elements of a JSON array, something like:

--
CREATE OR REPLACE FUNCTION upsert_foo_content_batch(p_batch jsonb)
RETURN void LANGUAGE plpgpsql
AS $function$
DECLARE
...
BEGIN
FOR item IN SELECT * FROM jsonb_array_elements(p_batch)
LOOP
  -- some unpacking of fields from json into the local variables
  PERFORM upsert_foo_content(..) -- called with the unpacked variables
END LOOP;
END;
$function$
--

'upsert_foo_content_batch' is called, in order, at the end of a long pl/pgsql
function 'upsert_foo_event_batch', which consists of a very long CTE that
extracts individual fields from a JSON argument, and then performs a number of
inserts into some tables, doing on conflict do nothing, afterwards performing
more inserts into the tables that reference the previous ones, doing on
conflict do update. However, it modifies neither 'config_content' or
'config_content_metadata' tables.

So the chain of calls is
'upsert_foo_event_batch' -> 
'upsert_foo_content_batch' -> 
   'upsert_foo_content'.
(the last one contains the statement that leads to the "tuple concurrently 
updated" error).


It is possible that 'upsert_foo_content' function is called with the same data
multiple times in different processes, however, I’d expect it to either
complete successfully, or throw an error because the PK already exists (this is
running in a read committed mode, so ISTM not immune to the case where the row
in the metadata table is inserted after another session does the check, but
before the insert), but not an error mentioned at the beginning of this
message.

Are there any signs in this description that the queries might be doing
something unexpected to PostgreSQL, or that something went wrong during the
split?  I am running out of options of what could cause the issue, so any
pointers or help in debugging it is appreciated (note that this is a production
database, I cannot just stop it at will).

Cheers,
Oleksii



Re: GSoC 2018

2017-12-15 Thread Alex Kliukin
Hello,

On Fri, Dec 15, 2017, at 14:30, Stephen Frost wrote:
> Aleksander,
> 
> * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> 
> > Regarding the difficulty of the project - in fact it's not that
> > difficult. Particularly this project can rely on external tools, e.g.
> > use Consul for service discovery and leader election based on
> > leader-lease approach (implementation [1]). Having this the only thing
> > is missing is automatic replica promoting and (optionally)
> > re-configuring of HAProxy / pgbouncer / whatever. Yes, and lots of
> > Jepsen-like test of course. I believe it's not such a complicated
> > project.

Does it make sense to address the  limitations of the logical
replication first, i.e. inability to replicate DDL, sequences and so on?

> 
> What you're talking about is rebuilding Patroni, but adding more into it
> than even Patroni tries to do, building it on Logical Replication
> instead of physical replication, and calling it simple and something
> that could get into core PostgreSQL over a 12 week GSoC project.  I've
> certainly got doubts about that, even if we decide that it'd be an
> external-to-PG project (like Patroni).
> 
> What might be interesting is seeing if Logical Replication could be
> added to Patroni as an option and then building on that..  Having
> someone involved in the Patroni project would be the right way to go
> about proposing that though to see what they think of it.  That would
> also be much more sensible as a GSoC project, since it'd be an addition
> to an existing project and not more-or-less starting a whole new
> project.

Right now logical replication and  physical replication-based HA tools
don't work together nicely, since logical replication position is not
propagated to the promoted replica (I think Craig Ringer has been
tackling this issue for a few releases already, the latest set of
patches I could find is https://commitfest.postgresql.org/15/788/).
Perhaps there is opportunity for a GSoC student to help fixing it. Until
then  we cannot use logical replication for HA, and even doing something
simpler like automating creation of logical replicas in Patroni makes
little sense, as they are doomed to be reinitialized on every failover.

-- 
Sincerely,
Alex



Re: GSoC 2018

2017-12-15 Thread Alex Kliukin


On Fri, Dec 15, 2017, at 14:52, Aleksander Alekseev wrote:

> Completely agree, this project can be an improvement for Stolon (or
> Patroni, but I personally never tested or used it, also I got a feeling
> that Google guys will prefer a project that is written in Go). This
> would make much more sense.

I don't believe Google will reject a project based on the fact that it
is written in Python (in fact, Python Software Foundation has
successfully participated in GSoC for many years). Based on the github
statistics, Patroni has started earlier and has more contributors than
Stolon (including those contributed more than one patch/pull-request.)
-- 
Sincerely,
Alex