date:20210110

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-10 Thread Noah Misch

On Sun, Jan 10, 2021 at 11:44:14AM +0500, Andrey Borodin wrote:
> > 10 янв. 2021 г., в 03:15, Noah Misch  написал(а):
> > 
> > No; it deletes the most recent ~1B and leaves the older segments.  An
> > exception is multixact, as described in the commit message and the patch's
> > change to a comment in TruncateMultiXact().
> 
> Thanks for clarification.
> One more thing: retention point at 3/4 of overall space (half of wraparound) 
> seems more or less random to me. Why not 5/8 or 9/16?

No reason for that exact value.  The purpose of that patch is to mitigate bugs
that cause the server to write data into a region of the SLRU that we permit
truncation to unlink.  If the patch instead tested "diff > INT_MIN * .99", the
new behavior would get little testing, because xidWarnLimit would start first.
Also, the new behavior wouldn't mitigate bugs that trespass >~20M XIDs into
unlink-eligible space.  If the patch tested "diff > INT_MIN * .01", more sites
would see disk consumption grow.  I think reasonable multipliers range from
0.5 (in the patch today) to 0.9, but it's a judgment call.

> Can you please send revised patches with fixes?

Attached.
Author: Noah Misch 
Commit: Noah Misch 

Prevent excess SimpleLruTruncate() deletion.

Every core SLRU wraps around.  With the exception of pg_notify, the wrap
point can fall in the middle of a page.  Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback.  Update each callback implementation to fit the new
specification.  This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.)  The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb.  Back-patch
to 9.5 (all supported versions).

Reviewed by Andrey Borodin and Tom Lane.

Discussion: https://postgr.es/m/20190202083822.gc32...@gust.leadboat.com

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 69a81f3..410d02a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -694,6 +694,7 @@ CLOGShmemInit(void)
SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
  XactSLRULock, "pg_xact", 
LWTRANCHE_XACT_BUFFER,
  SYNC_HANDLER_CLOG);
+   SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
 /*
@@ -912,13 +913,22 @@ TruncateCLOG(TransactionId oldestXact, Oid 
oldestxid_datoid)
 
 
 /*
- * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ * Decide whether a CLOG page number is "older" for truncation purposes.
  *
  * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
+ * thing with wraparound XID arithmetic.  However, TransactionIdPrecedes()
+ * would get weird about permanent xact IDs.  So, offset both such that xid1,
+ * xid2, and xid2 + CLOG_XACTS_PER_PAGE - 1 are all normal XIDs; this offset
+ * is relevant to page 0 and to the page preceding page 0.
+ *
+ * The page containing oldestXact-2^31 is the important edge case.  The
+ * portion of that page equaling or following oldestXact-2^31 is expendable,
+ * but the portion preceding oldestXact-2^31 is not.  When oldestXact-2^31 is
+ * the first XID of a page and segment, the entire page and segment is
+ * expendable, and we could truncate the segment.  Recognizing that case would
+ * require making oldestXact, not just the page containing oldestXact,
+ * available to this callback.  The benefit would be rare and small, so we
+ * don't optimize that edge case.
  */
 static bool
 CLOGPagePrecedes(int page1, int page2)
@@ -927,11 +937,12 @@ CLOGPagePrecedes(int page1, int page2)
TransactionId xid2;
 
xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
-   xid1 += FirstNormalTransactionId;
+   xid1 += FirstNormalTransactionId + 1;
xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
-   xid2 += FirstNormalTransactionId;
+   xid2 += FirstNormalTransactionId + 1;
 
-   return TransactionIdPrecedes(xid1, xid2);
+   return (TransactionIdPrecedes(xid1, xid2) &&
+   TransactionIdPrecedes(xid1, xid2 + CLOG_XACTS_PER_PAGE 
- 1));
 }
 
 
diff --git a/src/backend/access/transam/commit_ts.c 
b/src/backend/access/transam/commit_ts.c
index b786eef..9f42461 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -557,6 +557,7 @@ CommitTsShmemInit(void)

Re: Added schema level support for publication.

2021-01-10 Thread Dilip Kumar

On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
>  wrote:
> > I think this feature can be useful, in case a user has a lot of tables
> > to publish inside a schema. Having said that, I wonder if this feature
> > mandates users to create the same schema with same
> > permissions/authorizations manually on the subscriber, because logical
> > replication doesn't propagate any ddl's so are the schema or schema
> > changes? Or is it that the list of tables from the publisher can go
> > into a different schema on the subscriber?
> >
> > Since the schema can have other objects such as data types, functions,
> > operators, I'm sure with your feature, non-table objects will be
> > skipped.
> >
> > As Amit pointed out earlier, the behaviour when schema dropped, I
> > think we should also consider when schema is altered, say altered to a
> > different name, maybe we should change that in the publication too.
> >
> > In general, what happens if we have some temporary tables or foreign
> > tables inside the schema, will they be allowed to send the data to
> > subscribers?
> >
> > And, with this feature, since there can be many huge tables inside a
> > schema, the initial table sync phase of the replication can take a
> > while.
> >
> > Say a user has created a publication for a schema with hundreds of
> > tables in it, at some point later, can he stop replicating a single or
> > some tables from that schema?
> >
> > IMO, it's better to have the syntax - CREATE PUBLICATION
> > production_publication FOR ALL TABLES IN SCHEMA production - just
> > added IN between for all tables and schema.
> >
> > Say a user has a schema with 121 tables in it, and wants to replicate
> > only 120 or 199 or even lesser tables out of it, so can we have some
> > skip option to the new syntax, something like below?
> > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> > production WITH skip = marketing, accounts, sales;  --> meaning is,
> > replicate all the tables in the schema production except marketing,
> > accounts, sales tables.
>
> One more point - if the publication is created for a schema with no or
> some initial tables, will all the future tables that may get added to
> the schema will be replicated too?
>

I expect this should be the behavior.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Inconsistent "" use

2021-01-10 Thread Tatsuo Ishii

In doc/src/sgml/func.sgml description of SHOW command use
"SQL", while SET command description the same
section does not use "". Shouldn't the description of SET use
"" for "SQL" as well? Patch attached.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02a37658ad..ecb66f9c3f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24551,7 +24551,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
 If is_local is true, the new
 value will only apply for the current transaction. If you want the new
 value to apply for the current session, use false
-instead. This function corresponds to the SQL
+instead. This function corresponds to the SQL
 command SET.

invalid data in file backup_label problem on windows

2021-01-10 Thread Wang, Shenhao

Hi hackers.

When I using a Non-Exclusive Low-Level Backup on windows, "invalid data in file 
backup_label" will be shown.

The code are listed below

if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
   &hi, &lo, &tli_from_walseg, startxlogfilename, &ch) 
!= 5 || ch != '\n')
ereport(FATAL,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("invalid data in file \"%s\"", 
BACKUP_LABEL_FILE)));

When I wrote the backup_label on windows, CRLF will at the end of line, so the 
ch will be '\r'. 

I'm not sure that these two files(tablespace_map and backup_label) should not 
use CRLF new line style is mentioned in manual[1].
How about setting the text mode when reading these 2 files like patch listed in 
attachment?

Any thought?

[1] 
https://www.postgresql.org/docs/13/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP


Best Regards,
Shenhao Wang






0001-set-text-mode-when-reading-backup_label-and-tablesap.patch
Description: 0001-set-text-mode-when-reading-backup_label-and-tablesap.patch

Re: support for MERGE

2021-01-10 Thread Vik Fearing

On 1/10/21 2:44 AM, Tomas Vondra wrote:
> 5) WHEN AND
> 
> I admit the "WHEN AND" conditions sounds a bit cryptic - it took me a
> while to realize what this refers to. Is that a term established by SQL
> Standard, or something we invented?

The grammar gets it right but the error messages are nonsensical to me.
 I would like to see all user-facing instances of "WHEN AND" be replaced
by the admittedly more verbose "WHEN [NOT] MATCHED AND".
-- 
Vik Fearing

Re: proposal: schema variables

2021-01-10 Thread Pavel Stehule

pá 8. 1. 2021 v 18:54 odesílatel Erik Rijkers  napsal:

> On 2021-01-08 07:20, Pavel Stehule wrote:
> > Hi
> >
> > just rebase
> >
> > [schema-variables-20200108.patch]
>
> Hey Pavel,
>
> My gcc 8.3.0 compile says:
> (on debian 10/Buster)
>
> utility.c: In function ‘CreateCommandTag’:
> utility.c:2332:8: warning: this statement may fall through
> [-Wimplicit-fallthrough=]
>  tag = CMDTAG_SELECT;
>  ^~~
> utility.c:2334:3: note: here
> case T_LetStmt:
> ^~~~
>

yes, there was an error from the last rebase. Fixed


>
> compile, check, check-world, runs without further problem.
>
> I also changed a few typos/improvements in the documentation, see
> attached.
>
> One thing I wasn't sure of: I have assumed that
>ON TRANSACTIONAL END RESET
>
> should be
>ON TRANSACTION END RESET
>
> and changed it accordingly, please double-check.
>

It looks well, I merged these changes to patch. Thank you very much for
these corectures

Updated patch attached

Regards

Pavel


>
> Erik Rijkers
>


schema-variables-20210110.patch.gz
Description: application/gzip

Re: [HACKERS] Custom compression methods

2021-01-10 Thread Justin Pryzby

On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  wrote:
> > And fails pg_upgrade check, apparently losing track of the compression (?)
> >
> >  CREATE TABLE public.cmdata2 (
> > -f1 text COMPRESSION lz4
> > +f1 text
> >  );
> 
> I did not get this?  pg_upgrade check is passing for me.

I realized that this was failing in your v16 patch sent Dec 25.
It's passing on current patches because they do "DROP TABLE cmdata2", but
that's only masking the error.

I think this patch needs to be specifically concerned with pg_upgrade, so I
suggest to not drop your tables and MVs, to allow the pg_upgrade test to check
them.  That exposes this issue:

pg_dump: error: Error message from server: ERROR:  cache lookup failed for 
access method 36447
pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout;
pg_dumpall: error: pg_dump failed on database "regression", exiting
waiting for server to shut down done
server stopped
pg_dumpall of post-upgrade database cluster failed

I found that's the AM's OID in the old clsuter:
regression=# SELECT * FROM pg_am WHERE oid=36447;
  oid  | amname |  amhandler  | amtype 
---++-+
 36447 | pglz2  | pglzhandler | c

But in the new cluster, the OID has changed.  Since that's written into table
data, I think you have to ensure that the compression OIDs are preserved on
upgrade:

 16755 | pglz2  | pglzhandler  | c

In my brief attempt to inspect it, I got this crash:

$ tmp_install/usr/local/pgsql/bin/postgres -D src/bin/pg_upgrade/tmp_check/data 
&
regression=# SELECT pg_column_compression(f1) FROM cmdata a;
server closed the connection unexpectedly

Thread 1 "postgres" received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x55c6049fde62 in cstring_to_text (s=0x0) at varlena.c:193
#2  pg_column_compression () at varlena.c:5335

(gdb) up
#2  pg_column_compression () at varlena.c:5335
5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name(
(gdb) l
5333varvalue = (struct varlena *) DatumGetPointer(value);
5334
5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name(
5336
toast_get_compression_oid(varvalue;

I guess a missing AM here is a "shouldn't happen" case, but I'd prefer it to be
caught with an elog() (maybe in get_am_name()) or at least an Assert.

-- 
Justin

Re: Added schema level support for publication.

2021-01-10 Thread vignesh C

Thanks for your comments Bharath, please find my opinion below.

On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
 wrote:
> I think this feature can be useful, in case a user has a lot of tables
> to publish inside a schema. Having said that, I wonder if this feature
> mandates users to create the same schema with same
> permissions/authorizations manually on the subscriber, because logical
> replication doesn't propagate any ddl's so are the schema or schema
> changes? Or is it that the list of tables from the publisher can go
> into a different schema on the subscriber?
>

DDL's will not be propagated to the subscriber. Users have to create
the schema & tables in the subscriber. No change in
Permissions/authorizations handling, it will be the same as the
existing behavior for relations.

> Since the schema can have other objects such as data types, functions,
> operators, I'm sure with your feature, non-table objects will be
> skipped.
>

Yes, only table data will be sent to subscribers, non-table objects
will be skipped.

> As Amit pointed out earlier, the behaviour when schema dropped, I
> think we should also consider when schema is altered, say altered to a
> different name, maybe we should change that in the publication too.
>

I agree that when schema is altered the renamed schema should be
reflected in the publication.

> In general, what happens if we have some temporary tables or foreign
> tables inside the schema, will they be allowed to send the data to
> subscribers?
>

Temporary tables & foreign tables will not be added to the publications.

> And, with this feature, since there can be many huge tables inside a
> schema, the initial table sync phase of the replication can take a
> while.
>

Yes this is required.

> Say a user has created a publication for a schema with hundreds of
> tables in it, at some point later, can he stop replicating a single or
> some tables from that schema?
>

There is no provision for this currently.

> IMO, it's better to have the syntax - CREATE PUBLICATION
> production_publication FOR ALL TABLES IN SCHEMA production - just
> added IN between for all tables and schema.
>

I'm ok with the proposed syntax, I would like others' opinion too
before making the change.

> Say a user has a schema with 121 tables in it, and wants to replicate
> only 120 or 199 or even lesser tables out of it, so can we have some
> skip option to the new syntax, something like below?
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> production WITH skip = marketing, accounts, sales;  --> meaning is,
> replicate all the tables in the schema production except marketing,
> accounts, sales tables.
>

Yes this is a good use case, will include this change.

Thanks for the comments, I will handle the comments and post a patch for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Added schema level support for publication.

2021-01-10 Thread vignesh C

On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy
 wrote:
>
> One more point - if the publication is created for a schema with no or
> some initial tables, will all the future tables that may get added to
> the schema will be replicated too?
>

I agree on this, when a relation is added to the schema it should be
added to the publication.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-10 Thread Pavel Stehule

Hi


> I'm thinking of the update path as a kind of implicit schema. JSON is
> intentionally not bound to any schema on creation, so I don't see a
> failure to enforce another schema at runtime (and outside the WHERE
> clause, at that) as an error exactly.
>

This concept is not consistent with other implemented behaviour.

1. The schema is dynamically enhanced - so although the path doesn't
exists, it is created and data are changed

postgres=# create table foo(a jsonb);
CREATE TABLE
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a']['a'][10] = '0';
UPDATE 1
postgres=# select * from foo;
┌───┐
│   a
│
╞═══╡
│ {"a": {"a": [null, null, null, null, null, null, null, null, null, null,
0]}} │
└───┘
(1 row)

So although the path [a,a,10] was not exists, it was created.

2. this update fails (and it is correct)

postgres=# update foo set a['a']['a']['b'] = '0';
ERROR:  path element at position 3 is not an integer: "b"

although the path [a,a,b] doesn't exists, and it is not ignored.

This implementation doesn't do only UPDATE (and then analogy with WHERE
clause isn't fully adequate). It does MERGE. This is necessary, because
without it, the behaviour will be pretty unfriendly - because there is not
any external schema. I think so this is important - and it can be little
bit messy. I am not sure if I use correct technical terms - we try to use
LAX update in first step, and if it is not successful, then we try to do
LAX insert. This is maybe correct from JSON semantic - but for developer it
is unfriendly, because he hasn't possibility to detect if insert was or was
not successful. In special JSON functions I can control behave and can
specify LAX or STRICT how it is necessity. But in this interface
(subscripting) this possibility is missing.

I think so there should be final check (semantically) if value was updated,
and if the value was changed. If not, then error should be raised. It
should be very similar like RLS update. I know and I understand so there
should be more than one possible implementations, but safe is only one -
after successful update I would to see new value inside, and when it is not
possible, then I expect exception. I think so it is more practical too. I
can control filtering with WHERE clause. But I cannot to control MERGE
process. Manual recheck after every update can be terrible slow.

Regards

Pavel



> But I looked into the bulk case a little further, and "outside the
> WHERE clause" cuts both ways. The server reports an update whether or
> not the JSON could have been modified, which suggests triggers will
> fire for no-op updates. That's more clearly a problem.
>
> insert into j (val) values
>  ('{"a": 100}'),
>  ('{"a": "200"}'),
>  ('{"b": "300"}'),
>  ('{"c": {"d": 400}}'),
>  ('{"a": {"z": 500}}');
>
> INSERT 0 5
> update j set val['a']['z'] = '600' returning *;
> val
> 
>  {"a": 100}
>  {"a": "200"}
>  {"a": {"z": 600}, "b": "300"}
>  {"a": {"z": 600}, "c": {"d": 400}}
>  {"a": {"z": 600}}
> (5 rows)
>
> *UPDATE 5*
>

Re: new heapcheck contrib module

2021-01-10 Thread Thomas Munro

On Fri, Jan 8, 2021 at 6:33 AM Mark Dilger  wrote:
> The attached patches, v31, are mostly the same, but with "getopt_long.h" 
> included from pg_amcheck.c per Thomas's review, and a .gitignore file added 
> in contrib/pg_amcheck/

I couple more little things from Windows CI:

C:\projects\postgresql\src\include\fe_utils/option_utils.h(19):
fatal error C1083: Cannot open include file: 'libpq-fe.h': No such
file or directory [C:\projects\postgresql\pg_amcheck.vcxproj]

Does contrib/amcheck/Makefile need to say "SHLIB_PREREQS =
submake-libpq" like other contrib modules that use libpq?

pg_backup_utils.obj : error LNK2001: unresolved external symbol
exit_nicely [C:\projects\postgresql\pg_dump.vcxproj]

I think this is probably because additions to src/fe_utils/Makefile's
OBJS list need to be manually replicated in
src/tools/msvc/Mkvcbuild.pm's @pgfeutilsfiles list.  (If I'm right
about that, perhaps it needs a comment to remind us Unix hackers of
that, or perhaps it should be automated...)

Re: Terminate the idle sessions

2021-01-10 Thread Tom Lane

Thomas Munro  writes:
> On Thu, Jan 7, 2021 at 4:51 PM Tom Lane  wrote:
>> Thomas Munro  writes:
>>> One of the strange things about these errors is that they're
>>> asynchronous/unsolicited, but they appear to the client to be the
>>> response to their next request (if it doesn't eat ECONNRESET instead).

>> Right, which is what makes class 57 (operator intervention) seem
>> attractive to me.  From the client's standpoint these look little
>> different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN,
>> which are in that category.

> Yeah, that's a good argument.

Given the lack of commentary on this thread, I'm guessing that people
aren't so excited about this topic that a change in the existing sqlstate
assignment for ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT would fly.
So I propose to change the new ERRCODE_IDLE_SESSION_TIMEOUT to be in
class 57 and call it good.

regards, tom lane

Re: zstd compression for pg_dump

2021-01-10 Thread Andreas Karlsson


On 1/4/21 3:53 AM, Justin Pryzby wrote:

About 89% smaller.


Did a quick code review of the patch. I have not yet taken it for a spin 
yet and there are parts of the code I have not read yet.


## Is there any reason for this diff?

-cfp*fp = pg_malloc(sizeof(cfp));
+cfp*fp = pg_malloc0(sizeof(cfp));

## Since we know have multiple returns in cfopen() I am not sure that 
setting fp to NULL is still clearer than just returning NULL.


## I do not like that this pretends to support r+, w+ and a+ but does 
not actually do so since it does not create an input stream in those cases.


else if (mode[0] == 'w' || mode[0] == 'a' ||
strchr(mode, '+') != NULL)
[...]
else if (strchr(mode, 'r'))

## Wouldn't cfread(), cfwrite(), cfgetc(), cfgets(), cfclose() and 
cfeof() be cleaner with sitch statments similar to cfopen()?


## "/* Should be called "method" or "library" ? */"

Maybe, but personally I think algorithm is fine too.

## "Is a nondefault level set ?"

The PostgreSQL project does not use space before question mark (at least 
not in English).


## Why isn't level_set just a local variable in parse_compression()? It 
does not seem to be used elsewhere.


## Shouldn't we call the Compression variable in OpenArchive() 
nocompress to match with the naming convention in other places.


And in general I wonder if we should not write "nocompression = 
{COMPR_ALG_NONE}" rather than "nocompression = {0}".


## Why not use const on the pointers to Compression for functions like 
cfopen()? As far as I can see several of them could be const.


## Shouldn't "AH->compression.alg = Z_DEFAULT_COMPRESSION" in ReadHead() 
be "AH->compression.alg = COMPR_ALG_DEFAULT"?


Additionally I am not convinced that returning COMPR_ALG_DEFAULT will 
even work but I have not had the time to test that theory yet. And in 
general I am quite sceptical of that we really need of COMPR_ALG_DEFAULT.


## Some white space issues

Add spaces around plus in "atoi(1+eq)" and "pg_log_error("unknown 
compression algorithm: %s", 1+eq)".


Add spaces around plus in parse_compression(), e.g. in "strlen(1+eq)".

## Shouldn't hasSuffix() take the current compression algorithm as a 
parameter? Or alternatively look up which compression algorithm to use 
from the suffix?


## Why support multiple ways to write zlib on the command line? I do not 
see any advatange of being able to write it as libz.


## I feel renaming SaveOutput() to GetOutput() would make it more clear 
what it does now that you have changed the return type.


## You have accidentally committed "-runstatedir" in configure. I have 
no idea why we do not have it (maybe it is something Debian specific) 
but even if we are going to add it it should not be in this patch. Same 
with the parenthesis changes to LARGE_OFF_T.


## This is probably out of scope of your patch but I am not a fan of the 
fallback logic in cfopen_read(). I feel ideally we should always know if 
there is a suffix or not and not try to guess file names and do 
pointless syscalls.


## COMPR_ALG_DEFAULT looks like it would error out for archive and 
directory if someone has neither zlib nor zstandard. It feels like it 
should default to uncompressed if we have neither. Or at least give a 
better error message.



Note, there's currently several "compression" patches in CF app.  This patch
seems to be independent of the others, but probably shouldn't be totally
uncoordinated (like adding lz4 in one and ztsd in another might be poor
execution).


A thought here is that maybe we want to use the same values for the 
enums in all patches. Especially if we write the numeric value to pg 
dump files.


Andreas

Re: Inconsistent "" use

2021-01-10 Thread Noah Misch

On Sun, Jan 10, 2021 at 08:22:42PM +0900, Tatsuo Ishii wrote:
> In doc/src/sgml/func.sgml description of SHOW command use
> "SQL", while SET command description the same
> section does not use "". Shouldn't the description of SET use
> "" for "SQL" as well? Patch attached.

https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters
either ignore that  or use it as a signal to substitute small caps.
I don't consider small caps an improvement for "SQL", so I'd prefer to never
use SQL.   also makes the markup longer (though
one could mitigate that with an entity like &SQL).  However, standardizing on
either way is better than varying within the manual.

Re: [PATCH] Automatic HASH and LIST partition creation

2021-01-10 Thread Thomas Munro

On Wed, Oct 7, 2020 at 6:26 AM Anastasia Lubennikova
 wrote:
> Do you think that it is a bug? For now, I removed this statement from
> tests just to calm down the CI.

I don't think we can use \d+ on a temporary table here, because the
backend ID appears in the namespace, which is causing a failure on one
of the CI OSes due to nondeterminism:

CREATE TEMP TABLE temp_parted (a char) PARTITION BY LIST (a)
CONFIGURATION (VALUES IN ('a') DEFAULT PARTITION temp_parted_default);
\d+ temp_parted
- Partitioned table "pg_temp_3.temp_parted"
+ Partitioned table "pg_temp_4.temp_parted"

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-01-10 Thread Justin Pryzby

On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote:
> > > I ended up with apparently broken constraint when running multiple loops 
> > > around
> > > a concurrent detach / attach:
> > > 
> > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > CONCURRENTLY"; do :; done&
> > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > CONCURRENTLY"; do :; done&
> > > 
> > > "p1_check" CHECK (true)
> > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > 
> > Not good.
> 
> Haven't had time to investigate this problem yet.

I guess it's because you commited the txn and released lock in the middle of
the command.

-- 
Justin
>From e18c11fd5bcc4f5cd981a3219383265b55974f34 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 10 Jan 2021 15:41:43 -0600
Subject: [PATCH] fix

---
 src/backend/commands/tablecmds.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d7b9c63e5f..144c27c303 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17131,6 +17131,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		Oid		partrelid,
 parentrelid;
 		LOCKTAG		tag;
+		LockRelId	partlockrelid;
 		char   *parentrelname;
 		char   *partrelname;
 
@@ -17162,6 +17163,10 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		table_close(rel, NoLock);
 		tab->rel = NULL;
 
+		partlockrelid.relId = parentrelid;
+		partlockrelid.dbId = MyDatabaseId;
+		LockRelationIdForSession(&partlockrelid, ShareUpdateExclusiveLock);
+
 		/* Make updated catalog entry visible */
 		PopActiveSnapshot();
 		CommitTransactionCommand();
@@ -17204,7 +17209,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	 errmsg("partition \"%s\" was removed concurrently", partrelname)));
 
 		tab->rel = rel;
-
+		UnlockRelationIdForSession(&partlockrelid, ShareUpdateExclusiveLock);
 	}
 
 	/* Do the final part of detaching */
@@ -17444,7 +17449,10 @@ DetachAddConstraintIfNeeded(List **wqueue, Relation partRel)
 	TupleDesc	td = RelationGetDescr(partRel);
 	Constraint *n;
 
-	constraintExpr = make_ands_explicit(RelationGetPartitionQual(partRel));
+	List *l = RelationGetPartitionQual(partRel);
+	Assert(partRel->rd_rel->relispartition);
+	Assert(l != NIL);
+	constraintExpr = make_ands_explicit(l);
 
 	/* If an identical constraint exists, we don't need to create one */
 	if (td->constr && td->constr->num_check > 0)
-- 
2.17.0

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2021-01-10 Thread Tomas Vondra


Hi,

I started looking at this patch again, hoping to get it committed in 
this CF, but I think there's a regression in handling TOAST tables 
(compared to the v3 patch submitted by Pavan in February 2019).


The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) 
I get this:


   pages   NOT all_visible
  --
  main   637 0
  toast50001 3

There was some discussion about relcache invalidations causing a couple 
TOAST pages not be marked as all_visible, etc.


However, for this patch on master I get this

   pages   NOT all_visible
  --
  main   637 0
  toast50001 50001

So no pages in TOAST are marked as all_visible. I haven't investigated 
what's causing this, but IMO that needs fixing to make ths patch RFC.


Attached is the test script I'm using, and a v5 of the patch - rebased 
on current master, with some minor tweaks to comments etc.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 10 Jan 2021 20:30:29 +0100
Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now it only marked individual tuples as frozen,
but page-level flags were not updated.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii
Discussion: https://postgr.es/m/caboikdn-ptgv0mzntrk2q8otfuuajqaymgmkdu1dckftuxv...@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 64 +++
 contrib/pg_visibility/sql/pg_visibility.sql   | 77 +++
 src/backend/access/heap/heapam.c  | 76 --
 src/backend/access/heap/hio.c | 17 
 src/include/access/heapam_xlog.h  |  3 +
 5 files changed, 229 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | t   | t
+ 1 | t   | t
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | f   | f
+ 1 | f   | f
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | t   | t
+ 1 | f   | f
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop tabl

Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-01-10 Thread Tom Lane

I wrote:
> The problems that I see in this area are first that there's no
> real standardization in libpq as to whether to append error messages
> together or just flush preceding messages; and second that no effort
> is made in multi-connection-attempt cases to separate the errors from
> different attempts, much less identify which error goes with which
> host or IP address.  I think we really need to put some work into
> that.

I spent some time on this, and here is a patch set that tries to
improve matters.

0001 changes the libpq coding rules to be that all error messages should
be appended to conn->errorMessage, never overwritten (there are a couple
of exceptions in fe-lobj.c) and we reset conn->errorMessage to empty only
at the start of a connection request or new query.  This is something
that's been bugging me for a long time and I'm glad to get it cleaned up.
Formerly it seemed that a dartboard had been used to decide whether to use
"printfPQExpBuffer" or "appendPQExpBuffer"; now it's always the latter.
We can also get rid of several hacks that were used to get around the
mess and force appending behavior.

0002 then changes the reporting rules in fe-connect.c as I suggested,
so that you might get errors like this:

$ psql -h localhost,/tmp -p 12345
psql: error: could not connect to host "localhost" (::1), port 12345: 
Connection refused
Is the server running on that host and accepting TCP/IP connections?
could not connect to host "localhost" (127.0.0.1), port 12345: Connection 
refused
Is the server running on that host and accepting TCP/IP connections?
could not connect to socket "/tmp/.s.PGSQL.12345": No such file or directory
Is the server running locally and accepting connections on that socket?

and we have a pretty uniform rule that errors coming back from a
connection attempt will be prefixed with the server address.

Then 0003 is the part of your patch that I'm happy with.  Given 0001+0002
we could certainly consider looping back and retrying for more cases, but
I still want to tread lightly on that.  I don't see a lot of value in
retrying errors that seem to be on the client side, such as failure to
set socket properties; and in general I'm hesitant to add untestable
code paths here.

I feel pretty good about 0001: it might be committable as-is.  0002 is
probably subject to bikeshedding, plus it has a problem in the ECPG tests.
Two of the error messages are now unstable because they expose
chosen-at-random socket paths:

diff -U3 
/home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr 
/home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr
--- /home/postgres/pgsql/src/interfaces/ecpg/test/expected/connect-test5.stderr 
2020-08-04 14:59:45.617380050 -0400
+++ /home/postgres/pgsql/src/interfaces/ecpg/test/results/connect-test5.stderr  
2021-01-10 16:12:22.539433702 -0500
@@ -36,7 +36,7 @@
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ECPGconnect: opening database  on  port   
for user regress_ecpg_user2
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
+[NO_PID]: ECPGconnect: could not open database: could not connect to socket 
"/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL:  database "regress_ecpg_user2" 
does not exist

 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_finish: connection main closed
@@ -73,7 +73,7 @@
 [NO_PID]: sqlca: code: -220, state: 08003
 [NO_PID]: ECPGconnect: opening database  on  port   
for user regress_ecpg_user2
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ECPGconnect: could not open database: FATAL:  database 
"regress_ecpg_user2" does not exist
+[NO_PID]: ECPGconnect: could not open database: could not connect to socket 
"/tmp/pg_regress-EbHubF/.s.PGSQL.58080": FATAL:  database "regress_ecpg_user2" 
does not exist

 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_finish: connection main closed

I don't have any non-hacky ideas what to do about that.  The extra detail
seems useful to end users, but we don't have any infrastructure that
would allow filtering it out in the ECPG tests.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index b76f0befd0..8b60378379 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -208,13 +208,13 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen,
 	{
 		if (inputlen == 0)
 		{
-			printfPQExpBuffer(&conn->errorMessage,
+			appendPQExpBuffer(&conn->errorMessage,
 			  libpq_gettext("malformed SCRAM message (empty message)\n"));
 			goto error;
 		}
 		if (inputlen != strlen(input))
 		{
-			printfPQExpBuffer(&conn->errorMessage,
+			appendPQExpBuffer(&conn->errorMessage,
 			  libpq_gettext("malformed SCRAM message (length mismatch)\n"));
 			goto error;
 		}
@@ -258,14 +258,14 @@ pg_fe_scram_exchange(vo

Re: Inconsistent "" use

2021-01-10 Thread Bruce Momjian

On Sun, Jan 10, 2021 at 01:11:07PM -0800, Noah Misch wrote:
> On Sun, Jan 10, 2021 at 08:22:42PM +0900, Tatsuo Ishii wrote:
> > In doc/src/sgml/func.sgml description of SHOW command use
> > "SQL", while SET command description the same
> > section does not use "". Shouldn't the description of SET use
> > "" for "SQL" as well? Patch attached.
> 
> https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters
> either ignore that  or use it as a signal to substitute small caps.
> I don't consider small caps an improvement for "SQL", so I'd prefer to never
> use SQL.   also makes the markup longer (though
> one could mitigate that with an entity like &SQL).  However, standardizing on
> either way is better than varying within the manual.

I think smallcaps is almost always a win for acronyms.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Inconsistent "" use

2021-01-10 Thread Tom Lane

Bruce Momjian  writes:
> On Sun, Jan 10, 2021 at 01:11:07PM -0800, Noah Misch wrote:
>> https://tdg.docbook.org/tdg/5.2/acronym.html suggests docbook formatters
>> either ignore that  or use it as a signal to substitute small caps.
>> I don't consider small caps an improvement for "SQL", so I'd prefer to never
>> use SQL.   also makes the markup longer (though
>> one could mitigate that with an entity like &SQL).  However, standardizing on
>> either way is better than varying within the manual.

> I think smallcaps is almost always a win for acronyms.

I'm with Noah: small caps are *not* an improvement, they're just
distractingly fussy.  I note that the authors of the stylesheets
we use seem to agree, because AFAICS  is not rendered
specially in either HTML or PDF output.

Given this docbook.org advice, I'd be inclined to just remove
our use of  altogether.  Although, since it isn't actually
making any difference, it's not clear that it's worth doing anything.
The largest effect of trying to standardize (in either direction)
would be to create back-patching hazards for docs fixes.

regards, tom lane

RE: Disable WAL logging to speed up data loading

2021-01-10 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> I think it's better to have index AM (and perhaps table AM) control it
> instead of filtering in XLogInsert(). Because otherwise third-party
> access methods that use LSN like gist indexes cannot write WAL records
> at all when wal_level = none even if they want.

Hm, third-party extensions use RM_GENERIC_ID, so it should probably be allowed 
in XLogInsert() as well instead of allowing control in some other way.  It's 
not unnatural that WAL records for in-core AMs are filtered in XLogInsert().


Regards
Takayuki Tsunakawa

RE: Parallel Inserts in CREATE TABLE AS

2021-01-10 Thread Hou, Zhijie

> Attaching v21 patch set, which has following changes:
> 1) 0001 - changed fpes->ins_cmd_type ==
> PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF
> 2) 0002 - reworded the commit message.
> 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> to ensure leader and worker insert the tuples in the same xact and replaced
> memory usage output in numbers like 25kB to NkB to make the tests stable.
> 4) 0004 - updated one of the test output to be in NkB and made the assertion
> in SetParallelInsertState to be not under an if condition.
> 
> There's one open point [1] on selective skipping of error "cannot insert
> tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> welcome.
> 
> Please consider the v21 patch set for further review.

Hi,

I took a look into the new patch and have some comments.

1.
+   /*
+* Do not consider tuple cost in case of we intend to perform parallel
+* inserts by workers. We would have turned on the ignore flag in
+* apply_scanjoin_target_to_paths before generating Gather path for the
+* upper level SELECT part of the query.
+*/
+   if ((root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_SELECT_QUERY) &&
+   (root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_CAN_IGN_TUP_COST))

Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when 
PARALLEL_INSERT_SELECT_QUERY is set.


2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+  void *ins_info)
...
+   info = (ParallelInsertCTASInfo *) ins_info;
+   intoclause_str = nodeToString(info->intoclause);
+   intoclause_len = strlen(intoclause_str) + 1;

+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+  void *ins_info)
...
+   info = (ParallelInsertCTASInfo *)ins_info;
+   intoclause_str = nodeToString(info->intoclause);
+   intoclause_len = strlen(intoclause_str) + 1;
+   intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);

I noticed the above code will call nodeToString and strlen twice which seems 
unnecessary.
Do you think it's better to store the result of nodetostring and strlen first 
and pass them when used ?

3.
+   if (node->need_to_scan_locally || node->nworkers_launched == 0)
+   {
+   EState *estate = node->ps.state;
+   TupleTableSlot *outerTupleSlot;
+
+   for(;;)
+   {
+   /* Install our DSA area while executing the plan. */
+   estate->es_query_dsa =
+   node->pei ? node->pei->area : NULL;
...
+   node->ps.state->es_processed++;
+   }

How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;

Best regards,
houzj

Re: pg_preadv() and pg_pwritev()

2021-01-10 Thread Thomas Munro

On Wed, Dec 23, 2020 at 12:06 AM Thomas Munro  wrote:
> On Mon, Dec 21, 2020 at 11:40 AM Andres Freund  wrote:
> > Can we come up with a better name than 'uio'? I find that a not exactly
> > meaningful name.
>
> Ok, let's try port/pg_iovec.h.

I pushed it with that name, and a couple more cosmetic changes.  I'll
keep an eye on the build farm.

Re: Inconsistent "" use

2021-01-10 Thread Tatsuo Ishii

> I'm with Noah: small caps are *not* an improvement, they're just
> distractingly fussy.  I note that the authors of the stylesheets
> we use seem to agree, because AFAICS  is not rendered
> specially in either HTML or PDF output.
> 
> Given this docbook.org advice, I'd be inclined to just remove
> our use of  altogether.  Although, since it isn't actually
> making any difference, it's not clear that it's worth doing anything.
> The largest effect of trying to standardize (in either direction)
> would be to create back-patching hazards for docs fixes.

Yeah, simple grep showed that there are almost 1k lines using
. I agree that the pain caused by fixing all of them is much
larger than the benefit to standardize the usage of .
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: pg_preadv() and pg_pwritev()

2021-01-10 Thread Thomas Munro

On Mon, Jan 11, 2021 at 3:34 PM Thomas Munro  wrote:
> On Wed, Dec 23, 2020 at 12:06 AM Thomas Munro  wrote:
> > On Mon, Dec 21, 2020 at 11:40 AM Andres Freund  wrote:
> > > Can we come up with a better name than 'uio'? I find that a not exactly
> > > meaningful name.
> >
> > Ok, let's try port/pg_iovec.h.
>
> I pushed it with that name, and a couple more cosmetic changes.  I'll
> keep an eye on the build farm.

Since only sifaka has managed to return a result so far (nice CPU), I
had plenty of time to notice that macOS Big Sur has introduced
preadv/pwritev.  They were missing on Catalina[1].

[1] https://cirrus-ci.com/task/6002157537198080

O(n^2) system calls in RemoveOldXlogFiles()

2021-01-10 Thread Thomas Munro

Hi,

I noticed that RemoveXlogFile() has this code:

/*
 * Before deleting the file, see if it can be recycled as a future log
 * segment. Only recycle normal files, pg_standby for example can create
 * symbolic links pointing to a separate archive directory.
 */
if (wal_recycle &&
endlogSegNo <= recycleSegNo &&
lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
InstallXLogFileSegment(&endlogSegNo, path,
   true,
recycleSegNo, true))
{
ereport(DEBUG2,
(errmsg("recycled write-ahead log file \"%s\"",
segname)));
CheckpointStats.ckpt_segs_recycled++;
/* Needn't recheck that slot on future iterations */
endlogSegNo++;
}

I didn't check the migration history of this code but it seems that
endlogSegNo doesn't currently have the right scoping to achieve the
goal of that last comment, so checkpoints finish up repeatedly search
for the next free slot, starting at the low end each time, like so:

stat("pg_wal/0001004F", {st_mode=S_IFREG|0600,
st_size=16777216, ...}) = 0
...
stat("pg_wal/00010073", 0x7fff98b9e060) = -1 ENOENT
(No such file or directory)

stat("pg_wal/0001004F", {st_mode=S_IFREG|0600,
st_size=16777216, ...}) = 0
...
stat("pg_wal/00010074", 0x7fff98b9e060) = -1 ENOENT
(No such file or directory)

... and so on until we've recycled all our recyclable segments.  Ouch.

Re: Removing unneeded self joins

2021-01-10 Thread Andrey V. Lepikhov


On 1/7/21 7:08 PM, Masahiko Sawada wrote:

On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
 wrote:

Thanks, it is my fault. I tried to extend this patch with foreign key
references and made a mistake.
Currently I rollback this new option (see patch in attachment), but will
be working for a while to simplify this patch.


Are you working to simplify the patch? This patch has been "Waiting on
Author" for 1 month and doesn't seem to pass cfbot tests. Please
update the patch.


Yes, I'm working to improve this feature.
In attachment - fixed and rebased on ce6a71fa53.

--
regards,
Andrey Lepikhov
Postgres Professional
>From 3caeb297320af690be71b367329d86c49564e231 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 11 Jan 2021 09:01:11 +0500
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 1186 +
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/optimizer/util/relnode.c  |   26 +-
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/pathnode.h  |4 +
 src/include/optimizer/planmain.h  |2 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  331 ++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  166 +++
 11 files changed, 1765 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 90460a69bd..d631e95f89 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
 
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -47,6 +52,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    RelOptInfo *innerrel,
    JoinType jointype,
    List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
 
 
 /*
@@ -1118,3 +1124,1183 @@ is_innerrel_unique_for(PlannerInfo *root,
 	/* Let rel_is_distinct_for() do the hard work */
 	return rel_is_distinct_for(root, innerrel, clause_list);
 }
+
+typedef struct
+{
+	Index oldRelid;
+	Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, Var))
+	{
+		Var* var = (Var*)node;
+		if (var->varno == context->oldRelid)
+		{
+			var->varno = context->newRelid;
+			var->varnosyn = context->newRelid;
+			var->location = -1;
+		}
+		else if (var->varno == context->newRelid)
+			var->location = -1;
+
+		return false;
+	}
+	if (IsA(node, RestrictInfo))
+	{
+		change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+		return false;
+	}
+	return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+	ChangeVarnoContext context;
+
+	context.oldRelid = oldRelid;
+	context.newRelid = newRelid;
+	change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+	if (bms_is_member(oldId, *relids))
+		*relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+	bool is_req_equal

Re: Single transaction in the tablesync worker?

2021-01-10 Thread Amit Kapila

On Fri, Jan 8, 2021 at 8:20 AM Amit Kapila  wrote:
>
> On Fri, Jan 8, 2021 at 7:14 AM Peter Smith  wrote:
> >
> > FYI, I was able to reproduce this case in debugger. PSA logs showing 
> > details.
> >
>
> Thanks for reproducing as I was worried about exactly this case. I
> have one question related to logs:
>
> ##
> ## ALTER SUBSCRIPTION to REFRESH the publication
>
> ## This blocks on some latch until the tablesync worker dies, then it 
> continues
> ##
>
> Did you check which exact latch or lock blocks this?
>

I have checked this myself and the command is waiting on the drop of
origin till the tablesync worker is finished because replorigin_drop()
requires state->acquired_by to be 0 which will only be true once the
tablesync worker exits. I think this is the reason you might have
noticed that the command can't be finished until the tablesync worker
died. So this can't be an interlock between ALTER SUBSCRIPTION ..
REFRESH command and tablesync worker and to that end it seems you have
below Fixme's in the patch:

+ * FIXME - Usually this cleanup would be OK, but will not
+ * always be OK because the logicalrep_worker_stop_at_commit
+ * only "flags" the worker to be stopped in the near future
+ * but meanwhile it may still be running. In this case there
+ * could be a race between the tablesync worker and this code
+ * to see who will succeed with the tablesync drop (and the
+ * loser will ERROR).
+ *
+ * FIXME - Also, checking the state is also not guaranteed
+ * correct because state might be TCOPYDONE when we checked
+ * but has since progressed to SYNDONE
+ */
+
+ if (state == SUBREL_STATE_TCOPYDONE)
+ {

I feel this was okay for an earlier code but now we need to stop the
tablesync workers before trying to drop the slot as we do in
DropSubscription. Now, if we do that then that would fix the race
conditions mentioned in Fixme but still, there are few more things I
am worried about: (a) What if the launcher again starts the tablesync
worker? One idea could be to acquire AccessExclusiveLock on
SubscriptionRelationId as we do in DropSubscription which is not a
very good idea but I can't think of any other good way. (b) the patch
is just checking SUBREL_STATE_TCOPYDONE before dropping the
replication slot but the slot could be created even before that (in
SUBREL_STATE_DATASYNC state). One idea could be we can try to drop the
slot and if we are not able to drop then we can simply continue
assuming it didn't exist.

One minor comment:
1.
+ SpinLockAcquire(&MyLogicalRepWorker->relmutex);
  MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCDONE;
  MyLogicalRepWorker->relstate_lsn = current_lsn;
-

Spurious line removal.

-- 
With Regards,
Amit Kapila.

Re: logical replication worker accesses catalogs in error context callback

2021-01-10 Thread Masahiko Sawada

On Sat, Jan 9, 2021 at 2:57 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 7, 2021 at 5:47 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Due to a debug ereport I just noticed that worker.c's
> > > > slot_store_error_callback is doing something quite dangerous:
> > > >
> > > > static void
> > > > slot_store_error_callback(void *arg)
> > > > {
> > > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
> > > > LogicalRepRelMapEntry *rel;
> > > > char   *remotetypname;
> > > > Oid remotetypoid,
> > > > localtypoid;
> > > >
> > > > /* Nothing to do if remote attribute number is not set */
> > > > if (errarg->remote_attnum < 0)
> > > > return;
> > > >
> > > > rel = errarg->rel;
> > > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
> > > >
> > > > /* Fetch remote type name from the LogicalRepTypMap cache */
> > > > remotetypname = logicalrep_typmap_gettypname(remotetypoid);
> > > >
> > > > /* Fetch local type OID from the local sys cache */
> > > > localtypoid = get_atttype(rel->localreloid, 
> > > > errarg->local_attnum + 1);
> > > >
> > > > errcontext("processing remote data for replication target 
> > > > relation \"%s.%s\" column \"%s\", "
> > > >"remote type %s, local type %s",
> > > >rel->remoterel.nspname, 
> > > > rel->remoterel.relname,
> > > >
> > > > rel->remoterel.attnames[errarg->remote_attnum],
> > > >remotetypname,
> > > >format_type_be(localtypoid));
> > > > }
> > > >
> > > >
> > > > that's not code that can run in an error context callback. It's
> > > > theoretically possible (but unlikely) that
> > > > logicalrep_typmap_gettypname() is safe to run in an error context
> > > > callback. But get_atttype() definitely isn't.
> > > >
> > > > get_attype() may do catalog accesses. That definitely can't happen
> > > > inside an error context callback - the entire transaction might be
> > > > borked at this point!
> > >
> > > You're right. Perhaps calling to format_type_be() is also dangerous
> > > since it does catalog access. We should have added the local type
> > > names to SlotErrCallbackArg so we avoid catalog access in the error
> > > context.
> > >
> > > I'll try to fix this.
> >
> > Attached the patch that fixes this issue.
> >
> > Since logicalrep_typmap_gettypname() could search the sys cache by
> > calling to format_type_be(), I stored both local and remote type names
> > to SlotErrCallbackArg so that we can just set the names in the error
> > callback without sys cache lookup.
> >
> > Please review it.
>
> Patch looks good, except one minor comment - can we store
> rel->remoterel.atttyps[remoteattnum] into a local variable and use it
> in logicalrep_typmap_gettypname, just to save on indentation?

Thank you for reviewing the patch!

Agreed. Attached the updated patch.

>
> I quickly searched in places where error callbacks are being used, I
> think we need a similar kind of fix for conversion_error_callback in
> postgres_fdw.c, because get_attname and get_rel_name are being used
> which do catalogue lookups. ISTM that all the other error callbacks
> are good in the sense that they are not doing sys catalogue lookups.

Indeed. If we need to disallow the catalog lookup during executing
error callbacks we might want to have an assertion checking that in
SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


fix_slot_store_error_callback_v2.patch
Description: Binary data

Re: [HACKERS] Custom compression methods

2021-01-10 Thread Dilip Kumar

On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby  wrote:
>
> On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  wrote:
> > > And fails pg_upgrade check, apparently losing track of the compression (?)
> > >
> > >  CREATE TABLE public.cmdata2 (
> > > -f1 text COMPRESSION lz4
> > > +f1 text
> > >  );
> >
> > I did not get this?  pg_upgrade check is passing for me.
>
> I realized that this was failing in your v16 patch sent Dec 25.
> It's passing on current patches because they do "DROP TABLE cmdata2", but
> that's only masking the error.
>
> I think this patch needs to be specifically concerned with pg_upgrade, so I
> suggest to not drop your tables and MVs, to allow the pg_upgrade test to check
> them.  That exposes this issue:

Thanks for the suggestion I will try this.

> pg_dump: error: Error message from server: ERROR:  cache lookup failed for 
> access method 36447
> pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout;
> pg_dumpall: error: pg_dump failed on database "regression", exiting
> waiting for server to shut down done
> server stopped
> pg_dumpall of post-upgrade database cluster failed
>
> I found that's the AM's OID in the old clsuter:
> regression=# SELECT * FROM pg_am WHERE oid=36447;
>   oid  | amname |  amhandler  | amtype
> ---++-+
>  36447 | pglz2  | pglzhandler | c
>
> But in the new cluster, the OID has changed.  Since that's written into table
> data, I think you have to ensure that the compression OIDs are preserved on
> upgrade:
>
>  16755 | pglz2  | pglzhandler  | c

Yeah, basically we are storing am oid in the compressed data so Oid
must be preserved.  I will look into this and fix it.


> In my brief attempt to inspect it, I got this crash:
>
> $ tmp_install/usr/local/pgsql/bin/postgres -D 
> src/bin/pg_upgrade/tmp_check/data &
> regression=# SELECT pg_column_compression(f1) FROM cmdata a;
> server closed the connection unexpectedly
>
> Thread 1 "postgres" received signal SIGSEGV, Segmentation fault.
> __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
> 120 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
> #1  0x55c6049fde62 in cstring_to_text (s=0x0) at varlena.c:193
> #2  pg_column_compression () at varlena.c:5335
>
> (gdb) up
> #2  pg_column_compression () at varlena.c:5335
> 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name(
> (gdb) l
> 5333varvalue = (struct varlena *) DatumGetPointer(value);
> 5334
> 5335PG_RETURN_TEXT_P(cstring_to_text(get_am_name(
> 5336
> toast_get_compression_oid(varvalue;
>
> I guess a missing AM here is a "shouldn't happen" case, but I'd prefer it to 
> be
> caught with an elog() (maybe in get_am_name()) or at least an Assert.

Yeah, this makes sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-10 Thread Masahiko Sawada

On Thu, Jan 7, 2021 at 9:32 PM Fujii Masao  wrote:
>
>
>
> On 2021/01/07 17:28, shinya11.k...@nttdata.com wrote:
> >> On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao 
> >>  wrote:
> >>>
> >>> On 2021/01/07 12:42, Masahiko Sawada wrote:
>  On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao 
>   wrote:
> >
> > On 2021/01/07 10:01, Masahiko Sawada wrote:
> >> On Wed, Jan 6, 2021 at 3:37 PM  
> >> wrote:
> >>>
>  +#define Query_for_list_of_cursors \
>  +" SELECT name FROM pg_cursors"\
> 
>  This query should be the following?
> 
>  " SELECT pg_catalog.quote_ident(name) "\
>  " FROM pg_catalog.pg_cursors "\
>  " WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"
> 
>  +/* CLOSE */
>  + else if (Matches("CLOSE"))
>  + COMPLETE_WITH_QUERY(Query_for_list_of_cursors
>  + " UNION ALL SELECT 'ALL'");
> 
>  "UNION ALL" should be "UNION"?
> >>>
> >>> Thank you for your advice, and I corrected them.
> >>>
> > + COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> > + " UNION SELECT 'ABSOLUTE'"
> > + " UNION SELECT 'BACKWARD'"
> > + " UNION SELECT 'FORWARD'"
> > + " UNION SELECT 'RELATIVE'"
> > + " UNION SELECT 'ALL'"
> > + " UNION SELECT 'NEXT'"
> > + " UNION SELECT 'PRIOR'"
> > + " UNION SELECT 'FIRST'"
> > + " UNION SELECT 'LAST'"
> > + " UNION SELECT 'FROM'"
> > + " UNION SELECT 'IN'");
> >
> > This change makes psql unexpectedly output "FROM" and "IN" just 
> > after "FETCH".
> 
>  I think "FROM" and "IN" can be placed just after "FETCH". According 
>  to
>  the documentation, the direction can be empty. For instance, we can 
>  do
>  like:
> >>>
> >>> Thank you!
> >>>
>  I've attached the patch improving the tab completion for DECLARE as 
>  an
>  example. What do you think?
> 
>  BTW according to the documentation, the options of DECLARE statement
>  (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> 
>  DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
>  CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> 
>  But I realized that these options are actually order-insensitive. For
>  instance, we can declare a cursor like:
> 
>  =# declare abc scroll binary cursor for select * from pg_class;
>  DECLARE CURSOR
> 
>  The both parser code and documentation has been unchanged from 2003.
>  Is it a documentation bug?
> >>>
> >>> Thank you for your patch, and it is good.
> >>> I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO 
> >>> SCROLL) are order-sensitive."
> >>> I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in 
> >>> any order." , according to the documentation.
> >>
> >> Thanks, you're right. I missed that sentence. I still don't think the
> >> syntax of DECLARE statement in the documentation doesn't match its
> >> implementation but I agree that it's order-insensitive.
> >>
> >>> I made a new patch, but the amount of codes was large due to 
> >>> order-insensitive.
> >>
> >> Thank you for updating the patch!
> >>
> >> Yeah, I'm also afraid a bit that conditions will exponentially
> >> increase when a new option is added to DECLARE statement in the
> >> future. Looking at the parser code for DECLARE statement, we can put
> >> the same options multiple times (that's also why I don't think it
> >> matches). For instance,
> >>
> >> postgres(1:44758)=# begin;
> >> BEGIN
> >> postgres(1:44758)=# declare test binary binary binary cursor for
> >> select * from pg_class;
> >> DECLARE CURSOR
> >>
> >> So how about simplify the above code as follows?
> >>
> >> @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, 
> >> int end)
> >> else if (Matches("DECLARE", MatchAny))
> >> COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
> >> "CURSOR");
> >> + /*
> >> + * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
> >> + * NO SCROLL, and CURSOR. The tail doesn't match any keywords for
> >> + * DECLARE, assume we want options.
> >> + */
> >> + else if (HeadMatches("DECLARE", MatchAny, "*") &&
> >> + TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
> >> + COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
> >> + "CURSOR");
> >
> > This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
> > unexpectedly output BINARY, INSENSITIVE, etc.
> 
>  Indeed. Is the following not complete but much better?
> >>>
> >>> Yes, I think that's better.
> >>>
> 
>  @@

Re: Added schema level support for publication.

2021-01-10 Thread Bharath Rupireddy

On Sun, Jan 10, 2021 at 11:21 PM vignesh C  wrote:
> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
>  wrote:
> > I think this feature can be useful, in case a user has a lot of tables
> > to publish inside a schema. Having said that, I wonder if this feature
> > mandates users to create the same schema with same
> > permissions/authorizations manually on the subscriber, because logical
> > replication doesn't propagate any ddl's so are the schema or schema
> > changes? Or is it that the list of tables from the publisher can go
> > into a different schema on the subscriber?
> >
>
> DDL's will not be propagated to the subscriber. Users have to create
> the schema & tables in the subscriber. No change in
> Permissions/authorizations handling, it will be the same as the
> existing behavior for relations.

Looks like the existing behaviour already requires users to create the
schema on the subscriber when publishing the tables from that schema.
Otherwise, an error is thrown on the subscriber [1].

[1] on publisher:
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
CREATE PUBLICATION testpub FOR TABLE myschema.t1;

on subscriber:
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
ERROR:  schema "myschema" does not exist
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
NOTICE:  created replication slot "testsub" on publisher
CREATE SUBSCRIPTION

> > Since the schema can have other objects such as data types, functions,
> > operators, I'm sure with your feature, non-table objects will be
> > skipped.
> >
>
> Yes, only table data will be sent to subscribers, non-table objects
> will be skipped.

Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
for all the tables in the database, does this i.e. skips non-table
objects and temporary tables, foreign tables and so on. So, your
feature also can behave the same way, but within the scope of the
given schema/s.

> > As Amit pointed out earlier, the behaviour when schema dropped, I
> > think we should also consider when schema is altered, say altered to a
> > different name, maybe we should change that in the publication too.
> >
>
> I agree that when schema is altered the renamed schema should be
> reflected in the publication.

I think, it's not only making sure that the publisher side has the new
altered schema, but also the subscriber needs those alters. Having
said that, since these alters come under DDL changes and in logical
replication we don't publish the scheme changes to the subscriber, we
may not need to anything extra for informing the schema alters to the
subscriber from the publisher, the users might have to do the same
schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION;  should work for them? If this understanding is
correct, then we should document this.

> > In general, what happens if we have some temporary tables or foreign
> > tables inside the schema, will they be allowed to send the data to
> > subscribers?
> >
>
> Temporary tables & foreign tables will not be added to the publications.

Yes the existing logical replication framework doesn't allow
replication of temporary, unlogged, foreign tables and other non-table
relations such as materialized views, indexes etc [1]. The CREATE
PUBLICATION statement either fails in check_publication_add_relation
or before that.

CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
single table tab1 is any of the above restricted tables, seems fine.
But, if there's a list of tables with CREATE PUBLICATION testpub FOR
TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
unlogged_tab5, normal_tab6, normal_tab7 ..; This query fails on
first encounter of the restricted table, say at temp_tab2. Whereas,
CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
tables and continue to add the accepted tables into the publication
within the database.

IMHO, if there's a list of tables specified with FOR TABLE, then
instead of throwing an error in case of any restricted table, we can
issue a warning and continue with the addition of accepted tables into
the publication. If done, this behaviour will be in sync with FOR ALL
TABLES;

Thoughts? If okay, I can work on a patch.

Related to error messages: when foreign table is specified in CREATE
PUBLICATION statement, then "ERROR:  "f1" is not a table", is thrown
[1], how about the error message "ERROR: foerign table "f1" cannot be
replicated". In general, it would be good if we could have the error
messages something like in [2] instead of the existing [1].

Thoughts? If okay, I can work on a patch.

[1]
t1 is a temporary table:
postgres=# CREATE PUBLICATION testpub FOR TABLE

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2021-01-10 Thread Andrey Borodin




> 10 янв. 2021 г., в 14:43, Noah Misch  написал(а):
> 
>> Can you please send revised patches with fixes?
> 
> Attached.
> 
> 

I'm marking patch as ready for committer.

I can't tell should we backpatch insurance patch or not: it potentially fixes 
unknown bugs, and potentially contains unknown bugs. I can't reason because of 
such uncertainty. I've tried to look for any potential problem and as for now I 
see none. Chances are  is doing code less 
error-prone.

Fix  certainly worth backpatching.

Thanks!

Best regards, Andrey Borodin.

Re: [HACKERS] Custom compression methods

2021-01-10 Thread Dilip Kumar

On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar  wrote:
>
> On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby  wrote:
> >
> > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  wrote:
> > > > And fails pg_upgrade check, apparently losing track of the compression 
> > > > (?)
> > > >
> > > >  CREATE TABLE public.cmdata2 (
> > > > -f1 text COMPRESSION lz4
> > > > +f1 text
> > > >  );
> > >
> > > I did not get this?  pg_upgrade check is passing for me.
> >
> > I realized that this was failing in your v16 patch sent Dec 25.
> > It's passing on current patches because they do "DROP TABLE cmdata2", but
> > that's only masking the error.

I tested specifically pg_upgrade by removing all the DROP table and MV
and it is passing.  I don't see the reason why should it fail.  I mean
after the upgrade why COMPRESSION lz4 is missing?


> > I think this patch needs to be specifically concerned with pg_upgrade, so I
> > suggest to not drop your tables and MVs, to allow the pg_upgrade test to 
> > check
> > them.  That exposes this issue:
>
> Thanks for the suggestion I will try this.
>
> > pg_dump: error: Error message from server: ERROR:  cache lookup failed for 
> > access method 36447
> > pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout;
> > pg_dumpall: error: pg_dump failed on database "regression", exiting
> > waiting for server to shut down done
> > server stopped
> > pg_dumpall of post-upgrade database cluster failed
> >
> > I found that's the AM's OID in the old clsuter:
> > regression=# SELECT * FROM pg_am WHERE oid=36447;
> >   oid  | amname |  amhandler  | amtype
> > ---++-+
> >  36447 | pglz2  | pglzhandler | c
> >
> > But in the new cluster, the OID has changed.  Since that's written into 
> > table
> > data, I think you have to ensure that the compression OIDs are preserved on
> > upgrade:
> >
> >  16755 | pglz2  | pglzhandler  | c
>
> Yeah, basically we are storing am oid in the compressed data so Oid
> must be preserved.  I will look into this and fix it.

On further analysis, if we are dumping and restoring then we will
compress the data back while inserting it so why would we need to old
OID.  I mean in the new cluster we are inserting data again so it will
be compressed again and now it will store the new OID.  Am I missing
something here?

> > In my brief attempt to inspect it, I got this crash:
> >
> > $ tmp_install/usr/local/pgsql/bin/postgres -D 
> > src/bin/pg_upgrade/tmp_check/data &
> > regression=# SELECT pg_column_compression(f1) FROM cmdata a;
> > server closed the connection unexpectedly

I tried to test this after the upgrade but I can get the proper value.

Laptop309pnin:bin dilipkumar$ ./pg_ctl -D
/Users/dilipkumar/Documents/PG/custom_compression/src/bin/pg_upgrade/tmp_check/data.old/
start
waiting for server to start2021-01-11 11:53:28.153 IST [43412]
LOG:  starting PostgreSQL 14devel on x86_64-apple-darwin19.6.0,
compiled by Apple clang version 11.0.3 (clang-1103.0.32.62), 64-bit
2021-01-11 11:53:28.170 IST [43412] LOG:  database system is ready to
accept connections
 done
server started

Laptop309pnin:bin dilipkumar$ ./psql -d regression
regression[43421]=# SELECT pg_column_compression(f1) FROM cmdata a;
 pg_column_compression
---
 lz4
 lz4
 pglz2
(3 rows)

Manual test: (dump and load on the new cluster)
---
postgres[43903]=# CREATE ACCESS METHOD pglz2 TYPE COMPRESSION HANDLER
pglzhandler;
CREATE ACCESS METHOD

postgres[43903]=# select oid from pg_am where amname='pglz2';
  oid
---
 16384
(1 row)

postgres[43903]=# CREATE TABLE cmdata_test(f1 text COMPRESSION pglz2);
CREATE TABLE
postgres[43903]=# INSERT INTO cmdata_test
VALUES(repeat('1234567890',1000));
INSERT 0 1
postgres[43903]=# SELECT pg_column_compression(f1) FROM cmdata_test;
pg_column_compression
---
 pglz2
(1 row)

Laptop309pnin:bin dilipkumar$ ./pg_dump -d postgres > 1.sql

—restore on new cluster—
postgres[44030]=# select oid from pg_am where amname='pglz2';
  oid
---
 16385
(1 row)

postgres[44030]=# SELECT pg_column_compression(f1) FROM cmdata_test;
 pg_column_compression
---
 pglz2
(1 row)

You can see on the new cluster the OID of the pglz2 is changed but
there is no issue.   Is it possible for you to give me a
self-contained test case to reproduce the issue or a theory that why
it should fail?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Custom compression methods

2021-01-10 Thread Justin Pryzby

On Mon, Jan 11, 2021 at 12:11:54PM +0530, Dilip Kumar wrote:
> On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar  wrote:
> > On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby  wrote:
> > >
> > > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> > > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby  
> > > > wrote:
> > > > > And fails pg_upgrade check, apparently losing track of the 
> > > > > compression (?)
> > > > >
> > > > >  CREATE TABLE public.cmdata2 (
> > > > > -f1 text COMPRESSION lz4
> > > > > +f1 text
> > > > >  );
> > > >
> > > > I did not get this?  pg_upgrade check is passing for me.
> > >
> > > I realized that this was failing in your v16 patch sent Dec 25.
> > > It's passing on current patches because they do "DROP TABLE cmdata2", but
> > > that's only masking the error.
> 
> I tested specifically pg_upgrade by removing all the DROP table and MV
> and it is passing.  I don't see the reason why should it fail.  I mean
> after the upgrade why COMPRESSION lz4 is missing?

How did you test it ?

I'm not completely clear how this is intended to work... has it been tested
before ?  According to the comments, in binary upgrade mode, there's an ALTER
which is supposed to SET COMPRESSION, but that's evidently not happening.

> > > I found that's the AM's OID in the old clsuter:
> > > regression=# SELECT * FROM pg_am WHERE oid=36447;
> > >   oid  | amname |  amhandler  | amtype
> > > ---++-+
> > >  36447 | pglz2  | pglzhandler | c
> > >
> > > But in the new cluster, the OID has changed.  Since that's written into 
> > > table
> > > data, I think you have to ensure that the compression OIDs are preserved 
> > > on
> > > upgrade:
> > >
> > >  16755 | pglz2  | pglzhandler  | c
> >
> > Yeah, basically we are storing am oid in the compressed data so Oid
> > must be preserved.  I will look into this and fix it.
> 
> On further analysis, if we are dumping and restoring then we will
> compress the data back while inserting it so why would we need to old
> OID.  I mean in the new cluster we are inserting data again so it will
> be compressed again and now it will store the new OID.  Am I missing
> something here?

I'm referring to pg_upgrade which uses pg_dump, but does *not* re-insert data,
but rather recreates catalogs only and then links to the old tables (either
with copy, link, or clone).  Test with make -C src/bin/pg_upgrade (which is
included in make check-world).

-- 
Justin

Re: Moving other hex functions to /common

2021-01-10 Thread Michael Paquier

On Wed, Jan 06, 2021 at 08:58:23AM -0500, Bruce Momjian wrote:
> Fine.  Do you want to add the overflow to the patch I posted, for all
> encoding types?

Yeah.  It looks that it would be good to be consistent as well for
escape case, so as it is possible to add a dstlen argument to struct
pg_encoding for the encoding and decoding routines.  I would also
prefer the option to remove the argument "data" from the encode and
decode length routines for the hex routines part of src/common/, even
if it forces the creation of two small wrappers in encode.c to call
the routines of src/common/.  Would you prefer if I send a patch by
myself?  Please note that anything I'd send would use directly elog()
and pg_log() instead of returning status codes for the src/common/
routines, and of course not touch ECPG, as that's the approach you are
favoring.
--
Michael


signature.asc
Description: PGP signature

Re: psql \df choose functions by their arguments

2021-01-10 Thread Ian Lawrence Barwick

Hi

I tried this patch out last year but was overrolled by Other Stuff before I got
around to providing any feedback, and was reminded of it just now when I was
trying to execute "\df somefunction text int" or similar, which had me
confused until I remembered it's not a feature yet, so it would
certainly be very
welcome to have this.

2020年11月3日(火) 23:27 Greg Sabino Mullane :
>
> Thanks for looking this over!
>
>>
>> some Abbreviations of common types are not added to the type_abbreviations[] 
>> Such as:
>>
>> Int8=> bigint
>
>
> I wasn't aiming to provide a canonical list, as I personally have never seen
> anyone use int8 instead of bigint when (for example) creating a function, but
> I'm not strongly opposed to expanding the list.

I have vague memories of working with "int8" a bit (possibly related to an
Informix migration), anyway it seems easy enough to add them for completeness
as someone (possibly migrating from another database) might wonder why
it's not working.

Just a small code readability suggestion - in exec_command_d(), it seems
neater to put the funcargs declaration in a block together with the
code with which uses it (see attached diff).


Regards

Ian Barwick


-- 
EnterpriseDB: https://www.enterprisedb.com
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index d449cea66c..e8c1fd64bc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -781,8 +781,6 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 			case 'f':			/* function subsystem */
 switch (cmd[2])
 {
-		char	   *funcargs;
-
 	case '\0':
 	case '+':
 	case 'S':
@@ -791,10 +789,14 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 	case 'p':
 	case 't':
 	case 'w':
+	{
+		char	   *funcargs;
+
 		funcargs = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, true);
 		success = describeFunctions(&cmd[2], pattern, show_verbose, show_system, funcargs);
 		free(funcargs);
 		break;
+	}
 	default:
 		status = PSQL_CMD_UNKNOWN;
 		break;

Re: logical replication worker accesses catalogs in error context callback

2021-01-10 Thread Bharath Rupireddy

On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada  wrote:
> Agreed. Attached the updated patch.

Thanks for the updated patch. Looks like the comment crosses the 80
char limit, I have corrected it. And also changed the variable name
from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname
will not cross the 80 char limit. And also added a commit message.
Attaching v3 patch, please have a look. Both make check and make
check-world passes.

> > I quickly searched in places where error callbacks are being used, I
> > think we need a similar kind of fix for conversion_error_callback in
> > postgres_fdw.c, because get_attname and get_rel_name are being used
> > which do catalogue lookups. ISTM that all the other error callbacks
> > are good in the sense that they are not doing sys catalogue lookups.
>
> Indeed. If we need to disallow the catalog lookup during executing
> error callbacks we might want to have an assertion checking that in
> SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).

I tried to add(as attached in
v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the
Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself
fails [1]. That means, we are doing a bunch of catalogue access from
error context callbacks. Given this, I'm not quite sure whether we can
have such an assertion in SearchCatCacheInternal.

Although unrelated to what we are discussing here -  when I looked at
SearchCatCacheInternal, I found that the function SearchCatCache has
SearchCatCacheInternal in the function comment, I think we should
correct it. Thoughts? If okay, I will post a separate patch for this
minor comment fix.

[1]
running bootstrap script ... TRAP:
FailedAssertion("error_context_stack", File: "catcache.c", Line: 1220,
PID: 310728)
/home/bharath/workspace/postgres/instnew/bin/postgres(ExceptionalCondition+0xd0)[0x56056984c8ba]
/home/bharath/workspace/postgres/instnew/bin/postgres(+0x76eb1a)[0x560569826b1a]
/home/bharath/workspace/postgres/instnew/bin/postgres(SearchCatCache1+0x3a)[0x5605698269af]
/home/bharath/workspace/postgres/instnew/bin/postgres(SearchSysCache1+0xc1)[0x5605698448b2]
/home/bharath/workspace/postgres/instnew/bin/postgres(get_typtype+0x1f)[0x56056982e389]
/home/bharath/workspace/postgres/instnew/bin/postgres(CheckAttributeType+0x29)[0x5605692bafe9]
/home/bharath/workspace/postgres/instnew/bin/postgres(CheckAttributeNamesTypes+0x2c9)[0x5605692bafac]
/home/bharath/workspace/postgres/instnew/bin/postgres(heap_create_with_catalog+0x11f)[0x5605692bc436]
/home/bharath/workspace/postgres/instnew/bin/postgres(boot_yyparse+0x7f0)[0x5605692a0d3f]
/home/bharath/workspace/postgres/instnew/bin/postgres(+0x1ecb36)[0x5605692a4b36]
/home/bharath/workspace/postgres/instnew/bin/postgres(AuxiliaryProcessMain+0x5e0)[0x5605692a4997]
/home/bharath/workspace/postgres/instnew/bin/postgres(main+0x268)[0x5605694c6bce]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fa2777ce0b3]
/home/bharath/workspace/postgres/instnew/bin/postgres(_start+0x2e)[0x5605691794de]

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From a806415a49e3fc8c1e71a102aefb0c10c08deb05 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 11 Jan 2021 12:26:12 +0530
Subject: [PATCH v3] fix slot store error callback

Avoid accessing system catalogues inside slot_store_error_callback
error context callback, because the the entire transaction might
have been broken at this point.

Since logicalrep_typmap_gettypname() could search the sys cache by
calling to format_type_be(), store both local and remote type names
to SlotErrCallbackArg so that we can just set the names in the error
callback without sys cache lookup.

Author: Masahiko Sawada
---
 src/backend/replication/logical/worker.c | 50 +---
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1b1d70ed68..95f28c9acf 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -132,8 +132,9 @@ static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping);
 typedef struct SlotErrCallbackArg
 {
 	LogicalRepRelMapEntry *rel;
-	int			local_attnum;
 	int			remote_attnum;
+	char		*local_typename;
+	char		*remote_typename;
 } SlotErrCallbackArg;

 /*
@@ -430,30 +431,19 @@ static void
 slot_store_error_callback(void *arg)
 {
 	SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
-	LogicalRepRelMapEntry *rel;
-	char	   *remotetypname;
-	Oid			remotetypoid,
-localtypoid;
+	LogicalRepRelMapEntry *rel = errarg->rel;

 	/* Nothing to do if remote attribute number is not set */
 	if (errarg->remote_attnum < 0)
 		return;

-	rel = errarg->rel;
-	remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
-
-	/* Fetch remote type name from the LogicalRepTypMap cache */
-	remotetypname = logicalrep_typmap_gettypname(remotetypoid);
-
-	/* Fetch local type

Re: Reduce the time required for a database recovery from archive.

2021-01-10 Thread Dmitry Shulga

Hi StephenBased on our last discussion I redesigned the implementation of WAL archive recovery speed-up. The main idea of the new implementation was partly borrowed from your proposal, to be more accurate from the following one:On 9 Nov 2020, at 23:31, Stephen Frost  wrote:The relatively simple approach I was thinking was that a couple ofworkers would be started and they'd have some prefetch amount that needsto be kept out ahead of the applying process, which they couldpotentially calculate themselves without needing to be pushed forward bythe applying process.In the new implementation, several workers are spawned on server start up for delivering WAL segements from archive. The number of workers to spawn is specfied by the GUC parameter wal_prefetch_workers; the max. number of files to preload from the archive is determined by the GUC parameter wal_max_prefetch_amount. The applier of WAL records still handles WAL files one-by-one, but since several prefetching processes are loading files from the archive, there is a high probability that when the next WAL file is requested by the applier of WAL records, it has already been delivered from the archive.Every time any of the running workers is going to preload the next WAL file, it checks whether a limit imposed by the parameter wal_max_prefetch_amount was reached. If it was, then the process suspends preloading until the WAL applier process handles some of the already preloaded WAL files and the total number of already loaded but not yet processed WAL files drops below this limit.At the moment I didn't implement a mechanism for dynamic calculation of the number of workers required for loading the WAL files in time. We can consider current (simplified) implementation as a base for further discussion and turn to this matter in the next iteration if it be needed.Also I would like to ask your opinion about the issue I'm thinking about:  Parallel workers spawned for preloading WAL files from archive use the original mechanism for delivering files from archive - they run a command specified by the GUC parameter restore_command. One of the possible parameters accepted by the restore_command is %r, which specifies the filename of the last restart point. If several workers preload WAL files simultaneously with another process applying the preloaded WAL files, I’m not sure what is correct way to determine the last restart point value that WAL-preloading processes should use, because this value can be updated at any time by the process that applies WALs.Another issue that I would like to ask your opinion about regards to choosing correct value for a max size of the hash table stored in shared memory. Currently, wal_max_prefetch_amount is passed as the value for max. hash table size that I'm not sure is the best decision.Thanks in advance for feedback.Regards,Dmitry

0001-Reduce-time-required-to-recover-database-from-archiv.patch
Description: Binary data

39 matches

Mail list logo