Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-10-22 Thread Sadhuprasad Patro
On Fri, Oct 22, 2021 at 8:07 AM Japin Li  wrote:
>
>
> On Fri, 22 Oct 2021 at 08:26, Masahiko Sawada  wrote:
> > On Thu, Oct 21, 2021 at 11:18 PM Japin Li  wrote:
> >>
> >> How it breaks?
> >
> > I don't know the real case but for example, if an application gets
> > changes via pg_recvlogical with a decoding plugin (say wal2json) from
> > the database whose DateStyle setting is "SQL, MDY", it expects that
> > the date values in the streamed data are in the style of "ISO, MDY".
> > But with this change, it will get date values in the style of "ISO"
> > which could lead to a parse error in the application.
> >
> >>  The user also can specify the "options" to get date data
> >> in the style which they are wanted. Right?
> >
> > Right. But doesn't it mean breaking the compatibility?
> >
>
> Yeah, it might be break the compatibility.
>
> In conclusion, this bug has two ways to fix.
>
> In conclusion, this bug has two ways to fix.
>
> 1. Set the parameters on publisher, this might be break the compatibility.

Is it not possible to set the parameter on publisher as "ISO, MDY" or
"ISO, YMD", instead of only "ISO"?
DateStyle includes both, so we may set the parameter with date format...

> 2. Set the parameters on subscriber. In my first patch, I try to set the
>parameters after establish the connection, it will lead more network
>round trips. We can set the parameters when connecting the walsender
>using "options".
>
> For the second way, should we set the parameters same as subscriber or
> use the parameters (e.g. datestyle = "ISO") likes postgres_fdw
> set_transmission_modes()?
>
> Any thoughts?

IMO, setting the parameter value same as the subscriber is better. It
is always possible that we can set any datestyle in the plugins
itself...


Thanks & Regards
SadhuPrasad
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-10-22 Thread Masahiko Sawada
On Fri, Oct 22, 2021 at 2:25 PM Amit Kapila  wrote:
>
> On Thu, Oct 21, 2021 at 6:47 PM vignesh C  wrote:
> >
> >
> > Thanks for the comments, the attached v45 patch has the fix for the same.
> >
>
> The first patch is mostly looking good to me apart from the below
> minor comments:

Let me share other minor comments on v45-0001 patch:

>
> 1.
> +  
> +   The catalog pg_publication_namespace contains the
> +   mapping between schemas and publications in the database.  This is a
> +   many-to-many mapping.
>
> There are extra spaces after mapping at the end which are not required.

+   ADD and DROP  clauses will add and
+   remove one or more tables/schemas from the publication.  Note that adding
+   tables/schemas to a publication that is already subscribed to will require a

There is also an extra space after "adding".

-[ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
-  | FOR ALL TABLES ]
+[ FOR ALL TABLES
+  | FOR publication_object [, ... ] ]

Similarly, after "TABLES".

+
+ 
+  Specifying a table that is part of a schema specified by
+  FOR ALL TABLES IN SCHEMA is not supported.
+ 

And, after "by".

---

+static void
+AlterPublicationSchemas(AlterPublicationStmt *stmt,
+HeapTuple tup, List
*schemaidlist)
+{
(snip)
+PublicationAddSchemas(pubform->oid, schemaidlist, true, stmt);
+}
+
+return;
+}

The "return" at the end of the function is not necessary.

---
+if (pubobj->name)
+pubobj->pubobjtype =
PUBLICATIONOBJ_REL_IN_SCHEMA;
+else if (!pubobj->name && !pubobj->pubtable)
+pubobj->pubobjtype = PUBLICATIONOBJ_CURRSCHEMA;
+else if (!pubobj->name)
+ereport(ERROR,
+errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("invalid
schema name at or near"),
+
parser_errposition(pubobj->location));

I think it's better to change the last "else if" to just "else".

---
+
+if (schemarelids)
+{
+/*
+ * If the publication publishes
partition changes via their
+ * respective root partitioned
tables, we must exclude
+ * partitions in favor of including
the root partitioned
+ * tables. Otherwise, the function
could return both the child
+ * and parent tables which could
cause data of the child table
+ * to be double-published on the
subscriber side.
+ *
+ * XXX As of now, we do this when a
publication has associated
+ * schema or for all tables publication. See
+ * GetAllTablesPublicationRelations().
+ */
+tables =
list_concat_unique_oid(relids, schemarelids);
+if (publication->pubviaroot)
+tables =
filter_partitions(tables, schemarelids);
+}
+else
+tables = relids;
+
+}

There is an extra newline after "table = relids;".

The rest looks good to me.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Aleksander Alekseev
Hi Tom,

> BTW, I think there is an additional bug in copyTemplateDependencies:
> I do not see it initializing slot->tts_isnull[] anywhere.  It
> probably accidentally works (at least in devel builds) because we zero
> that memory somewhere else, but surely this code shouldn't assume that?

tts_isnull[] is zeroed in:
- copyTemplateDependencies
-- MakeSingleTupleTableSlot, which simply wraps:
--- MakeTupleTableSlot

... where the slot is allocated with palloc0. The assumption that
MakeSingleTupleTableSlot() returns valid TupleTableSlot* with zeroed
tts_isnull[] seems reasonable, no?

What confuses me is the fact that we have two procedures that do the
same thing. Maybe one is redundant.

-- 
Best regards,
Aleksander Alekseev




Add connection active, idle time to pg_stat_activity

2021-10-22 Thread Rafia Sabih
Hello there hackers,

We at Zalando have faced some issues around long running idle
transactions and were thinking about increasing the visibility of
pg_stat_* views to capture them easily. What I found is that currently
in pg_stat_activity there is a lot of good information about the
current state of the process, but it is lacking the cumulative
information on how much time the connection spent being idle, idle in
transaction or active, we would like to see cumulative values for each
of these per connection. I believe it would be helpful for us and more
people out there if we could have total connection active and idle
time displayed in pg_stat_activity.

To provide this information I was digging into how the statistics
collector is working and found out there is already information like
total time that a connection is active as well as idle computed in
pgstat_report_activity[1]. Ideally, this would be the values we would
like to see per process in pg_stat_activity.

Curious to know your thoughts on this.

[1]https://github.com/postgres/postgres/blob/cd3f429d9565b2e5caf0980ea7c707e37bc3b317/src/backend/utils/activity/backend_status.c#L593



-- 
Regards,
Rafia Sabih




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Michael Paquier
On Fri, Oct 22, 2021 at 10:48:57AM +0300, Aleksander Alekseev wrote:
> ... where the slot is allocated with palloc0. The assumption that
> MakeSingleTupleTableSlot() returns valid TupleTableSlot* with zeroed
> tts_isnull[] seems reasonable, no?

Yes, I don't see any need to do something more here.  The number of
arguments is fetched from the tuple descriptor itself, so the
allocation is sufficient.

> What confuses me is the fact that we have two procedures that do the
> same thing. Maybe one is redundant.

Do you have something in mind here?

Taking advantage of the catalog types and knowing that this is a
one-off, it is possible to recover dbid, classid, objid, objsubid and
refclassid.  deptype can be mostly guessed from refclassid, but the
real problem is that refobjid is just lost because of the casting to a
char from and Oid.

[ ... Thinks more ... ]

Hmm.  Wouldn't it be as simple as removing the entries in pg_shdepend
where dbid is NULL, and do an INSERT/SELECT with the existing entries
in pg_shdepend from the template database, updating dbid to the new
database?  That would require users to know which template they used
as origin, as well as we could assume that no shared deps have changed
but that can be guessed by looking at all the misplaced fields.  It is
true enough that users could make a lot of damage with chirurgy DMLs
on catalogs, though.
--
Michael


signature.asc
Description: PGP signature


Re: wait event and archive_command

2021-10-22 Thread Michael Paquier
On Thu, Oct 21, 2021 at 10:57:50PM +0900, Fujii Masao wrote:
> Also how about adding wait events for other commands like
> archive_cleanup_command, restore_command and recovery_end_command?

+1 to add something for all of them as we track the startup process in
pg_stat_activity.  Thinking with a larger picture, this comes down to
the usage of system().  We could introduce a small wrapper of system()
that takes as argument a wait event for the backend.
--
Michael


signature.asc
Description: PGP signature


Re: Adding CI to our tree

2021-10-22 Thread Vladimir Sitnikov
>Just add those files to the global gitignore on your machine

While global gitignore is a nice feature, it won't protect users who do not
know they need to create a global ignore file.
Adding explicit excludes for well-known temporary files into PostgreSQL
sources makes it easier to work with the sources for everybody.
Less manual configuration is better for having a productive environment.

On top of that, there is even a use-case for having .idea folder in Git:
.idea/icon.png and .idea/icon_dark.png files are displayed in JetBrains
IDEs so the list of projects becomes easier to distinguish.
AFAIK, a standard icon configuration does not yet exist:
https://github.com/editorconfig/editorconfig/issues/425

Here's are samples:
https://github.com/junit-team/junit5/blob/4ddc786728bc3fbc68d6a35d2eeeb63eb3e85609/.idea/icon.png
,
https://github.com/gradle/gradle/tree/1be71a9cd8882b08a9f8728d44eac8f65a33fbda/.idea

Vladimir


Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir

2021-10-22 Thread Amit Kapila
On Fri, Oct 8, 2021 at 4:39 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> At times, users want to know what are the files (snapshot and mapping
> files) that are available under pg_logical directory and also the
> spill files that are under pg_replslot directory and how much space
> they occupy.
>

Why can't you use pg_ls_dir to see the contents of pg_replslot? To
know the space taken by spilling, you might want to check
pg_stat_replication_slots[1] as that gives information about
spill_bytes.

[1] - 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW

-- 
With Regards,
Amit Kapila.




Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"

2021-10-22 Thread Bharath Rupireddy
On Fri, Oct 22, 2021 at 5:54 AM Jeff Davis  wrote:
>
> On Wed, 2021-10-20 at 21:35 +0530, Bharath Rupireddy wrote:
> > The  FATAL error "recovery ended before configured recovery target
> > was
> > reached" introduced by commit at [1] in PG 14 is causing the standby
> > to go down after having spent a good amount of time in recovery.
> > There
> > can be cases where the arrival of required WAL (for reaching recovery
> > target) from the archive location to the standby may take time and
> > meanwhile the standby failing with the FATAL error isn't good.
> > Instead, how about we make the standby wait for a certain amount of
> > time (with a GUC) so that it can keep looking for the required WAL.
>
> How is archiving configured, and would it be possible to introduce
> logic into the restore_command to handle slow-to-arrive WAL?

Thanks Jeff!

If the suggestion is to have the wait and retry logic embedded into
the user-written restore_command, IMHO, it's not a good idea as the
restore_command is external to the core PG and the FATAL error
"recovery ended before configured recovery target was reached" is an
internal thing. Having the retry logic (controlled with a GUC) within
the core, when the startup process hits the recovery end before the
target, is a better way and it is something the core PG can offer.
With this, the amount  of work spent in recovery by the standby isn't
wasted if the GUC is enabled with the right value. The optimal value
someone can set is the average time it takes for the WAL to reach
archive location from the primary + from archive location to the
standby. By default, we can disable the new GUC with value 0 so that
whoever wants can set it.

Regards,
Bharath Rupireddy.




Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir

2021-10-22 Thread Bharath Rupireddy
On Fri, Oct 22, 2021 at 3:18 PM Amit Kapila  wrote:
>
> On Fri, Oct 8, 2021 at 4:39 PM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > At times, users want to know what are the files (snapshot and mapping
> > files) that are available under pg_logical directory and also the
> > spill files that are under pg_replslot directory and how much space
> > they occupy.
> >
>
> Why can't you use pg_ls_dir to see the contents of pg_replslot? To
> know the space taken by spilling, you might want to check
> pg_stat_replication_slots[1] as that gives information about
> spill_bytes.
>
> [1] - 
> https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW

Thanks Amit!

pg_ls_dir gives the list of directories and files, but not their
sizes. And it looks like the spill_bytes from
pg_stat_replication_slots is the accumulated byte count (see [1]), not
the current size of the spills files, so it's not representing the
spill files and their size at that moment.

If we have  pg_ls_logicaldir and pg_ls_replslotdir returning the
files, szies, and last modified times, it will be useful in production
environments to see the disk usage of those files at the current
moment. The data from these functions can be fed to an external
analytics tool invoking the functions at regular intervals of time and
report the disk usage of these folders. This will be super useful to
analyze the questions like: Was the disk usage more at time t1? What
happened to my database system at that time? etc. And,  these
functions can run independent of the stats collector process which is
currently required for the pg_stat_replication_slots view.

Thoughts?

I plan to work on a patch if okay.

[1]
postgres=# select
pg_ls_dir('/home/bharath/postgres/inst/bin/data/pg_replslot/mysub');
 pg_ls_dir
---
 state
(1 row)

postgres=# select * from pg_stat_replication_slots;
 slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes | stats_reset
---++-+-+-+--+--++-+-
 mysub |  3 |   6 |   39600 |   0 |
0 |0 |  5 |   396001128 |
(1 row)

Regards,
Bharath Rupireddy.




Re: Multi-Column List Partitioning

2021-10-22 Thread Zhihong Yu
On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav 
wrote:

> > While testing further I got a crash with partition wise join enabled for
> multi-col list partitions. please find test case & stack-trace below.
>
> Thanks for sharing. I have fixed the issue in the attached patch.
>
> Thanks & Regards,
> Nitin Jadhav
>
>
>>
>> Hi,

+isListBoundDuplicated(List *list_bounds, List *new_bound)

+   Const   *value1 = castNode(Const, list_nth(elem, i));
+   Const   *value2 = castNode(Const, list_nth(new_bound, i));

Should the upper bound for index i take into account the length of
new_bound ?
If the length of new_bound is always the same as that for elem, please add
an assertion.

For transformPartitionListBounds():
+   deparse_expression((Node *) list_nth(partexprs, j),
+
 deparse_context_for(RelationGetRelationName(parent),
+
 RelationGetRelid(parent)),

Please consider calling RelationGetRelationName(parent)
and RelationGetRelid(parent) (and assigning to local variables) outside the
loop.

+get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)

get_list_datum_count -> get_list_datums_count

For partition_bounds_equal():

+   if (b1->isnulls)
+   b1_isnull = b1->isnulls[i][j];
+   if (b2->isnulls)
+   b2_isnull = b2->isnulls[i][j];

Should the initialization of b1_isnull and b2_isnull be done inside the
loop (so that they don't inherit value from previous iteration) ?

Cheers


Re: prevent immature WAL streaming

2021-10-22 Thread Amul Sul
On Thu, Oct 14, 2021 at 6:14 PM Amul Sul  wrote:
>
> On Wed, Oct 13, 2021 at 10:58 PM Alvaro Herrera  
> wrote:
> >
> > On 2021-Oct-13, Amul Sul wrote:
> >
> > > I have one more question, regarding the need for other global
> > > variables i.e. abortedRecPtr.  (Sorry for coming back after so long.)
> > >
> > > Instead of abortedRecPtr point, isn't enough to write
> > > overwrite-contrecord at XLogCtl->lastReplayedEndRecPtr?  I think both
> > > are pointing to the same location then can't we use
> > > lastReplayedEndRecPtr instead of abortedRecPtr to write
> > > overwrite-contrecord and remove need of extra global variable, like
> > > attached?
> >
> > I'm a bit fuzzy on the difference "the end+1" and "the start of the next
> > record".  Are they always the same?  We do have XLogRecPtrToBytePos()
> > and XLogBytePosToEndRecPtr() to convert unadorned XLogRecPtr values to
> > "usable byte positions", which suggests to me that the proposed patch
> > may fail if end+1 is a page or segment boundary.
> >
>
> Yes, you are correct, that could be a possible failure.
>
> How about calculating that from the lastReplayedEndRecPtr by
> converting it first to "usable byte positions" and then recalculating
> the record pointer from that, like attached?
>

Any thoughts about the patch posted previously?

Regards,
Amul




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Oct 22, 2021 at 10:48:57AM +0300, Aleksander Alekseev wrote:
>> ... where the slot is allocated with palloc0. The assumption that
>> MakeSingleTupleTableSlot() returns valid TupleTableSlot* with zeroed
>> tts_isnull[] seems reasonable, no?

> Yes, I don't see any need to do something more here.

That assumption is exactly what I'm objecting to.  I don't think
we make it in other places, and I don't like making it here.
(By "here" I mean all of e3931d0, because it made the same omission
in several places.)

The primary reason why I think it's a bad idea is that only one
path in MakeSingleTupleTableSlot provides a pre-zeroed tts_isnull
array --- if you don't supply a tuple descriptor at creation,
the assumption falls down.  So even if this coding technique is
safe where it is, it is a hazard for anyone copying the code into
some other context.

I might be happier if we tried to guarantee that *every* way of
creating a slot will end with a pre-zeroed isnull array, and then
got rid of any thereby-duplicative memsets.  But that would be
a lot more invasive than just making these places get in step.

regards, tom lane




Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()

2021-10-22 Thread Aleksander Alekseev
Hi hackers,

During the discussion [1] it was discovered that we have two
procedures in execTuples.c that do the same thing:

* MakeSingleTupleTableSlot()
* MakeTupleTableSlot()

In fact, MakeSingleTupleTableSlot() is simply a wrapper for
MakeTupleTableSlot().

I propose keeping only one of these procedures to simplify navigating
through the code and debugging, and maybe saving a CPU cycle or two. A
search for MakeTupleTableSlot produced 8 matches across 2 files, while
MakeSingleTupleTableSlot is used 41 times across 26 files. Thus the
proposed patch removes MakeTupleTableSlot and keeps
MakeSingleTupleTableSlot to keep the patch less invasive and simplify
backporting of the other patches. Hopefully, this will not complicate
the life of the extension developers too much.

The patch was tested on MacOS against `master` branch b1ce6c28.

[1]: 
https://www.postgresql.org/message-id/flat/CAJ7c6TP0AowkUgNL6zcAK-s5HYsVHVBRWfu69FRubPpfwZGM9A%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev


v1-0001-refactoring-MakeTupleTableSlot.patch
Description: Binary data


Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Aleksander Alekseev
Hi Michael,

> Do you have something in mind here?

Yep. This is not a priority though, thus I created a separate CF entry:

https://commitfest.postgresql.org/35/3371/

-- 
Best regards,
Aleksander Alekseev




Re: parallelizing the archiver

2021-10-22 Thread Magnus Hagander
On Thu, Oct 21, 2021 at 11:05 PM Robert Haas  wrote:

> On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost  wrote:
> > restore_command used to be in recovery.conf, which disappeared with v12
> > and it now has to go into postgresql.auto.conf or postgresql.conf.
> >
> > That's a huge breaking change.
>
> Not in the same sense. Moving the functionality to a different
> configuration file can and probably did cause a lot of problems for
> people, but the same basic functionality was still available.
>

Yeah.

And as a bonus it got a bunch of people to upgrade their backup software
that suddenly stopped working. Or in some case, to install backup software
instead of using the hand-rolled scripts. So there were some good
side-effects specifically to breaking it as well.



(Also, I'm pretty sure that the recovery.conf changes would have
> happened years earlier if there hadn't been backward compatibility
> concerns, from Simon in particular. So saying that there was "hardly
> any complaint raised" in that case doesn't seem to me to be entirely
> accurate.)
>
> > > Also, more to the point, when there's a need to break backward
> > > compatibility in order to get some improvement, it's worth
> > > considering, but here there just isn't.
> >
> > There won't be any thought towards a backwards-incompatible capability
> > if everyone is saying that we can't possibly break it.  That's why I was
> > commenting on it.
>
> I can't speak for anyone else, but that is not what I am saying. I am
> open to the idea of breaking it if we thereby get some valuable
> benefit which cannot be obtained otherwise. But Nathan has now
> implemented something which, from the sound of it, will allow us to
> obtain all of the available benefits with no incompatibilities. If we
> think of additional benefits that we cannot obtain without
> incompatibilities, then we can consider that situation when it arises.
> In the meantime, there's no need to go looking for reasons to break
> stuff that works in existing releases.
>

 Agreed.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: parallelizing the archiver

2021-10-22 Thread Magnus Hagander
On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan  wrote:

> On 10/20/21, 3:23 PM, "Bossart, Nathan"  wrote:
> > Alright, I reworked the patch a bit to maintain backward
> > compatibility.  My initial intent for 0001 was to just do a clean
> > refactor to move the shell archiving stuff to its own file.  However,
> > after I did that, I realized that adding the hook wouldn't be too much
> > more work, so I did that as well.  This seems to be enough to support
> > custom archiving modules.  I included a basic example of such a module
> > in 0002.  0002 is included primarily for demonstration purposes.
>
> It looks like the FreeBSD build is failing because sys/wait.h is
> missing.  Here is an attempt at fixing that.
>

I still like the idea of loading the library via a special parameter,
archive_library or such.

One reason for that is that adding/removing modules in
shared_preload_libraries has a terrible UX in that you have to replace the
whole thing. This makes it much more complex to deal with when different
modules just want to add to it.

E.g. my awsome backup program could set
archive_library='my_awesome_backups', and know it didn't break anything
else. but it couldn't set  shared_preload_libraries='my_awesome_bacukps',
because then it might break a bunch of other modules that used to be there.
So it has to go try to parse the whole config and figure out where to make
such modifications.

Now, this could *also* be solved by allowing shared_preload_library to be a
"list" instead of a string, and allow postgresql.conf to accept syntax like
shared_preload_libraries+='my_awesome_backups'.

But without that level fo functionality available, I think a separate
parameter for the archive library would be a good thing.

Other than that:
+
+/*
+ * Is WAL archiving configured?  For consistency with previous releases,
this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly
configured.
+ */
+#define XLogArchivingConfigured() \
+   (PG_archive && (PG_archive != shell_archive ||
XLogArchiveCommand[0] != '\0'))


Wouldn't that be better as a callback into the module? So that
shell_archive would implement the check for XLogArchiveCommand. Then
another third party module can make it's own decision on what to check. And
PGarchive would then be a struct that holds a function pointer to the
archive command and another function pointer to the isenabled command? (I
think having a struct for it would be useful regardless -- for possible
future extensions with more API points).

--
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-21 22:13:22 -0400, Tom Lane wrote:
>> I've thought about doing something like
>> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
>> but in cases with tens of thousands of tables, it seems unlikely that
>> that's going to behave all that nicely.

> That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
> made it unnest('{oid, oid, ...}'), that scales much better.

I'm skeptical of that, mainly because it doesn't work in old servers,
and I really don't want to maintain two fundamentally different
versions of getTableAttrs().  I don't think you actually need the
multi-array form of unnest() here --- we know the TableInfo array
is in OID order --- but even the single-array form only works
back to 8.4.

However ... looking through getTableAttrs' main query, it seems
like the only thing there that's potentially unsafe is the
"format_type(t.oid, a.atttypmod)" call.  I wonder if it could be
sane to convert it into a single query that just scans all of
pg_attribute, and then deal with creating the formatted type names
separately, perhaps with an improved version of getFormattedTypeName
that could cache the results for non-default typmods.  The main
knock on this approach is the temptation for somebody to stick some
unsafe function into the query in future.  We could stick a big fat
warning comment into the code, but lately I despair of people reading
comments.

> To see where it's worth putting in time it'd be useful if getSchemaData() in
> verbose mode printed timing information...

I've been running test cases with log_min_duration_statement = 0,
which serves well enough.

regards, tom lane




Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi,

On 2021-10-22 10:53:31 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-10-21 22:13:22 -0400, Tom Lane wrote:
> >> I've thought about doing something like
> >> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
> >> but in cases with tens of thousands of tables, it seems unlikely that
> >> that's going to behave all that nicely.
> 
> > That's kinda what I'm doing in the quick hack. But instead of using IN(...) 
> > I
> > made it unnest('{oid, oid, ...}'), that scales much better.
> 
> I'm skeptical of that, mainly because it doesn't work in old servers,
> and I really don't want to maintain two fundamentally different
> versions of getTableAttrs().  I don't think you actually need the
> multi-array form of unnest() here --- we know the TableInfo array
> is in OID order --- but even the single-array form only works
> back to 8.4.

I think we can address that, if we think it's overall a promising approach to
pursue. E.g. if we don't need the indexes, we can make it = ANY().


> However ... looking through getTableAttrs' main query, it seems
> like the only thing there that's potentially unsafe is the
> "format_type(t.oid, a.atttypmod)" call.

I assume the default expression bit would also be unsafe?

Greetings,

Andres Freund




Re: pgstat_assert_is_up() can fail in walsender

2021-10-22 Thread Andres Freund
Hi,

On 2021-10-22 02:14:40 +0900, Fujii Masao wrote:
> On 2021/10/19 22:14, Amit Langote wrote:
> > Hi,
> > 
> > I can (almost) consistently reproduce $subject by executing the
> > attached shell script, which I was using while constructing a test
> > case for another thread.
> 
> This seems the same issue that was reported at the thread [1].
> 
> [1]
> https://www.postgresql.org/message-id/os0pr01mb571621b206eeb17d8ab171f094...@os0pr01mb5716.jpnprd01.prod.outlook.com

Sorry for not working on that sooner, I got distracted. Experimenting with a
somewhat more fundamental fix for this. I'll not finish that today, but I'll
try to have something out monday.

Greetings,

Andres Freund




Re: Added schema level support for publication.

2021-10-22 Thread vignesh C
On Fri, Oct 22, 2021 at 11:59 AM Greg Nancarrow  wrote:
>
> On Fri, Oct 22, 2021 at 12:41 PM Greg Nancarrow  wrote:
> >
> > I was also previously concerned about what the behavior should be when
> > only including just the partitions of a partitioned table in a
> > publication using ALL TABLES IN SCHEMA and having
> > publish_via_partition_root=true. It seems to implicitly include the
> > partitioned table in the publication. But I did some testing and found
> > that this is the current behavior when only the partitions are
> > individually included in a publication using TABLE, so it seems to be
> > OK.
> >
>
> Thinking some more about this, it still may still be confusing to the
> user if not explicitly stated in the ALL TABLES IN SCHEMA case.
> How about adding some additional explanation to the end of the
> following paragraph:
>
> + 
> +  When a partitioned table is published via schema level publication, all
> +  of its existing and future partitions irrespective of it being from the
> +  publication schema or not are implicitly considered to be part of the
> +  publication. So, even operations that are performed directly on a
> +  partition are also published via publications that its ancestors are
> +  part of.
> + 
>
> Something like:
>
> Similarly, if a partition is published via schema level publication
> and publish_via_partition_root=true, the parent partitioned table is
> implicitly considered to be part of the publication, irrespective of
> it being from the publication schema or not.

I have made this change in the v46 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm3kBrMO5EyEgK_TyOrBuw%2BRvdJ2mJfpWb5e8FbuKg2cQw%40mail.gmail.com

Regards,
Vignesh




Re: Added schema level support for publication.

2021-10-22 Thread vignesh C
On Fri, Oct 22, 2021 at 1:03 PM Masahiko Sawada  wrote:
>
> On Fri, Oct 22, 2021 at 2:25 PM Amit Kapila  wrote:
> >
> > On Thu, Oct 21, 2021 at 6:47 PM vignesh C  wrote:
> > >
> > >
> > > Thanks for the comments, the attached v45 patch has the fix for the same.
> > >
> >
> > The first patch is mostly looking good to me apart from the below
> > minor comments:
>
> Let me share other minor comments on v45-0001 patch:
>
> >
> > 1.
> > +  
> > +   The catalog pg_publication_namespace contains 
> > the
> > +   mapping between schemas and publications in the database.  This is a
> > +   many-to-many mapping.
> >
> > There are extra spaces after mapping at the end which are not required.

Modified

> +   ADD and DROP  clauses will add and
> +   remove one or more tables/schemas from the publication.  Note that adding
> +   tables/schemas to a publication that is already subscribed to will 
> require a
>
> There is also an extra space after "adding".

Modified

> -[ FOR TABLE [ ONLY ]  class="parameter">table_name [ * ] [, ...]
> -  | FOR ALL TABLES ]
> +[ FOR ALL TABLES
> +  | FOR  class="parameter">publication_object [, ... ] ]
>
> Similarly, after "TABLES".

Modified

> +
> + 
> +  Specifying a table that is part of a schema specified by
> +  FOR ALL TABLES IN SCHEMA is not supported.
> + 
>
> And, after "by".

Modified

> ---
>
> +static void
> +AlterPublicationSchemas(AlterPublicationStmt *stmt,
> +HeapTuple tup, List
> *schemaidlist)
> +{
> (snip)
> +PublicationAddSchemas(pubform->oid, schemaidlist, true, 
> stmt);
> +}
> +
> +return;
> +}
>
> The "return" at the end of the function is not necessary.

Modified

> ---
> +if (pubobj->name)
> +pubobj->pubobjtype =
> PUBLICATIONOBJ_REL_IN_SCHEMA;
> +else if (!pubobj->name && !pubobj->pubtable)
> +pubobj->pubobjtype = 
> PUBLICATIONOBJ_CURRSCHEMA;
> +else if (!pubobj->name)
> +ereport(ERROR,
> +
> errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("invalid
> schema name at or near"),
> +
> parser_errposition(pubobj->location));
>
> I think it's better to change the last "else if" to just "else".

Modified

> ---
> +
> +if (schemarelids)
> +{
> +/*
> + * If the publication publishes
> partition changes via their
> + * respective root partitioned
> tables, we must exclude
> + * partitions in favor of including
> the root partitioned
> + * tables. Otherwise, the function
> could return both the child
> + * and parent tables which could
> cause data of the child table
> + * to be double-published on the
> subscriber side.
> + *
> + * XXX As of now, we do this when a
> publication has associated
> + * schema or for all tables publication. See
> + * GetAllTablesPublicationRelations().
> + */
> +tables =
> list_concat_unique_oid(relids, schemarelids);
> +if (publication->pubviaroot)
> +tables =
> filter_partitions(tables, schemarelids);
> +}
> +else
> +tables = relids;
> +
> +}
>
> There is an extra newline after "table = relids;".

Removed it

I have made this change in the v46 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm3kBrMO5EyEgK_TyOrBuw%2BRvdJ2mJfpWb5e8FbuKg2cQw%40mail.gmail.com

Regards,
Vignesh




Re: XTS cipher mode for cluster file encryption

2021-10-22 Thread Stephen Frost
Greetings,

* Sasasu (i...@sasa.su) wrote:
> On 2021/10/22 01:28, Stephen Frost wrote:
> >None of these are actually working with or changing the data though,
> >they're just copying it.  I don't think we'd actually want these to
> >decrypt and reencrypt the data.
> 
> OK, but why ALTER TABLE SET TABLESPACE use smgrread() and smgrextend()
> instead of copy_file().
> TDE needs to modify these code paths, and make the patch bigger.

Tables and databases are handled differently, yes.

With ALTER TABLE SET TABLESPACE, we're allocating a new refilenode and
WAL'ing the table as FPIs.  What happens with databases is fundamentally
different- no one is allowed to be connected to the database being moved
and we write a single 'database changed tablespace' record in the WAL
for this case.  When it comes to TDE, this probably is actually helpful
as we're going to likely want the relfilenode to be included as part of
the IV.

> On 2021/10/22 01:28, Stephen Frost wrote:
> > No, the CTR approach isn't great because, as has been discussed quite a
> > bit already, using the LSN as the IV means that different data might be
> > re-encrypted with the same LSN and that's not an acceptable thing to
> > have happen with CTR.
> On 2021/10/22 01:28, Stephen Frost wrote:
> > Thankfully, for WAL
> > (unlike heap and index blocks) we don't really have that issue- we
> > hopefully aren't going to write different WAL records at the same LSN
> > and so using the LSN there should be alright.
> On 2021/10/22 01:28, Stephen Frost wrote:
> > We've discussed at length how using CTR for heap isn't a good idea even
> > if we're using the LSN for the IV, while if we use XTS then we don't
> > have the issues that CTR has with IV re-use and using the LSN (plus
> > block number and perhaps other things).  Nothing in what has been
> > discussed here has really changed anything there that I can see and so
> > it's unclear to me why we continue to go round and round with it.
> 
> I am not re-discuss using CTR for heap table. I mean use some CTR-like
> algorithm *only* for WAL encryption. My idea is exactly the same when you
> are typing "we hopefully aren't going to write different WAL records at the
> same LSN and so using the LSN there should be alright."

I don't like the idea of "CTR-like".  What's wrong with using CTR for
WAL encryption?  Based on the available information, that seems like the
exact use-case for CTR.

> The point of disagreement between you and me is only on the block-based API.

I'm glad to hear that, at least.

Thanks,

Stephen


signature.asc
Description: PGP signature


Allow root ownership of client certificate key

2021-10-22 Thread David Steele

Hackers,

I noticed recently that permissions checking is done differently for the 
server certificate key than the client key. Specifically, on the server 
the key can have 640 perms if it is owned by root.


On the server side this change was made in 9a83564c and I think the same 
rational applies equally well to the client key. At the time managed 
keys on the client may not have been common but they are now.


Attached is a patch to make this change.

I was able to this this manually by hacking 001_ssltests.pl like so:

-   chmod 0640, "ssl/${key}_tmp.key"
+   chmod 0600, "ssl/${key}_tmp.key"
  or die "failed to change permissions on ssl/${key}_tmp.key: $!";
-   system_or_bail("sudo chown root ssl/${key}_tmp.key");

But this is clearly not going to work for general purpose testing. The 
server keys also not tested for root ownership so perhaps we do not need 
that here either.


I looked at trying to make this code common between the server and 
client but due to the differences in error reporting it seemed like more 
trouble than it was worth.


Regards,
--
-David
da...@pgmasters.netdiff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
index 3a7cc8f774..285e772170 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1234,11 +1234,38 @@ initialize_SSL(PGconn *conn)
  fnbuf);
return -1;
}
+
+   /*
+   * Refuse to load key files owned by users other than us or root.
+   *
+   * XXX surely we can check this on Windows somehow, too.
+   */
+#ifndef WIN32
+   if (buf.st_uid != geteuid() && buf.st_uid != 0)
+   {
+   appendPQExpBuffer(&conn->errorMessage,
+ 
libpq_gettext("private key file \"%s\" must be owned by the current user or 
root\n"),
+ fnbuf);
+   return -1;
+   }
+#endif
+
+   /*
+   * Require no public access to key file. If the file is owned by 
us,
+   * require mode 0600 or less. If owned by root, require 0640 or 
less to
+   * allow read access through our gid, or a supplementary gid 
that allows
+   * to read system-wide certificates.
+   *
+   * XXX temporarily suppress check when on Windows, because there 
may not
+   * be proper support for Unix-y file permissions.  Need to think 
of a
+   * reasonable check to apply on Windows.
+   */
 #ifndef WIN32
-   if (!S_ISREG(buf.st_mode) || buf.st_mode & (S_IRWXG | S_IRWXO))
+   if ((buf.st_uid == geteuid() && buf.st_mode & (S_IRWXG | 
S_IRWXO)) ||
+   (buf.st_uid == 0 && buf.st_mode & (S_IWGRP | S_IXGRP | 
S_IRWXO)))
{
appendPQExpBuffer(&conn->errorMessage,
- 
libpq_gettext("private key file \"%s\" has group or world access; permissions 
should be u=rw (0600) or less\n"),
+ 
libpq_gettext("private key file \"%s\" has group or world access; file must 
have permissions u=rw (0600) or less if owned by the current user, or 
permissions u=rw,g=r (0640) or less if owned by root.\n"),
  fnbuf);
return -1;
}


Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund  writes:
> On 2021-10-22 10:53:31 -0400, Tom Lane wrote:
>> I'm skeptical of that, mainly because it doesn't work in old servers,

> I think we can address that, if we think it's overall a promising approach to
> pursue. E.g. if we don't need the indexes, we can make it = ANY().

Hmm ... yeah, I guess we could get away with that.  It might not scale
as nicely to a huge database, but probably dumping a huge database
from an ancient server isn't all that interesting.

I'm inclined to think that it could be sane to make getTableAttrs
and getIndexes use this style, but we probably still want functions
and such to use per-object queries.  In those other catalogs there
are many built-in objects that we don't really care about.  The
prepared-queries hack I was working on last night is probably plenty
good enough there, and it's a much less invasive patch.

Were you planning to pursue this further, or did you want me to?
I'd want to layer it on top of the work I did at [1], else there's
going to be lots of merge conflicts.

regards, tom lane

[1] https://www.postgresql.org/message-id/2273648.1634764485%40sss.pgh.pa.us




Re: [RFC] building postgres with meson

2021-10-22 Thread John Naylor
On Thu, Oct 21, 2021 at 5:48 PM Andres Freund  wrote:

> However, update-unicode is a bit harder.  Partially not directly because
of
> meson, but because update-unicode as-is afaict doesn't support VPATH
builds,
> and meson enforces those.

> make update-unicode
> ...
> make -C src/common/unicode update-unicode
> '/usr/bin/perl' generate-unicode_norm_table.pl
> Can't open perl script "generate-unicode_norm_table.pl": No such file or
directory
>
> It's not too hard to fix. See attached for the minimal stuff that I
> immediately found to be needed.

Thanks for doing that, it works well enough for demonstration. With your
patch, and using an autoconf VPATH build, the unicode tables work fine, but
it complains of a permission error in generate_unaccent_rules.py. That
seems to be because the script is invoked directly rather than as an
argument to the python interpreter.

> The slightly bigger issue making update-unicode work with meson is that
meson
> doesn't provide support for invoking build targets in specific directories
> (because it doesn't map nicely to e.g. msbuild). But scripts like
> src/common/unicode/generate-unicode_norm_table.pl rely on CWD. It's not
hard
> to work around that, but IMO it's better for such scripts to not rely on
CWD.

Yeah. I encountered a further issue: With autoconf on HEAD, with a source
tree build executed in contrib/unaccent:

$ touch generate_unaccent_rules.py
$ make update-unicode
generate_unaccent_rules.py --unicode-data-file
../../src/common/unicode/UnicodeData.txt --latin-ascii-file Latin-ASCII.xml
>unaccent.rules
/bin/sh: generate_unaccent_rules.py: command not found
make: *** [unaccent.rules] Error 127
make: *** Deleting file `unaccent.rules'

...so in this case it seems not to know to use CWD here.

Anyway, this can be put off until the very end, since it's not run often.
You've demonstrated how these targets would work, and that's good enough
for now.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir

2021-10-22 Thread Justin Pryzby
On Fri, Oct 22, 2021 at 04:18:04PM +0530, Bharath Rupireddy wrote:
> On Fri, Oct 22, 2021 at 3:18 PM Amit Kapila  wrote:
> > On Fri, Oct 8, 2021 at 4:39 PM Bharath Rupireddy 
> >  wrote:
> > >
> > > At times, users want to know what are the files (snapshot and mapping
> > > files) that are available under pg_logical directory and also the
> > > spill files that are under pg_replslot directory and how much space
> > > they occupy.
> >
> > Why can't you use pg_ls_dir to see the contents of pg_replslot? To
> 
> Thanks Amit!
> 
> pg_ls_dir gives the list of directories and files, but not their sizes.

Returning sizes is already possible by using pg_stat_file:

ts=# SELECT dd, a, ls, stat.* FROM (SELECT current_setting('data_directory') AS 
dd, 'pg_logical' AS a) AS a, pg_ls_dir(a) AS ls, pg_stat_file(dd ||'/'|| a 
||'/'|| ls) AS stat ;
   dd   | a  |  ls   | size | 
access |  modification  | change | creation | 
isdir 
++---+--++++--+---
 /var/lib/pgsql/14/data | pg_logical | replorigin_checkpoint |8 | 
2021-10-22 08:20:30-06 | 2021-10-22 08:20:30-06 | 2021-10-22 08:20:30-06 |  
| f
 /var/lib/pgsql/14/data | pg_logical | mappings  | 4096 | 
2021-10-21 19:54:19-06 | 2021-10-15 19:50:35-06 | 2021-10-15 19:50:35-06 |  
| t
 /var/lib/pgsql/14/data | pg_logical | snapshots | 4096 | 
2021-10-21 19:54:19-06 | 2021-10-15 19:50:35-06 | 2021-10-15 19:50:35-06 |  
| t

I agree that this isn't a very friendly query, so I had created a patch adding
pg_ls_dir_metadata():
https://commitfest.postgresql.org/33/2377/

postgres=# SELECT * FROM pg_ls_dir_metadata('pg_logical');
   filename| size | access |  modification  
| change | creation | type |   path   
---+--++++--+--+--
 mappings  | 4096 | 2021-10-22 09:15:29-05 | 2021-10-22 09:15:29-05 
| 2021-10-22 09:15:29-05 |  | d| pg_logical/mappings
 replorigin_checkpoint |8 | 2021-10-22 09:15:47-05 | 2021-10-22 09:15:45-05 
| 2021-10-22 09:15:45-05 |  | -| pg_logical/replorigin_checkpoint
 . | 4096 | 2021-10-22 09:16:23-05 | 2021-10-22 09:15:45-05 
| 2021-10-22 09:15:45-05 |  | d| pg_logical/.
 ..| 4096 | 2021-10-22 09:16:01-05 | 2021-10-22 09:15:47-05 
| 2021-10-22 09:15:47-05 |  | d| pg_logical/..
 snapshots | 4096 | 2021-10-22 09:15:29-05 | 2021-10-22 09:15:29-05 
| 2021-10-22 09:15:29-05 |  | d| pg_logical/snapshots
(5 rows)

I concluded that it's better to add a function to list metadata of an arbitrary
dir, rather than adding more functions to handle specific, hardcoded dirs:
https://www.postgresql.org/message-id/flat/20191227170220.ge12...@telsasoft.com

-- 
Justin




Re: Multi-Column List Partitioning

2021-10-22 Thread Zhihong Yu
On Fri, Oct 22, 2021 at 3:50 AM Zhihong Yu  wrote:

>
>
> On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav <
> nitinjadhavpostg...@gmail.com> wrote:
>
>> > While testing further I got a crash with partition wise join enabled
>> for multi-col list partitions. please find test case & stack-trace below.
>>
>> Thanks for sharing. I have fixed the issue in the attached patch.
>>
>> Thanks & Regards,
>> Nitin Jadhav
>>
>>
>>>
>>> Hi,
>
> +isListBoundDuplicated(List *list_bounds, List *new_bound)
>
> +   Const   *value1 = castNode(Const, list_nth(elem, i));
> +   Const   *value2 = castNode(Const, list_nth(new_bound, i));
>
> Should the upper bound for index i take into account the length of
> new_bound ?
> If the length of new_bound is always the same as that for elem, please add
> an assertion.
>
> For transformPartitionListBounds():
> +   deparse_expression((Node *) list_nth(partexprs, j),
> +
>  deparse_context_for(RelationGetRelationName(parent),
> +
>  RelationGetRelid(parent)),
>
> Please consider calling RelationGetRelationName(parent)
> and RelationGetRelid(parent) (and assigning to local variables) outside the
> loop.
>
> +get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)
>
> get_list_datum_count -> get_list_datums_count
>
> For partition_bounds_equal():
>
> +   if (b1->isnulls)
> +   b1_isnull = b1->isnulls[i][j];
> +   if (b2->isnulls)
> +   b2_isnull = b2->isnulls[i][j];
>
> Should the initialization of b1_isnull and b2_isnull be done inside the
> loop (so that they don't inherit value from previous iteration) ?
>
> Cheers
>

Hi,
Continuing review.

+* For the multi-column case, we must make an BoolExpr that

an BoolExpr -> a BoolExpr

In get_qual_for_list(), it would be better if repetitive code can be
extracted into a helper method:

+   if (val->constisnull)
+   {
+   NullTest   *nulltest = makeNode(NullTest);
+
+   key_is_null[j] = true;
+
+   nulltest->arg = keyCol[j];
+   nulltest->nulltesttype = IS_NULL;
+   nulltest->argisrow = false;
+   nulltest->location = -1;
+
+   if (key->partnatts > 1)
+   and_args = lappend(and_args, nulltest);
+   else
+   is_null_test = (Expr *) nulltest;
+   }
+   else
+   {
+   if (key->partnatts > 1)
+   {
+   Expr *opexpr =
+   make_partition_op_expr(key, j,
+  BTEqualStrategyNumber,
+  keyCol[j],
+  (Expr *) val);
+   and_args = lappend(and_args, opexpr);
+   }
+   else
+   datum_elem = (Expr *) val;
+   }

For match_clause_to_partition_key():

+   if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+   {
+   *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
+   return PARTCLAUSE_MATCH_NULLNESS;
+   }
+   else

Since the if block ends with return, the 'else' is not needed - else block
can be indented to the left.

get_min_and_max_off(): I think get_min_and_max_offset as method name would
be more informative.

+   Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
+  boundinfo->datums[off],
+  boundinfo->isnulls[off],
+  values, isnulls, nvalues));

If the 'while (off >= 1)' loop exits without modifying off, is the above
assertion always true (can boundinfo->datums[off] be accessed without
checking bound) ?

Cheers


Re: Assorted improvements in pg_dump

2021-10-22 Thread Hans Buschmann
Hello Tom!

I noticed you are improving pg_dump just now.

Some time ago I experimented with a customer database dump in parallel 
directory mode -F directory -j (2-4)

I noticed it took quite long to complete.

Further investigation showed that in this mode with multiple jobs the tables 
are processed in decreasing size order, which makes sense to avoid a long tail 
of a big table in one of the jobs prolonging overall dump time.

Exactly one table took very long, but seemed to be of moderate size.

But the size-determination fails to consider the size of toast tables and this 
table had a big associated toast-table of bytea column(s).
Even with an analyze at loading time there where no size information of the 
toast-table in the catalog tables.

I thought of the following alternatives to ameliorate:

1. Using pg_table_size() function in the catalog query
Pos: This reflects the correct size of every relation
Neg: This goes out to disk and may take a huge impact on databases with very 
many tables

2. Teaching vacuum to set the toast-table size like it sets it on normal tables

3. Have a command/function for occasionly setting the (approximate) size of 
toast tables 

I think with further work under the way (not yet ready), pg_dump can really 
profit from parallel/not compressing mode, especially considering the huge 
amount of bytea/blob/string data in many big customer scenarios.

Thoughts?

Hans Buschmann




Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-10-22 Thread Japin Li

On Fri, 22 Oct 2021 at 15:00, Sadhuprasad Patro  wrote:
> On Fri, Oct 22, 2021 at 8:07 AM Japin Li  wrote:
>>
>>
>> On Fri, 22 Oct 2021 at 08:26, Masahiko Sawada  wrote:
>> > On Thu, Oct 21, 2021 at 11:18 PM Japin Li  wrote:
>> >>
>> >> How it breaks?
>> >
>> > I don't know the real case but for example, if an application gets
>> > changes via pg_recvlogical with a decoding plugin (say wal2json) from
>> > the database whose DateStyle setting is "SQL, MDY", it expects that
>> > the date values in the streamed data are in the style of "ISO, MDY".
>> > But with this change, it will get date values in the style of "ISO"
>> > which could lead to a parse error in the application.
>> >
>> >>  The user also can specify the "options" to get date data
>> >> in the style which they are wanted. Right?
>> >
>> > Right. But doesn't it mean breaking the compatibility?
>> >
>>
>> Yeah, it might be break the compatibility.
>>
>> In conclusion, this bug has two ways to fix.
>>
>> In conclusion, this bug has two ways to fix.
>>
>> 1. Set the parameters on publisher, this might be break the compatibility.
>
> Is it not possible to set the parameter on publisher as "ISO, MDY" or
> "ISO, YMD", instead of only "ISO"?
> DateStyle includes both, so we may set the parameter with date format...
>
>> 2. Set the parameters on subscriber. In my first patch, I try to set the
>>parameters after establish the connection, it will lead more network
>>round trips. We can set the parameters when connecting the walsender
>>using "options".
>>
>> For the second way, should we set the parameters same as subscriber or
>> use the parameters (e.g. datestyle = "ISO") likes postgres_fdw
>> set_transmission_modes()?
>>
>> Any thoughts?
>
> IMO, setting the parameter value same as the subscriber is better. It
> is always possible that we can set any datestyle in the plugins
> itself...
>

Attach v5 patch.  This patch set the datestyle, intervalstyle and
extra_float_digits parameters when we connect to publisher, this can
avoid the network round trips (compare with the first patch).

OTOH, the patch uses the subscriber's parameters as connecting parameters,
which is more complex.  If we use the parameters likes postgres_fdw
set_transmission_mode(), the code will be easier [1].


[1]
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 5c6e56a5b2..0d03edd39f 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -128,8 +128,8 @@ libpqrcv_connect(const char *conninfo, bool logical, const 
char *appname,
 {
WalReceiverConn *conn;
PostgresPollingStatusType status;
-   const char *keys[5];
-   const char *vals[5];
+   const char *keys[6];
+   const char *vals[6];
int i = 0;

/*
@@ -155,6 +155,8 @@ libpqrcv_connect(const char *conninfo, bool logical, const 
char *appname,
{
keys[++i] = "client_encoding";
vals[i] = GetDatabaseEncodingName();
+   keys[++i] = "options";
+   vals[i] = "-c datestyle=ISO,\\ YMD -c intervalstyle=postgres 
extra_float_digits=3";
}
keys[++i] = NULL;
vals[i] = NULL;

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 5c6e56a5b2..c1fb4fd3d4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -30,6 +30,7 @@
 #include "pqexpbuffer.h"
 #include "replication/walreceiver.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/pg_lsn.h"
 #include "utils/tuplestore.h"
@@ -82,6 +83,8 @@ static WalRcvExecResult *libpqrcv_exec(WalReceiverConn *conn,
 	   const int nRetTypes,
 	   const Oid *retTypes);
 static void libpqrcv_disconnect(WalReceiverConn *conn);
+static char *add_backslash(const char *value);
+static char *get_transmission_modes(void);
 
 static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_connect,
@@ -128,9 +131,10 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 {
 	WalReceiverConn *conn;
 	PostgresPollingStatusType status;
-	const char *keys[5];
-	const char *vals[5];
+	const char *keys[6];
+	const char *vals[6];
 	int			i = 0;
+	char	   *options;
 
 	/*
 	 * We use the expand_dbname parameter to process the connection string (or
@@ -155,6 +159,10 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	{
 		keys[++i] = "client_encoding";
 		vals[i] = GetDatabaseEncodingName();
+
+		options = get_transmission_modes();
+		keys[++i] = "options";
+		vals[i] = options;
 	}
 	keys[++i] = NULL;
 	vals[i] = NULL;
@@ -164,6 +172

Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-10-22 Thread Tom Lane
Japin Li  writes:
> Attach v5 patch.  This patch set the datestyle, intervalstyle and
> extra_float_digits parameters when we connect to publisher, this can
> avoid the network round trips (compare with the first patch).

You could make it a little less confusing by not insisting on a
space in the datestyle.  This should work fine:

vals[i] = "-c datestyle=ISO,YMD -c intervalstyle=postgres 
extra_float_digits=3";

Also, I think some comments would be appropriate.

I don't see any value whatsoever in the more complicated version
of the patch.  It's just more code to maintain and more things
to go wrong.  And not only at our level, but the DBA's too.
What if the subscriber and publisher are of different PG versions
and have different ideas of the valid values of these settings?

regards, tom lane




Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?

2021-10-22 Thread Bossart, Nathan
On 10/21/21, 6:51 PM, "Bharath Rupireddy" 
 wrote:
> Here's v3 for further review.

I've marked this as ready-for-committer.  The only other feedback I
would offer is nitpicking at the test code to clean it up a little
bit, but I don't think it is necessary to block on that.

Nathan



Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund  writes:
>> On 2021-10-21 18:27:25 -0400, Tom Lane wrote:
>>> (a) the executable size increases by a few KB --- apparently, even
>>> the minimum subset of simplehash.h's functionality is code-wasteful.

> If I prevent the compiler from inlining findObjectByCatalogId() in all the
> find*ByOid() routines, your version is smaller than master even without other
> changes.

Hmm ... seems to depend a lot on which compiler you use.

I was originally looking at it with gcc 8.4.1 (RHEL8 default),
x86_64.  On that, adding pg_noinline to findObjectByCatalogId
helps a little, but it's still 3.6K bigger than HEAD.

I then tried gcc 11.2.1/x86_64, finding that the patch adds
about 2K and pg_noinline makes no difference.

I also tried it on Apple's clang 13.0.0, both x86_64 and ARM
versions.  On that, the change seems to be a wash or slightly
smaller, with maybe a little benefit from pg_noinline but not
much.  It's hard to tell for sure because size(1) seems to be
rounding off to a page boundary on that platform.

Anyway, these are all sub-one-percent changes in the code
size, so probably we should not sweat that much about it.
I'm kind of leaning now towards pushing the patch, just
on the grounds that getting rid of all those single-purpose
index arrays (and likely future need for more of them)
is worth it from a maintenance perspective.

regards, tom lane




Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-10-22 Thread Japin Li

On Sat, 23 Oct 2021 at 00:55, Tom Lane  wrote:
> Japin Li  writes:
>> Attach v5 patch.  This patch set the datestyle, intervalstyle and
>> extra_float_digits parameters when we connect to publisher, this can
>> avoid the network round trips (compare with the first patch).
>
> You could make it a little less confusing by not insisting on a
> space in the datestyle.  This should work fine:
>
>   vals[i] = "-c datestyle=ISO,YMD -c intervalstyle=postgres 
> extra_float_digits=3";
>

Oh. My apologies.  I try this style before, but find it see "ISO," is not valid,
so I add backslash, but it seems like that is my environment doesn't cleanup.

Fixed.

> Also, I think some comments would be appropriate.
>

Add comments for it.

> I don't see any value whatsoever in the more complicated version
> of the patch.  It's just more code to maintain and more things
> to go wrong.  And not only at our level, but the DBA's too.

Agreed.

> What if the subscriber and publisher are of different PG versions
> and have different ideas of the valid values of these settings?
>

Sorry, I'm a bit confused.  Do you mean we should provide a choose for user
to set thoses parameters when establish logical replication?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 5c6e56a5b2..2c1a598300 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -128,8 +128,8 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 {
 	WalReceiverConn *conn;
 	PostgresPollingStatusType status;
-	const char *keys[5];
-	const char *vals[5];
+	const char *keys[6];
+	const char *vals[6];
 	int			i = 0;
 
 	/*
@@ -155,6 +155,13 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	{
 		keys[++i] = "client_encoding";
 		vals[i] = GetDatabaseEncodingName();
+
+		/*
+		 * Force assorted GUC parameters to settings that ensure that we'll output
+		 * data values in a form that is unambiguous to the publisher.
+		 */
+		keys[++i] = "options";
+		vals[i] = "-c datestyle=ISO,YMD -c intervalstyle=postgres -c extra_float_digits=3";
 	}
 	keys[++i] = NULL;
 	vals[i] = NULL;
diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index baa4a90771..d92ab60d86 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 7;
 
 # Bug #15114
 
@@ -224,3 +224,69 @@ $node_sub->safe_psql('postgres', "DROP TABLE tab1");
 $node_pub->stop('fast');
 $node_pub_sub->stop('fast');
 $node_sub->stop('fast');
+
+# Verify different datestyle between publisher and subscriber.
+$node_publisher = PostgresNode->new('datestyle_publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	"datestyle = 'SQL, MDY'");
+$node_publisher->append_conf('postgresql.conf',
+	"extra_float_digits = '-4'");
+$node_publisher->start;
+
+$node_subscriber = PostgresNode->new('datestyle_subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->append_conf('postgresql.conf',
+	"datestyle = 'SQL, DMY'");
+$node_subscriber->start;
+
+# Table for datestyle
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE tab_rep(a date)");
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE tab_rep(a date)");
+
+# Table for extra_float_digits
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE flt_rep(a real, d double precision)");
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE flt_rep(a real, d double precision)");
+
+# Setup logical replication
+my $node_publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION tab_pub FOR ALL TABLES");
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tab_sub CONNECTION '$node_publisher_connstr' PUBLICATION tab_pub");
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO tab_rep VALUES ('07-18-2021'), ('05-15-2018')");
+
+$node_publisher->wait_for_catchup('tab_sub');
+
+my $result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*) FROM tab_rep");
+is($result, qq(2), 'failed to replication date from different datestyle');
+
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO flt_rep VALUES (1.2121323, 32.32321232132434)");
+
+$node_publisher->wait_for_catchup('tab_sub');
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT a, d FROM flt_rep");
+is($result, qq(1.2121323|32.32321232132434),
+	'failed to replication floating-point values');
+
+# Clean up the tables on both publisher and subscriber as we don't need them
+$node_publisher->safe_psql('postgres', 'DROP TABLE tab_rep');
+$node_subscr

Re: parallelizing the archiver

2021-10-22 Thread Bossart, Nathan
On 10/22/21, 7:43 AM, "Magnus Hagander"  wrote:
> I still like the idea of loading the library via a special
> parameter, archive_library or such.

My first attempt [0] added a GUC like this, so I can speak to some of
the interesting design decisions that follow.

The simplest thing we could do would be to add the archive_library GUC
and to load that just like the library is at the end of
shared_preload_libraries.  This would mean that the archive library
could be specified in either GUC, and there would effectively be no
difference between the two.

The next thing we could consider doing is adding a new boolean called
process_archive_library_in_progress, which would be analogous to
process_shared_preload_libraries_in_progress.  If a library is loaded
from the archive_library GUC, its _PG_init() will be called with
process_archive_library_in_progress set.  This also means that if a
library is specified in both shared_preload_libraries and
archive_library, we'd call its _PG_init() twice.  The library could
then branch based on whether
process_shared_preload_libraries_in_progress or
process_archive_library_in_progress was set.

Another approach would be to add a new initialization function (e.g.,
PG_archive_init()) that would be used if the library is being loaded
from archive_library.  Like before, you can use the library for both
shared_preload_libraries and archive_library, but your initialization
logic would be expected to go in separate functions.  However, there
still wouldn't be anything forcing that.  A library could still break
the rules and do everything in _PG_init() and be loaded via
shared_preload_libraries.

One more thing we could do is to discover the relevant symbols for
archiving in library loading function.  Rather than expecting the
initialization function to set the hook correctly, we'd just look up
the _PG_archive() function during loading.  Again, a library could
probably still break the rules and do everything in
_PG_init()/shared_preload_libraries, but there would at least be a
nicer interface available.

I believe the main drawbacks of going down this path are the
additional complexity in the backend and the slippery slope of adding
all kinds of new GUCs in the future.  My original patch also tried to
do something similar for some other shell command GUCs
(archive_cleanup_command, restore_command, and recovery_end_command).
While I'm going to try to keep this focused on archive_command for
now, presumably we'd eventually want the ability to use hooks for all
of these things.  I don't know if we really want to incur a new GUC
for every single one of these.  To be clear, I'm not against adding a
GUC if it seems like the right thing to do.  I just want to make sure
we are aware of the tradeoffs compared to a simple
shared_preload_libraries approach with its terrible UX.

> Wouldn't that be better as a callback into the module? So that
> shell_archive would implement the check for XLogArchiveCommand. Then
> another third party module can make it's own decision on what to
> check. And PGarchive would then be a struct that holds a function
> pointer to the archive command and another function pointer to the
> isenabled command? (I think having a struct for it would be useful
> regardless -- for possible future extensions with more API points).

+1.  This crossed my mind, too.  I'll add this in the next revision.

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com



Re: Make mesage at end-of-recovery less scary.

2021-10-22 Thread Bossart, Nathan
On 3/4/21, 10:50 PM, "Kyotaro Horiguchi"  wrote:
> As the result, the following messages are emitted with the attached.

I'd like to voice my support for this effort, and I intend to help
review the patch.  It looks like the latest patch no longer applies,
so I've marked the commitfest entry [0] as waiting-on-author.

Nathan

[0] https://commitfest.postgresql.org/35/2490/



Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-10-22 Thread Tom Lane
Japin Li  writes:
> On Sat, 23 Oct 2021 at 00:55, Tom Lane  wrote:
>> What if the subscriber and publisher are of different PG versions
>> and have different ideas of the valid values of these settings?

> Sorry, I'm a bit confused.  Do you mean we should provide a choose for user
> to set thoses parameters when establish logical replication?

No, I'm just pointing out that pushing the subscriber's settings
to the publisher wouldn't be guaranteed to work.  As long as we
use curated values that we know do what we want on all versions,
I think we're okay.

regards, tom lane




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Daniel Gustafsson
> On 22 Oct 2021, at 15:38, Tom Lane  wrote:
> 
> Michael Paquier  writes:
>> On Fri, Oct 22, 2021 at 10:48:57AM +0300, Aleksander Alekseev wrote:
>>> ... where the slot is allocated with palloc0. The assumption that
>>> MakeSingleTupleTableSlot() returns valid TupleTableSlot* with zeroed
>>> tts_isnull[] seems reasonable, no?
> 
>> Yes, I don't see any need to do something more here.
> 
> That assumption is exactly what I'm objecting to.  I don't think
> we make it in other places, and I don't like making it here.
> (By "here" I mean all of e3931d0, because it made the same omission
> in several places.)

The attached fixes the the two ones I spotted, are there any I missed?
Regardless of if we want to change the API (as discussed elsewhere here and in
a new thread), something like the attached should be done first and in 14 I
think.

--
Daniel Gustafsson   https://vmware.com/



tts_isnull_zeroed.diff
Description: Binary data


Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi, 

On October 22, 2021 8:54:13 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On 2021-10-22 10:53:31 -0400, Tom Lane wrote:
>>> I'm skeptical of that, mainly because it doesn't work in old servers,
>
>> I think we can address that, if we think it's overall a promising approach to
>> pursue. E.g. if we don't need the indexes, we can make it = ANY().
>
>Hmm ... yeah, I guess we could get away with that.  It might not scale
>as nicely to a huge database, but probably dumping a huge database
>from an ancient server isn't all that interesting.

I think compared to the overhead of locking that many tables and sending O(N) 
queries it shouldn't be a huge factor.

One think that looks like it might be worth doing, and not hard, is to use 
single row mode. No need to materialize all that data twice in memory.


At a later stage it might be worth sending the array separately as a parameter. 
Perhaps even binary encoded.


>I'm inclined to think that it could be sane to make getTableAttrs
>and getIndexes use this style, but we probably still want functions
>and such to use per-object queries.  In those other catalogs there
>are many built-in objects that we don't really care about.  The
>prepared-queries hack I was working on last night is probably plenty
>good enough there, and it's a much less invasive patch.

Yes, that seems reasonable. I think the triggers query would benefit from the 
batch approach though - I see that taking a long time in aggregate on a test 
database with many tables I had around (partially due to the self join), and we 
already materialize it.


>Were you planning to pursue this further, or did you want me to?

It seems too nice an improvement to drop on the floor. That said, I don't 
really have the mental bandwidth to pursue this beyond the POC stage - it 
seemed complicated enough that suggestion accompanied by a prototype was a good 
idea. So I'd be happy for you to incorporate this into your other changes.


>I'd want to layer it on top of the work I did at [1], else there's
>going to be lots of merge conflicts.

Makes sense. Even if nobody else were doing anything in the area I'd probably 
want to split it into one commit creating the query once, and then separately 
implement the batching.

Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 22 Oct 2021, at 15:38, Tom Lane  wrote:
>> (By "here" I mean all of e3931d0, because it made the same omission
>> in several places.)

> The attached fixes the the two ones I spotted, are there any I missed?

Ah, you're right, InsertPgAttributeTuples is the only other spot in
that patch that's actually touching slots.  I'd skimmed it a little
too quickly.

regards, tom lane




Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund  writes:
> On October 22, 2021 8:54:13 AM PDT, Tom Lane  wrote:
>> Were you planning to pursue this further, or did you want me to?

> It seems too nice an improvement to drop on the floor. That said, I don't 
> really have the mental bandwidth to pursue this beyond the POC stage - it 
> seemed complicated enough that suggestion accompanied by a prototype was a 
> good idea. So I'd be happy for you to incorporate this into your other 
> changes.

Cool, I'll see what I can do with it, as long as I'm poking around
in the area.

regards, tom lane




Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Andres Freund
Hi, 

On October 22, 2021 10:32:30 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>>> On 2021-10-21 18:27:25 -0400, Tom Lane wrote:
 (a) the executable size increases by a few KB --- apparently, even
 the minimum subset of simplehash.h's functionality is code-wasteful.
>
>> If I prevent the compiler from inlining findObjectByCatalogId() in all the
>> find*ByOid() routines, your version is smaller than master even without other
>> changes.
>
>Hmm ... seems to depend a lot on which compiler you use.

Inline heuristics change a lot over time, so that'd make sense.

I see some win by marking pg_log_error cold. That might be useful more 
generally too.


Which made me look at the code invoking it from simplehash. I think the patch 
that made simplehash work in frontend code isn't quite right, because 
pg_log_error() returns...


Wonder if we should mark simplehash's grow as noinline? Even with a single 
caller it seems better to not inline it to remove register allocator pressure.


>Anyway, these are all sub-one-percent changes in the code
>size, so probably we should not sweat that much about it.
>I'm kind of leaning now towards pushing the patch, just
>on the grounds that getting rid of all those single-purpose
>index arrays (and likely future need for more of them)
>is worth it from a maintenance perspective.

+1

The only thought I had wrt the patch is that I'd always create the hash table. 
That way the related branches can be removed, which is a win code size wise (as 
well as speed presumably, but I think we're far away from that mattering).


This type of code is where I most wish for a language with proper generic data 
types/containers...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Mikhail
On Mon, Oct 18, 2021 at 10:07:40AM +1300, Thomas Munro wrote:
> On Mon, Oct 18, 2021 at 4:49 AM Mikhail  wrote:
> > The logic works - the initial call to semget() in
> > InternalIpcSemaphoreCreate returns -1 and errno is set to ENOSPC - I
> > tested the patch on OpenBSD 7.0, it successfully recycles sem's after
> > previous "pkill -6 postgres". Verified it with 'ipcs -s'.
> 
> Since you mentioned OpenBSD, what do you think of the idea of making
> named POSIX semas the default on that platform?  You can't run out of
> those practically speaking, but then you get lots of little memory
> mappings (from memory, at least it does close the fd for each one,
> unlike some other OSes where we wouldn't want to use this technique).
> Trivial patch:
> 
> https://www.postgresql.org/message-id/CA%2BhUKGJVSjiDjbJpHwUrvA1TikFnJRfyJanrHofAWhnqcDJayQ%40mail.gmail.com
> 
> No strong opinion on the tradeoffs here, as I'm not an OpenBSD user,
> but it's something I think about whenever testing portability stuff
> there and having to adjust the relevant sysctls.
> 
> Note: The best kind would be *unnamed* POSIX semas, where we get to
> control their placement in existing memory; that's what we do on Linux
> and FreeBSD.  They weren't supported on OpenBSD last time we checked:
> it rejects requests for shared ones.  I wonder if someone could
> implement them with just a few lines of user space code, using atomic
> counters and futex() for waiting.

Hello, sorry for not replying earlier - I was able to think about and
test the patch only on the weekend.

I totally agree with your approach, in conversation with one of the
OpenBSD developers he supported using of sem_open(), because most ports
use it and consistency is desirable across our ports tree. It looks like
PostgreSQL was the only port to use semget().

Switching to sem_open() also looks much safer than patching sysv_sema.c
for corner ENOSPC case as Tom already mentioned.

In your patch I've removed testing for 5.x versions, because official
releases are supported only for one year, no need to worry about them.
The patch is tested with 'make installcheck', also I can confirm that
'ipcs' shows that no semaphores are used, and server starts normally
after 'pkill -6 postgres' with the default semmns sysctl, what was the
original motivation for the work.


diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index d74d1ed7af..2dfea0662b 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -998,21 +998,7 @@ psql: error: connection to server on socket 
"/tmp/.s.PGSQL.5432" failed: No such

 The default shared memory settings are usually good enough, unless
 you have set shared_memory_type to 
sysv.
-You will usually want to
-increase kern.seminfo.semmni
-and kern.seminfo.semmns,
-as OpenBSD's default settings
-for these are uncomfortably small.
-   
-
-   
-IPC parameters can be adjusted using sysctl,
-for example:
-
-# sysctl kern.seminfo.semmni=100
-
-To make these settings persist over reboots, modify
-/etc/sysctl.conf.
+System V semaphores are not used on this platform.

 
   
diff --git a/src/template/openbsd b/src/template/openbsd
index 365268c489..41221af382 100644
--- a/src/template/openbsd
+++ b/src/template/openbsd
@@ -2,3 +2,7 @@
 
 # Extra CFLAGS for code that will go into a shared library
 CFLAGS_SL="-fPIC -DPIC"
+
+# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
+# without changing any sysctl settings, unlike System V semaphores.
+USE_NAMED_POSIX_SEMAPHORES=1




Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
Andres Freund  writes:
> Which made me look at the code invoking it from simplehash. I think the patch 
> that made simplehash work in frontend code isn't quite right, because 
> pg_log_error() returns...

Indeed, that's broken.  I guess we want pg_log_fatal then exit(1).

> Wonder if we should mark simplehash's grow as noinline? Even with a single 
> caller it seems better to not inline it to remove register allocator pressure.

Seems plausible --- you want me to go change that?

> The only thought I had wrt the patch is that I'd always create the hash
> table.

That'd require adding an explicit init function and figuring out where to
call it, which we could do but I didn't (and don't) think it's worth the
trouble.  One more branch here isn't going to matter, especially given
that we can't even measure the presumed macro improvement.

regards, tom lane




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Tom Lane
Mikhail  writes:
> In your patch I've removed testing for 5.x versions, because official
> releases are supported only for one year, no need to worry about them.

Official support or no, we have OpenBSD 5.9 in our buildfarm, so
ignoring the case isn't going to fly.

regards, tom lane




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Thomas Munro
On Sat, Oct 23, 2021 at 8:43 AM Tom Lane  wrote:
> Mikhail  writes:
> > In your patch I've removed testing for 5.x versions, because official
> > releases are supported only for one year, no need to worry about them.
>
> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
> ignoring the case isn't going to fly.

It was a test for < 5.5, so that aspect's OK.




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Mikhail
On Fri, Oct 22, 2021 at 03:43:00PM -0400, Tom Lane wrote:
> Mikhail  writes:
> > In your patch I've removed testing for 5.x versions, because official
> > releases are supported only for one year, no need to worry about them.
> 
> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
> ignoring the case isn't going to fly.

5.9 has support for unnamed POSIX semas. Do you think new machine with
OpenBSD <5.5 (when unnamed POSIX semas were introduced) can appear in
buildfarm or be used by real customer?

I have no objections on testing "openbsd5.[01234]" and using SysV semas
there and can redo and test the patch, but isn't it over caution?




Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Tom Lane
Mikhail  writes:
> On Fri, Oct 22, 2021 at 03:43:00PM -0400, Tom Lane wrote:
>> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
>> ignoring the case isn't going to fly.

> 5.9 has support for unnamed POSIX semas. Do you think new machine with
> OpenBSD <5.5 (when unnamed POSIX semas were introduced) can appear in
> buildfarm or be used by real customer?

Nah, I misunderstood you to say that 5.9 would also be affected.

regards, tom lane




Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-10-22 Thread Bossart, Nathan
On 9/5/21, 11:33 PM, "Bossart, Nathan"  wrote:
> Attached is an attempt at cleaning the patch up and adding an
> informative comment and a test case.

For future reference, there was another thread for this bug [0], and a
fix was committed [1].

Nathan

[0] 
https://postgr.es/m/CAA3qoJnr2%2B1dVJObNtfec%3DqW4Z0nz%3DA9%2Br5bZKoTSy5RDjskMw%40mail.gmail.com
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2acc84c



Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"

2021-10-22 Thread Jeff Davis
On Fri, 2021-10-22 at 15:34 +0530, Bharath Rupireddy wrote:
> If the suggestion is to have the wait and retry logic embedded into
> the user-written restore_command, IMHO, it's not a good idea as the
> restore_command is external to the core PG and the FATAL error
> "recovery ended before configured recovery target was reached" is an
> internal thing. 

It seems likely that you'd want to tweak the exact behavior for the
given system. For instance, if the files are making some progress, and
you can estimate that in 2 more minutes everything will be fine, then
you may be more willing to wait those two minutes. But if no progress
has happened since recovery began 15 minutes ago, you may want to fail
immediately.

All of this nuance would be better captured in a specialized script
than a generic timeout in the server code.

What do you want to do after the timeout happens? If you want to issue
a WARNING instead of failing outright, perhaps that makes sense for
exploratory PITR cases. That could be a simple boolean GUC without
needing to introduce the timeout logic into the server.

I think it's an interesting point that it can be hard to choose a
reasonable recovery target if the system is completely down. We could
use some better tooling or metadata around the lsns, xids or timestamp
ranges available in a pg_wal directory or an archive. Even better would
be to see the available named restore points. This would make is easier
to calculate how long recovery might take for a given restore point, or
whether it's not going to work at all because there's not enough WAL.

Regards,
Jeff Davis






Re: Experimenting with hash tables inside pg_dump

2021-10-22 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> Wonder if we should mark simplehash's grow as noinline? Even with a single 
>> caller it seems better to not inline it to remove register allocator 
>> pressure.

> Seems plausible --- you want me to go change that?

Hmm, harder than it sounds.  If I remove "inline" from SH_SCOPE then
the compiler complains about unreferenced static functions, while
if I leave it there than adding pg_noinline causes a complaint about
conflicting options.  Seems like we need a less quick-and-dirty
approach to dealing with unnecessary simplehash support functions.

regards, tom lane




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Daniel Gustafsson
> On 22 Oct 2021, at 20:34, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 22 Oct 2021, at 15:38, Tom Lane  wrote:
>>> (By "here" I mean all of e3931d0, because it made the same omission
>>> in several places.)
> 
>> The attached fixes the the two ones I spotted, are there any I missed?
> 
> Ah, you're right, InsertPgAttributeTuples is the only other spot in
> that patch that's actually touching slots.  I'd skimmed it a little
> too quickly.

Thanks for confirming, unless there are objections I'll apply the fix to master
and backpatch to 14.

--
Daniel Gustafsson   https://vmware.com/





pg_dump versus ancient server versions

2021-10-22 Thread Tom Lane
While doing some desultory testing, I realized that the commit
I just pushed (92316a458) broke pg_dump against 8.0 servers:

$ pg_dump -p5480 -s regression
pg_dump: error: schema with OID 11 does not exist

The reason turns out to be something I'd long forgotten about: except
for the few "bootstrap" catalogs, our system catalogs didn't use to
have fixed OIDs.  That changed at 7c13781ee, but 8.0 predates that.
So when pg_dump reads a catalog on 8.0, it gets some weird number for
"tableoid", and the logic I just put into common.c's findNamespaceByOid
et al fails to find the resulting DumpableObjects.

So my first thought was just to revert 92316a458 and give up on it as
a bad idea.  However ... does anyone actually still care about being
able to dump from such ancient servers?  In addition to this issue,
I'm thinking of the discussion at [1] about wanting to use unnest()
in pg_dump, and of what we would need to do instead in pre-8.4 servers
that lack that.  Maybe it'd be better to move up pg_dump's minimum
supported server version to 8.4 or 9.0, and along the way whack a
few more lines of its backward-compatibility hacks.  If there is
anyone out there still using an 8.x server, they could use its
own pg_dump whenever they get around to migration.

Another idea would be to ignore "tableoid" and instead use the OIDs
we're expecting, but that's way too ugly for my taste, especially
given the rather thin argument for committing 92316a458 at all.

Anyway, I think the default answer is "revert 92316a458 and keep the
compatibility goalposts where they are".  But I wanted to open up a
discussion to see if anyone likes the other approach better.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/20211022055939.z6fihsm7hdzbjttf%40alap3.anarazel.de




Re: [PATCH] Fix memory corruption in pg_shdepend.c

2021-10-22 Thread Michael Paquier
On Fri, Oct 22, 2021 at 10:49:38PM +0200, Daniel Gustafsson wrote:
> On 22 Oct 2021, at 20:34, Tom Lane  wrote:
>> Ah, you're right, InsertPgAttributeTuples is the only other spot in
>> that patch that's actually touching slots.  I'd skimmed it a little
>> too quickly.
> 
> Thanks for confirming, unless there are objections I'll apply the fix to 
> master
> and backpatch to 14.

Fine by me.  The patch looks OK.
--
Michael


signature.asc
Description: PGP signature


Re: pg_dump versus ancient server versions

2021-10-22 Thread David G. Johnston
On Fri, Oct 22, 2021 at 3:42 PM Tom Lane  wrote:

> Anyway, I think the default answer is "revert 92316a458 and keep the
> compatibility goalposts where they are".  But I wanted to open up a
> discussion to see if anyone likes the other approach better.
>
> [1]
> https://www.postgresql.org/message-id/20211022055939.z6fihsm7hdzbjttf%40alap3.anarazel.de
>
>
I'd rather drop legacy support than revert.  Even if the benefit of
92316a456 of is limited to refactoring the fact it was committed is enough
for me to feel it is a worthwhile improvement.  It's still yet another five
years before there won't be a supported release that can dump/restore this
- so 20 years for someone to have upgraded without having to go to the (not
that big a) hassle of installing an out-of-support version as a stop-over.

In short, IMO, the bar for this kind of situation should be 10 releases at
most - 5 of which would be in support at the time the patch goes in.  We
don't have to actively drop support of older stuff but anything older
shouldn't be preventing new commits.

David J.


Re: Reserve prefixes for loaded libraries proposal

2021-10-22 Thread Florin Irion
Hi,

Il giorno gio 21 ott 2021 alle ore 08:02 Michael Paquier <
mich...@paquier.xyz> ha scritto:
>
> On Thu, Sep 30, 2021 at 11:54:04PM +0200, Florin Irion wrote:
> > We could also help users get a warning if they set a parameter with the
> > `SET`
> > command. I've seen many cases where users make typos and break things
badly,
> > check the following example:
> >
> > ```
> > postgres=# BEGIN;
> > BEGIN
> > postgres=*# SET plpgsql.no_such_setting = false;
> > SET
> > postgres=*# -- do critical queries taking into account that
> > plpgsql.no_such_setting is false;
> > postgres=*# COMMIT;
> > COMMIT
> > ```
>
> Could you give a more concrete example here?  What kind of critical
> work are you talking about here when using prefixes?  Please note that
> I am not against the idea of improving the user experience in this
> area as that's inconsistent, as you say.
> --
> Michael

Thank you for the interest.

I used `plpgsql` in my example/test because it's something that many of
us know already.

However, for example, pglogical2
 has the
`pglogical.conflict_resolution`
configuration parameter that can be set to one of the following:

```
error
apply_remote
keep_local
last_update_wins
first_update_wins
```

You can imagine that conflicting queries could have different outcomes
based on this setting.
IMHO there are other settings like this, in other extensions, that can
be critical.

In any case, even setting something that is not critical could still
be important for a user, one example, `pg_stat_statements.track`.

Cheers,
Florin

--
Florin Irion
www.enterprisedb.com


Re: pg_dump versus ancient server versions

2021-10-22 Thread Robert Haas
On Fri, Oct 22, 2021 at 6:42 PM Tom Lane  wrote:
> So my first thought was just to revert 92316a458 and give up on it as
> a bad idea.  However ... does anyone actually still care about being
> able to dump from such ancient servers?

I think I recently heard about an 8.4 server still out there in the
wild, but AFAICR it's been a long time since I've heard about anything
older.

It seems to me that if you're upgrading by a dozen server versions in
one shot, it's not a totally crazy idea that you might want to do it
in steps, or use the pg_dump for the version you have and then hack
the dump. I kind of wonder if there's really any hope of a pain-free
upgrade across that many versions anyway. There are things that can
bite you despite all the work we've put into pg_dump, like having
objects that depend on system objects whose definition has changed
over the years, plus implicit casting differences, operator precedence
changes, => getting deprecated, lots of GUC changes, etc. You are
going to be able to upgrade in the end, but it's probably going to
take some work. So I'm not really sure that giving up pg_dump
compatibility for versions that old is losing as much as it may seem.

Another thing to think about in that regard: how likely is that
PostgreSQL 7.4 and PostgreSQL 15 both compile and run on the same
operating system? I suspect the answer is "not very." I seem to recall
Greg Stark trying to compile really old versions of PostgreSQL for a
conference talk some years ago, and he got back to a point where it
just became impossible to make work on modern toolchains even with a
decent amount of hackery. One tends to think of C as about as static a
thing as can be, but that's kind of missing the point. On my laptop
for example, my usual configure invocation fails on 7.4 with:

checking for SSL_library_init in -lssl... no
configure: error: library 'ssl' is required for OpenSSL

In fact, I get that same failure on every branch older than 9.2. I
expect I could work around that by disabling SSL or finding an older
version of OpenSSL that works the way those branches expect, but that
might not be the only problem, either.  Now I understand you could
have PostgreSQL 15 on a new box and PostgreSQL 7.x on an ancient one
and connect via the network, and it would in all fairness be cool if
that Just Worked. But I suspect that even if that did happen in the
lab, reality wouldn't often be so kind.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-22 Thread Tom Lane
"David G. Johnston"  writes:
> On Fri, Oct 22, 2021 at 3:42 PM Tom Lane  wrote:
>> Anyway, I think the default answer is "revert 92316a458 and keep the
>> compatibility goalposts where they are".  But I wanted to open up a
>> discussion to see if anyone likes the other approach better.

> ... IMO, the bar for this kind of situation should be 10 releases at
> most - 5 of which would be in support at the time the patch goes in.  We
> don't have to actively drop support of older stuff but anything older
> shouldn't be preventing new commits.

Yeah.  I checked into when it was that we dropped pre-8.0 support
from pg_dump, and the answer is just about five years ago (64f3524e2).
So moving the bar forward by five releases isn't at all out of line.
8.4 would be eight years past EOL by the time v15 comes out.

One of the arguments for the previous change was that it was getting
very hard to build old releases on modern platforms, thus making it
hard to do any compatibility testing.  I believe the same is starting
to become true of the 8.x releases, though I've not tried personally
to build any of them in some time.  (The executables I'm using for
them date from 2014 or earlier, and have not been recompiled in
subsequent platform upgrades ...)  Anyway it's definitely not free
to continue to support old source server versions.

regards, tom lane




Re: parallelizing the archiver

2021-10-22 Thread Robert Haas
On Fri, Oct 22, 2021 at 1:42 PM Bossart, Nathan  wrote:
> Another approach would be to add a new initialization function (e.g.,
> PG_archive_init()) that would be used if the library is being loaded
> from archive_library.  Like before, you can use the library for both
> shared_preload_libraries and archive_library, but your initialization
> logic would be expected to go in separate functions.  However, there
> still wouldn't be anything forcing that.  A library could still break
> the rules and do everything in _PG_init() and be loaded via
> shared_preload_libraries.

I was imagining something like what logical decoding does. In that
case, you make a _PG_output_plugin_init function and it returns a
table of callbacks. Then the core code invokes those callbacks at the
appropriate times.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-10-22 Thread Tom Lane
Robert Haas  writes:
> Another thing to think about in that regard: how likely is that
> PostgreSQL 7.4 and PostgreSQL 15 both compile and run on the same
> operating system? I suspect the answer is "not very." I seem to recall
> Greg Stark trying to compile really old versions of PostgreSQL for a
> conference talk some years ago, and he got back to a point where it
> just became impossible to make work on modern toolchains even with a
> decent amount of hackery.

Right.  The toolchains keep moving, even if the official language
definition doesn't.  For grins, I just checked out REL8_4_STABLE
on my M1 Mac, and found that it only gets this far:

checking test program... ok
checking whether long int is 64 bits... no
checking whether long long int is 64 bits... no
configure: error: Cannot find a working 64-bit integer type.

which turns out to be down to a configure-script issue we fixed
some years ago, ie using exit() without a prototype:

conftest.c:158:3: error: implicitly declaring library function 'exit' with type\
 'void (int) __attribute__((noreturn))' [-Werror,-Wimplicit-function-declaratio\
n]
  exit(! does_int64_work());
  ^

I notice that the configure script is also selecting some warning
switches that this compiler doesn't much like, plus it doesn't
believe 2.6.x flex is usable.  So that's *at least* three things
that'd have to be hacked even to get to a successful configure run.

Individually such issues are (usually) not very painful, but when
you have to recreate all of them at once it's a daunting project.

So if I had to rebuild 8.4 from scratch right now, I would not be
a happy camper.  That seems like a good argument for not deeming
it to be something we still have to support.

regards, tom lane




Re: XTS cipher mode for cluster file encryption

2021-10-22 Thread Bruce Momjian
On Mon, Oct 18, 2021 at 11:56:03AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.von...@enterprisedb.com) wrote:
> > On 10/15/21 21:22, Stephen Frost wrote:
> > >Now, to address the concern around re-encrypting a block with the same
> > >key+IV but different data and leaking what parts of the page changed, I
> > >do think we should use the LSN and have it change regularly (including
> > >unlogged tables) but that's just because it's relatively easy for us to
> > >do and means an attacker wouldn't be able to tell what part of the page
> > >changed when the LSN was also changed.  That was also recommended by
> > >NIST and that's a pretty strong endorsement also.
> > 
> > Not sure - it seems a bit weird to force LSN change even in cases that don't
> > generate any WAL. I was not following the encryption thread and maybe it was
> > discussed/rejected there, but I've always imagined we'd have a global nonce
> > generator (similar to a sequence) and we'd store it at the end of each
> > block, or something like that.
> 
> The 'LSN' being referred to here isn't the regular LSN that is
> associated with the WAL but rather the separate FakeLSN counter which we
> already have.  I wasn't suggesting having the regular LSN change in
> cases that don't generate WAL.

Yes, my original patch created dummy WAL records for dummy LSNs but that
is no longer needed with XTS.

> > I'm not very convinced that using the LSN for any of this is a good
> > idea. Something that changes most of the time but not all the time
> > seems more like it could hurt by masking fuzzy thinking more than it
> > helps anything.
> 
> This argument doesn't come across as very strong at all to me,
> particularly when we have explicit recommendations from NIST that having
> the IV vary more is beneficial.  While this would be using the LSN, the
> fact that the LSN changes most of the time but not all of the time isn't
> new and is something we already have to deal with.  I'd think we'd
> address the concern about mis-thinking around how this works by
> providing a README and/or an appropriate set of comments around what's
> being done and why.

Agreed.  I think we would need to document when we reencrypt a page
with the same LSN, and of course write-based attacks.

> > Do we think knowing which 16-byte blocks on an 8k page change would leak
> > useful information?  If so, we should use the LSN and just accept that
> > some cases might leak as described above.  If we don't care, then we can
> > skip the use of the LSN and simplify the patch.
> 
> While there may not be an active attack against PG that leverages such a
> leak, I have a hard time seeing why we would intentionally design this
> in when we have a great option that's directly available to us and
> doesn't cause such a leak with nearly such regularity as not using the
> LSN would, and also follows recommendations of using XTS from NIST.

Agreed.

> > I consider this a checkbox feature and making it too complex will cause
> > it to be rightly rejected.
> 
> Presuming that 'checkbox feature' here means "we need it to please
> $someone but no one will ever use it" or something along those lines,
> this is very clearly not the case and therefore we shouldn't be
> describing it or treating it as such.  Even if the meaning here is
> "there's other ways people could get this capability" the reality is
> that those other methods are simply not always available and in those
> cases, people will choose to not use PostgreSQL.  Nearly every other
> database system which we might compare ourselves to has a solution in
> this area and people actively use those solutions in a lot of
> deployments.

I think people will use this feature, but I called it a 'checkbox
feature' because they usually are not looking for a complex or flexible
feature, but rather something that is simple to setup and effective.

> > And if PostgreSQL is using XTS, there is no different with dm-encrypt.
> > The user can use dm-encrypt directly.
> 
> dm-encrypt is not always an option and it doesn't actually address the
> threat-model that Tomas brought up here anyway, as it would be below the
> level that the low-privileged OS user would be looking at.  That's not
> the only threat model to consider, but it is one which could potentially
> be addressed by either XTS or AES-GCM-SIV.  There are threat models
> which dm-crypt would address, of course, such as data-at-rest (hard
> drive theft, improper disposal of storage media, backups which don't
> have their own encryption, etc), but, again, dm-crypt isn't always an
> option that is available and so I don't agree that we should throw this
> out just because dm-crypt exists and may be useable in some cases.

I actually think a Postgres integrity-check feature would need to create
an abstraction layer on top of all writes to PGDATA and tablespaces so
the filesystem would look unencrypted to Postgres.

-- 
  Bruce Momjian  https://momjian.us
  EDB

Re: pg_dump versus ancient server versions

2021-10-22 Thread Alvaro Herrera
On 2021-Oct-22, Robert Haas wrote:

> In fact, I get that same failure on every branch older than 9.2. I
> expect I could work around that by disabling SSL or finding an older
> version of OpenSSL that works the way those branches expect, but that
> might not be the only problem, either.

I just tried to build 9.1.  My config line there doesn't have ssl, but I
do get this in the compile stage:

gram.c:69:25: error: conflicting types for ‘base_yylex’
   69 | #define yylex   base_yylex
  | ^~
scan.c:15241:12: note: in expansion of macro ‘yylex’
15241 | extern int yylex \
  |^
In file included from /pgsql/source/REL9_1_STABLE/src/backend/parser/gram.y:60:
/pgsql/source/REL9_1_STABLE/src/include/parser/gramparse.h:66:12: note: 
previous declaration of ‘base_yylex’ was here
   66 | extern int base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
  |^~
gram.c:69:25: error: conflicting types for ‘base_yylex’
   69 | #define yylex   base_yylex
  | ^~
scan.c:15244:21: note: in expansion of macro ‘yylex’
15244 | #define YY_DECL int yylex \
  | ^
scan.c:15265:1: note: in expansion of macro ‘YY_DECL’
15265 | YY_DECL
  | ^~~
In file included from /pgsql/source/REL9_1_STABLE/src/backend/parser/gram.y:60:
/pgsql/source/REL9_1_STABLE/src/include/parser/gramparse.h:66:12: note: 
previous declaration of ‘base_yylex’ was here
   66 | extern int base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
  |^~
make[3]: *** [../../../src/Makefile.global:655: gram.o] Error 1

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)




Re: XTS cipher mode for cluster file encryption

2021-10-22 Thread Bruce Momjian
On Tue, Oct 19, 2021 at 02:44:26PM -0400, Stephen Frost wrote:
> There are ways around it.  There likely always will be.  We need to be
> clear about what it provides and what it doesn't.  We need to stop
> telling ourselves that the only answer is a 100% solution and therefore
> it's impossible to do.  Users who care about these capabilities will
> understand that it's not 100% and they will still happily use it because
> it's better than 0% which is where we are today and is why they are
> going with other solutions.  Yes, if it's trivial to get around then
> perhaps it's not much better than 0% and if that's the case then it
> doesn't make sense to do it, but none of what has been discussed here
> thus far has made me feel like either the XTS or the GCM-SIV approaches
> would be trivial to to circumvent for the threat models they're intended
> to address, though it certainly takes more care and more thought when
> we're trying to address someone who has write access to part of the
> system and that we need to be clear what is addressed and what isn't in
> all of these cases.

Stephen, your emails on this thread have been very helpful and on-topic.
I think the distinction above is that it is useful to fully protect
against some attack types, even if we don't protect against all attack
types.  For example, if we protect 100% against read attacks, it doesn't
mean that gets reduced to 50% because we don't protect against write
attacks --- we are still 100% read-protected and 0% write protected.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: XTS cipher mode for cluster file encryption

2021-10-22 Thread Bruce Momjian
On Mon, Oct 18, 2021 at 12:37:39PM -0400, Robert Haas wrote:
> > Thus, to me, it doesn't seem worth going down the XTS route, just to
> > temporarily save a bit of implementation effort. We'll have to endure that
> > pain anyway.
> 
> I agree up to a point, but I do also kind of feel like we should be
> leaving it up to whoever is working on an implementation to decide
> what they want to implement. I don't particularly like this discussion

Uh, our TODO has this list:

Desirability -> Design -> Implement -> Test -> Review -> Commit

I think we have to agree on Desirability and Design before anyone starts
work since it is more likely a patch will be rejected without this.

> where it feels like people are trying to tell other people what they
> have to do because "the community has decided X." It's pretty clear
> that there are multiple opinions here, and I don't really see any of
> them to be without merit, nor do I see why Bruce or Stephen or you or
> anyone else should get to say "what the community has decided" in the
> absence of a clear consensus.

I don't see anyone saying we have agreed on anything, but I do hear
people say they are willing to work on some things, and not others.

> I do really like the idea of using AES-GCM-SIV not because I know
> anything about it, but because the integrity checking seems cool, and
> storing the nonce seems like it would improve security. However, based
> on what I know now, I would not vote to reject an XTS-based patch and,
> as Stephen and Bruce have said, maybe with the right design it permits
> upgrades from non-encrypted clusters to encrypted clusters. I'm
> actually kind of doubtful about that, because that seems to require
> some pretty specific and somewhat painful implementation decisions.
> For example, I think if your solution for rotating keys is to shut
> down the standby, re-encrypt it with a new key, start it up again, and
> fail over to it, then you probably ever can't do key rotation in any
> other way. The keys now have to be non-WAL-logged so that the standby
> can be different, which means you can't add a new key on the master
> and run around re-encrypting everything with it, WAL-logging those
> changes as you go. Now I realize that implementing that is really
> challenging anyway so maybe some people wouldn't like to go that way,
> but then maybe other people would. Another thing you probably can't do
> in this model is encrypt different parts of the database with
> different keys, because how would you keep track of that? Certainly
> not in the system catalogs, if none of that can show up in the WAL
> stream.

The design is to have a heap/index key and a WAL key.  You create a
binary replica that uses a different heap/index key but the same WAL
key, switch-over to it, and then change the WAL key.

`> But, you know, still: if somebody showed up with a fully-working XTS
> patch with everything in good working order, I don't see that we have
> enough evidence to reject it just because it's XTS. And I would hope
> that the people favoring XTS would not vote to reject a fully working
> GCM patch just because it's GCM. I think what we ought to be doing at

I don't think that would happen, but I do think patch size, code
maintenance, and upgradability would be reasonable considerations.

> this point is combining our efforts to try to get some things
> committed which make future work in this area committed - like that
> patch to preserve relfilenode and database OID, or maybe some patches
> to drive all of our I/O through a smaller number of code paths instead
> of having every different type of temporary file we write reinvent the
> wheel. These discussions about what encryption type we ought to use
> are useful for ruling out options that we know are bad, but beyond
> that I'm not sure they have much value. AES in any mode could seem
> like a much less safe choice by the time we get a committed feature
> here than it does today - even if somehow that were to happen for v15.
> I expect there are people out there trying to break it even as I write
> these words, and it seems likely that they will eventually succeed,
> but as to when, who can say?

Yes, we should start on things we know we need, but we will have to have
these discussions until we have desirability and design most people
agree on.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: XTS cipher mode for cluster file encryption

2021-10-22 Thread Bruce Momjian
On Fri, Oct 22, 2021 at 11:36:37AM -0400, Stephen Frost wrote:
> > I am not re-discuss using CTR for heap table. I mean use some CTR-like
> > algorithm *only* for WAL encryption. My idea is exactly the same when you
> > are typing "we hopefully aren't going to write different WAL records at the
> > same LSN and so using the LSN there should be alright."
> 
> I don't like the idea of "CTR-like".  What's wrong with using CTR for
> WAL encryption?  Based on the available information, that seems like the
> exact use-case for CTR.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-22 Thread Tom Lane
Mikhail  writes:
> +# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
> +# without changing any sysctl settings, unlike System V semaphores.
> +USE_NAMED_POSIX_SEMAPHORES=1

I tried this on an OpenBSD 6.0 image I had handy.  The good news is
that it works, and I can successfully start the postmaster with a lot
of semaphores (I tried with max_connections=1) without any special
system configuration.  The bad news is it's *slow*.  It takes the
postmaster over a minute to start up at 1 max_connections, and
also about 15 seconds to shut down.  The regression tests also appear
noticeably slower, even at the default max_connections=100.  I'm
afraid that those "lots of tiny mappings" that Thomas noted have
a nasty impact on our process launch times, since the kernel
presumably has to do work to clone them into the child process.

Now this lashup that I'm testing on is by no means well suited for
performance tests, so maybe my numbers are bogus.  Also, maybe it's
better in more recent OpenBSD releases.  But I think we need to take a
harder look at performance before we decide that it's okay to change
the default semaphore type for this platform.

regards, tom lane




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-10-22 Thread Noah Misch
On Wed, Oct 20, 2021 at 11:27:11AM -0700, Jeff Davis wrote:
> On Wed, 2021-10-20 at 10:32 -0700, Mark Dilger wrote:
> > I'd like to have a much clearer understanding of Noah's complaint
> > first.  There are multiple things to consider: (1) the role which
> > owns the trigger, (2) the role which is performing an action which
> > would cause the trigger to fire, (3) the set of roles role #1 belongs
> > to, (4) the set of roles role #1 has ADMIN privilege on, (5) the set
> > of roles that role #2 belongs to, and (6) the set of roles that role
> > #2 has ADMIN privilege on.  Maybe more?

> > I'd like to know precisely which combinations of these six things are
> > objectionable, and why.  There may be a way around the objections
> > without needing to create new user options or new privileged roles.
> 
> I can't speak for Noah, but my interpretation is that it would be
> surprising if GRANT/REVOKE or membership in an ordinary role had
> effects other than "permission denied" errors. It might make sense for
> event trigger firing in all the cases we can think of, but it would
> certainly be strange if we started accumulating a collection of
> behaviors that implicitly change when you move in or out of a role.
> 
> That's pretty general, so to answer your question: it seems like a
> problem to use #3-6 in the calculation about whether to fire an event
> trigger.

Exactly.  That's the main point.  Also, it's currently a best practice for
only non-LOGIN roles to have members.  The proposed approach invites folks to
abandon that best practice.  I take the two smells as a sign that we'll regret
adopting the proposal, despite not knowing how it will go seriously wrong.

On Wed, Oct 20, 2021 at 12:09:08PM -0700, Jeff Davis wrote:
> On Thu, 2021-05-27 at 23:06 -0700, Noah Misch wrote:
> > pg_logical_replication would not be safe to delegate that way:
> > 
> https://postgr.es/m/flat/CACqFVBbx6PDq%2B%3DvHM0n78kHzn8tvOM-kGO_2q_q0zNAMT%2BTzdA%40mail.gmail.com
> 
> What do you mean "that way"? Do you mean it's not safe to delegate
> subscription creation to non-superusers at all?

I meant "pg_logical_replication would not be safe to delegate to the tenant of
a database provided as a service."  It's not safe today, but it can be made
safe:

> From the thread above, I don't see anything so dangerous that it can't
> be delegated:
> 
>   * persistent background workers on subscriber
> - still seems reasonable to delegate to a privileged user

Agreed, I don't have a problem with pg_logical_replication implying that
ability.  If you can create this worker, you can bypass ADMIN OPTION by
running the GRANT/REVOKE inside a subscription.  That's probably fine if
documented, or else is_admin_of_role() could prevent it.


>   * arbitrary code execution by the apply worker on subscriber
> - apply worker runs as subscription owner, so doesn't seem
>   like a problem?

Sounds right.  I think Mark Dilger drafted a patch to add ACL checks and a TAP
test confirming that the worker does get permission denied.  That change has
no disadvantage, so this problem is on the way to getting solved.

>   * connection info may be visible to non-superusers
> - seems either solvable or not necessarily a problem

Yes.

The other matter from the thread you linked is "the connection to the
publisher must enforce the equivalent of dblink_security_check()".  I think
Mark Dilger drafted a patch for that, too.




Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"

2021-10-22 Thread Bharath Rupireddy
On Sat, Oct 23, 2021 at 1:46 AM Jeff Davis  wrote:
>
> On Fri, 2021-10-22 at 15:34 +0530, Bharath Rupireddy wrote:
> > If the suggestion is to have the wait and retry logic embedded into
> > the user-written restore_command, IMHO, it's not a good idea as the
> > restore_command is external to the core PG and the FATAL error
> > "recovery ended before configured recovery target was reached" is an
> > internal thing.
>
> What do you want to do after the timeout happens? If you want to issue
> a WARNING instead of failing outright, perhaps that makes sense for
> exploratory PITR cases. That could be a simple boolean GUC without
> needing to introduce the timeout logic into the server.

If you are suggesting to give the user more control on what should
happen to the standby even after the timeout,  then, the 2 new GUCs
recovery_target_retry_timeout (int) and
recovery_target_continue_after_timeout (bool) will really help users
choose what they want. I'm not sure if it is okay to have 2 new GUCs.
Let's hear from other hackers what they think about this.

> I think it's an interesting point that it can be hard to choose a
> reasonable recovery target if the system is completely down. We could
> use some better tooling or metadata around the lsns, xids or timestamp
> ranges available in a pg_wal directory or an archive. Even better would
> be to see the available named restore points. This would make is easier
> to calculate how long recovery might take for a given restore point, or
> whether it's not going to work at all because there's not enough WAL.

I think pg_waldump can help here to do some exploratory analysis of
the available WAL in the directory where the WAL files are present.
Since it is an independent C program, it can run even when the server
is down and also run on archive location.

Regards,
Bharath Rupireddy.