date:20210908




On 2021/09/08 10:45, Kyotaro Horiguchi wrote:

Anyway there's no guarantee on the archive ordering. As discussed in
the nearby thread [1], newer segment is often archived earlier. I
agree that that happens mainly on busy servers, though. The archiver
is designed to handle such "gaps" and/or out-of-order segment
notifications.  We could impose a strict ordering on archiving but I
think we would take total performance than such strictness.


Yes, there are other cases causing newer WAL file to be archived eariler.
The issue can happen if XLogArchiveNotify() fails to create .ready file,
for example. Fixing only the case that we're discussing here is not enough.
If *general* fix is discussed at the thread you told, it's better to
do nothing here for the issue and to just make the startup process call
XLogArchiveCheckDone() if it finds the WAL file including XLOG_SWITCH record.



At least currently, recovery fetches segments by a single process and
every file is marked immediately after being filled-up, so all files
other than the latest one in pg_wal including history files should
have been marked for sure unless file system gets into a trouble.


You can reproduce that situation easily by starting the server with
archive_mode=off, generating WAL files, sometimes running pg_switch_wal(),
causing the server to crash, and then restarting the server with
archive_mode=on. At the beginning of recovery, all the WAL files in pg_wal
don't have their archive notification files at all. Then, with the patch,
only WAL files including XLOG_SWITCH are notified for WAL archiving
during recovery. The other WAL files will be notified at the subsequent
checkpoint.



I'm not sure I like that XLogWalRcvClose hides the segment-switch
condition.  If we do that check in the function, I'd like to name the
function XLogWalRcvCloseIfSwitched or something indicates the
condition.  I'd like to invert the condition to reduce indentation,
too.


We can move the condition-check out of the function like the attached patch.



Why don't we call it just after writing data, as my first proposal
did? There's no difference in functionality between doing that and the
patch.  If we do so, recvFile>=0 is always true and that condition can
be removed but that would be optional.  Anyway, by doing that, no
longer need to call the function twice or we can even live without the
new function.


I think that it's better and more robust to confirm that the currently-opened
WAL file is valid target one to write WAL *before* actually writing any data
into it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 9a2bc37fd7..8d7f52352d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -125,6 +125,7 @@ static void WalRcvDie(int code, Datum arg);
 static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len);
 static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr);
 static void XLogWalRcvFlush(bool dying);
+static void XLogWalRcvClose(XLogRecPtr recptr);
 static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
@@ -883,42 +884,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
{
int segbytes;
 
-   if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, 
wal_segment_size))
+   /* Close the current segment if it's completed */
+   if (recvFile >= 0 && !XLByteInSeg(recptr, recvSegNo, 
wal_segment_size))
+   XLogWalRcvClose(recptr);
+
+   if (recvFile < 0)
{
-   /*
-* fsync() and close current file before we switch to 
next one. We
-* would otherwise have to reopen this file to fsync it 
later
-*/
-   if (recvFile >= 0)
-   {
-   charxlogfname[MAXFNAMELEN];
-
-   XLogWalRcvFlush(false);
-
-   XLogFileName(xlogfname, recvFileTLI, recvSegNo, 
wal_segment_size);
-
-   /*
-* XLOG segment files will be re-read by 
recovery in startup
-* process soon, so we don't advise the OS to 
release cache
-* pages associated with the file like 
XLogFileClose() does.
-*/
-   if (close(recvFile) != 0)
-   ereport(PANIC,
-   
(errcode_for_file_access(),
-

Re: prevent immature WAL streaming

At Tue, 7 Sep 2021 18:41:57 +, "Bossart, Nathan"  
wrote in 
> On 9/4/21, 10:26 AM, "Alvaro Herrera"  wrote:
> > Attached are the same patches as last night, except I added a test for
> > XLOG_DEBUG where pertinent.  (The elog(PANIC) is not made conditional on
> > that, since it's a cross-check rather than informative.)  Also fixed the
> > silly pg_rewind mistake I made.
> >
> > I'll work on the new xlog record early next week.
> 
> Are these patches in a good state for some preliminary testing?  I'd
> like to try them out, but I'll hold off if they're not quite ready
> yet.

Thanks!  As my understanding the new record add the ability to
cross-check between a teard-off contrecord and the new record inserted
after the teard-off record.  I didn't test the version by myself but
the previous version implemented the essential machinery and that
won't change fundamentally by the new record.

So I think the current patch deserves to see the algorithm actually
works against the problem.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Estimating HugePages Requirements?





On 2021/09/08 12:50, Michael Paquier wrote:

On Tue, Sep 07, 2021 at 05:08:43PM +, Bossart, Nathan wrote:

On 9/6/21, 9:00 PM, "Michael Paquier"  wrote:

+   sprintf(buf, "%lu MB", size_mb);
+   SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, PGC_S_OVERRIDE);
One small-ish comment about 0002: there is no need to add the unit
into the buffer set as GUC_UNIT_MB would take care of that.  The patch
looks fine.


I fixed this in v7.


Switched the variable name to shared_memory_size_mb for easier
grepping, moved it to a more correct location with the other read-only
GUCS, and applied 0002.  Well, 0001 here.


Thanks for adding useful feature!

+   {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,

When reading the applied code, I found the category of shared_memory_size
is RESOURCES_MEM. Why? This seems right because the parameter is related
to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
is more proper as the category? BTW, the category of any other
PGC_INTERNAL parameters seems to be PRESET_OPTIONS.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

RE: [PATCH] New default role allowing to change per-role/database settings

2021-09-08 Thread Shinya11.Kato

>Thanks for letting me know, I've attached a rebased v4 of this patch, no other
>changes.
I tried it, but when I used set command, tab completion did not work properly 
and an error occurred.

---
postgres=> \conninfo
You are connected to database "postgres" as user "aaa" via socket in "/tmp" at 
port "5432".
postgres=> \du
   List of roles
 Role name | Attributes |   
  Member of 
---++---
 aaa   || 
{pg_change_role_settings}
 penguin   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
postgres=> show log
log_autovacuum_min_durationlog_executor_stats 
log_min_error_statementlog_replication_commands   
log_timezone
log_checkpointslog_file_mode  
log_min_messages   log_rotation_age   
log_transaction_sample_rate
log_connectionslog_hostname   
log_parameter_max_length   log_rotation_size  
log_truncate_on_rotation
log_destinationlog_line_prefix
log_parameter_max_length_on_error  log_statement  
logging_collector
log_disconnections log_lock_waits 
log_parser_stats   log_statement_sample_rate  
logical_decoding_work_mem
log_duration   log_min_duration_sample
log_planner_stats  log_statement_stats
log_error_verbositylog_min_duration_statement 
log_recovery_conflict_waitslog_temp_files 
postgres=> show log_duration ;
 log_duration 
--
 off
(1 row)

postgres=> set log
log_parameter_max_length_on_error  logical_decoding_work_mem
postgres=> set log_duration to on;
2021-09-08 16:23:39.216 JST [533860] ERROR:  permission denied to set parameter 
"log_duration"
2021-09-08 16:23:39.216 JST [533860] STATEMENT:  set log_duration to on;
ERROR:  permission denied to set parameter "log_duration"
---

Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Possible missing segments in archiving on standby

At Wed, 8 Sep 2021 16:01:22 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/09/08 10:45, Kyotaro Horiguchi wrote:
> > Anyway there's no guarantee on the archive ordering. As discussed in
> > the nearby thread [1], newer segment is often archived earlier. I
> > agree that that happens mainly on busy servers, though. The archiver
> > is designed to handle such "gaps" and/or out-of-order segment
> > notifications.  We could impose a strict ordering on archiving but I
> > think we would take total performance than such strictness.
> 
> Yes, there are other cases causing newer WAL file to be archived
> eariler.
> The issue can happen if XLogArchiveNotify() fails to create .ready
> file,
> for example. Fixing only the case that we're discussing here is not
> enough.
> If *general* fix is discussed at the thread you told, it's better to
> do nothing here for the issue and to just make the startup process
> call
> XLogArchiveCheckDone() if it finds the WAL file including XLOG_SWITCH
> record.

No. The discussion taken there is not about permanently missing .ready
files, but about .ready files created out-of-order.  So I don't think
the outcome from the thread does *fix* this issue.

> > At least currently, recovery fetches segments by a single process and
> > every file is marked immediately after being filled-up, so all files
> > other than the latest one in pg_wal including history files should
> > have been marked for sure unless file system gets into a trouble.
> 
> You can reproduce that situation easily by starting the server with
> archive_mode=off, generating WAL files, sometimes running
> pg_switch_wal(),
> causing the server to crash, and then restarting the server with
> archive_mode=on. At the beginning of recovery, all the WAL files in
> pg_wal
> don't have their archive notification files at all. Then, with the
> patch,
> only WAL files including XLOG_SWITCH are notified for WAL archiving
> during recovery. The other WAL files will be notified at the
> subsequent
> checkpoint.

I don't think we want such extent of perfectness at all for the case
where some archive-related parameters are changed after a
crash. Anyway we need to take a backup after that and at least all
segments required for the backup will be properly archived.  The
segments up to the XLOG_SWITCH segment are harmless garbage, or a bit
of food for disk.

> > I'm not sure I like that XLogWalRcvClose hides the segment-switch
> > condition.  If we do that check in the function, I'd like to name the
> > function XLogWalRcvCloseIfSwitched or something indicates the
> > condition.  I'd like to invert the condition to reduce indentation,
> > too.
> 
> We can move the condition-check out of the function like the attached
> patch.

Thanks!

> > Why don't we call it just after writing data, as my first proposal
> > did? There's no difference in functionality between doing that and the
> > patch.  If we do so, recvFile>=0 is always true and that condition can
> > be removed but that would be optional.  Anyway, by doing that, no
> > longer need to call the function twice or we can even live without the
> > new function.
> 
> I think that it's better and more robust to confirm that the
> currently-opened
> WAL file is valid target one to write WAL *before* actually writing
> any data
> into it.

Sounds convincing.  Ok, I agree to that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Gather performance analysis

2021-09-08 Thread Dilip Kumar

On Wed, Sep 8, 2021 at 12:03 PM Andres Freund  wrote:

> Hi,
>
> On 2021-09-08 11:45:16 +0530, Dilip Kumar wrote:
> > On Wed, Sep 8, 2021 at 3:08 AM Andres Freund  wrote:
> >
> >
> > > Looking at this profile made me wonder if this was a build without
> > > optimizations. The pg_atomic_read_u64()/pg_atomic_read_u64_impl() calls
> > > should
> > > be inlined. And while perf can reconstruct inlined functions when using
> > > --call-graph=dwarf, they show up like "pg_atomic_read_u64 (inlined)"
> for
> > > me.
> > >
> >
> > Yeah, for profiling generally I build without optimizations so that I can
> > see all the functions in the stack, so yeah profile results are without
> > optimizations build but the performance results are with optimizations
> > build.
>
> I'm afraid that makes the profiles just about meaningless :(.
>

Maybe it can be misleading sometimes, but I feel sometimes it is more
informative compared to the optimized build where it makes some function
inline, and then it becomes really hard to distinguish which function
really has the problem.  But your point is taken and I will run with an
optimized build.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: Improve logging when using Huge Pages

2021-09-08 Thread Shinoda, Noriyoshi (PN Japan FSIP)

Hello,

Thank you everyone for comments.
I have attached a patch that simply changed the message like the advice from 
Horiguchi-san.

> Even with the patch, there are still some cases where huge pages is disabled 
> silently. We should report something even in these cases?
> For example, in the platform where huge pages is not supported, it's silently 
> disabled when huge_pages=try.

The area where this patch is written is inside the "#ifdef MAP_HUGETLB #endif" 
block.
For this reason, I think it is excluded from binaries created in an environment 
that does not have the MAP_HUGETLB macro.

> One big concern about the patch is that log message is always reported when 
> shared memory fails to be allocated with huge pages enabled when 
> huge_pages=try. Since 
> huge_pages=try is the default setting, many users would see this new log 
> message whenever they start the server. Those who don't need huge pages but 
> just use the default 
> setting might think that such log messages would be noisy.

This patch is meant to let the admin know that HugePages isn't being used, so 
I'm sure you're right. I have no idea what to do so far.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Kyotaro Horiguchi [mailto:horikyota@gmail.com] 
Sent: Wednesday, September 8, 2021 11:18 AM
To: pry...@telsasoft.com
Cc: masao.fu...@oss.nttdata.com; Shinoda, Noriyoshi (PN Japan FSIP) 
; pgsql-hack...@postgresql.org; rjuju...@gmail.com; 
t...@sss.pgh.pa.us
Subject: Re: Improve logging when using Huge Pages

At Tue, 7 Sep 2021 08:16:53 -0500, Justin Pryzby  wrote 
in 
> On Tue, Sep 07, 2021 at 07:12:36PM +0900, Fujii Masao wrote:
> > One big concern about the patch is that log message is always 
> > reported when shared memory fails to be allocated with huge pages 
> > enabled when huge_pages=try. Since huge_pages=try is the default 
> > setting, many users would see this new log message whenever they 
> > start the server. Those who don't need huge pages but just use the 
> > default setting might think that such log messages would be noisy.
> 
> I don't see this as any issue.  We're only talking about a single 
> message on each restart, which would be added in a major release.  If 
> it's a problem, the message could be a NOTICE or INFO, and it won't be shown 
> by default.
> 
> I think it should say "with/out huge pages" without 
> "enabled/disabled", without "again", and without "The server", like:
> 
> +   (errmsg("could not map anonymous 
> shared memory (%zu bytes)"
> +   " with huge pages.", 
> allocsize),
> +errdetail("Anonymous shared memory 
> will be mapped "
> +   "without huge 
> + pages.")));

I don't dislike the message, but I'm not sure I like the message is too 
verbose, especially about it has DETAILS. It seems to me something like the 
following is sufficient, or I'd like see it even more concise.

"fall back anonymous shared memory to non-huge pages: required %zu bytes for 
huge pages"

If we emit an error message for other than mmap failure, it would be like the 
following.

"fall back anonymous shared memory to non-huge pages: huge pages not available"

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


huge_pages_log_v4.diff
Description: huge_pages_log_v4.diff

Re: Added missing invalidations for all tables publication

On Wed, Sep 8, 2021 at 7:57 AM houzj.f...@fujitsu.com
 wrote:
>
> > From Mon, Sep 6, 2021 1:56 PM Amit Kapila  wrote:
> > > On Tue, Aug 31, 2021 at 8:54 PM vignesh C  wrote:
> > > > Thanks for the comments, the attached v3 patch has the changes for
> > > > the same.
> > > >
> > >
> > > I think this bug should be fixed in back branches (till v10). OTOH, as
> > > this is not reported by any user and we have found it during code
> > > review so it seems either users don't have an exact use case or they
> > > haven't noticed this yet. What do you people think about back-patching?
> >
> > Personally, I think it's ok to back-patch.
>
> I found that the patch cannot be applied to back-branches(v10-v14) cleanly,
> so, I generate the patches for back-branches. Attached, all the patches have
> passed regression test.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: Added schema level support for publication.

On Wed, Sep 8, 2021 at 10:48 AM vignesh C  wrote:
>
> On Tue, Sep 7, 2021 at 5:10 PM Amit Kapila  wrote:
> >
> > On Tue, Sep 7, 2021 at 12:45 PM vignesh C  wrote:
> > >
> > > On Fri, Sep 3, 2021 at 4:49 PM Amit Kapila  
> > > wrote:
> > > >
> > >
> > > > 5.
> > > > If I modify the search path to remove public schema then I get the
> > > > below error message:
> > > > postgres=# Create publication mypub for all tables in schema 
> > > > current_schema;
> > > > ERROR:  no schema has been selected
> > > >
> > > > I think this message is not very clear. How about changing to
> > > > something like "current_schema doesn't contain any valid schema"? This
> > > > message is used in more than one place, so let's keep it the same at
> > > > all the places if you agree to change it.
> > >
> > > I would prefer to use the existing messages as we have used this in a
> > > few other places similarly. Thoughts?
> > >
> >
> > Yeah, I also see the same message in code but I think here usage is a
> > bit different. If you see a similar SQL statement that causes the same
> > error message then can you please give an example?
>
> I was referring to the error message in create table
> postgres=# set search_path='non_existent_schema';
> SET
> postgres=# create table t1(c1 int);
> ERROR:  no schema has been selected to create in
> LINE 1: create table t1(c1 int);
>
> If it is not very useful in the case of creating a publication, then I
> can change it. Thoughts?
>

If you want to be consistent with the existing message then why did
you left the trailing part (" to create in") of the sentence?

-- 
With Regards,
Amit Kapila.

Re: public schema default ACL

2021-09-08 Thread Peter Eisentraut


On 04.09.21 18:18, Noah Misch wrote:

I tried a couple of upgrade scenarios and it appeared to do the right thing.

This patch is actually two separate changes: First, change the owner of the
public schema to "pg_database_owner"; second, change the default privileges
set on the public schema by initdb.  I was a bit surprised that the former
hadn't already be done in PG14.


Interesting.  That change requires a7a7be1, which is also not in v14.

Do you plan to change the CF entry, or should it remain in Needs Review with
no assigned reviewer?


I've set it to ready for committer now.

Re: Proposal: More structured logging

2021-09-08 Thread Peter Eisentraut


On 01.09.21 10:00, Ronan Dunklau wrote:

In-core it would open up the possibility to split log messages into different
fields, for example the different statistics reported in the logs by VACUUM /
ANALYZE VERBOSE and make it easier to consume the output without having to
parse the message. Parsing the message also means that any tool parsing it
needs to either be aware of the locale, or to force the user to use a specific
one.


I think those messages would themselves have substructure.  For example, 
the current message


"I/O timings: read: %.3f ms, write: %.3f ms\n"

might be

{"I/O timings": {"read": ..., "write": ...}}

and that in turn is already part of a larger message.

So just having a single level of tags wouldn't really work for this.

Re: row filtering for logical replication

2021-09-08 Thread Ajin Cherian

On Wed, Sep 1, 2021 at 9:23 PM Euler Taveira  wrote:
>
> On Sun, Aug 29, 2021, at 11:14 PM, Peter Smith wrote:
>
> Here are the new v26* patches. This is a refactoring of the row-filter
> caches to remove all the logic from the get_rel_sync_entry function
> and delay it until if/when needed in the pgoutput_row_filter function.
> This is now implemented per Amit's suggestion to move all the cache
> code [1]. It is a replacement for the v25* patches.
>
> The make check and TAP subscription tests are all OK. I have repeated
> the performance tests [2] and those results are good too.
>
> v26-0001 <--- v23 (base RF patch)
> v26-0002 <--- ExprState cache mods (refactored row filter caching)
> v26-0002 <--- ExprState cache extra debug logging (temp)
>
> Peter, I'm still reviewing this new cache mechanism. I will provide a feedback
> as soon as I integrate it as part of this recent modification.
>
> I'm attaching a new version that simply including Houzj review [1]. This is
> based on v23.
>
> There has been a discussion about which row should be used by row filter. We
> don't have a unanimous choice, so I think it is prudent to provide a way for
> the user to change it. I suggested in a previous email [2] that a publication
> option should be added. Hence, row filter can be applied to old tuple, new
> tuple, or both. This approach is simpler than using OLD/NEW references (less
> code and avoid validation such as NEW reference for DELETEs and OLD reference
> for INSERTs). I think about a reasonable default value and it seems _new_ 
> tuple
> is a good one because (i) it is always available and (ii) user doesn't have
> to figure out that replication is broken due to a column that is not part
> of replica identity. I'm attaching a POC that implements it. I'm still
> polishing it. Add tests for multiple row filters and integrate Peter's caching
> mechanism [3] are the next steps.
>

Assuming this _new_tuple option is enabled and
1. An UPDATE, where the new_tuple satisfies the row filter, but the
old_tuple did not  (not checked). Since the row filter check passed
but the actual row never existed on the subscriber, would this patch
convert the UPDATE to an INSERT or would this UPDATE be ignored? Based
on the tests that I did, I see that it is ignored.
2. An UPDATE where the new tuple does not satisfy the row filter but
the old_tuple did. Since the new_tuple did not match the row filter,
wouldn't this row now remain divergent on the replica?

Somehow this approach of either new_tuple or old_tuple doesn't seem to
be very fruitful if the user requires that his replica is up-to-date
based on the filter condition. For that, I think you will need to
convert UPDATES to either INSERTS or DELETES if only new_tuple or
old_tuple matches the filter condition but not both matches the filter
condition.

UPDATE
old-row (match)   new-row (no match)  -> DELETE
old-row (no match)  new row (match)   -> INSERT
old-row (match)   new row (match)   -> UPDATE
old-row (no match)  new-row (no match)  -> (drop change)

regards,
Ajin Cherian
Fujitsu Australia

Re: The Free Space Map: Problems and Opportunities

2021-09-08 Thread Hannu Krosing

On Wed, Sep 8, 2021 at 6:52 AM Peter Geoghegan  wrote:
>
> On Tue, Sep 7, 2021 at 5:25 AM Hannu Krosing  wrote:
> > Are you speaking of just heap pages here or also index pages ?
>
> Mostly heap pages, but FWIW I think it could work for index tuples
> too, with retail index tuple deletion. Because that allows you to even
> remove would-be LP_DEAD item pointers.

But it does need info from the heap page(s) the index entry points to,
which can become quite expensive if that heap page is locked by other
processes or even not cached at all.

Also, parallel processes may have moved the index entry to other pages etc.

> > Or are you expecting these to be kept in good-enoug shape by your
> > earlier index manager work ?
>
> It's very workload dependent. Some things were very much improved by
> bottom-up index deletion in Postgres 14, for example (non-hot updates
> with lots of logically unchanged indexes). Other things weren't helped
> at all, or were barely helped. I think it's important to cover or
> cases.
>
> > A minimal useful patch emerging from that understanding could be
> > something which just adds hysteresis to FSM management. (TBH, I
> > actually kind of expected some hysteresis to be there already, as it
> > is in my mental model of "how things should be done" for managing
> > almost any resource :) )
>
> I think that you need to do the FSM before the aborted-heap-tuple
> cleanup. Otherwise you don't really know when or where to apply the
> special kind of pruning that the patch invents, which targets aborts
> specifically.

Alternative is to not actually prune the page at all at this time (to
avoid re-dirtying it and/or WAL write) but just inspect and add the
*potential* free space in FSM.
And then do the pruning at the time of next INSERT or UPDATE when the
page is ditied and WAL written anyway.

> > Adding hysteresis to FSM management can hopefully be done independent
> > of all the other stuff and also seems to be something that is
> > unobtrusive and non-controversial enough to fit in current release and
> > possibly be even back-ported .
>
> I don't know about that! Seems kind of an invasive patch to me.

Why do you think that changing the calculation when a page is added
back to FSM is "kind of invasive" ?

Not having looked at the code I would assume from the discussion here
that we remove the page from FSM when free space goes above FILLFACTOR
and we add it back when it goes back below FILLFACTOR.

The code change would just change the "add id back" code to use
FILLFACTOR - HYSTERESIS_FACTOR instead.

What other parts does this need to touch ? Or is it just that the
calculation source code is not localised well in code and written
multiple times all over the place ?

> > I did not mean CHECKPOINT as a command, but more the concept of
> > writing back / un-dirtying the page. In this sense it *is* special
> > because it is the last point in time where you are guaranteed to have
> > the page available in buffercache and thus cheap to access for
> > modifications plus you will avoid a second full-page writeback because
> > of cleanup. Also you do not want to postpone the cleanup to actual
> > page eviction, as that is usually in the critical path for some user
> > query or command.
>
> I intend to do some kind of batching, but only at the level of small
> groups of related transactions. Writing a page that was quickly opened
> for new inserts, filled with newly inserted heap tuples, then closed,
> and finally cleaned up doesn't seem like it needs to take any direct
> responsibility for writeback.

Agreed that this code does not need to care about writeback.

I approached it from the other end and proposed the "just before
writeback" as a strategy which is easy to understand conceptually and
also is mostly "already there" codewise, that is there is a location
in write-back code to plug the call to cleanup in .

Cheers
-
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.

RE: [PATCH] Add tab-complete for backslash commands

2021-09-08 Thread tanghy.f...@fujitsu.com

On Wednesday, September 8, 2021 5:05 AM, Tom Lane  wrote:
>Sure, but he'd still get all the commands, just not all the possible
>spellings of each one.  And a person who's not sure what's available
>is unlikely to be helped by an entry for "\c", because it's entirely
>not clear which command that's an abbreviation for.
>
>In any case, my main point is that the primary usage of tab-completion
>is as a typing aid, not documentation.  I do not think we should make
>the behavior less useful for typing in order to be exhaustive on the
>documentation side.

You are right. I think I've got your point.
Here is the updated patch in which I added the multiple-character versions for 
backslash commands 
and remove their corresponding single-character version.
Of course, for backslash commands with only single-character version, no change 
added.

BTW, I've done the existing tap-tests for tab-completion with this patch, all 
tests passed.

Regards,
Tang

v3-0001-Replace-the-single-character-with-multiple-charac.patch
Description:  v3-0001-Replace-the-single-character-with-multiple-charac.patch

Re: Atomic rename feature for Windows.

2021-09-08 Thread Juan José Santamaría Flecha

On Thu, Jul 8, 2021 at 12:32 AM Victor Spirin 
wrote:

>
> >   #if defined(_MSC_VER) && _MSC_VER >= 1900
> > -#define MIN_WINNT 0x0600
> > +#define MIN_WINNT 0x0A00
> >   #else
> >   #define MIN_WINNT 0x0501
> >   #endif
> > This is a large bump for Studio >= 2015 I am afraid.  That does not
> > seem acceptable, as it means losing support for GetLocaleInfoEx()
> > across older versions.
> >
> It seems that the MIN_WINNT value 0x0600 or 0x0A00 does not affect the
> use of the GetLocaleInfoEx () function
>
>  Anything below Windows Server 2012 (_WIN32_WINNT = 0x0602) is no longer
supported. A patch with a bump on MIN_WINNT might be due.

Regards,

Juan José Santamaría Flecha

drop tablespace failed when location contains .. on win32

2021-09-08 Thread wangsh.f...@fujitsu.com

Hi,

I find a problem related to tablespace on win32(server2019).

> postgres=# create tablespace tbs location 
> 'C:\Users\postgres\postgres_install\aa\..\aa';
> CREATE TABLESPACE
> postgres=# create table tbl(col int) tablespace tbs;
> ERROR:  could not stat directory "pg_tblspc/16384/PG_15_202109061/12754": 
> Invalid argument
> postgres=# drop tablespace tbs;
> WARNING:  could not open directory "pg_tblspc/16384/PG_15_202109061": No such 
> file or directory
> ERROR:  could not stat file "pg_tblspc/16384": Invalid argument

I find that canonicalize_path() only remove the trailing '..',  in this case,  
'..' is not removed , and 
pgsymlink succeed.

But, in fact, if I double click the dir (%PGDATA%\pg_tblspac\16387),  the error 
message is shown:
> The filename, directory name, or volume label syntax is incorrect.

Since the pgsymlink() seems right and I'm not sure I can change the action of 
canonicalize_path, 
so I want to add a error check(patch is attached).

Any comment ?

Regards,
Shenhao Wang




problem.diff
Description: problem.diff

Re: ResourceOwner refactoring

2021-09-08 Thread Aleksander Alekseev

Hi Heikki,

> Yeah, needed some manual fixing, but here you go.

Thanks for working on this!

v8-0002 didn't apply to the current master, so I rebased it. See
attached v9-* patches. I also included v9-0004 with some minor tweaks
from me. I have several notes regarding the code.

1. Not sure if I understand this code of ResourceOwnerReleaseAll():
```
/* First handle all the entries in the array. */
do
{
found = false;
for (int i = 0; i < owner->narr; i++)
{
if (owner->arr[i].kind->phase == phase)
{
Datumvalue = owner->arr[i].item;
ResourceOwnerFuncs *kind = owner->arr[i].kind;

if (printLeakWarnings)
kind->PrintLeakWarning(value);
kind->ReleaseResource(value);
found = true;
}
}

/*
 * If any resources were released, check again because some of the
 * elements might have been moved by the callbacks. We don't want to
 * miss them.
 */
} while (found && owner->narr > 0);
```

Shouldn't we mark the resource as released and/or decrease narr and/or
save the last processed i? Why this will not call ReleaseResource()
endlessly on the same resource (array item)? Same question for the
following code that iterates over the hash table.

2. Just an idea/observation. It's possible that the performance of
ResourceOwnerEnlarge() can be slightly improved. Since the size of the
hash table is always a power of 2 and the code always doubles the size
of the hash table, (idx & mask) value will get one extra bit, which
can be 0 or 1. If it's 0, the value is already in its place,
otherwise, it should be moved on the known distance. In other words,
it's possible to copy `oldhash` to `newhash` and then move only half
of the items. I don't claim that this code necessarily should be
faster, or that this should be checked in the scope of this work.

-- 
Best regards,
Aleksander Alekseev


v9-0003-Optimize-hash-function.patch
Description: Binary data


v9-0001-Move-a-few-ResourceOwnerEnlarge-calls-for-safety.patch
Description: Binary data


v9-0002-Make-resowners-more-easily-extensible.patch
Description: Binary data


v9-0004-Minor-tweaks.patch
Description: Binary data

Re: Gather performance analysis

2021-09-08 Thread Dilip Kumar

On Wed, Sep 8, 2021 at 3:28 PM Tomas Vondra 
wrote:

On 9/8/21 9:40 AM, Dilip Kumar wrote:
>
> > Maybe it can be misleading sometimes, but I feel sometimes it is more
> > informative compared to the optimized build where it makes some function
> > inline, and then it becomes really hard to distinguish which function
> > really has the problem.  But your point is taken and I will run with an
> > optimized build.
> >
>
> IMHO Andres is right optimization may make profiles mostly useless in
> most cases - it may skew timings for different parts differently, so
> something that'd be optimized out may take much more time.
>
> It may provide valuable insights, but we definitely should not use such
> binaries for benchmarking and comparisons of the patches.
>

Yeah, I completely agree that those binaries should not be used for
benchmarking and patch comparison and I never used it for that purpose.  I
was also making the same point that with debug binaries sometimes we get
some valuable insight during profiling.



> As mentioned, I did some benchmarks, and I do see some nice improvements
> even with properly optimized builds -O2.
>
> Attached is a simple script that varies a bunch of parameters (number of
> workers, number of rows/columns, ...) and then measures duration of a
> simple query, similar to what you did. I haven't varied the queue size,
> that might be interesting too.
>
> The PDF shows a comparison of master and the two patches. For 10k rows
> there's not much difference, but for 1M and 10M rows there are some nice
> improvements in the 20-30% range. Of course, it's just a single query in
> a simple benchmark.
>

Thanks for the benchmarking.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Small documentation improvement for ALTER SUBSCRIPTION

2021-09-08 Thread Masahiko Sawada

On Tue, Sep 7, 2021 at 9:01 PM Daniel Gustafsson  wrote:
>
> > On 7 Sep 2021, at 13:36, Peter Eisentraut 
> >  wrote:
> >
> > On 12.08.21 04:52, Masahiko Sawada wrote:
> >> On Wed, Aug 11, 2021 at 5:42 PM Daniel Gustafsson  wrote:
> >>>
>  On 11 Aug 2021, at 09:57, Masahiko Sawada  wrote:
> >>>
>  Additionally, refresh options as described in
>  refresh_option of
>  REFRESH PUBLICATION may be specified,
>  except in the case of DROP PUBLICATION.
> >>>
> >>> Since this paragraph is under the literal option “refresh”, which takes a
> >>> value, I still find your original patch to be the clearest.
> >> Yeah, I prefer my original patch over this idea. On the other hand, I
> >> can see the point of review comment on it that Amit pointed out[1].
> >
> > How about this:
> >
> > -  Additionally, refresh options as described
> > -  under REFRESH PUBLICATION may be specified.
> > +  Additionally, the options described under REFRESH
> > +  PUBLICATION may be specified, to control the implicit 
> > refresh
> > +  operation.
>
> LGTM.

+1

Attached the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


alter_subscription_doc_v2.patch
Description: Binary data

Re: Added schema level support for publication.

2021-09-08 Thread vignesh C

On Mon, Sep 6, 2021 at 6:56 AM houzj.f...@fujitsu.com
 wrote:
>
> From Thur, Sep 2, 2021 2:33 PM vignesh C  wrote:
> > On Wed, Sep 1, 2021 at 6:58 AM houzj.f...@fujitsu.com 
> >  wrote:
> > >
> > > Here are some other comments for v23-000x patches.
> > > 3)
> > >
> > > + .description =
> > "PUBLICATION SCHEMA",
> > > + .section =
> > SECTION_POST_DATA,
> > > + .createStmt
> > > + = query->data));
> > >
> > > Is it better to use something like 'PUBLICATION TABLES IN SCHEMA' to
> > > describe the schema level table publication ? Because there could be
> > > some other type publication such as 'ALL SEQUENCES IN SCHEMA' in the
> > > future, it will be better to make it clear that we only publish table in 
> > > schema in
> > this patch.
> >
> > Modified
>
> Thanks for updating the patch.
>
> I think we might also need to mention the publication object 'table' in the
> following types:
>
> 1)
> +   /* OCLASS_PUBLICATION_SCHEMA */
> +   {
> +   "publication schema", OBJECT_PUBLICATION_SCHEMA
> +   },
>
> 2)
> +   PUBLICATIONOBJ_SCHEMA,  /* Schema type */
> +   PUBLICATIONOBJ_UNKNOWN  /* Unknown type */
> +} PublicationObjSpecType;
>
> 3)
> +   DO_PUBLICATION_SCHEMA,
>
> I think it might be to change the typename like XX_REL_IN_SCHEMA,
> and adjust the comments.

Thanks for the comments, this is handled in the v26 patch attached at [1]
[1] - 
https://www.postgresql.org/message-id/CALDaNm3EwAVma8n4YpV1%2BQWiccuVPxpqNfbbrUU3s3XTHcTXew%40mail.gmail.com

Regards,
Vignesh

Re: RFC: Logging plan of the running query

2021-09-08 Thread torikoshia


On 2021-09-07 12:39, torikoshia wrote:

On 2021-08-20 01:12, Fujii Masao wrote:

On 2021/08/11 21:14, torikoshia wrote:
As far as I looked into, pg_log_current_plan() can call 
InstrEndLoop() through ExplainNode().
I added a flag to ExplainState to avoid calling InstrEndLoop() when 
ExplainNode() is called from pg_log_current_plan().


Thanks for updating the patch!
I tried to test the patch again and encountered two issues.


Thanks for finding these issues!



(1)
The following WITH RECURSIVE query failed with the error
"ERROR:  failed to find plan for CTE sg" when I ran
pg_log_current_query_plan() against the backend executing that query.
Is this a bug?

create table graph0( f int, t int, label text );
insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 ->
3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5');

with recursive search_graph(f, t, label, i) as (
select *, 1||pg_sleep(1)::text from graph0 g
union distinct
select g.*,1||pg_sleep(1)::text
from graph0 g, search_graph sg
   where g.f = sg.t
) search breadth first by f, t set seq
select * from search_graph order by seq;


This ERROR occurred without applying the patch and just calling
EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST.

I'm going to make another thread to discuss it.


(2)
When I ran pg_log_current_query_plan() while "make installcheck" test
was running, I got the following assertion failure.

TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type ==
LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512)

0   postgres0x00010ec23557
ExceptionalCondition + 231
1   postgres0x00010e9eff15
LockAcquireExtended + 1461
2   postgres0x00010e9ed14d 
LockRelationOid + 61
3   postgres0x00010e41251b 
relation_open + 91
4   postgres0x00010e509679 table_open 
+ 25

5   postgres0x00010ebf9462
SearchCatCacheMiss + 274
6   postgres0x00010ebf5979
SearchCatCacheInternal + 761
7   postgres0x00010ebf566c 
SearchCatCache + 60
8   postgres0x00010ec1a9e0 
SearchSysCache + 144

9   postgres0x00010ec1ae03
SearchSysCacheExists + 51
10  postgres0x00010e58ce35 
TypeIsVisible + 437

11  postgres0x00010ea98e4c
format_type_extended + 1964
12  postgres0x00010ea9900e
format_type_with_typemod + 30
13  postgres0x00010eb78d76 
get_const_expr + 742
14  postgres0x00010eb79bc8 
get_rule_expr + 232
15  postgres0x00010eb8140f 
get_func_expr + 1247
16  postgres0x00010eb79dcd 
get_rule_expr + 749

17  postgres0x00010eb81688
get_rule_expr_paren + 136
18  postgres0x00010eb7bf38 
get_rule_expr + 9304

19  postgres0x00010eb72ad5
deparse_expression_pretty + 149
20  postgres0x00010eb73463
deparse_expression + 83
21  postgres0x00010e68eaf1 
show_plan_tlist + 353
22  postgres0x00010e68adaf ExplainNode 
+ 4991

23  postgres0x00010e688b4b
ExplainPrintPlan + 283
24  postgres0x00010e68e1aa
ProcessLogCurrentPlanInterrupt + 266
25  postgres0x00010ea133bb
ProcessInterrupts + 3435
26  postgres0x00010e738c97
vacuum_delay_point + 55
27  postgres0x00010e42bb4b
ginInsertCleanup + 1531
28  postgres0x00010e42d418
gin_clean_pending_list + 776
29  postgres0x00010e74955a 
ExecInterpExpr + 2522

30  postgres0x00010e7487e2
ExecInterpExprStillValid + 82
31  postgres0x00010e7ae83b
ExecEvalExprSwitchContext + 59
32  postgres0x00010e7ae7be ExecProject 
+ 78
33  postgres0x00010e7ae4e9 ExecResult 
+ 345

34  postgres0x00010e764e42
ExecProcNodeFirst + 82
35  postgres0x00010e75ccb2 
ExecProcNode + 50
36  postgres0x00010e758301 ExecutePlan 
+ 193

37  postgres0x00010e7581d1
standard_ExecutorRun + 609
38  auto_explain.so 0x00010f1df3a7
explain_ExecutorRun + 247
39  postgres0x00010e757f3b ExecutorRun 
+ 91
40  postgres0x00010ea1cb49 
P

Re: row filtering for logical replication

2021-09-08 Thread Peter Smith

Hi Euler,

As you probably know the "base" Row-Filter 27-0001 got seriously
messed up by a recent commit that had lots of overlaps with your code
[1].

e.g. It broke trying to apply on HEAD as follows:

[postgres@CentOS7-x64 oss_postgres_RowFilter]$ git apply
v27-0001-Row-filter-for-logical-replication.patch
error: patch failed: src/backend/catalog/pg_publication.c:141
error: src/backend/catalog/pg_publication.c: patch does not apply
error: patch failed: src/backend/commands/publicationcmds.c:384
error: src/backend/commands/publicationcmds.c: patch does not apply
error: patch failed: src/backend/parser/gram.y:426
error: src/backend/parser/gram.y: patch does not apply
error: patch failed: src/include/catalog/pg_publication.h:83
error: src/include/catalog/pg_publication.h: patch does not apply
error: patch failed: src/include/nodes/nodes.h:490
error: src/include/nodes/nodes.h: patch does not apply
error: patch failed: src/include/nodes/parsenodes.h:3625
error: src/include/nodes/parsenodes.h: patch does not apply
error: patch failed: src/test/regress/expected/publication.out:158
error: src/test/regress/expected/publication.out: patch does not apply
error: patch failed: src/test/regress/sql/publication.sql:93
error: src/test/regress/sql/publication.sql: patch does not apply

~~

I know you are having discussions in the other (Col-Filtering) thread
about the names PublicationRelationInfo versus PublicationRelInfo etc,
but meanwhile, I am in need of a working "base" Row-Filter patch so
that I can post my incremental work, and so that the cfbot can
continue to run ok.

Since your v27 has been broken for several days already I've taken it
upon myself to re-base it. PSA.

v27-0001 --> v28-0001.

(AFAIK this new v28 applies ok and passes all regression and TAP
subscription tests)

Note: This v28 patch was made only so that I can (soon) post some
other small incremental patches on top of it, and also so the cfbot
will be able to run them ok. If you do not like it then just overwrite
it - I am happy to work with whatever latest "base" patch you provide
so long as it is compatible with the current master code.

--

[1] 
https://github.com/postgres/postgres/commit/0c6828fa987b791744b9c8685aadf1baa21f8977#

Kind Regards,
Peter Smith.
Fujitsu Australia


v28-0001-Row-filter-for-logical-replication.patch
Description: Binary data

RE: Allow escape in application_name

2021-09-08 Thread kuroda.hay...@fujitsu.com

Dear Horiguchi-san,

Thank you for reviewing! I attached the fixed version.

> Is "the previous comment" "the comment above"?

Yeah, fixed.

> + for (i = n -1; i >= 0; i--)
> 
> You might want a space between - and 1.

Fixed.

> +parse_application_name(StringInfo buf, const char *name)
> 
> The name seems a bit too generic as it is a function only for
> postgres_fdw.

Indeed. I renamed to parse_pgfdw_appname().

> + /* must be a '%', so skip to the next char */
> + p++;
> + if (*p == '\0')
> + break;  /* format error -
> ignore it */
> 
> I'm surprised by finding that undefined %-escapes and stray % behave
> differently between archive_command and log_line_prefix. I understand
> this behaves like the latter.

Indeed. pgarch_archiveXlog() treats undefined escapes as nothing special,
but log_line_prefix() stop parsing immediately.
They have no description about it in docs.
I will not treat it in this thread and follow log_line_prefix(),
but I agree it is strange.

> + const char *username =
> MyProcPort->user_name;
> 
> I'm not sure but even if user_name doesn't seem to be NULL, don't we
> want to do the same thing with %u of log_line_prefix for safety?
> Namely, setting [unknown] if user_name is NULL or "". The same can be
> said for %d.

I think they will be never NULL in current implementation,
but your suggestion is better. Checks were added in %a, %u and %d.

> + * process_padding --- helper function for processing the format
> + * string in log_line_prefix
> 
> Since this is no longer a static helper function for a specific
> function, the name and the comment should be more descriptive.
> 
> That being said, in the first place the function seems reducible
> almost to a call to atol. By a quick measurement the function is about
> 38% faster (0.024us/call(the function) vs 0.039us/call(strtol) so I'm
> not sure we want to replace process_log_prefix_padding(), but we don't
> need to reuse the function in parse_application_name since it is
> called only once per connection.

My first impression is that they use the function
because they want to know the end of sequence and do error-handling,
but I agree this can be replaced by strtol().
I changed the function to strtol() and restored process_log_prefix_padding().
How do you think? Does it follow your suggestion?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v11_0002_allow_escapes.patch
Description: v11_0002_allow_escapes.patch

Re: Schema variables - new implementation for Postgres 15

Hi

so 28. 8. 2021 v 11:57 odesílatel Gilles Darold  napsal:

> Hi,
>
> Review resume:
>
>
> This patch implements Schema Variables that are database objects that can
> hold a single or composite value following the data type used at variable
> declaration. Schema variables, like relations, exist within a schema and
> their access is controlled via GRANT and REVOKE commands. The schema
> variable can be created by the CREATE VARIABLE command, altered using ALTER
> VARIABLE and removed using DROP VARIABLE.
>
> The value of a schema variable is local to the current session. Retrieving
> a variable's value returns either a NULL or a default value, unless its
> value is set to something else in the current session with a LET command.
> The content of a variable is not transactional. This is the same as in
> regular variables in PL languages.
>
> Schema variables are retrieved by the SELECT SQL command. Their value is
> set with the LET SQL command. While schema variables share properties with
> tables, their value cannot be updated with an UPDATE command.
>
> The patch apply with the patch command without problem and compilation
> reports no warning or errors. Regression tests pass successfully using make
> check or make installcheck
> It also includes all documentation and regression tests.
>
> Performances are near the set of plpgsql variable settings which is
> impressive:
>
> do $$
> declare var1 int ; i int;
> begin
>   for i in 1..100
>   loop
> var1 := i;
>   end loop;
> end;
> $$;
> DO
> Time: 71,515 ms
>
> CREATE VARIABLE var1 AS integer;
> do $$
> declare i int ;
> begin
>   for i in 1..100
>   loop
> let var1 = i;
>   end loop;
> end;
> $$;
> DO
> Time: 94,658 ms
>
> There is just one thing that puzzles me. We can use :
>
> CREATE VARIABLE var1 AS date NOT NULL;
> postgres=# SELECT var1;
> ERROR:  null value is not allowed for NOT NULL schema variable "var1"
>
> which I understand and is the right behavior. But if we use:
>
> CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
> postgres=# SELECT var1;
> ERROR:  null value is not allowed for NOT NULL schema variable "var1"
> DETAIL:  The schema variable was not initialized yet.
> postgres=# LET var1=current_date;
> ERROR:  schema variable "var1" is declared IMMUTABLE
>
> It should probably be better to not allow NOT NULL when IMMUTABLE is used
> because the variable can not be used at all.  Also probably IMMUTABLE
> without a DEFAULT value should also be restricted as it makes no sens. If
> the user wants the variable to be NULL he must use DEFAULT NULL. This is
> just a though, the above error messages are explicit and the user can
> understand what wrong declaration he have done.
>

I wrote a check that disables this case.  Please, see the attached patch. I
agree, so this case is confusing, and it is better to disable it.

Regards

Pavel


> Except that I think this patch is ready for committers, so if there is no
> other opinion in favor of restricting the use of IMMUTABLE with NOT NULL
> and DEFAULT I will change the status to ready for committers.
>
> --
> Gilles Daroldhttp://www.darold.net/
>
>


schema-variables-20210908.patch.gz
Description: application/gzip

Re: Non-decimal integer literals

2021-09-08 Thread Vik Fearing

On 8/16/21 11:51 AM, Peter Eisentraut wrote:
> Here is a patch to add support for hexadecimal, octal, and binary
> integer literals:
> 
>     0x42E
>     0o112
>     0b100101
> 
> per SQL:202x draft.

Is there any hope of adding the optional underscores?  I see a potential
problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
0x1_a would be a valid number with no alias.

(The standard does not allow identifiers to begin with _ but we do...)
-- 
Vik Fearing

Re: Add jsonlog log_destination for JSON server logs

2021-09-08 Thread Andrew Dunstan



On 9/8/21 2:58 AM, Michael Paquier wrote:
> On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:
>> That makes the elog.c changes in the JSON logging patch minimal as all it's
>> really doing is invoking the new write_jsonlog(...) function.
> Looking at 0001, to do things in order.
>
>> @@ -46,8 +46,8 @@ typedef struct
>>  charnuls[2];/* always \0\0 */
>>  uint16  len;/* size of this chunk (counts 
>> data only) */
>>  int32   pid;/* writer's pid */
>> -charis_last;/* last chunk of message? 't' 
>> or 'f' ('T' or
>> - * 'F' for CSV 
>> case) */
>> +int32   dest;   /* log destination */
>> +charis_last;/* last chunk of message? 't' or 'f'*/
>>  chardata[FLEXIBLE_ARRAY_MEMBER];/* data payload starts 
>> here */
>>  } PipeProtoHeader;
> Making PipeProtoHeader larger is not free, and that could penalize
> workloads with a lot of short messages and many backends as the
> syslogger relies on pipes with sync calls.  Why not switching is_last
> to bits8 flags instead?  That should be enough for the addition of
> JSON.  3 bits are enough at the end: one to know if it is the last
> chunk of message, one for CSV and one for JSON.



Yeah. A very simple change would be to use two different values for json
(say 'y' and 'n'). A slightly more principled scheme might use the top
bit for the end marker and the bottom 3 bits for the dest type (so up to
8 types possible), with the rest available for future use.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Parallelize correlated subqueries that execute within each worker

2021-09-08 Thread James Coleman

On Tue, Sep 7, 2021 at 11:06 AM Zhihong Yu  wrote:
>
>
>
> On Tue, Sep 7, 2021 at 6:17 AM James Coleman  wrote:
>>
>> On Wed, Sep 1, 2021 at 7:06 AM Daniel Gustafsson  wrote:
>> >
>> > > On 7 May 2021, at 18:30, James Coleman  wrote:
>> >
>> > > ..here we are now, and I finally have this patch cleaned up
>> > > enough to share.
>> >
>> > This patch no longer applies to HEAD, can you please submit a rebased 
>> > version?
>>
>> See attached.
>>
>> Thanks,
>> James
>
> Hi,
> For v2-0002-Parallel-query-support-for-basic-correlated-subqu.patch :
>
> +* is when we're going to execute multiple partial parths in parallel
>
> parths -> paths
>
> if (index->amcanparallel &&
> -   rel->consider_parallel && outer_relids == NULL &&
> -   scantype != ST_BITMAPSCAN)
> +   rel->consider_parallel && outer_relids == NULL &&
> +   scantype != ST_BITMAPSCAN)
>
> the change above seems unnecessary since the first line of if condition 
> doesn't change.
> Similar comment for the next hunk.
>
> +* It's not a partial path; it'a a full path that is executed as 
> a subquery.
>
> it'a a -> it's a
>
> +   /* rel->consider_parallel_rechecking_params = false; */
> +   /* rel->partial_pathlist = NIL; */
>
> The commented code can be taken out.

Thanks for taking a look at this.

See updated patch series attached.

James Coleman


v3-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patch
Description: Binary data


v3-0002-Parallel-query-support-for-basic-correlated-subqu.patch
Description: Binary data


v3-0003-Other-places-to-consider-for-completeness.patch
Description: Binary data

Re: drop tablespace failed when location contains .. on win32

2021-09-08 Thread Andrew Dunstan



On 9/8/21 6:16 AM, wangsh.f...@fujitsu.com wrote:
> Hi,
>
> I find a problem related to tablespace on win32(server2019).
>
>> postgres=# create tablespace tbs location 
>> 'C:\Users\postgres\postgres_install\aa\..\aa';
>> CREATE TABLESPACE
>> postgres=# create table tbl(col int) tablespace tbs;
>> ERROR:  could not stat directory "pg_tblspc/16384/PG_15_202109061/12754": 
>> Invalid argument
>> postgres=# drop tablespace tbs;
>> WARNING:  could not open directory "pg_tblspc/16384/PG_15_202109061": No 
>> such file or directory
>> ERROR:  could not stat file "pg_tblspc/16384": Invalid argument
> I find that canonicalize_path() only remove the trailing '..',  in this case, 
>  '..' is not removed , and 
> pgsymlink succeed.


That seems like a bug. It's not very canonical :-)


cheers


andrew

-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: On login trigger: take three

2021-09-08 Thread Daniel Gustafsson

> On 19 Jul 2021, at 15:25, Greg Nancarrow  wrote:

> Attached a rebased patch (minor updates to the test code).

I took a look at this, and while I like the proposed feature I think the patch
has a bit more work required.


+END IF;
+
+-- 2) Initialize some user session data
+   CREATE TEMP TABLE session_storage (x float, y integer);
The example in the documentation use a mix of indentations, neither of which is
the 2-space indentation used elsewhere in the docs.


+   /* Get the command tag. */
+   tag = parsetree ? CreateCommandTag(parsetree) : CMDTAG_CONNECT;
This is hardcoding knowledge that currently only this trigger wont have a
parsetree, and setting the tag accordingly.  This should instead check the
event and set CMDTAG_UNKNOWN if it isn't the expected one.


+   /* database has on-login triggers */
+   booldathaslogontriggers;
This patch uses three different names for the same thing: client connection,
logontrigger and login trigger.  Since this trigger only fires after successful
authentication it’s not accurate to name it client connection, as that implies
it running on connections rather than logins.  The nomenclature used in the
tree is "login" so the patch should be adjusted everywhere to use that.


+/* Hook for plugins to get control at start of session */
+client_connection_hook_type client_connection_hook = EventTriggerOnConnect;
I don't see the reason for adding core functionality by hooks.  Having a hook
might be a useful thing on its own (to be discussed in a new thread, not hidden
here), but core functionality should not depend on it IMO.


+   {"enable_client_connection_trigger", PGC_SU_BACKEND, DEVELOPER_OPTIONS,
+   gettext_noop("Enables the client_connection event trigger."),
+   gettext_noop("In case of errors in the ON client_connection 
EVENT TRIGGER procedure, "
 ..and..
+   /*
+* Try to ignore error for superuser to make it possible to login even 
in case of errors
+* during trigger execution
+*/
+   if (!is_superuser)
+   PG_RE_THROW();
This patch adds two ways for superusers to bypass this event trigger in case of
it being faulty, but for every other event trigger we've documented to restart
in single-user mode and fixing it there.  Why does this need to be different?
This clearly has a bigger chance of being a footgun but I don't see that as a
reason to add a GUC and a bypass that other footguns lack.


+   elog(NOTICE, "client_connection trigger failed with message: %s", 
error->message);
Calling elog() in a PG_CATCH() block isn't allowed is it?


+   /* Runtlist is empty: clear dathaslogontriggers flag
+*/
s/Runtlist/Runlist/ and also commenting style.


The logic for resetting the pg_database flag when firing event trigger in case
the trigger has gone away is messy and a problem waiting to happen IMO.  For
normal triggers we don't bother with that on the grounds of it being racy, and
instead perform relhastrigger removal during VACUUM.  Another approach is using
a counter as propose upthread, since updating that can be made safer.  The
current coding also isn't instructing concurrent backends to perform relcache
rebuild.

--
Daniel Gustafsson   https://vmware.com/

Re: Non-decimal integer literals

Vik Fearing  writes:
> On 8/16/21 11:51 AM, Peter Eisentraut wrote:
>> Here is a patch to add support for hexadecimal, octal, and binary
>> integer literals:
>> 
>>     0x42E
>>     0o112
>>     0b100101
>> 
>> per SQL:202x draft.

> Is there any hope of adding the optional underscores?  I see a potential
> problem there as SELECT 1_a; is currently parsed as SELECT 1 AS _a; when
> it should be parsed as SELECT 1_ AS a; or perhaps even as an error since
> 0x1_a would be a valid number with no alias.

Even without that point, this patch *is* going to break valid queries,
because every one of those cases is a valid number-followed-by-identifier
today, e.g.

regression=# select 0x42e;
 x42e 
--
0
(1 row)

AFAIR we've seen exactly zero field demand for this feature,
so I kind of wonder why we're in such a hurry to adopt something
that hasn't even made it past draft-standard status.

regards, tom lane

Re: Possible missing segments in archiving on standby




On 2021/09/08 16:40, Kyotaro Horiguchi wrote:

No. The discussion taken there is not about permanently missing .ready
files, but about .ready files created out-of-order.  So I don't think
the outcome from the thread does *fix* this issue.


Hmm...


I don't think we want such extent of perfectness at all for the case
where some archive-related parameters are changed after a
crash. Anyway we need to take a backup after that and at least all
segments required for the backup will be properly archived.  The
segments up to the XLOG_SWITCH segment are harmless garbage, or a bit
of food for disk.


So probably we reached the consensus that something like the attached patch
(XLogArchiveCheckDone_walfile_xlog_switch.patch) is enough for the corner
case where walreceiver fails to create .ready file of WAL file including
XLOG_SWITCH?


Sounds convincing.  Ok, I agree to that.


So barring any objection, I will commit the patch
and back-patch it to all supported version.

walreceiver_notify_archive_soon_v5.patch
walreceiver_notify_archive_soon_v5_pg14-13.patch
walreceiver_notify_archive_soon_v5_pg12-11.patch
walreceiver_notify_archive_soon_v5_pg10.patch
walreceiver_notify_archive_soon_v5_pg96.patch

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e51a7a749d..6046e24f0f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7392,6 +7392,27 @@ StartupXLOG(void)
/* Handle interrupt signals of startup process 
*/
HandleStartupProcInterrupts();
 
+   /*
+* In standby mode, create an archive 
notification file of a
+* WAL segment if it includes an XLOG_SWITCH 
record and its
+* notification file has not been created yet. 
This is
+* necessary to handle the corner case that 
walreceiver may
+* fail to create such notification file if it 
exits after it
+* receives XLOG_SWITCH record but while it's 
receiving the
+* remaining bytes in the segment. Without this 
handling, WAL
+* archiving of the segment will be delayed 
until subsequent
+* checkpoint creates its notification file 
when removing it
+* even though it can be archived soon.
+*/
+   if (StandbyMode && record->xl_rmid == 
RM_XLOG_ID &&
+   (record->xl_info & ~XLR_INFO_MASK) == 
XLOG_SWITCH)
+   {
+   char
xlogfilename[MAXFNAMELEN];
+
+   XLogFileName(xlogfilename, curFileTLI, 
readSegNo, wal_segment_size);
+   XLogArchiveCheckDone(xlogfilename);
+   }
+
/*
 * Pause WAL replay, if requested by a 
hot-standby session via
 * SetRecoveryPause().
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 75ec985953..2818bf5e25 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -118,6 +118,7 @@ static void WalRcvDie(int code, Datum arg);
 static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len);
 static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr);
 static void XLogWalRcvFlush(bool dying);
+static void XLogWalRcvClose(XLogRecPtr recptr);
 static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
@@ -920,46 +921,16 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
{
int segbytes;
 
-   if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo))
-   {
-   booluse_existent;
-
-   /*
-* fsync() and close current file before we switch to 
next one. We
-* would otherwise have to reopen this file to fsync it 
later
-*/
-   if (recvFile >= 0)
-   {
-   charxlogfname[MAXFNAMELEN];
+   /* Close the current segment if it's completed */
+   if (recvFile >= 0 && !XLByteInSeg(recptr, recvSegNo))
+   XLogWalRcvClose(recptr);
 
-

pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-09-08 Thread Bharath Rupireddy

Hi,

While working on one of the internal features, we found that it is a
bit difficult to run pg_waldump for a normal user to know WAL info and
stats of a running postgres database instance in the cloud. Many a
times users or DBAs or developers would want to get and analyze
following:
1) raw WAL record associated with an LSN or raw WAL records between a
start LSN and end LSN for feeding to some other functionality
2) WAL statistics associated with an LSN or between start LSN and end
LSN for debugging or analytical purposes. The WAL stats are the number
of inserts, updates, deletes, index inserts, commits, checkpoints,
aborts, wal record sizes, FPI (Full Page Image) count etc. which are
basically everything that we get with pg_waldump --stats option plus
some other information as we may feel will be useful.

An available option is to use pg_waldump, a standalone program
emitting human readable WAL info into a standard output/file. This
works well when users have access to the system on which postgres is
running. But for a postgres database instance running in the cloud
environments, starting the pg_waldump, fetching and presenting its
output to the users in a structured way may be a bit hard to do.

How about we create a new extension, called pg_walinspect (synonymous
to pageinspect extension) with a bunch of SQL-callable functions that
get the raw WAL records or stats out of a running postgres database
instance in a more structured way that is easily consumable by all the
users or DBAs or developers? We can also provide these functionalities
into the core postgres (in xlogfuncs.c) instead of a new extension,
but we would like it to be pluggable so that the functions will be
used only if required.

[1] shows a rough sketch of the functions that the new pg_walinspect
extension can provide. These are not exhaustive; we can
add/remove/modify as we move further.

We would like to invite more thoughts from the hackers.

Credits: Thanks to Satya Narlapuram, Chen Liang (for some initial
work), Tianyu Zhang and Ashutosh Sharma (copied in cc) for internal
discussions.

[1]
a)  bytea pg_get_wal_record(pg_lsn lsn); and bytea
pg_get_wal_record(pg_lsn lsn, text wal_dir); - Returns a single row of
raw WAL record of bytea type. WAL data is read from pg_wal or
specified wal_dir directory.

b)  bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn); and
bytea[] pg_get_wal_record(pg_lsn start_lsn, pg_lsn end_lsn, text
wal_dir); - Returns multiple rows of raw WAL records of bytea type,
one row per each WAL record. WAL data is read from pg_wal or specified
wal_dir directory.

CREATE TYPE walinspect_stats_type AS (stat1, stat2, stat3 …. statN);
c)  walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn); and
walinspect_stats_type  pg_get_wal_stats(pg_lsn lsn, text wal_dir); -
Returns a single row of WAL record’s stats of walinspect_stats_type
type. WAL data is read from pg_wal or specified wal_dir directory.

d)  walinspect_stats_type[] pg_get_wal_stats(pg_lsn start_lsn, pg_lsn
end_lsn); and walinspect_stats_type[] pg_get_wal_stats(pg_lsn
start_lsn, pg_lsn end_lsn, text wal_dir); - Returns multiple rows of
WAL record stats of walinspect_stats_type type, one row per each WAL
record. WAL data is read from pg_wal or specified wal_dir directory.

e)  walinspect_stats_type  pg_get_wal_stats(bytea wal_record); -
Returns a single row of provided WAL record (wal_record) stats.

f)  walinspect_stats_type pg_get_wal_stats_aggr(pg_lsn start_lsn,
pg_lsn end_lsn); and walinspect_stats_type
pg_get_wal_stats_aggr(pg_lsn start_lsn, pg_lsn end_lsn, text wal_dir);
- Returns a single row of aggregated stats of all the WAL records
between start_lsn and end_lsn. WAL data is read from pg_wal or
specified wal_dir directory.

CREATE TYPE walinspect_lsn_range_type AS (pg_lsn start_lsn, pg_lsn end_lsn);
g)  walinspect_lsn_range_type   walinspect_get_lsn_range(text
wal_dir); - Returns a single row of start LSN and end LSN of the WAL
records available under pg_wal or specified wal_dir directory.

Regards,
Bharath Rupireddy.

Re: On login trigger: take three

Hi


> +   {"enable_client_connection_trigger", PGC_SU_BACKEND,
> DEVELOPER_OPTIONS,
> +   gettext_noop("Enables the client_connection event
> trigger."),
> +   gettext_noop("In case of errors in the ON
> client_connection EVENT TRIGGER procedure, "
>  ..and..
> +   /*
> +* Try to ignore error for superuser to make it possible to login
> even in case of errors
> +* during trigger execution
> +*/
> +   if (!is_superuser)
> +   PG_RE_THROW();
> This patch adds two ways for superusers to bypass this event trigger in
> case of
> it being faulty, but for every other event trigger we've documented to
> restart
> in single-user mode and fixing it there.  Why does this need to be
> different?
> This clearly has a bigger chance of being a footgun but I don't see that
> as a
> reason to add a GUC and a bypass that other footguns lack.
>
>
>
In the time when event triggers were introduced, managed services were not
too widely used like now. When we discussed this feature we thought about
environments when users have no superuser rights and have no possibility to
go to single mode.

Personally, I prefer to introduce some bypassing for event triggers instead
of removing bypass from login triggers.

Regards

Pavel

Re: use AV worker items infrastructure for GIN pending list's cleanup

2021-09-08 Thread Jaime Casanova

On Mon, May 17, 2021 at 01:46:37PM +0900, Masahiko Sawada wrote:
> On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova
>  wrote:
> >
> > Hi,
> >
> > When AV worker items where introduced 4 years ago, i was suggested that
> > it could be used for other things like cleaning the pending list of GIN
> > index when it reaches gin_pending_list_limit instead of making user
> > visible operation pay the price.
> >
> > That never happened though. So, here is a little patch for that.
> 
> Thank you for working on this.
> 
> I like the idea of cleaning the GIN pending list using by autovacuum
> work item. But with the patch, we request and skip the pending list
> cleanup if the pending list size exceeds gin_pending_list_limit during
> insertion. But autovacuum work items are executed after an autovacuum
> runs. So if many insertions happen before executing the autovacuum
> work item, we will end up greatly exceeding the threshold
> (gin_pending_list_limit) and registering the same work item again and
> again. Maybe we need something like a soft limit and a hard limit?
> That is, if the pending list size exceeds the soft limit, we request
> the work item. OTOH, if it exceeds the hard limit
> (gin_pending_list_limit) we cleanup the pending list before insertion.
> We might also need to have autovacuum work items ignore the work item
> if the same work item with the same arguments is already registered.
> In addition to that, I think we should avoid the work item for
> cleaning the pending list from being executed if an autovacuum runs on
> the gin index before executing the work item.
> 

Thanks for your comments on this. I have been working on a rebased
version, but ENOTIME right now. 

Will mark this one as "Returned with feedback" and resubmit for
november.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: Postgres perl module namespace

2021-09-08 Thread Mark Dilger

> On Sep 7, 2021, at 9:00 PM, Noah Misch  wrote:
> 
> I wondered about using PGXS:: as the namespace for all these modules

That immediately suggests perl modules wrapping C code, which is misleading for 
these.  See `man perlxstut`

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Migração Postgresql 8.3 para versão Postgresql 9.3

2021-09-08 Thread oswaldo . bregnoles

Bom dia, 

Estou com um problema para realizar a migração utilizando o comando
pg_upgrade. 

Ao executar ele está me mostrando um erro de incompatibilidade com o
formato data/hora. 

Pelo comando pg_controldata verifico que na base 8.3 o Tipo de data/hora
é : números de pontos fluantes 

Já na base 9.3 o Tipo de data/hora é : interior de 64 bits. 

Estou realizando esses comandos: 

1 - pg_upgrade.exe -u postgres -d "F:\Arquivos de Programas
(x86)\PostgreSQL\8.3\data" -D "F:\PostgreSQL\9.3\data" -b "F:\Arquivos
de Programas (x86)\PostgreSQL\8.3\bin" -B "F:\PostgreSQL\9.3\bin" 

Erro: mapped win32 error code 2 to 2
Old and new pg_controldata date/time storage types do not
match. 

 You will need to rebuild the new server with configure
option
--disable-integer-datetimes or get server binaries built
with those options. 

2 - pg_upgrade.exe -u postgres --disable-integer-datetimes -d
"F:\Arquivos de Programas (x86)\PostgreSQL\8.3\data" -D
"F:\PostgreSQL\9.3\data" -b "F:\Arquivos de Programas
(x86)\PostgreSQL\8.3\bin" -B "F:\PostgreSQL\9.3\bin" 

Erro:  pg_upgrade.exe: illegal option -- disable-integer-datetimes 

 Try "pg_upgrade --help" for more information. 

Por gentileza, poderia me ajudar como posso resolver esse problema para
realizar a implantação da nova versão? 

agradeço, e aguardo seu retorno.

-- 
Att 

Oswaldo
Planin Sistemas

Re: Bug in query rewriter - hasModifyingCTE not getting set

Greg Nancarrow  writes:
> On Wed, Sep 8, 2021 at 8:00 AM Tom Lane  wrote:
>> Now, we could potentially make this work if we wrote code to run
>> through the copied rtable entries (recursively) and increment the
>> appropriate ctelevelsup fields by one.  That would essentially
>> have to be a variant of IncrementVarSublevelsUp that *only* acts
>> on ctelevelsup and not other level-dependent fields.  That's
>> what I meant when I spoke of moving mountains: the amount of code
>> that would need to go into this seems out of all proportion to
>> the value.  I think we should just throw an error, instead.
>> At least till such time as we see actual field complaints.

> [I don't think Tsunakawa-san will be responding to this any time soon]

Oh!  I'd not realized that he'd dropped out of the community, but
checking my mail folder, I don't see any messages from him in months
... and his email address is bouncing, too.  Too bad.

> I proposed a patch for this issue in a separate thread:
> https://www.postgresql.org/message-id/CAJcOf-f68DT=26YAMz_i0+Au3TcLO5oiHY5=fl6sfuits6r...@mail.gmail.com

Right, that one looks like an appropriate amount of effort
(at least till someone gets way more excited about the case
than I am).  I will mark this CF item Returned With Feedback
and go see about that one.

regards, tom lane

Re: Avoid stuck of pbgench due to skipped transactions




On 2021/09/07 18:24, Fabien COELHO wrote:


Hello Fujii-san,


Stop counting skipped transactions under -T as soon as the timer is exceeded. 
Because otherwise it can take a very long time to count all of them especially 
when quite a lot of them happen with unrealistically high rate setting in -R, 
which would prevent pgbench from ending immediately. Because of this behavior, 
note that there is no guarantee that all skipped transactions are counted under 
-T though there is under -t. This is OK in practice because it's very unlikely 
to happen with realistic setting.


Ok, I find this text quite clear.


Thanks for the check! So attached is the updated version of the patch.



One question is; which version do we want to back-patch to?


If we consider it a "very minor bug fix" which is triggered by somehow unrealistic 
options, so I'd say 14 & dev, or possibly only dev.


Agreed. Since it's hard to imagine the issue happens in practice,
we don't need to bother back-patch to the stable branches.
So I'm thinking to commit the patch to 15dev and 14.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4c9952a85a..433abd954b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3233,31 +3233,36 @@ advanceConnectionState(TState *thread, CState *st, 
StatsData *agg)
/*
 * If --latency-limit is used, and this slot is 
already late
 * so that the transaction will miss the 
latency limit even if
-* it completed immediately, skip this time 
slot and schedule
-* to continue running on the next slot that 
isn't late yet.
-* But don't iterate beyond the -t limit, if 
one is given.
+* it completed immediately, skip this time 
slot and loop to
+* reschedule.
 */
if (latency_limit)
{
pg_time_now_lazy(&now);
 
-   while (thread->throttle_trigger < now - 
latency_limit &&
-  (nxacts <= 0 || st->cnt < 
nxacts))
+   if (thread->throttle_trigger < now - 
latency_limit)
{
processXactStats(thread, st, 
&now, true, agg);
-   /* next rendez-vous */
-   thread->throttle_trigger +=
-   
getPoissonRand(&thread->ts_throttle_rs, throttle_delay);
-   st->txn_scheduled = 
thread->throttle_trigger;
-   }
 
-   /*
-* stop client if -t was exceeded in 
the previous skip
-* loop
-*/
-   if (nxacts > 0 && st->cnt >= nxacts)
-   {
-   st->state = CSTATE_FINISHED;
+   /*
+* Finish client if -T or -t 
was exceeded.
+*
+* Stop counting skipped 
transactions under -T as soon
+* as the timer is exceeded. 
Because otherwise it can
+* take a very long time to 
count all of them
+* especially when quite a lot 
of them happen with
+* unrealistically high rate 
setting in -R, which
+* would prevent pgbench from 
ending immediately.
+* Because of this behavior, 
note that there is no
+* guarantee that all skipped 
transactions are counted
+* under -T though there is 
under -t. This is OK in
+* practice because it's very 
unlikely to happen with
+* realistic setting.
+*/
+   if (timer_exceeded || (nxacts > 
0 && st->cnt >= nxacts))
+

Re: Add option --drop-cascade for pg_dump/restore

2021-09-08 Thread Wu Haotian

Hi,
here's the rebased patch.


0005-pg_dump-restore-add-drop-cascade-option.patch
Description: Binary data

Re: Use "superuser" instead of "super user" in code comments

2021-09-08 Thread Daniel Gustafsson

> On 7 Sep 2021, at 15:48, Bharath Rupireddy 
>  wrote:
> 
> On Tue, Sep 7, 2021 at 6:40 PM Daniel Gustafsson  wrote:
>> 
>>> On 7 Sep 2021, at 14:44, Bharath Rupireddy 
>>>  wrote:
>> 
>>> It seems like we use "superuser" as a standard term across the entire
>>> code base i.e. error messages, docs, code comments. But there are
>>> still a few code comments that use the term "super user". Can we
>>> replace those with "superuser"? Attaching a tiny patch to do that.
>> 
>> Good catch, superuser is the term we should use for this.  There is one
>> additional “super user” in src/test/regress/sql/conversion.sql (and its
>> corresponding resultfile) which can be included in this.  Unless there are
>> objections I’ll apply this with the testfile fixup.
> 
> Thanks for picking this up. Here's v2 including the change in
> conversion.sql and .out.

Done, thanks!

--
Daniel Gustafsson   https://vmware.com/

Re: PoC Refactor AM analyse API

2021-09-08 Thread Jaime Casanova

On Fri, Feb 19, 2021 at 12:06:12PM +1000, Denis Smirnov wrote:
> Thanks for your review, Heikki.
> 
> I have made the changes you have requested.
> 
> 1. All modifications interconnected with column projection were reverted 
> (they should be added in https://commitfest.postgresql.org/31/2922 if the 
> current patch would be merged one day).
> 2. I have returned PROGRESS_ANALYZE_* states.
> 3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was 
> added, that the acquire_sample_rows() AM function must return the tuples in a 
> physical table order.
> 4. table_beginscan_analyze() was removed as a redundant function.
> 5. acquire_sample_rows() comment about reservoir was changed.
> 

Hi Denis,

This doesn't apply anymore because of c6fc50c, can you resubmit a new
patch?

Please note that the patch must be submitted with a .patch extension
instead of .txt, that way the CI at http://commitfest.cputube.org/ will
be able to execute automatic tests on it.

Regards,

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: LogwrtResult contended spinlock

2021-09-08 Thread Jaime Casanova

On Tue, Feb 02, 2021 at 08:19:19PM -0300, Alvaro Herrera wrote:
> Hello,
> 
> So I addressed about half of your comments in this version merely by
> fixing silly bugs.  The problem I had which I described as
> "synchronization fails" was one of those silly bugs.
> 

Hi Álvaro,

Are we waiting for another version of the patch based on Andres'
comments? Or this version is good enough for testing?

BTW, current patch still applies cleanly.

regards,

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: The Free Space Map: Problems and Opportunities

On Tue, Sep 7, 2021 at 9:37 PM Peter Geoghegan  wrote:
> What I really like about the idea of doing the work in the foreground
> (when it's clearly not going to be too much of a latency hit) is that
> it would act as a natural form of backpressure.

Right, that's fair. But, prune-before-evict can be done in the first
instance by the bgwriter and then by the foreground process doing the
eviction if the bgwriter doesn't keep up. At least, assuming we had a
bgwriter that worked, which I think right now we don't, but fixing
that might be a good idea anyway.

> I didn't mean to suggest that it had to happen in perfect lockstep.
> But I do think that it should be pretty close to perfect. What I
> actually do right now is prune an open page when it *appears* to be
> full inside the loop in RelationGetBufferForTuple().

That seems like a good idea.

> As crude as that
> is, it sets hint bits and prunes away aborted transactions in mostly
> the right way, at least as far as TPC-C is concerned. (This only works
> because it fits into the wider framework of my much larger patch,
> which introduces ownership semantics, open and closed pages, empty
> page freelists, etc.)

I don't know, I'm not really convinced that "much larger patches" that
change a lot of loosely related things all at once are good for the
project. It seems to me that there's a reasonably good chance of
replacing an annoying set of problems that existing PostgreSQL users
have worked around to some degree, knowing or unknowingly, with a
different annoying set of problems that may cause fewer or more
problems in practice. Sometimes there's no way to improve something
short of a giant project that changes a lot of things at the same
time, but a series of incremental changes is a lot less risky.

> It seems to me that this leaves one harder question unanswered: at
> what point does a "medium sized" transaction become so large that it
> just doesn't make sense to do either? What's the crossover point at
> which background processing and foreground processing like this should
> be assumed to be not worth it? That I won't speculate about just yet.
> I suspect that at some point it really does make sense to leave it all
> up to a true table-level batch operation, like a conventional VACUUM.

I doubt it makes sense to define a limit here explicitly. At some
point strategies will naturally start to fail, e.g. prune-before-evict
won't work once the operation becomes large enough that pages have to
be evicted while the transaction is still running.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?

2021-09-08 Thread Christoph Berg

Re: Robert Haas
> But I agree with you that referring to the argument to
> VARDATA_COMPRESSED_GET_EXTSIZE or
> VARDATA_COMPRESSED_GET_COMPRESS_METHOD as an "external compressed
> Datum" doesn't seem quite right. It is compressed, but it is not
> external, at least in the sense that I understand that term.

How about "compressed-in-line Datum" like on the comment 5 lines above?

/* caution: this will not work on an external or compressed-in-line Datum */
/* caution: this will return a possibly unaligned pointer */
#define VARDATA_ANY(PTR) \
 (VARATT_IS_1B(PTR) ? VARDATA_1B(PTR) : VARDATA_4B(PTR))

/* Decompressed size and compression method of an external compressed Datum */
#define VARDATA_COMPRESSED_GET_EXTSIZE(PTR) \
(((varattrib_4b *) (PTR))->va_compressed.va_tcinfo & VARLENA_EXTSIZE_MASK)
#define VARDATA_COMPRESSED_GET_COMPRESS_METHOD(PTR) \
(((varattrib_4b *) (PTR))->va_compressed.va_tcinfo >> VARLENA_EXTSIZE_BITS)

This "external" there cost me about one hour of extra poking around
until I realized this is actually the macro I wanted.

-> /* Decompressed size and compression method of a compressed-in-line Datum */

Christoph

Re: Fix erroneous parallel execution when modifying CTE is present in rewritten query

Greg Nancarrow  writes:
> [ v1-0001-Propagate-CTE-property-flags-when-copying-a-CTE-list.patch ]

Pushed with a couple of adjustments:

* I rewrote the comment, mostly so as to include an explanation of how
the error could be removed, in case anyone ever wants to go to the
trouble.

* The existing test case can be fixed up without fundamentally changing
what it's testing, by replacing INSERT...SELECT with INSERT...VALUES.
(That should likely also be our first suggestion to any complainers.)

Thanks for the patch!

regards, tom lane

Re: Migração Postgresql 8.3 para versão Postgresql 9.3

2021-09-08 Thread Ranier Vilela

Em qua., 8 de set. de 2021 às 11:21, 
escreveu:

> Bom dia,
>
> Estou com um problema para realizar a migração utilizando o comando
> pg_upgrade.
>
> Ao executar ele está me mostrando um erro de incompatibilidade com o
> formato data/hora.
>
> Pelo comando pg_controldata verifico que na base 8.3 o Tipo de data/hora é
> : números de pontos fluantes
>
> Já na base 9.3 o Tipo de data/hora é : interior de 64 bits.
>
> Estou realizando esses comandos:
>
> 1 - pg_upgrade.exe -u postgres -d "F:\Arquivos de Programas
> (x86)\PostgreSQL\8.3\data" -D "F:\PostgreSQL\9.3\data" -b "F:\Arquivos de
> Programas (x86)\PostgreSQL\8.3\bin" -B "F:\PostgreSQL\9.3\bin"
>
> Erro: mapped win32 error code 2 to 2
> Old and new pg_controldata date/time storage types do not
> match.
>
>  You will need to rebuild the new server with configure option
> --disable-integer-datetimes or get server binaries built with
> those options.
>
> 2 - pg_upgrade.exe -u postgres --disable-integer-datetimes -d "F:\Arquivos
> de Programas (x86)\PostgreSQL\8.3\data" -D "F:\PostgreSQL\9.3\data" -b
> "F:\Arquivos de Programas (x86)\PostgreSQL\8.3\bin" -B
> "F:\PostgreSQL\9.3\bin"
>
> Erro:  pg_upgrade.exe: illegal option -- disable-integer-datetimes
>
>  Try "pg_upgrade --help" for more information.
>
> Por gentileza, poderia me ajudar como posso resolver esse problema para
> realizar a implantação da nova versão?
>
> agradeço, e aguardo seu retorno.
>
Olá Oswaldo,
Sinto, mas essa lista não é o local apropriado para essas questões.
Por favor encaminhar essas questões para:
https://www.postgresql.org/list/pgsql-general/

Ranier Vilela

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Fri, Sep 3, 2021 at 5:54 PM Andres Freund  wrote:
> > I think we already have such a code in multiple places where we bypass the
> > shared buffers for copying the relation
> > e.g. index_copy_data(), heapam_relation_copy_data().
>
> That's not at all comparable. We hold an exclusive lock on the relation at
> that point, and we don't have a separate implementation of reading tuples from
> the table or something like that.

I don't think there's a way to do this that is perfectly clean, so the
discussion here is really about finding the least unpleasant
alternative. I *really* like the idea of using pg_class to figure out
what relations to copy. As far as I'm concerned, pg_class is the
canonical list of what's in the database, and to the extent that the
filesystem happens to agree, that's good luck. From that perspective,
using the filesystem to figure out what to copy is by definition a
hack.

Now, having to use dedicated tuple-reading code is also a hack, but to
me that's largely an accident of questionable design decisions
elsewhere. You can't read a buffer with just the minimal amount of
information that you need to read a buffer; you have to have a
relcache entry, so we have things like ReadBufferWithoutRelcache and
CreateFakeRelcacheEntry. It's a little crazy to me that someone saw
that ReadBuffer() needed a thing which some callers might not have and
instead of saying "hmm, maybe we ought to change the arguments so that
anyone with enough information to call this function can do so," they
said "hmm, let's create a fake object that is not really the same as a
real one but good enough to fool the function into doing the right
thing, probably." I think the code layering here is just flat-out
broken and ought to be fixed. A layer whose job it is to read and
write blocks should not know that relations are even a thing. (The
widespread use of global variables in the relcache code, the catcache
code, and many other places in lieu of explicit parameter-passing just
makes everything a lot worse.)

So I think if we commit to the hackiness of the sort that this patch
introduces, there is some hope of things getting better in the future.
I don't think it's a real easy path forward, but maybe it's possible.
If on the other hand we commit to using the filesystem, I don't see
how it ever gets any better. Unlogged tables are a great example of a
feature that depended on the filesystem and it now seems to me to be -
by far - the worst thing about that feature. I have no idea how to get
rid of that dependency or all of the associated problems without
reverting the feature. But in this case, we seem to have another
option, and so I think we should take it.

Your (or other people's mileage) may vary ... this is just my view of it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: The Free Space Map: Problems and Opportunities

2021-09-08 Thread Peter Geoghegan

On Wed, Sep 8, 2021 at 8:20 AM Robert Haas  wrote:
> > I didn't mean to suggest that it had to happen in perfect lockstep.
> > But I do think that it should be pretty close to perfect. What I
> > actually do right now is prune an open page when it *appears* to be
> > full inside the loop in RelationGetBufferForTuple().
>
> That seems like a good idea.

> I don't know, I'm not really convinced that "much larger patches" that
> change a lot of loosely related things all at once are good for the
> project. It seems to me that there's a reasonably good chance of
> replacing an annoying set of problems that existing PostgreSQL users
> have worked around to some degree, knowing or unknowingly, with a
> different annoying set of problems that may cause fewer or more
> problems in practice. Sometimes there's no way to improve something
> short of a giant project that changes a lot of things at the same
> time, but a series of incremental changes is a lot less risky.

But these things are *highly* related.

The RelationGetBufferForTuple() prune mechanism I described (that
targets aborted xact tuples and sets hint bits) is fundamentally built
on top of the idea of ownership of heap pages by backends/transactions
-- that was what I meant before. We *need* to have context. This isn't an
ordinary heap prune -- it doesn't have any of the prechecks to avoid
useless pruning that you see at the top of heap_page_prune_opt(). It's
possible that we won't be able to get a super-exclusive lock in the
specialized prune code path, but that's considered a rare corner case.
There is no question of concurrent inserters senselessly blocking the
prune, which is not at all true with the current approach to free
space management. So there is no way I could extract a minimal "prune
inside RelationGetBufferForTuple()" patch that would actually work.

Systems that follow ARIES closely and have UNDO *must* treat free
space as a qualitative thing, something that is meaningful only with
associated information about a deleting or inserting transaction, and
its status. There is logical UNDO for the free space management
structure, and even getting free space from a page can involve
heavyweight locking. Postgres works differently, but there is no
reason why Postgres should not do a lightweight approximate version of
the same thing - the laws of physics favor carefully grouping
logically related data, and working to keep the physical database
representation as clean a representation of the logical database as
possible, right from the start.

> > It seems to me that this leaves one harder question unanswered: at
> > what point does a "medium sized" transaction become so large that it
> > just doesn't make sense to do either? What's the crossover point at
> > which background processing and foreground processing like this should
> > be assumed to be not worth it? That I won't speculate about just yet.
> > I suspect that at some point it really does make sense to leave it all
> > up to a true table-level batch operation, like a conventional VACUUM.
>
> I doubt it makes sense to define a limit here explicitly. At some
> point strategies will naturally start to fail, e.g. prune-before-evict
> won't work once the operation becomes large enough that pages have to
> be evicted while the transaction is still running.

Perhaps. As you know I'm generally in favor of letting things fail
naturally, and then falling back on an alternative strategy.


--
Peter Geoghegan

Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?

On Wed, Sep 8, 2021 at 11:33 AM Christoph Berg  wrote:
> How about "compressed-in-line Datum" like on the comment 5 lines above?

That seems reasonable to me, but I think Tom Lane is responsible for
the current form of that comment, so it'd be nice to hear what he
thinks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: The Free Space Map: Problems and Opportunities

On Wed, Sep 8, 2021 at 12:27 PM Peter Geoghegan  wrote:
> But these things are *highly* related.
>
> The RelationGetBufferForTuple() prune mechanism I described (that
> targets aborted xact tuples and sets hint bits) is fundamentally built
> on top of the idea of ownership of heap pages by backends/transactions
> -- that was what I meant before. We *need* to have context. This isn't an
> ordinary heap prune -- it doesn't have any of the prechecks to avoid
> useless pruning that you see at the top of heap_page_prune_opt(). It's
> possible that we won't be able to get a super-exclusive lock in the
> specialized prune code path, but that's considered a rare corner case.
> There is no question of concurrent inserters senselessly blocking the
> prune, which is not at all true with the current approach to free
> space management. So there is no way I could extract a minimal "prune
> inside RelationGetBufferForTuple()" patch that would actually work.

I'm not trying to argue for slimming down your patches to a size that
is so small that they no longer work.

However, I *am* arguing that, like bottom-up index deletion and the
emergency vacuum mechanism, this change sounds like something that
could *easily* have unforeseen consequences. And therefore a lot of
caution is needed. And part of that caution is not changing more
things at the same time than is really necessary. And that it's worth
thinking *hard* about how much change is *really* necessary.

It's very easy to convince oneself that everything is connected to
everything else and therefore we have to change a lot of things all at
once, but that's often not true.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Schema variables - new implementation for Postgres 15

2021-09-08 Thread Gilles Darold


Le 08/09/2021 à 13:41, Pavel Stehule a écrit :

Hi

so 28. 8. 2021 v 11:57 odesílatel Gilles Darold > napsal:


Hi,

Review resume:


This patch implements Schema Variables that are database objects
that can hold a single or composite value following the data type
used at variable declaration. Schema variables, like relations,
exist within a schema and their access is controlled via GRANT and
REVOKE commands. The schema variable can be created by the CREATE
VARIABLE command, altered using ALTER VARIABLE and removed using
DROP VARIABLE.

The value of a schema variable is local to the current session.
Retrieving a variable's value returns either a NULL or a default
value, unless its value is set to something else in the current
session with a LET command. The content of a variable is not
transactional. This is the same as in regular variables in PL
languages.

Schema variables are retrieved by the SELECT SQL command. Their
value is set with the LET SQL command. While schema variables
share properties with tables, their value cannot be updated with
an UPDATE command.


The patch apply with the patch command without problem and
compilation reports no warning or errors. Regression tests pass
successfully using make check or make installcheck
It also includes all documentation and regression tests.

Performances are near the set of plpgsql variable settings which
is impressive:

do $$
declare var1 int ; i int;
begin
  for i in 1..100
  loop
    var1 := i;
  end loop;
end;
$$;
DO
Time: 71,515 ms

CREATE VARIABLE var1 AS integer;
do $$
declare i int ;
begin
  for i in 1..100
  loop
    let var1 = i;
  end loop;
end;
$$;
DO
Time: 94,658 ms

There is just one thing that puzzles me.We can use :

    CREATE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable
"var1"

which I understand and is the right behavior. But if we use:

    CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable
"var1"
    DETAIL:  The schema variable was not initialized yet.
    postgres=# LET var1=current_date;
    ERROR:  schema variable "var1" is declared IMMUTABLE

It should probably be better to not allow NOT NULL when IMMUTABLE
is used because the variable can not be used at all.  Also
probably IMMUTABLE without a DEFAULT value should also be
restricted as it makes no sens. If the user wants the variable to
be NULL he must use DEFAULT NULL. This is just a though, the above
error messages are explicit and the user can understand what wrong
declaration he have done.


I wrote a check that disables this case.  Please, see the attached 
patch. I agree, so this case is confusing, and it is better to disable it.




Great, I also think that this is better to not confuse the user.

    postgres=# CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
    ERROR:  IMMUTABLE NOT NULL variable requires default expression

Working as expected. I have moved the patch to "Ready for committers". 
Thanks for this feature.



--
Gilles Darold
http://www.darold.net/

Re: [PATCH] Add tab-complete for backslash commands

"tanghy.f...@fujitsu.com"  writes:
> Here is the updated patch in which I added the multiple-character versions 
> for backslash commands 
> and remove their corresponding single-character version.
> Of course, for backslash commands with only single-character version, no 
> change added.

Pushed.  I tweaked your list to the extent of adding back "\ir",
because since it's two letters not one, the argument that it's
entirely useless for tab completion doesn't quite apply.  But
if we wanted to make a hard-and-fast policy of offering only
the long form when there are two forms, maybe we should remove
that one too.

regards, tom lane

Re: .ready and .done files considered harmful

2021-09-08 Thread Dipesh Pandit

> > I guess we still have to pick one or the other, but I don't really
> > know how to do that, since both methods seem to be relatively fine,
> > and the scenarios where one is better than the other all feel a little
> > bit contrived. I guess if no clear consensus emerges in the next week
> > or so, I'll just pick one and commit it. Not quite sure yet how I'll
> > do the picking, but we seem to all agree that something is better than
> > nothing, so hopefully nobody will be too sad if I make an arbitrary
> > decision. And if some clear agreement emerges before then, even
> > better.
>
> I will be happy to see this fixed either way.

+1

> > I agree, but it should probably be something like DEBUG3 instead of
> > LOG.
>
> I will update it in the next patch.

Updated log level to DEBUG3 and rebased the patch. PFA patch.

Thanks,
Dipesh
From e0ce2b85f9122963d820d005ca093e6c667ca7e6 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Wed, 8 Sep 2021 21:47:16 +0530
Subject: [PATCH] keep trying the next file approach

---
 src/backend/access/transam/xlog.c|  20 
 src/backend/access/transam/xlogarchive.c |  23 
 src/backend/postmaster/pgarch.c  | 195 ++-
 src/include/access/xlogdefs.h|   2 +
 src/include/postmaster/pgarch.h  |   4 +
 5 files changed, 217 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a7..7dd4b96 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9285,6 +9285,16 @@ CreateCheckPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per checkpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
@@ -9650,6 +9660,16 @@ CreateRestartPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per restartpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 26b023e..0969f42 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -489,6 +489,29 @@ XLogArchiveNotify(const char *xlog)
 		return;
 	}
 
+	/* Force a directory scan if we are archiving anything but a regular
+	 * WAL file or if this WAL file is being created out-of-order.
+	 */
+	if (!IsXLogFileName(xlog))
+		PgArchForceDirScan();
+	else
+	{
+		TimeLineID tli;
+		XLogSegNo this_segno;
+		XLogSegNo arch_segno;
+
+		XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+		arch_segno = PgArchGetCurrentSegno();
+
+		/*
+		 * We must use <= because the archiver may have just completed a
+		 * directory scan and found a later segment (but hasn't updated
+		 * shared memory yet).
+		 */
+		if (this_segno <= arch_segno)
+			PgArchForceDirScan();
+	}
+
 	/* Notify archiver that it's got something to do */
 	if (IsUnderPostmaster)
 		PgArchWakeup();
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c..84ba1e0 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -47,6 +47,7 @@
 #include "storage/proc.h"
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
+#include "storage/spin.h"
 #include "utils/guc.h"
 #include "utils/ps_status.h"
 
@@ -76,6 +77,20 @@
 typedef struct PgArchData
 {
 	int			pgprocno;		/* pgprocno of archiver process */
+
+	/*
+	 * Forces a directory scan in pgarch_readyXlog().  Protected by
+	 * arch_lck.
+	 */
+	bool		force_dir_scan;
+
+	/*
+	 * Current archiver state.  Protected by arch_lck.
+	 */
+	TimeLineID	lastTli;
+	XLogSegNo	lastSegNo;
+
+	slock_t		arch_lck;
 } PgArchData;
 
 
@@ -103,6 +118,7 @@ static bool pgarch_readyXlog(char *xlog);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
+static bool higher_arch_priority(const char *a, const char *b);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -129,6 +145,9 @@ PgArchShmemInit(void)
 		/* First time through,

Re: The Free Space Map: Problems and Opportunities

2021-09-08 Thread Peter Geoghegan

On Wed, Sep 8, 2021 at 9:35 AM Robert Haas  wrote:
> I'm not trying to argue for slimming down your patches to a size that
> is so small that they no longer work.

I was just explaining how the "eager physical rollback by pruning" thing works.

> However, I *am* arguing that, like bottom-up index deletion and the
> emergency vacuum mechanism, this change sounds like something that
> could *easily* have unforeseen consequences. And therefore a lot of
> caution is needed.

I agree that a lot of caution is needed. One can justify aggressive
intervention when the likely alternative is a severe and irreversible
adverse event, like a version-driven page split. On the other hand we
really should be very conservative when that isn't what we see.
Context is very important.

> And part of that caution is not changing more
> things at the same time than is really necessary. And that it's worth
> thinking *hard* about how much change is *really* necessary.

For the record bottom-up index deletion had very little code in the
grand scheme of things. Without documentation changes and comments it
is probably less than 1000 lines of C. And the emergency mechanism
(considered separately from the refactoring work) had very little
code. So I don't know what you mean about changing lots of things all
together at once. Huge big-bang style patches really aren't my style
at all. Don't expect that to change.

> It's very easy to convince oneself that everything is connected to
> everything else and therefore we have to change a lot of things all at
> once, but that's often not true.

There is compelling empirical evidence for my "chain of events"
explanation. You can make things much better for BenchmarkSQL on the
master branch today, either by artificially disabling the FSM
altogether (which hurts a little but helps a lot), *or* by configuring
the benchmark to not abort transactions in the first place (which
effectively simulates eager physical rollback of aborted xacts).

I don't want to change very much. Despite everything I've said, I
think it's true that the system does the right thing in most
situations -- even with BenchmarkSQL/TPC-C! I just find it useful to
focus on the exceptions, the cases where the system clearly does the
wrong thing.

-- 
Peter Geoghegan

Re: Estimating HugePages Requirements?

2021-09-08 Thread Bossart, Nathan

On 9/7/21, 8:50 PM, "Michael Paquier"  wrote:
> Switched the variable name to shared_memory_size_mb for easier
> grepping, moved it to a more correct location with the other read-only
> GUCS, and applied 0002.  Well, 0001 here.

Thanks!  And thanks for cleaning up the small mistake in aa37a43.

> +This can be used on a running server for most parameters.  However,
> +the server must be shut down for some runtime-computed parameters
> +(e.g., ).
> Perhaps we should add a couple of extra parameters here, like
> shared_memory_size, and perhaps wal_segment_size?  The list does not
> have to be complete, just meaningful enough.

Good idea.

Nathan

Re: VARDATA_COMPRESSED_GET_COMPRESS_METHOD comment?

Robert Haas  writes:
> On Wed, Sep 8, 2021 at 11:33 AM Christoph Berg  wrote:
>> How about "compressed-in-line Datum" like on the comment 5 lines above?

> That seems reasonable to me, but I think Tom Lane is responsible for
> the current form of that comment, so it'd be nice to hear what he
> thinks.

Hmm ... looks like I copied-and-pasted that comment to the wrong place
while rearranging stuff in aeb1631ed.  The comment just below is
off-point too.  Will fix.

regards, tom lane

Re: refactoring basebackup.c

2021-09-08 Thread Jeevan Ladhe

>
> 0007 adds server-side compression; currently, it only supports
> server-side compression using gzip, but I hope that it won't be hard
> to generalize that to support LZ4 as well, and Andres told me he
> thinks we should aim to support zstd since that library has built-in
> parallel compression which is very appealing in this context.
>

Thanks, Robert for laying the foundation here.
So, I gave a try to LZ4 streaming API for server-side compression.
LZ4 APIs are documented here[1].

With the attached WIP patch, I am now able to take the backup using the lz4
compression. The attached patch is basically applicable on top of Robert's
V3
patch-set[2].

I could take the backup using the command:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Further, when restored the backup `/tmp/data_lz4` and started the server, I
could see the tables I created, along with the data inserted on the original
server.

When I tried to look into the binary difference between the original data
directory and the backup `data_lz4` directory here is how it looked:

$ diff -qr data/ /tmp/data_lz4
Only in /tmp/data_lz4: backup_label
Only in /tmp/data_lz4: backup_manifest
Only in data/base: pgsql_tmp
Only in /tmp/data_lz4: base.tar
Only in /tmp/data_lz4: base.tar.lz4
Files data/global/pg_control and /tmp/data_lz4/global/pg_control differ
Files data/logfile and /tmp/data_lz4/logfile differ
Only in data/pg_stat: db_0.stat
Only in data/pg_stat: global.stat
Only in data/pg_subtrans: 
Only in data/pg_wal: 00010099.0028.backup
Only in data/pg_wal: 0001009A
Only in data/pg_wal: 0001009B
Only in data/pg_wal: 0001009C
Only in data/pg_wal: 0001009D
Only in data/pg_wal: 0001009E
Only in data/pg_wal/archive_status:
00010099.0028.backup.done
Only in data/: postmaster.opts

For now, what concerns me here is, the following `LZ4F_compressUpdate()`
API,
is the one which is doing the core work of streaming compression:

size_t LZ4F_compressUpdate(LZ4F_cctx* cctx,
   void* dstBuffer, size_t dstCapacity,
 const void* srcBuffer, size_t srcSize,
 const LZ4F_compressOptions_t* cOptPtr);

where, `dstCapacity`, is basically provided by the earlier call to
`LZ4F_compressBound()` which provides minimum `dstCapacity` required to
guarantee success of `LZ4F_compressUpdate()`, given a `srcSize` and
`preferences`, for a worst-case scenario. `LZ4F_compressBound()` is:

size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t*
prefsPtr);

Now, hard learning here is that the `dstCapacity` returned by the
`LZ4F_compressBound()` even for a single byte i.e. 1 as `srcSize` is about
~256K (seems it is has something to do with the blockSize in lz4 frame that
we
chose, the minimum we can have is 64K), though the actual length of
compressed
data by the `LZ4F_compressUpdate()` is very less. Whereas, the destination
buffer length for us i.e. `mysink->base.bbs_next->bbs_buffer_length` is only
32K. In the function call `LZ4F_compressUpdate()`, if I directly try to
provide
this `mysink->base.bbs_next->bbs_buffer + bytes_written` as `dstBuffer` and
the value returned by the `LZ4F_compressBound()` as the `dstCapacity` that
sounds very much incorrect to me, since the actual out buffer length
remaining
is much less than what is calculated for the worst case by
`LZ4F_compressBound()`.

For now, I am creating a temporary buffer of the required size, passing it
for compression, assert that the actual compressed bytes are less than the
whatever length we have available and then copy it to our output buffer.

To give an example, I put some logging statements, and I can see in the log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"

Will really appreciate any inputs, comments, suggestions here.

Regards,
Jeevan Ladhe

[1] https://fossies.org/linux/lz4/doc/lz4frame_manual.html
[2]
https://www.postgresql.org/message-id/CA+TgmoYgVN=-yoh71r3p9n7ekysd7_9b9s+1qfffcs3w7z-...@mail.gmail.com


lz4_compress_wip.patch
Description: Binary data

Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-08 Thread Jacob Champion

On Sat, 2021-09-04 at 11:32 -0400, Tom Lane wrote:
> Jacob Champion  writes:
> > t/010_tab_completion.pl .. 17/? 
> > #   Failed test 'tab-completion after single quoted text input with 
> > equal sign'
> > #   at t/010_tab_completion.pl line 198.
> > # Actual output was "CREATE SUBSCRIPTION my_sub CONNECTION 
> > 'host=localhost port=5432 dbname=postgres' \a"
> > # Did not match "(?^:PUBLICATION)"
> > # Looks like you failed 1 test of 23.
> 
> Independently of the concerns I raised, I'm wondering how come you
> are getting different results.  Which readline or libedit version
> are you using, on what platform?

Now you have me worried...

- Ubuntu 20.04
- libedit, version 3.1-20191231-1

--Jacob

Re: On login trigger: take three

2021-09-08 Thread Daniel Gustafsson

> On 8 Sep 2021, at 16:02, Pavel Stehule  wrote:

> In the time when event triggers were introduced, managed services were not 
> too widely used like now. When we discussed this feature we thought about 
> environments when users have no superuser rights and have no possibility to 
> go to single mode.

In situations where you don't have superuser access and cannot restart in
single user mode, none of the bypasses in this patch would help anyways.

I understand the motivation, but continuing on even in the face of an
ereport(ERROR..  ) in the hopes of being able to turn off buggy code seems
pretty unsafe at best.

--
Daniel Gustafsson   https://vmware.com/

Re: On login trigger: take three

st 8. 9. 2021 v 20:23 odesílatel Daniel Gustafsson  napsal:

> > On 8 Sep 2021, at 16:02, Pavel Stehule  wrote:
>
> > In the time when event triggers were introduced, managed services were
> not too widely used like now. When we discussed this feature we thought
> about environments when users have no superuser rights and have no
> possibility to go to single mode.
>
> In situations where you don't have superuser access and cannot restart in
> single user mode, none of the bypasses in this patch would help anyways.
>

If I remember well, it should be possible - you can set GUC in connection
string, and this GUC is limited to the database owner.


> I understand the motivation, but continuing on even in the face of an
> ereport(ERROR..  ) in the hopes of being able to turn off buggy code seems
> pretty unsafe at best.
>

I don't understand what you mean. I can disable the logon trigger by GUC. I
cannot to enable an continue after an exception.

Regards

Pavel


>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: PROXY protocol support

2021-09-08 Thread Jacob Champion

On Tue, 2021-09-07 at 12:24 +0200, Magnus Hagander wrote:
> On Wed, Jul 14, 2021 at 8:24 PM Jacob Champion  wrote:
> > On Mon, 2021-07-12 at 18:28 +0200, Magnus Hagander wrote:
> > > Yeah, I have no problem being stricter than necessary, unless that
> > > actually causes any interop problems. It's a lot worse to not be
> > > strict enough..
> > 
> > Agreed. Haven't heard back from the HAProxy mailing list yet, so
> > staying strict seems reasonable in the meantime. That could always be
> > rolled back later.
> 
> Any further feedback from them now, two months later? :)

Not yet :( I've bumped the thread; in the meantime I still think the
stricter operation is fine, since in the worst case you just make it
less strict in the future.

> (Sorry, I was out on vacation for the end of the last CF, so didn't
> get around to this one, but it seemed there'd be plenty of time in
> this CF)

No worries!

> > > The question at that point extends to, would we also add extra
> > > functions to get the data on the proxy connection itself? Maybe add a
> > > inet_proxy_addr()/inet_proxy_port()? Better names?
> > 
> > What's the intended use case? I have trouble viewing those as anything
> > but information disclosure vectors, but I'm jaded. :)
> 
> "Covering all the bases"?
> 
> I'm not entirely sure what the point is of the *existing* functions
> for that though, so I'm definitely not wedded to including it.

I guess I'm in the same boat. I'm probably not the right person to
weigh in.

> > Looking good in local testing. I'm going to reread the spec with fresh
> > eyes and do a full review pass, but this is shaping up nicely IMO.
> 
> Thanks!

I still owe you that overall review. Hoping to get to it this week.

> > Something that I haven't thought about very hard yet is proxy
> > authentication, but I think the simple IP authentication will be enough
> > for a first version. For the Unix socket case, it looks like anyone
> > currently relying on peer auth will need to switch to a
> > unix_socket_group/_permissions model. For now, that sounds like a
> > reasonable v1 restriction, though I think not being able to set the
> > proxy socket's permissions separately from the "normal" one might lead
> > to some complications in more advanced setups.
> 
> Agreed in principle, but I think those are some quite uncommon
> usecases, so definitely something we don't need to cover in a first
> feature.

Hm. I guess I'm overly optimistic that "properly securing your
database" is not such an uncommon case, but... :)

--Jacob

Why does bootstrap and later initdb stages happen via client?

2021-09-08 Thread Andres Freund

Hi,

While hacking on AIO I wanted to build the windows portion from linux. That
works surprisingly well with cross-building using --host=x86_64-w64-mingw32 .

What didn't work as well was running things under wine. It turns out that the
server itself works ok, but that initdb hangs because of a bug in wine ([1]),
leading to the bootstrap process hanging while trying to read more input.

Which made me wonder: What is really the point of doing so much setup as part
of initdb? Of course a wine bug isn't a reason to change anything, but I see
other reasons it might be worth thinking about moving more of initdb's logic
into the backend.

There of course is historical raisins for things happening in initdb - the
setup logic didn't use to be C. But now that it is C, it seems a bit absurd to
read bootstrap data in initdb, write the data to a pipe, and then read it
again in the backend. It for sure doesn't make things faster.

If more of initdb happened in the backend, it seems plausible that we could
avoid the restart of the server between bootstrap and the later setup phases -
which likely would result in a decent speedup. And trialing different
max_connection and shared_buffer settings would be a lot faster without
retries.

Besides potential speedups I also think there's architectural reasons to
prefer doing some of initdb's work in the backend - it would allow to avoid
some duplicated infrastructure and avoid leaking subsystem details to one more
place outside the subsystem.

The reason I CCed Peter is that he at some point proposed ([2]) having the
backend initialize itself via a base backup. I think if we generally moved
more of the data directory initialization into the backend that'd probably
architecturally work a bit better.

I'm not planning to work on this in the near future. But I would like to do so
at some point. And it might be worth considering pushing future additions to
initidb to be moved server-side via functions that initdb calls, rather than
having initdb control everything.

Greetings,

Andres Freund

[1] https://bugs.winehq.org/show_bug.cgi?id=51719
[2]
https://www.postgresql.org/message-id/61b8d18d-c922-ac99-b990-a31ba63cdcbb%402ndquadrant.com

Re: Schema variables - new implementation for Postgres 15

Hi

Great, I also think that this is better to not confuse the user.
>
> postgres=# CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
> ERROR:  IMMUTABLE NOT NULL variable requires default expression
>
> Working as expected. I have moved the patch to "Ready for committers".
> Thanks for this feature.
>

Thank you very much

Pavel


> --
> Gilles Daroldhttp://www.darold.net/
>
>

Re: refactoring basebackup.c

On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
 wrote:
> To give an example, I put some logging statements, and I can see in the log:
> "
> bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
> input size to be compressed: 512
> estimated size for compressed buffer by LZ4F_compressBound(): 262667
> actual compressed size: 16
> "

That is pretty lame. I don't know why it needs a ~256k buffer to
produce 16 bytes of output.

The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer. So if the LZ4 buffer is told that its buffer should
be at least, I don't know, say 64kB. Then it can compute how large an
output buffer the LZ4 library requires for 64kB. Hopefully we can
assume that liblz4 never needs a smaller buffer for a larger input.
Then we can assume that if a 64kB input requires, say, a 300kB output
buffer, every possible input < 64kB also requires an output buffer <=
300 kB.

But we can't just say, well, we were asked to create a 64kB buffer (or
whatever) so let's ask the next bbsink for a 300kB buffer (or
whatever), because then as soon as we write any data at all into it
the remaining buffer space might be insufficient for the next chunk.
So instead what I think we should do is have bbsink_lz4 set the size
of the next sink's buffer to its own buffer size +
LZ4F_compressBound(its own buffer size). So in this example if it's
asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
then it asks the next sink to set the buffer size to 364kB. Now, that
means that there will always be at least 300 kB available in the
output buffer until we've accumulated a minimum of 64 kB of compressed
data, and then at that point we can flush.

I think this would be relatively clean and would avoid the need for
the double copying that the current design forced you to do. What do
you think?

+ /*
+ * If we do not have enough space left in the output buffer for this
+ * chunk to be written, first archive the already written contents.
+ */
+ if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
mysink->bytes_written ||
+ mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }

I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.

+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;

I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?

Thanks for researching this!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why does bootstrap and later initdb stages happen via client?

2021-09-08 Thread Andrew Dunstan



On 9/8/21 3:07 PM, Andres Freund wrote:
> Hi,
>
> While hacking on AIO I wanted to build the windows portion from linux. That
> works surprisingly well with cross-building using --host=x86_64-w64-mingw32 .
>
> What didn't work as well was running things under wine. It turns out that the
> server itself works ok, but that initdb hangs because of a bug in wine ([1]),
> leading to the bootstrap process hanging while trying to read more input.
>
>
> Which made me wonder: What is really the point of doing so much setup as part
> of initdb? Of course a wine bug isn't a reason to change anything, but I see
> other reasons it might be worth thinking about moving more of initdb's logic
> into the backend.
>
> There of course is historical raisins for things happening in initdb - the
> setup logic didn't use to be C. But now that it is C, it seems a bit absurd to
> read bootstrap data in initdb, write the data to a pipe, and then read it
> again in the backend. It for sure doesn't make things faster.


I guess the downside would be that we'd need to teach the backend how to
do more stuff that only needs to be done once per cluster, and then that
code would be dead space for the rest of the lifetime of the cluster.


Maybe the difference is sufficiently small that it doesn't matter.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: unnesting multirange data types

2021-09-08 Thread Magnus Hagander

On Sun, Jul 18, 2021 at 8:20 PM Alexander Korotkov  wrote:
>
> On Thu, Jul 15, 2021 at 10:27 PM Jonathan S. Katz  
> wrote:
> > On 7/15/21 12:26 PM, Alexander Korotkov wrote:
> > > On Thu, Jul 15, 2021 at 6:47 PM Tom Lane  wrote:
> > >> Yeah, that seems pretty horrid.  I still don't like the way the
> > >> array casts were done, but I'd be okay with pushing the unnest
> > >> addition.
> > >
> > > +1 for just unnest().
> >
> > ...which was my original ask at the beginning of the thread :) So +1.
>
> Thanks for the feedback.  I've pushed the unnest() patch to master and
> pg14.  I've initially forgotten to change catversion.h for master, so
> I made it with an additional commit.
>
> I've double-checked that "make check-world" passes on my machine for
> both master and pg14.  And I'm keeping my fingers crossed looking at
> buildfarm.

This patch was closed with "moved to next commitfest" in the July
commitfest, and is currently sitting as "Needs review" in the
September one.

If it's committed, it should probably have been closed with that? And
if there are things still needed, they should perhaps have their own
CF entry instead since we clearly do have unnest() for multiranges?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: [PATCH] support tab-completion for single quote input with equal sign

Jacob Champion  writes:
> On Sat, 2021-09-04 at 11:32 -0400, Tom Lane wrote:
>> Independently of the concerns I raised, I'm wondering how come you
>> are getting different results.  Which readline or libedit version
>> are you using, on what platform?

> Now you have me worried...
> - Ubuntu 20.04
> - libedit, version 3.1-20191231-1

We've had troubles with libedit before :-(, and that particular release
is known to be a bit buggy [1].

I can reproduce a problem using HEAD (no patch needed) and Fedora 32's
libedit-3.1-32.20191231cvs.fc32.x86_64: if I type

create subscription s connection foo 

then it happily completes "PUBLICATION", but if I type

create subscription s connection 'foo' 

I just get beeps.

I do *not* see this misbehavior on a nearby FreeBSD 13.0 machine
with libedit 3.1.20210216,1 (which isn't even the latest version).
So it's a fixed-some-time-ago bug.  Given the symptoms, I wonder
if it isn't closely related to the original complaint at [1].

Anyway, the bottom line from our standpoint is going to be that
we can't put a test case like this one into the TAP test.  I recall
that getting 010_tab_completion.pl to pass everywhere was a huge
headache at the outset, so this conclusion doesn't surprise me
much.

regards, tom lane

[1] http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54510

Re: Why does bootstrap and later initdb stages happen via client?

2021-09-08 Thread Andres Freund

Hi,

On September 8, 2021 1:24:00 PM PDT, Andrew Dunstan  wrote:
>
>On 9/8/21 3:07 PM, Andres Freund wrote:
>> There of course is historical raisins for things happening in initdb - the
>> setup logic didn't use to be C. But now that it is C, it seems a bit absurd 
>> to
>> read bootstrap data in initdb, write the data to a pipe, and then read it
>> again in the backend. It for sure doesn't make things faster.
>
>
>I guess the downside would be that we'd need to teach the backend how to
>do more stuff that only needs to be done once per cluster, and then that
>code would be dead space for the rest of the lifetime of the cluster.
>
>
>Maybe the difference is sufficiently small that it doesn't matter.

Unused code doesn't itself cost much - the OS won't even page it in. And disk 
space wise, there's not much difference between code in initdb and code in 
postgres. It's callsites to the code that can be problematic. But there were 
already paying the price via --boot and a fair number of if (bootstrap) blocks.

Regards,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-08 Thread Jacob Champion

On Wed, 2021-09-08 at 17:08 -0400, Tom Lane wrote:
> Jacob Champion  writes:
> > On Sat, 2021-09-04 at 11:32 -0400, Tom Lane wrote:
> > > Independently of the concerns I raised, I'm wondering how come you
> > > are getting different results.  Which readline or libedit version
> > > are you using, on what platform?
> > Now you have me worried...
> > - Ubuntu 20.04
> > - libedit, version 3.1-20191231-1
> 
> We've had troubles with libedit before :-(, and that particular release
> is known to be a bit buggy [1].

That's... unfortunate. Thanks for the info; I wonder what other tab-
completion bugs I've just accepted as "standard behavior".

--Jacob

Regression in PG14 LookupFuncName

2021-09-08 Thread Sven Klemm

Hi hackers,

In PG12 and PG13 LookupFuncName would return procedures as well as
functions while in PG14 since commit e56bce5d [0] it would disregard
all procedures
and not return them as match.

Is this intended behaviour or an unintended side effect of the refactoring?

Sven

[0] https://github.com/postgres/postgres/commit/e56bce5d

Re: Regression in PG14 LookupFuncName

Sven Klemm  writes:
> In PG12 and PG13 LookupFuncName would return procedures as well as
> functions while in PG14 since commit e56bce5d [0] it would disregard
> all procedures
> and not return them as match.
> Is this intended behaviour or an unintended side effect of the refactoring?

It was intentional, because all internal callers of LookupFuncName only
want to see functions.  See the last few messages in the referenced
discussion thread:

https://www.postgresql.org/message-id/flat/3742981.1621533210%40sss.pgh.pa.us

You should be able to use LookupFuncWithArgs if you want a different
definition.

regards, tom lane

Re: Estimating HugePages Requirements?

On Wed, Sep 08, 2021 at 04:10:41PM +0900, Fujii Masao wrote:
> + {"shared_memory_size", PGC_INTERNAL, RESOURCES_MEM,
> 
> When reading the applied code, I found the category of shared_memory_size
> is RESOURCES_MEM. Why? This seems right because the parameter is related
> to memory resource. But since its context is PGC_INTERNAL, PRESET_OPTIONS
> is more proper as the category? BTW, the category of any other
> PGC_INTERNAL parameters seems to be PRESET_OPTIONS.

Yes, that's an oversight from me.  I was looking at that yesterday,
noticed some exceptions in the GUC list with things not allowed in
files and just concluded that RESOURCES_MEM should be fine, but the
docs tell a different story.  Thanks, fixed.
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-08 Thread Melanie Plageman

On Fri, Aug 13, 2021 at 3:08 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-08-11 16:11:34 -0400, Melanie Plageman wrote:
> > On Tue, Aug 3, 2021 at 2:13 PM Andres Freund  wrote:
> > > > Also, I'm unsure how writing the buffer action stats out in
> > > > pgstat_write_statsfiles() will work, since I think that backends can
> > > > update their buffer action stats after we would have already persisted
> > > > the data from the BufferActionStatsArray -- causing us to lose those
> > > > updates.
> > >
> > > I was thinking it'd work differently. Whenever a connection ends, it 
> > > reports
> > > its data up to pgstats.c (otherwise we'd loose those stats). By the time
> > > shutdown happens, they all need to have already have reported their stats 
> > > - so
> > > we don't need to do anything to get the data to pgstats.c during shutdown
> > > time.
> > >
> >
> > When you say "whenever a connection ends", what part of the code are you
> > referring to specifically?
>
> pgstat_beshutdown_hook()
>
>
> > Also, when you say "shutdown", do you mean a backend shutting down or
> > all backends shutting down (including postmaster) -- like pg_ctl stop?
>
> Admittedly our language is very imprecise around this :(. What I meant
> is that backends would report their own stats up to the stats collector
> when the connection ends (in pgstat_beshutdown_hook()). That means that
> when the whole server (pgstat and then postmaster, potentially via
> pg_ctl stop) shuts down, all the per-connection stats have already been
> reported up to pgstat.
>

So, I realized that the patch has a problem. I added the code to send
buffer actions stats to the stats collector
(pgstat_send_buffer_actions()) to pgstat_report_stat() and this isn't
getting called when all types of backends exit.

I originally thought to add pgstat_send_buffer_actions() to
pgstat_beshutdown_hook() (as suggested), but, this is called after
pgstat_shutdown_hook(), so, we aren't able to send stats to the stats
collector at that time. (pgstat_shutdown_hook() sets pgstat_is_shutdown
to true and then in pgstat_beshutdown_hook() (called after), if we call
pgstat_send_buffer_actions(), it calls pgstat_send() which calls
pgstat_assert_is_up() which trips when pgstat_is_shutdown is true.)

After calling pgstat_send_buffer_actions() from pgstat_report_stat(), it
seems to miss checkpointer stats entirely. I did find that if I
sprinkled pgstat_send_buffer_actions() around in the various places that
pgstat_send_checkpointer() is called, I could get checkpointer stats
(see attached patch, capture_checkpointer_buffer_actions.patch), but,
that seems a little bit haphazard since pgstat_send_buffer_actions() is
supposed to capture stats for all backend types. Is there somewhere else
I can call it that is exercised by all backend types before
pgstat_shutdown_hook() is called but after they would have finished any
relevant buffer actions?

- Melanie

capture_checkpointer_buffer_actions.patch
Description: Binary data

Re: [PATCH] test/ssl: rework the sslfiles Makefile target

On Wed, Sep 08, 2021 at 07:32:03PM +, Jacob Champion wrote:
> On Fri, 2021-09-03 at 23:21 +, Jacob Champion wrote:
> > > One small-ish comment that I have is about all the .config files we
> > > have at the root of src/test/ssl/, bloating the whole.  I think that
> > > it would be a bit cleaner to put all of them in a different
> > > sub-directory, say just config/ or conf/.
> > 
> > That sounds reasonable. I won't be able to get to it before the holiday
> > weekend, but I can put up a patch sometime next week.
> 
> Done in v6, a three-patch squashable set. I chose conf/ as the
> directory.

Looks sensible to me.  One thing I can see, while poking at it, is
that the README mentions sslfiles to recreate the set of files.  But
it is necessary to do sslfiles-clean once, as sslfiles is a no-op if
the set of files exists.

I have not been able to check that this is compatible across all the
versions of OpenSSL we support on HEAD.  By looking at the code, that
should be fine but it would be good to be sure.

Daniel, you are registered as a reviewer of this thread
(https://commitfest.postgresql.org/34/3029/).  So I guess that you
would prefer to look at that by yourself and perhaps take care of the
commit?
--
Michael

signature.asc
Description: PGP signature

Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 08, 2021 at 08:46:44AM -0400, Andrew Dunstan wrote:
> Yeah. A very simple change would be to use two different values for json
> (say 'y' and 'n'). A slightly more principled scheme might use the top
> bit for the end marker and the bottom 3 bits for the dest type (so up to
> 8 types possible), with the rest available for future use.

I was thinking to track stderr as a case where no bits are set in the
flags for the area of the destinations, but that's a bit crazy if we
have a lot of margin in what can be stored.  I have looked at that and
finished with the attached which is an improvement IMO, especially
when it comes to the header validation.

FWIW, while looking at my own external module for the JSON logs, I
noticed that I used the chunk protocol when the log redirection is
enabled, but just enforced everything to be sent to stderr.
--
Michael
diff --git a/src/include/postmaster/syslogger.h 
b/src/include/postmaster/syslogger.h
index 1491eecb0f..c79dfbeba2 100644
--- a/src/include/postmaster/syslogger.h
+++ b/src/include/postmaster/syslogger.h
@@ -46,8 +46,7 @@ typedef struct
charnuls[2];/* always \0\0 */
uint16  len;/* size of this chunk (counts 
data only) */
int32   pid;/* writer's pid */
-   charis_last;/* last chunk of message? 't' 
or 'f' ('T' or
-* 'F' for CSV 
case) */
+   bits8   flags;  /* bitmask of PIPE_PROTO_* */
chardata[FLEXIBLE_ARRAY_MEMBER];/* data payload starts 
here */
 } PipeProtoHeader;
 
@@ -60,6 +59,11 @@ typedef union
 #define PIPE_HEADER_SIZE  offsetof(PipeProtoHeader, data)
 #define PIPE_MAX_PAYLOAD  ((int) (PIPE_CHUNK_SIZE - PIPE_HEADER_SIZE))
 
+/* flag bits for PipeProtoHeader->flags */
+#define PIPE_PROTO_IS_LAST 0x01/* last chunk of message? */
+/* log destinations */
+#define PIPE_PROTO_DEST_STDERR 0x10
+#define PIPE_PROTO_DEST_CSVLOG 0x20
 
 /* GUC options */
 extern bool Logging_collector;
diff --git a/src/backend/postmaster/syslogger.c 
b/src/backend/postmaster/syslogger.c
index cad43bdef2..c00b2dc186 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -891,8 +891,7 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
if (p.nuls[0] == '\0' && p.nuls[1] == '\0' &&
p.len > 0 && p.len <= PIPE_MAX_PAYLOAD &&
p.pid != 0 &&
-   (p.is_last == 't' || p.is_last == 'f' ||
-p.is_last == 'T' || p.is_last == 'F'))
+   (p.flags & (PIPE_PROTO_DEST_STDERR | 
PIPE_PROTO_DEST_CSVLOG)) != 0)
{
List   *buffer_list;
ListCell   *cell;
@@ -906,8 +905,15 @@ process_pipe_input(char *logbuffer, int 
*bytes_in_logbuffer)
if (count < chunklen)
break;
 
-   dest = (p.is_last == 'T' || p.is_last == 'F') ?
-   LOG_DESTINATION_CSVLOG : LOG_DESTINATION_STDERR;
+   if ((p.flags & PIPE_PROTO_DEST_STDERR) != 0)
+   dest = LOG_DESTINATION_STDERR;
+   else if ((p.flags & PIPE_PROTO_DEST_CSVLOG) != 0)
+   dest = LOG_DESTINATION_CSVLOG;
+   else
+   {
+   /* this should never happen as of the header 
validation */
+   Assert(false);
+   }
 
/* Locate any existing buffer for this source pid */
buffer_list = buffer_lists[p.pid % NBUFFER_LISTS];
@@ -924,7 +930,7 @@ process_pipe_input(char *logbuffer, int *bytes_in_logbuffer)
free_slot = buf;
}
 
-   if (p.is_last == 'f' || p.is_last == 'F')
+   if ((p.flags & PIPE_PROTO_IS_LAST) == 0)
{
/*
 * Save a complete non-final chunk in a per-pid 
buffer
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 816b071afa..2af87ee3bd 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -3250,11 +3250,16 @@ write_pipe_chunks(char *data, int len, int dest)
 
p.proto.nuls[0] = p.proto.nuls[1] = '\0';
p.proto.pid = MyProcPid;
+   p.proto.flags = 0;
+   if (dest == LOG_DESTINATION_STDERR)
+   p.proto.flags |= PIPE_PROTO_DEST_STDERR;
+   else if (dest == LOG_DESTINATION_CSVLOG)
+   p.proto.flags |= PIPE_PROTO_DEST_CSVLOG;
 
/* write all but the last chunk */
while (len > PIPE_MAX_PAYLOAD)

Re: drop tablespace failed when location contains .. on win32

On Wed, Sep 08, 2021 at 08:54:25AM -0400, Andrew Dunstan wrote:
> That seems like a bug. It's not very canonical :-)

Yes, better to fix that.  I fear that more places are impacted than
just the tablespace code paths.
--
Michael


signature.asc
Description: PGP signature

RE: drop tablespace failed when location contains .. on win32

2021-09-08 Thread wangsh.f...@fujitsu.com

Hi, 

> -Original Message-
> From: Michael Paquier 
> 
> On Wed, Sep 08, 2021 at 08:54:25AM -0400, Andrew Dunstan wrote:
> > That seems like a bug. It's not very canonical :-)
> 
> Yes, better to fix that.  I fear that more places are impacted than
> just the tablespace code paths.
> --
> Michael

Do you mean changing the action of canonicalize_path(), like remove all the 
(..) ?

I'm willing to fix this problem.

Regards
Shenhao Wang

Re: Add jsonlog log_destination for JSON server logs

2021-09-08 Thread Greg Stark

Fwiw I was shocked when I saw the t/f T/F kluge when I went to work on
jsonlogging. That's the kind of dead-end short-sighted hack that just
lays traps and barriers for future hackers to have to clean up before
they can do the work they want to do.

Please just put a "format" field (or "channel" field -- the logging
daemon doesn't really care what format) with a list of defined formats
that can easily be extended in the future. If you want to steal the
high bit for "is last" and only allow 128 values instead of 256 so be
it.

Re: drop tablespace failed when location contains .. on win32

On Thu, Sep 09, 2021 at 02:35:52AM +, wangsh.f...@fujitsu.com wrote:
> Do you mean changing the action of canonicalize_path(), like remove all the 
> (..) ?
> 
> I'm willing to fix this problem.

Looking at canonicalize_path(), we have already some logic around
pending_strips to remove paths when we find a "/.." in the path, so
that's a matter of adjusting this area to trim properly the previous
directory.

On *nix platforms, we don't apply this much caution either, say a
simple /tmp/path/../path/ results in this same path used in the link
from pg_tblspc.  But we are speaking about Windows here, and junction
points.

Based on the lack of complains over the years, that does not seem
really worth backpatching.  Just my 2c on this point.
--
Michael

signature.asc
Description: PGP signature

Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 08, 2021 at 10:58:51PM -0400, Greg Stark wrote:
> Please just put a "format" field (or "channel" field -- the logging
> daemon doesn't really care what format) with a list of defined formats
> that can easily be extended in the future. If you want to steal the
> high bit for "is last" and only allow 128 values instead of 256 so be
> it.

Which is what I just posted here:
https://www.postgresql.org/message-id/ytlunscidrl1z...@paquier.xyz

Well, we could also do things so as we have two fields, as of
something like:
typedef struct
{
[...]
bits8   flags:4, format:4;
[...]
} PipeProtoHeader;

I am not sure if this is an improvement in readability though, and
that's less consistent with the recent practice we've been trying to
follow with bitmasks and flag-like options.
--
Michael


signature.asc
Description: PGP signature

Re: Estimating HugePages Requirements?

On Wed, Sep 08, 2021 at 05:52:33PM +, Bossart, Nathan wrote:
> Yeah, I did wonder about this.  We're even listing it in the "Preset
> Options" section in the docs.  I updated this in the new patch set,
> which is attached.

I broke that again, so rebased as v9 attached.

FWIW, I don't have an environment at hand these days to test properly
0001, so this will have to wait a bit.  I really like the approach
taken by 0002, and it is independent of the other patch while
extending support for postgres -c to provide the correct runtime
values.  So let's wrap this part first.  No need to send a reorganized
patch set.
--
Michael


signature.asc
Description: PGP signature

Re: PG Docs - CREATE SUBSCRIPTION option list order

On Wed, Sep 8, 2021 at 12:24 PM Peter Smith  wrote:
>
> v2 --> v3
>
> The subscription_parameter names are now split into 2 groups using
> Amit's suggestion [1] on how to categorise them.
>
> I also made some grammar improvements to their descriptions.
>

I have made minor edits to your first patch and it looks good to me. I
am not sure what exactly Tom has in mind related to grammatical
improvements, so it is better if he can look into that part of your
proposal (basically second patch
v4-0002-PG-Docs-Create-Subscription-options-rewording).

-- 
With Regards,
Amit Kapila.

v4-0001-Doc-Change-optional-parameters-grouping-in-Create.patch
Description: Binary data

v4-0002-PG-Docs-Create-Subscription-options-rewording.patch
Description: Binary data

Re: Estimating HugePages Requirements?

On Thu, Sep 09, 2021 at 01:19:14PM +0900, Michael Paquier wrote:
> I broke that again, so rebased as v9 attached.

Well.
--
Michael
From 09b1f5a5e0e88a6d84bfc2f8a1bc410e21fb3775 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Sep 2021 13:10:24 +0900
Subject: [PATCH v9 1/2] Introduce huge_pages_required GUC.

This runtime-computed GUC shows the number of huge pages required
for the server's main shared memory area.
---
 src/include/storage/pg_shmem.h |  1 +
 src/backend/port/sysv_shmem.c  | 16 +++-
 src/backend/port/win32_shmem.c | 14 ++
 src/backend/storage/ipc/ipci.c | 14 ++
 src/backend/utils/misc/guc.c   | 12 
 doc/src/sgml/config.sgml   | 21 +
 6 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 059df1b72c..518eb86065 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -87,5 +87,6 @@ extern PGShmemHeader *PGSharedMemoryCreate(Size size,

   PGShmemHeader **shim);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
 
 #endif /* PG_SHMEM_H */
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 9de96edf6a..125e2d47ec 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -456,8 +456,6 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
return shmStat.shm_nattch == 0 ? SHMSTATE_UNATTACHED : 
SHMSTATE_ATTACHED;
 }
 
-#ifdef MAP_HUGETLB
-
 /*
  * Identify the huge page size to use, and compute the related mmap flags.
  *
@@ -476,11 +474,19 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
  * such as increasing shared_buffers to absorb the extra space.
  *
  * Returns the (real, assumed or config provided) page size into *hugepagesize,
- * and the hugepage-related mmap flags to use into *mmap_flags.
+ * and the hugepage-related mmap flags to use into *mmap_flags.  If huge pages
+ * is not supported, *hugepagesize and *mmap_flags will be set to 0.
  */
-static void
+void
 GetHugePageSize(Size *hugepagesize, int *mmap_flags)
 {
+#ifndef MAP_HUGETLB
+
+   *hugepagesize = 0;
+   *mmap_flags = 0;
+
+#else
+
Sizedefault_hugepagesize = 0;
 
/*
@@ -553,9 +559,9 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
 #endif
-}
 
 #endif /* MAP_HUGETLB */
+}
 
 /*
  * Creates an anonymous mmap()ed shared memory segment.
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index d7a71992d8..90de2ab4e1 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -605,3 +605,17 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
 
return true;
 }
+
+/*
+ * This function is provided for consistency with sysv_shmem.c and does not
+ * provide any useful information for Windows.  To obtain the large page size,
+ * use GetLargePageMinimum() instead.
+ *
+ * This always sets *hugepagesize and *mmap_flags to 0.
+ */
+void
+GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+{
+   *hugepagesize = 0;
+   *mmap_flags = 0;
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 13f3926ff6..91490653cf 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -326,6 +326,9 @@ InitializeShmemGUCs(void)
charbuf[64];
Sizesize_b;
Sizesize_mb;
+   Sizehp_size;
+   Sizehp_required;
+   int unused;
 
/*
 * Calculate the shared memory size and round up to the nearest 
megabyte.
@@ -334,4 +337,15 @@ InitializeShmemGUCs(void)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf, PGC_INTERNAL, 
PGC_S_OVERRIDE);
+
+   /*
+* Calculate the number of huge pages required.
+*/
+   GetHugePageSize(&hp_size, &unused);
+   if (hp_size != 0)
+   {
+   hp_required = (size_b / hp_size) + 1;
+   sprintf(buf, "%zu", hp_required);
+   SetConfigOption("huge_pages_required", buf, PGC_INTERNAL, 
PGC_S_OVERRIDE);
+   }
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fd4ca83be1..8edfd07340 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -665,6 +665,7 @@ static int  max_identifier_length;
 static int block_size;
 static int segment_size;
 static int shared_memory_size_mb;
+static int huge_pages_requir

Re: strange case of "if ((a & b))"

On Tue, Sep 07, 2021 at 02:59:58PM +0900, Michael Paquier wrote:
> In 0002, everything is a boolean expression except for
> SpGistPageStoresNulls() and GistPageIsLeaf().  So that's a good
> cleanup overall.

I looked again at 0002 again yesterday, and that was an improvement
for most of those locations, where we already use a boolean as
expression, so done mostly as of fd0625c.

> -   pathnode->parallel_aware = parallel_workers > 0 ? true : false;
> +   pathnode->parallel_aware = parallel_workers > 0;
> I also prefer that we keep the parenthesis for such things.  That's
> more readable and easier to reason about.

Adjusted these as well.
--
Michael


signature.asc
Description: PGP signature

Re: drop tablespace failed when location contains .. on win32

At Thu, 9 Sep 2021 12:44:45 +0900, Michael Paquier  wrote 
in 
> On Thu, Sep 09, 2021 at 02:35:52AM +, wangsh.f...@fujitsu.com wrote:
> > Do you mean changing the action of canonicalize_path(), like remove all the 
> > (..) ?
> > 
> > I'm willing to fix this problem.
> 
> Looking at canonicalize_path(), we have already some logic around
> pending_strips to remove paths when we find a "/.." in the path, so
> that's a matter of adjusting this area to trim properly the previous
> directory.
> 
> On *nix platforms, we don't apply this much caution either, say a
> simple /tmp/path/../path/ results in this same path used in the link
> from pg_tblspc.  But we are speaking about Windows here, and junction
> points.
> 
> Based on the lack of complains over the years, that does not seem
> really worth backpatching.  Just my 2c on this point.

Reading the first complaint, I remember I proposed that as a part of a
larger patch.

https://www.postgresql.org/message-id/20190425.170855.39056106.horiguchi.kyotaro%40lab.ntt.co.jp

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Schema variables - new implementation for Postgres 15

Hi

fresh rebase

Regards

Pavel


schema-variables-20210909.patch.gz
Description: application/gzip

Re: strange case of "if ((a & b))"

At Thu, 9 Sep 2021 13:28:54 +0900, Michael Paquier  wrote 
in 
> On Tue, Sep 07, 2021 at 02:59:58PM +0900, Michael Paquier wrote:
> > In 0002, everything is a boolean expression except for
> > SpGistPageStoresNulls() and GistPageIsLeaf().  So that's a good
> > cleanup overall.
> 
> I looked again at 0002 again yesterday, and that was an improvement
> for most of those locations, where we already use a boolean as
> expression, so done mostly as of fd0625c.
> 
> > -   pathnode->parallel_aware = parallel_workers > 0 ? true : false;
> > +   pathnode->parallel_aware = parallel_workers > 0;
> > I also prefer that we keep the parenthesis for such things.  That's
> > more readable and easier to reason about.
> 
> Adjusted these as well.

Maybe I'm missing something, but I can see several instances of the
"eval-bool ? true : false" pattern after fd0625c7a9 that are not in
the latest 0002.

./backend/nodes/readfuncs.c

Re: Improve logging when using Huge Pages

Thanks!

At Wed, 8 Sep 2021 07:52:35 +, "Shinoda, Noriyoshi (PN Japan FSIP)" 
 wrote in 
> Hello,
> 
> Thank you everyone for comments.
> I have attached a patch that simply changed the message like the advice from 
> Horiguchi-san.
> 
> > Even with the patch, there are still some cases where huge pages is 
> > disabled silently. We should report something even in these cases?
> > For example, in the platform where huge pages is not supported, it's 
> > silently disabled when huge_pages=try.
> 
> The area where this patch is written is inside the "#ifdef MAP_HUGETLB 
> #endif" block.
> For this reason, I think it is excluded from binaries created in an 
> environment that does not have the MAP_HUGETLB macro.

Ah, right.

> > One big concern about the patch is that log message is always reported when 
> > shared memory fails to be allocated with huge pages enabled when 
> > huge_pages=try. Since 
> > huge_pages=try is the default setting, many users would see this new log 
> > message whenever they start the server. Those who don't need huge pages but 
> > just use the default 
> > setting might think that such log messages would be noisy.
> 
> This patch is meant to let the admin know that HugePages isn't being used, so 
> I'm sure you're right. I have no idea what to do so far.

It seems *to me* sufficient. I'm not sure what cases CreateFileMapping
return ERROR_NO_SYSTEM_RESOURCES when non-huge page can be allocated
successfully, though, but that doesn't matter much, maybe.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PROPOSAL] new diagnostic items for the dynamic sql

Hi

I tested the last patch, and I think I found unwanted behavior.

The value of PG_SQL_TEXT is not empty only when the error is related to the
parser stage. When the error is raised in the query evaluation stage, then
the value is empty.
I think this is too confusing. PL/pgSQL is a high level language, and the
behaviour should be consistent independent of internal implementation. I am
afraid this feature requires much more work.

postgres=# DO $$
DECLARE
  err_sql_stmt TEXT;
  err_sql_pos INT;
BEGIN
  EXECUTE 'SELECT 1/0';
EXCEPTION
  WHEN OTHERS THEN
GET STACKED DIAGNOSTICS
  err_sql_stmt = PG_SQL_TEXT,
  err_sql_pos = PG_ERROR_LOCATION;
RAISE NOTICE 'exception sql "%"', err_sql_stmt;
RAISE NOTICE 'exception sql position %', err_sql_pos;
END;
$$;
NOTICE:  exception sql ""
NOTICE:  exception sql position 0
DO

For this case, the empty result is not acceptable in this language. It is
too confusing. The implemented behaviour is well described in regress
tests, but I don't think it is user (developer) friendly. The location
field is not important, and can be 0 some times. But query text should be
not empty in all possible cases related to any query evaluation. I think
this can be a nice and useful feature, but the behavior should be
consistent.

Second, but minor, objection to this patch is zero formatting in a regress
test.

Regards

Pavel

Show redo LSN in checkpoint logs

Hello.

Sometimes, I needed to infer where a past checkpoint make wal segments
unnecessary up to, or just to know the LSN at a past point in
time. But there's no convenient source for that.

The attached small patch enables me (or us) to do that by looking into
server log files.


> LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 
> removed, 0 recycled; write=0.008 s, sync=0.035 s, total=0.064 s; sync 
> files=4, longest=0.017 s, average=0.009 s; distance=16420 kB, estimate=16420 
> kB, redo=0/30091D8

Does that make sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749d..b6d7013a73 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8761,7 +8761,7 @@ LogCheckpointEnd(bool restartpoint)
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
-		"distance=%d kB, estimate=%d kB",
+		"distance=%d kB, estimate=%d kB, redo=%X/%X",
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -8774,14 +8774,15 @@ LogCheckpointEnd(bool restartpoint)
 		longest_msecs / 1000, (int) (longest_msecs % 1000),
 		average_msecs / 1000, (int) (average_msecs % 1000),
 		(int) (PrevCheckPointDistance / 1024.0),
-		(int) (CheckPointDistanceEstimate / 1024.0;
+		(int) (CheckPointDistanceEstimate / 1024.0),
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo;
 	else
 		ereport(LOG,
 (errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
-		"distance=%d kB, estimate=%d kB",
+		"distance=%d kB, estimate=%d kB, redo=%X/%X",
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -8794,7 +8795,8 @@ LogCheckpointEnd(bool restartpoint)
 		longest_msecs / 1000, (int) (longest_msecs % 1000),
 		average_msecs / 1000, (int) (average_msecs % 1000),
 		(int) (PrevCheckPointDistance / 1024.0),
-		(int) (CheckPointDistanceEstimate / 1024.0;
+		(int) (CheckPointDistanceEstimate / 1024.0),
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo;
 }
 
 /*

Re: row filtering for logical replication

2021-09-08 Thread Peter Smith

PSA my new incremental patch (v28-0002) that introduces row filter
validation for the publish mode "delete". The validation requires that
any columns referred to in the filter expression must also be part of
REPLICA IDENTITY or PK.

[This v28-0001 is identical to the most recently posted rebased base
patch. It is included again here only so the cfbot will be happy]

~~

A requirement for some filter validation like this has been mentioned
several times in this thread [1][2][3][4][5].

I also added some test code for various kinds of replica identity.

A couple of existing tests had to be modified so they could continue
to work  (e.g. changed publish = "insert" or REPLICA IDENTITY FULL)

Feedback is welcome.

~~

NOTE: This validation currently only checks when the filters are first
created. Probably there are many other scenarios that need to be
properly handled. What to do if something which impacts the existing
filter is changed?

e.g.
- what if the user changes the publish parameter using ALTER
PUBLICATION set (publish="delete") etc?
- what if the user changes the replication identity?
- what if the user changes the filter using ALTER PUBLICATION in a way
that is no longer compatible with the necessary cols?
- what if the user changes the table (e.g. removes a column referred
to by a filter)?
- what if the user changes a referred column name?
- more...

(None of those are addressed yet - thoughts?)

--

[1] 
https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com
[2] 
https://www.postgresql.org/message-id/CAA4eK1JL2q%2BHENgiCf1HLRU7nD9jCcttB9sEqV1tech4mMv_0A%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/202107132106.wvjgvjgcyezo%40alvherre.pgsql
[4] 
https://www.postgresql.org/message-id/202107141452.edncq4ot5zkg%40alvherre.pgsql
[5] 
https://www.postgresql.org/message-id/CAA4eK1Kyax-qnVPcXzODu3JmA4vtgAjUSYPUK1Pm3vBL5gC81g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v28-0002-Row-filter-validation-replica-identity.patch
Description: Binary data


v28-0001-Row-filter-for-logical-replication.patch
Description: Binary data

Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-09-08 Thread Dinesh Chemuduru

On Thu, 9 Sept 2021 at 11:07, Pavel Stehule  wrote:

> Hi
>
> I tested the last patch, and I think I found unwanted behavior.
>
> The value of PG_SQL_TEXT is not empty only when the error is related to
> the parser stage. When the error is raised in the query evaluation stage,
> then the value is empty.
> I think this is too confusing. PL/pgSQL is a high level language, and the
> behaviour should be consistent independent of internal implementation. I am
> afraid this feature requires much more work.
>
> postgres=# DO $$
> DECLARE
>   err_sql_stmt TEXT;
>   err_sql_pos INT;
> BEGIN
>   EXECUTE 'SELECT 1/0';
> EXCEPTION
>   WHEN OTHERS THEN
> GET STACKED DIAGNOSTICS
>   err_sql_stmt = PG_SQL_TEXT,
>   err_sql_pos = PG_ERROR_LOCATION;
> RAISE NOTICE 'exception sql "%"', err_sql_stmt;
> RAISE NOTICE 'exception sql position %', err_sql_pos;
> END;
> $$;
> NOTICE:  exception sql ""
> NOTICE:  exception sql position 0
> DO
>
> For this case, the empty result is not acceptable in this language. It is
> too confusing. The implemented behaviour is well described in regress
> tests, but I don't think it is user (developer) friendly. The location
> field is not important, and can be 0 some times. But query text should be
> not empty in all possible cases related to any query evaluation. I think
> this can be a nice and useful feature, but the behavior should be
> consistent.
>
> Thanks for your time in evaluating this patch.
Let me try to fix the suggested case(I tried to fix this case in the past,
but this time I will try to spend more time on this), and will submit a new
patch.



> Second, but minor, objection to this patch is zero formatting in a regress
> test.
>
> Regards
>
> Pavel
>
>

missing warning in pg_import_system_collations

2021-09-08 Thread Anton Voloshin


Hello hackers,

In pg_import_system_collations() there is this fragment of code:

enc = pg_get_encoding_from_locale(localebuf, false);
if (enc < 0)
{
/* error message printed by pg_get_encoding_from_locale() */
continue;
}

However, false passed to pg_get_encoding_from_locale() means 
write_message argument is false, so no error message is ever printed.

I propose an obvious patch (see attachment).

Introduced in aa17c06fb in January 2017 when debug was replaced by 
false, so I guess back-patching through 10 would be appropriate.


--
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 4075f991a03..b601ca4d68c 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -594,7 +594,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 continue;
 			}
 
-			enc = pg_get_encoding_from_locale(localebuf, false);
+			enc = pg_get_encoding_from_locale(localebuf, /*write_message=*/true);
 			if (enc < 0)
 			{
 /* error message printed by pg_get_encoding_from_locale() */

Re: Added schema level support for publication.