date:20240125

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

Thanks for trying these patches!

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 10:53:58 +0800,
  jian he  wrote:

> COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5

Wow! I didn't know the "\watch count="!
I'll use it.

> Time: 668.996 ms
> Time: 596.254 ms
> Time: 592.723 ms
> Time: 591.663 ms
> Time: 590.803 ms

It seems that 5 times isn't enough for this case as Michael
said. But thanks for trying!

Thanks,
-- 
kou

Re: UUID v7

2024-01-25 Thread Sergey Prokhorenko

I am against turning the DBMS into another C++, in which they do not so much 
design something new as fix bugs in production after a crash.
As for partitioning, I already wrote to Andrey Borodin that we need a special 
function to generate a partition id using the UUIDv7 timestamp or even 
simultaneously with the generation of the timestamp. For example, every month 
(or so, since precision is not needed here) a new partition is created. Here's 
a good example: 
https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916
But without a separate function for extracting the entire timestamp from the 
UUID! Let's solve this specific problem, and not give the developers a grenade 
with the safety removed. Many developers have already decided to store the 
timestamp in UUIDv7, so as not to create a separate created_at field. Then they 
will delete table records with the old timestamp, etc. Horrible mistakes are 
simply guaranteed.

Sergey Prokhorenko sergeyprokhore...@yahoo.com.au 

On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin 
 wrote:  

> On 25 Jan 2024, at 09:40, Nikolay Samokhvalov  wrote:
> 
> From a practical point of view, these two things are extremely important to 
> have to support partitioning. It is better to implement limitations than 
> throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. 
you can define immutable function that is not really immutable, turn off 
autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if 
it can return strange values for imprecise RFC implementation.

> On 25 Jan 2024, at 02:15, Jelte Fennema-Nio  wrote:
> 
> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s 
compose better error message?

Best regards, Andrey Borodin.

Re: Use of backup_label not noted in log

2024-01-25 Thread Michael Paquier

On Mon, Jan 22, 2024 at 04:36:27PM +0900, Michael Paquier wrote:
> +   if (ControlFile->backupStartPoint != InvalidXLogRecPtr)
> 
> Nit 1: I would use XLogRecPtrIsInvalid here.
> 
> +   ereport(LOG,
> +   (errmsg("completed backup recovery with redo LSN %X/%X",
> +   LSN_FORMAT_ARGS(oldBackupStartPoint;
> 
> Nit 2: How about adding backupEndPoint in this LOG?  That would give:
> "completed backup recovery with redo LSN %X/%X and end LSN %X/%X".

Hearing nothing, I've just applied a version of the patch with these
two modifications on HEAD.  If this needs tweaks, just let me know.
--
Michael


signature.asc
Description: PGP signature

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote

On Thu, Jan 25, 2024 at 4:47 PM Richard Guo  wrote:
> On Thu, Jan 25, 2024 at 2:28 PM Amit Langote  wrote:
>>
>> On Thu, Jan 25, 2024 at 2:59 PM Richard Guo  wrote:
>> > I came across a warning when building master (a044e61f1b) on old GCC
>> > (4.8.5).
>> >
>> > jsonfuncs.c: In function ‘jsonb_populate_record_valid’:
>> > ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison 
>> > will always evaluate as ‘true’ for the address of ‘escontext’ will never 
>> > be NULL [-Waddress]
>> >   ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \
>> >^
>> > jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’
>> >   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
>> >
>> > This was introduced in commit 1edb3b491b, and can be observed on several
>> > systems in the buildfarm too, such as:
>> > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build
>>
>> Thanks for the report.
>>
>> Will apply the attached, which does this:
>>
>> -   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
>> +   return BoolGetDatum(!escontext.error_occurred);
>
> Looks good to me.

Thanks for checking, pushed.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Split index and table statistics into different types of stats

2024-01-25 Thread Bertrand Drouvot

Hi,

On Tue, Nov 14, 2023 at 09:04:03AM +0100, Drouvot, Bertrand wrote:
> On 11/13/23 9:44 PM, Andres Freund wrote:
> > Hi,
> > 
> > It's not nice from a layering POV that we need this level of awareness in
> > bufmgr.c.  I wonder if this is an argument for first splitting out stats 
> > like
> > blocks_hit, blocks_fetched into something like "relfilenode stats" - they're
> > agnostic of the relkind.
> 
> Thanks for looking at it! Yeah I think that would make a lot of sense
> to track some stats per relfilenode.
> 
> > There aren't that many such stats right now,
> > admittedly, but I think we'll want to also track dirtied, written blocks on 
> > a
> > per relation basis once we can (i.e. we key the relevant stats by 
> > relfilenode
> > instead of oid, so we can associate stats when writing out buffers).
> > 
> > 
> 
> Agree. Then, I think that would make sense to start this effort before the
> split index/table one. I can work on a per relfilenode stat patch first.
> 
> Does this patch ordering make sense to you?
> 
> 1) Introduce per relfilenode stats
> 2) Split index and table stats

Just a quick update on this: I had a chat with Andres at pgconf.eu and we agreed
on the above ordering so that:

1) I started working on relfilenode stats (I hope to be able to provide a POC
patch soon).

2) The CF entry [1] status related to this thread has been changed to "Waiting
on Author".

[1]: https://commitfest.postgresql.org/47/4792/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 12:17:55 +0900,
  Michael Paquier  wrote:

> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
> *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot 
> *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
> 
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.

OK. I'll do it.

> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
> 
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring.  Why not using a fetch function instead
> that uses a string in input?  Then you can call that once after
> parsing the List of options in ProcessCopyOptions().

OK. How about the following for the fetch function
signature?

extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

We may introduce an enum and use it:

typedef enum CopyBuiltinFormat
{
COPY_BUILTIN_FORMAT_TEXT = 0,
COPY_BUILTIN_FORMAT_CSV,
COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;

extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.

Oh, sorry. I assumed that the comment style was adjusted by
pgindent.

I'll use the following style:

/*
 * ...
 */

> +getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> +fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
> 
> Actually, this split is interesting.  It is possible for a custom
> format to plug in a custom set of out functions.  Did you make use of
> something custom for your own stuff?

I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.

I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):

1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
   custom type

>   Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?

Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".

> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work.  The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.

CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.

> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest.  Then we could move on with 0002~0004 and
> 0006~0008.

OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.


Thanks,
-- 
kou

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 25 Jan 2024 13:36:03 +0900,
  Masahiko Sawada  wrote:

> I've experimented with a similar optimization for csv
> and text format; have different callbacks for text and csv format and
> remove "if (cstate->opts.csv_mode)" branches. I've attached a patch
> for that. Here are results:
> 
> HEAD w/ 0001 patch + remove branches:
> binary 2824.502 ms
> text 2715.264 ms
> csv 2803.381 ms
> 
> The numbers look better now. I'm not sure these are within a noise
> range but it might be worth considering having different callbacks for
> text and csv formats.

Wow! Interesting. I tried the approach before but I didn't
see any difference by the approach. But it may depend on my
environment.

I'll import the approach to the next patch set so that
others can try the approach easily.


Thanks,
-- 
kou

RE: speed up a logical replica setup

2024-01-25 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

Here are comments for v8 patch set. I may revise them by myself,
but I want to post here to share all of them.

01.
```
/* Options */
static char *pub_conninfo_str = NULL;
static SimpleStringList database_names = {NULL, NULL};
static int  wait_seconds = DEFAULT_WAIT;
static bool retain = false;
static bool dry_run = false;
```

Just to confirm - is there a policy why we store the specified options? If you
want to store as global ones, username and port should follow (my fault...).
Or, should we have a structure to store them?

02.
```
{"subscriber-conninfo", required_argument, NULL, 'S'},
```

This is my fault, but "--subscriber-conninfo" is still remained. It should be
removed if it is not really needed.

03.
```
{"username", required_argument, NULL, 'u'},
```

Should we accept 'U' instead of 'u'?

04.
```
{"dry-run", no_argument, NULL, 'n'},
```

I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
which value would be changed based on the input. As for the pg_upgrade, it 
checks
whether the node can be upgraded for now. I think, we should have the checking
feature, so it should be renamed to --check. Also, the process should exit 
earlier
at that time.

05.
I felt we should accept some settings from enviroment variables, like 
pg_upgrade.
Currently, below items should be acceted.

- data directory
- username
- port
- timeout

06.
```
pg_logging_set_level(PG_LOG_WARNING);
```

If the default log level is warning, there are no ways to output debug logs.
(-v option only raises one, so INFO would be output)
I think it should be PG_LOG_INFO.

07.
Can we combine verifications into two functions, e.g., check_primary() and 
check_standby/check_subscriber()?

08.
Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
should be created. The name should contain the timestamp.

09.
Not sure, but should we check max_slot_wal_keep_size of primary server? It can
avoid to fail starting of logical replicaiton.

10.
```
nslots_new = nslots_old + dbarr.ndbs;

if (nslots_new > max_replication_slots)
{
pg_log_error("max_replication_slots (%d) must be greater than 
or equal to "
 "the number of replication slots 
required (%d)", max_replication_slots, nslots_new);
exit(1);
}
```

I think standby server must not have replication slots. Because subsequent
pg_resetwal command discards all the WAL file, so WAL records pointed by them
are removed. Currently pg_resetwal does not raise ERROR at that time.

11.
```
/*
 * Stop the subscriber if it is a standby server. Before executing the
 * transformation steps, make sure the subscriber is not running because
 * one of the steps is to modify some recovery parameters that require a
 * restart.
 */
if (stat(pidfile, &statbuf) == 0)
```

I kept just in case, but I'm not sure it is still needed. How do you think?
Removing it can reduce an inclusion of pidfile.h.

12.
```
pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
  standby.bindir, 
standby.pgdata);
rc = system(pg_ctl_cmd);
pg_ctl_status(pg_ctl_cmd, rc, 0);
```


There are two places to stop the instance. Can you divide it into a function?

13.
```
 * A temporary replication slot is not used here to avoid keeping a
 * replication connection open (depending when base backup was taken, 
the
 * connection should be open for a few hours).
 */
conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
if (conn == NULL)
exit(1);
consistent_lsn = create_logical_replication_slot(conn, true,

 &dbarr.perdb[0]);
```

I didn't notice the comment, but still not sure the reason. Why we must reserve
the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
is created only for getting a consistent_lsn. So we do not have to keep.
Also, just before, logical replication slots for each databases are created, so
WAL records are surely reserved.

14.

```
pg_log_info("starting the subscriber");
start_standby_server(&standby, subport, server_start_log);
```

This info should be in the function.

15.
```
/*
 * Create a subscription for each database.
 */
for (i = 0; i < dbarr.ndbs; i++)
```

This can be divided into a function, like create_all_subscriptions().

16.
My fault: usage() must be updated.

17. use_primary_slot_name
```
if (PQntuples(res) != 1)
{
pg_log_error("could not obtain replication slot information: 
got %d rows, expected %d row",
 PQntuples(res), 1);
return NULL;

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-01-25 Thread Bharath Rupireddy

On Tue, Jan 23, 2024 at 9:37 AM Jeff Davis  wrote:
>
> On Mon, 2024-01-22 at 12:12 -0800, Andres Freund wrote:
> > I still think that anything that requires such checks shouldn't be
> > merged. It's completely bogus to check page contents for validity
> > when we
> > should have metadata telling us which range of the buffers is valid
> > and which
> > not.
>
> The check seems entirely unnecessary, to me. A leftover from v18?
>
> I have attached a new patch (version "19j") to illustrate some of my
> previous suggestions. I didn't spend a lot of time on it so it's not
> ready for commit, but I believe my suggestions are easier to understand
> in code form.

> Note that, right now, it only works for XLogSendPhysical(). I believe
> it's best to just make it work for 1-3 callers that we understand well,
> and we can generalize later if it makes sense.

+1 to do it for XLogSendPhysical() first. Enabling it for others can
just be done as something like the attached v20-0003.

> I'm still not clear on why some callers are reading XLOG_BLCKSZ
> (expecting zeros at the end), and if it's OK to just change them to use
> the exact byte count.

"expecting zeros at the end" - this can't always be true as the WAL
can get flushed after determining the flush ptr before reading it from
the WAL file. FWIW, here's what I've tried previoulsy -
https://github.com/BRupireddy2/postgres/tree/ensure_extra_read_WAL_page_is_zero_padded_at_the_end_WIP,
the tests hit the Assert(false); added. Which means, the zero-padding
comment around WALRead() call-sites isn't quite right.

/*
 * Even though we just determined how much of the page can be validly read
 * as 'count', read the whole page anyway. It's guaranteed to be
 * zero-padded up to the page boundary if it's incomplete.
 */
if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli,

I think this needs to be discussed separately. If okay, I'll start a new thread.

> Also, if we've detected that the first requested buffer has been
> evicted, is there any value in continuing the loop to see if more
> recent buffers are available? For example, if the requested LSNs range
> over buffers 4, 5, and 6, and 4 has already been evicted, should we try
> to return LSN data from 5 and 6 at the proper offset in the dest
> buffer? If so, we'd need to adjust the API so the caller knows what
> parts of the dest buffer were filled in.

I'd second this capability for now to keep the API simple and clear,
but we can consider expanding it as needed.

I reviewed the v19j and attached v20 patch set:

1.
 * The caller must ensure that it's reasonable to read from the WAL buffers,
 * i.e. that the requested data is from the current timeline, that we're not
 * in recovery, etc.

I still think the XLogReadFromBuffers can just return in any of the
above cases instead of comments. I feel we must assume the caller is
going to ask the WAL from a different timeline and/or in recovery and
design the API to deal with it. Done that way in v20 patch.

2. Fixed some typos, reworded a few comments (i.e. used "current
insert/write position" instead of "Insert/Write pointer" like
elsewhere), ran pgindent.

3.
- * XXX probably this should be improved to suck data directly from the
- * WAL buffers when possible.

Removed the above comment before WALRead() since we have that facility
now. Perhaps, we can say the callers can suck data directly from the
WAL buffers using XLogReadFromBuffers. But I have no strong opinion on
this.

4.
+ * Most callers will have already updated LogwrtResult when determining
+ * how far to read, but it's OK if it's out of date. (XXX: is it worth
+ * taking a spinlock to update LogwrtResult and check again before calling
+ * WaitXLogInsertionsToFinish()?)

If the callers use GetFlushRecPtr() to determine how far to read,
LogwrtResult will be *reasonably* latest, otherwise not. If
LogwrtResult is a bit old, XLogReadFromBuffers will call
WaitXLogInsertionsToFinish which will just loop over all insertion
locks and return.

As far as the current WAL readers are concerned, we don't need an
explicit spinlock to determine LogwrtResult because all of them use
GetFlushRecPtr() to determine how far to read. If there's any caller
that's not updating LogwrtResult at all, we can consider reading
LogwrtResult it ourselves in future.

5. I think the two requirements specified at
https://www.postgresql.org/message-id/20231109205836.zjoawdrn4q77yemv%40awork3.anarazel.de
still hold with the v19j.

5.1 Never allow WAL being read that's past
XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos) as it does not
exist.
5.2 If the to-be-read LSN is between XLogCtl->LogwrtResult->Write and
XLogBytePosToRecPtr(Insert->CurrBytePos) we need to call
WaitXLogInsertionsToFinish() before copying the data.

+if (upto > LogwrtResult.Write)
+{
+XLogRecPtr writtenUpto = WaitXLogInsertionsToFinish(upto, false);
+
+upto = Min(upto, writtenUpto);
+nbytes = upto - st

Re: remaining sql/json patches

2024-01-25 Thread jian he

On 9.16.4. JSON_TABLE
`
name type FORMAT JSON [ENCODING UTF8] [ PATH json_path_specification ]
Inserts a composite SQL/JSON item into the output row
`
i am not sure "Inserts a composite SQL/JSON item into the output row"
I think it means, for any type's typecategory is TYPCATEGORY_STRING,
if FORMAT JSON is specified explicitly, the output value (text type)
will be legal
json type representation.

I also did a minor refactor on JSON_VALUE_OP, jsexpr->omit_quotes.
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 74bc6f49..56ab12ac 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -4289,7 +4289,7 @@ ExecInitJsonExpr(JsonExpr *jexpr, ExprState *state,
 	 * nodes.
 	 */
 	jsestate->escontext.type = T_ErrorSaveContext;
-	if (jexpr->result_coercion || jexpr->omit_quotes)
+	if (jexpr->result_coercion)
 	{
 		jsestate->jump_eval_result_coercion =
 			ExecInitJsonExprCoercion(state, jexpr->result_coercion,
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 31c0847e..9802b4ae 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -4363,6 +4363,7 @@ transformJsonFuncExpr(ParseState *pstate, JsonFuncExpr *func)
 			jsexpr->returning->format->format_type = JS_FORMAT_DEFAULT;
 			jsexpr->returning->format->encoding = JS_ENC_DEFAULT;
 
+			jsexpr->omit_quotes = true;
 			jsexpr->result_coercion = coerceJsonFuncExprOutput(pstate, jsexpr);
 
 			/*


v1-0001-refactor-sqljson_jsontable-related-tests.no-cfbot
Description: Binary data

Fix inappropriate comments in function ReplicationSlotAcquire

2024-01-25 Thread Wei Wang (Fujitsu)

Hi,

In the function ReplicationSlotAcquire(), I found there is a missing in the
below comments:

```
/*
 * Search for the slot with the specified name if the slot to acquire is
 * not given. If the slot is not found, we either return -1 or error 
out.
 */
s = SearchNamedReplicationSlot(name, false);
if (s == NULL || !s->in_use)
{
LWLockRelease(ReplicationSlotControlLock);

ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg("replication slot \"%s\" does not 
exist",
name)));
}
```

It seems that when the slot is not found, we will only error out and will not
return -1. Attach the patch to remove the inappropriate description of return
value.

Regards,
Wang Wei


v1-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Description:  v1-0001-Fix-inappropriate-comments-in-function-Replicatio.patch

Re: speed up a logical replica setup

2024-01-25 Thread Peter Eisentraut


On 24.01.24 00:44, Euler Taveira wrote:

Subscriber has a different meaning of subscription. Subscription is an SQL
object. Subscriber is the server (node in replication terminology) where the
subscription resides. Having said that pg_createsubscriber doesn't seem 
a bad
name because you are creating a new subscriber. (Indeed, you are 
transforming /
converting but "create" seems closer and users can infer that it is a 
tool to

build a new logical replica.


That makes sense.

(Also, the problem with "convert" etc. is that "convertsubscriber" would 
imply that you are converting an existing subscriber to something else. 
It would need to be something like "convertbackup" then, which doesn't 
seem helpful.)



I think "convert" and "transform" fit for this case. However, "create",
"convert" and "transform" have 6, 7 and 9 characters,  respectively. I 
suggest
that we avoid long names (subscriber already has 10 characters). My 
preference

is pg_createsubscriber.


That seems best to me.

Re: Fix inappropriate comments in function ReplicationSlotAcquire

2024-01-25 Thread Amit Kapila

On Thu, Jan 25, 2024 at 2:57 PM Wei Wang (Fujitsu)
 wrote:
>
> In the function ReplicationSlotAcquire(), I found there is a missing in the
> below comments:
>
> ```
> /*
>  * Search for the slot with the specified name if the slot to acquire 
> is
>  * not given. If the slot is not found, we either return -1 or error 
> out.
>  */
> s = SearchNamedReplicationSlot(name, false);
> if (s == NULL || !s->in_use)
> {
> LWLockRelease(ReplicationSlotControlLock);
>
> ereport(ERROR,
> (errcode(ERRCODE_UNDEFINED_OBJECT),
>  errmsg("replication slot \"%s\" does not 
> exist",
> name)));
> }
> ```
>
> It seems that when the slot is not found, we will only error out and will not
> return -1.
>

You seem to be correct. However, isn't the first part of the comment
also slightly confusing? In particular, "... if the slot to acquire is
not given." In this function, I don't see the case where a slot to
acquire is given.

-- 
With Regards,
Amit Kapila.

Re: Documentation to upgrade logical replication cluster

2024-01-25 Thread vignesh C

On Wed, 24 Jan 2024 at 15:16, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thanks for updating the patch! Basically your patch looks good.
> Below lines are my comments for v3.
>
> 01.
>
> ```
>  
>   The output plugins referenced by the slots on the old cluster must be
>   installed in the new PostgreSQL executable directory.
>  
> ```
>
> PostgreSQL must be marked as .

Modified

> 02.
>
> ```
> 
> pg_ctl -D /opt/PostgreSQL/data1 stop -l logfile
> 
> ```
>
> I checked that found that -l was no-op when `pg_ctl stop` was specified. Can 
> we remove?
> The documentation is not listed -l for the stop command.
> All the similar lines should be fixed as well.

Modified

> 03.
>
> ```
> On node3, create any tables that were created in
> the upgraded node2 between
>  and now,
> ```
>
> If tables are newly defined on node1 between 1 - 11, they are not defined on 
> node3.
> So they must be defined on node3 as well.

The new node1 tables will be created in node2 in step-11. Since we
have mentioned that, create the tables that were created between step
6 and now, all of node1 and node2 tables will get created

> 04.
>
> ```
>   
>
> ```
>
> Even if the referred steps is correct, ID should be allocated to step, not 
> para.
> That's why the rendering is bit a strange.

Modified

The attached v4 version patch has the changes for the same.

Regards,
Vignesh
From 4abe4ed86d68a5ecb60c3bf29249bb883605fb76 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 13 Dec 2023 14:11:58 +0530
Subject: [PATCH v4] Documentation for upgrading logical replication cluster.

Documentation for upgrading logical replication cluster.
---
 doc/src/sgml/logical-replication.sgml | 807 ++
 doc/src/sgml/ref/pgupgrade.sgml   | 143 +
 2 files changed, 822 insertions(+), 128 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ec2130669e..81501f9b92 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1926,6 +1926,813 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
  
 
+ 
+  Upgrade
+
+  
+   Migration of logical replication clusters is possible only when all the
+   members of the old logical replication clusters are version 17.0 or later.
+   Before reading this section, refer  page for
+   more details about pg_upgrade.
+  
+
+  
+   Prepare for publisher upgrades
+
+   
+pg_upgrade attempts to migrate logical
+slots. This helps avoid the need for manually defining the same
+logical slots on the new publisher. Migration of logical slots is
+only supported when the old cluster is version 17.0 or later.
+Logical slots on clusters before version 17.0 will silently be
+ignored.
+   
+
+   
+Before you start upgrading the publisher cluster, ensure that the
+subscription is temporarily disabled, by executing
+ALTER SUBSCRIPTION ... DISABLE.
+Re-enable the subscription after the upgrade.
+   
+
+   
+There are some prerequisites for pg_upgrade to
+be able to upgrade the logical slots. If these are not met an error
+will be reported.
+   
+
+   
+
+ 
+  The new cluster must have
+  wal_level as
+  logical.
+ 
+
+
+ 
+  The new cluster must have
+  max_replication_slots
+  configured to a value greater than or equal to the number of slots
+  present in the old cluster.
+ 
+
+
+ 
+  The output plugins referenced by the slots on the old cluster must be
+  installed in the new PostgreSQL executable
+  directory.
+ 
+
+
+ 
+  The old cluster has replicated all the transactions and logical decoding
+  messages to subscribers.
+ 
+
+
+ 
+  All slots on the old cluster must be usable, i.e., there are no slots
+  whose
+  pg_replication_slots.conflicting
+  is true.
+ 
+
+
+ 
+  The new cluster must not have permanent logical slots, i.e.,
+  there must be no slots where
+  pg_replication_slots.temporary
+  is false.
+ 
+
+   
+  
+
+  
+   Prepare for subscriber upgrades
+
+   
+Setup the 
+subscriber configurations in the new subscriber.
+pg_upgrade attempts to migrate subscription
+dependencies which includes the subscription's table information present in
+pg_subscription_rel
+system catalog and also the subscription's replication origin. This allows
+logical replication on the new subscriber to continue from where the
+old subscriber was up to. Migration of subscription dependencies is only
+supported when the old cluster is version 17.0 or later. Subscription
+dependencies on clusters before version 17.0 will silently be ignored.
+   
+
+   
+There are some prerequisites for pg_upgrade to
+be able to upgrade the subscriptions. If these are not met an er

Re: Documentation to upgrade logical replication cluster

2024-01-25 Thread vignesh C

On Thu, 25 Jan 2024 at 05:45, Peter Smith  wrote:
>
> Here are some review comments for patch v3.
>
> ==
> doc/src/sgml/ref/pgupgrade.sgml
>
> 1.
> +
> +  
> +   This page does not cover steps to upgrade logical replication
> clusters, refer
> +for details on upgrading
> +   logical replication clusters.
> +  
> +
>
> I felt that maybe this note was misplaced. Won't it be better to put
> this down in the "Usage" section of this page?
>
> BEFORE
> These are the steps to perform an upgrade with pg_upgrade:
>
> SUGGESTION (or something like this)
> Below are the steps to perform an upgrade with pg_upgrade.
>
> Note, the steps to upgrade logical replication clusters are not
> covered here; refer to 
> for details.

Modified

> ~~~
>
> 2.
> Configure the servers for log shipping.  (You do not need to run
> pg_backup_start() and
> pg_backup_stop()
> or take a file system backup as the standbys are still synchronized
> -   with the primary.)  Only logical slots on the primary are copied to 
> the
> -   new standby, but other slots on the old standby are not copied so must
> -   be recreated manually.
> +   with the primary.)  In version 17.0 or later, only logical slots on 
> the
> +   primary are copied to the new standby, but other slots on the
> old standby
> +   are not copied so must be recreated manually.
>
>
> This para was still unclear to me. What is version 17.0 referring to
> -- the old_cluster version? Do you mean something like:
> If the old cluster is < v17 then logical slots are not copied. If the
> old_cluster is >= v17 then...

Yes, I have rephrased it now.

> ==
> doc/src/sgml/logical-replication.sgml
>
> 3.
> +   
> +While upgrading a subscriber, write operations can be performed in the
> +publisher, these changes will be replicated to the subscriber once the
> +subscriber upgradation is completed.
> +   
>
> 3a.
> /publisher, these changes/publisher. These changes/

Modified

> ~
>
> 3b.
> "upgradation" ??. See [1]
>
> maybe just /upgradation/upgrade/

Modified

> ~~~
>
> 4. GENERAL - prompts/paths
>
> I noticed in v3 you removed all the cmd prompts like:
> dba@node1:/opt/PostgreSQL/postgres/17/bin$
> dba@node2:/opt/PostgreSQL/postgres/18/bin$
> etc.
>
> I thought those were helpful to disambiguate which server/version was
> being operated on. I wonder if there is some way to keep information
> still but not make it look like a real current directory that
> Kuroda-san did not like:
>
> e.g. Maybe something like the below is possible?
>
> (dba@node1: v17) pg_upgrade...
> (dba@node2: v18) pg_upgrade...

I did not want to add this as our current documentation is consistent
with how it is documented in the pg_upgrade page at [1].

The v4 version patch attached at [2] has the changes for the same.

[1] - https://www.postgresql.org/docs/devel/pgupgrade.html
[2] - 
https://www.postgresql.org/message-id/CALDaNm1wCHmBwpLM%3Dd9oBoZqKXOe-TwC-LCcHC9gFy0bazZU6Q%40mail.gmail.com

Regards,
Vignesh

Re: Popcount optimization using AVX512

2024-01-25 Thread Alvaro Herrera

On 2024-Jan-25, Shankaran, Akash wrote:

> With the updated patch, we observed significant improvements and
> handily beat the previous popcount algorithm performance. No
> regressions in any scenario are observed:
> Platform: Intel Xeon Platinum 8360Y (Icelake) for data sizes 1kb - 64kb.
> Microbenchmark: 2x - 3x gains presently vs 19% previously, on the same
> microbenchmark described initially in this thread. 

These are great results.

However, it would be much better if the improved code were available for
all relevant builds and activated if a CPUID test determines that the
relevant instructions are available, instead of requiring a compile-time
flag -- which most builds are not going to use, thus wasting the
opportunity for running the optimized code.

I suppose this would require patching pg_popcount64_choose() to be more
specific.  Looking at the existing code, I would also consider renaming
the "_fast" variants to something like pg_popcount32_asml/
pg_popcount64_asmq so that you can name the new one pg_popcount64_asmdq
or such.  (Or maybe leave the 32-bit version alone as "fast/slow", since
there's no third option for that one -- or do I misread?)

I also think this needs to move the CFLAGS-decision-making elsewhere;
asking the user to get it right is too much of a burden.  Is it workable
to simply verify compiler support for the additional flags needed, and
if so add them to a new CFLAGS_BITUTILS variable or such?  We already
have the CFLAGS_CRC model that should be easy to follow.  Should be easy
enough to mostly copy what's in configure.ac and meson.build, right?

Finally, the matter of using ifunc as proposed by Noah seems to be still
in the air, with no patches offered for the popcount family.  Given that
Nathan reports [1] a performance decrease, maybe we should set that
thought aside for now and continue to use function pointers.  It's worth
keeping in mind that popcount is already using function pointers (at
least in the case where we try to use POPCNT directly), so patching to
select between three options instead of between two wouldn't be a
regression.

[1] https://postgr.es/m/20231107201441.GA898662@nathanxps13

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)

RE: speed up a logical replica setup

2024-01-25 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thanks for giving your idea!

> > Subscriber has a different meaning of subscription. Subscription is an SQL
> > object. Subscriber is the server (node in replication terminology) where the
> > subscription resides. Having said that pg_createsubscriber doesn't seem
> > a bad
> > name because you are creating a new subscriber. (Indeed, you are
> > transforming /
> > converting but "create" seems closer and users can infer that it is a
> > tool to
> > build a new logical replica.
> 
> That makes sense.
> 
> (Also, the problem with "convert" etc. is that "convertsubscriber" would
> imply that you are converting an existing subscriber to something else.
> It would need to be something like "convertbackup" then, which doesn't
> seem helpful.)
> 
> > I think "convert" and "transform" fit for this case. However, "create",
> > "convert" and "transform" have 6, 7 and 9 characters,  respectively. I
> > suggest
> > that we avoid long names (subscriber already has 10 characters). My
> > preference
> > is pg_createsubscriber.
> 
> That seems best to me.

Just FYI - I'm ok to change the name to pg_createsubscriber.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Synchronizing slots from primary to standby

2024-01-25 Thread Amit Kapila

On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot
 wrote:
>
> On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote:
> >
> > Agreed. I split the original 0001 patch into 3 patches as suggested.
> > Here is the V68 patch set.

Thanks, I have pushed 0001.

>
> Thanks!
>
> Some comments.
>
> Looking at 0002:
>
> 1 ===
>
> +  The following options are supported:
>
> What about "The following option is supported"? (as currently only the 
> "FAILOVER"
> is)
>
> 2 ===
>
> What about adding some TAP tests too? (I can see that ALTER_REPLICATION_SLOT 
> test
> is added in v68-0004 but I think having some in 0002 would make sense too).
>

The subscription tests in v68-0003 will test this functionality. The
one advantage of adding separate tests for this is that if in the
future we extend this replication command, it could be convenient to
test various options. However, the same could be said about existing
replication commands as well. But is it worth having extra tests which
will be anyway covered in the next commit in a few days?

I understand that it is a good idea and makes one comfortable to have
tests for each separate commit but OTOH, in the longer term it will
just be adding more test time without achieving much benefit. I think
we can tell explicitly in the commit message of this patch that the
subsequent commit will cover the tests for this functionality

One minor comment on 0002:
+  so that logical replication can be resumed after failover.
+ 

Can we move this and similar comments or doc changes to the later 0004
patch where we are syncing the slots?

-- 
With Regards,
Amit Kapila.

RE: Fix inappropriate comments in function ReplicationSlotAcquire

2024-01-25 Thread Wei Wang (Fujitsu)

On Thu, Jan 25, 2024 at 18:39 Amit Kapila  wrote:
>

Thanks for your review.

> On Thu, Jan 25, 2024 at 2:57 PM Wei Wang (Fujitsu)
>  wrote:
> >
> > In the function ReplicationSlotAcquire(), I found there is a missing in the
> > below comments:
> >
> > ```
> > /*
> >  * Search for the slot with the specified name if the slot to 
> > acquire is
> >  * not given. If the slot is not found, we either return -1 or 
> > error out.
> >  */
> > s = SearchNamedReplicationSlot(name, false);
> > if (s == NULL || !s->in_use)
> > {
> > LWLockRelease(ReplicationSlotControlLock);
> >
> > ereport(ERROR,
> > (errcode(ERRCODE_UNDEFINED_OBJECT),
> >  errmsg("replication slot \"%s\" does not
> exist",
> > name)));
> > }
> > ```
> >
> > It seems that when the slot is not found, we will only error out and will 
> > not
> > return -1.
> >
> 
> You seem to be correct. However, isn't the first part of the comment
> also slightly confusing? In particular, "... if the slot to acquire is
> not given." In this function, I don't see the case where a slot to
> acquire is given.

Yes, agree. I think these two parts have become slightly outdated after the
commit 1632ea4. So also tried to fix the first part of the comment.
Attach the new patch.

Regards,
Wang Wei


v2-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Description:  v2-0001-Fix-inappropriate-comments-in-function-Replicatio.patch

Re: index prefetching

2024-01-25 Thread Dilip Kumar

On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra
 wrote:

> On 1/22/24 07:35, Konstantin Knizhnik wrote:
> >
> > On 22/01/2024 1:47 am, Tomas Vondra wrote:
> >> h, right. Well, you're right in this case we perhaps could set just one
> >> of those flags, but the "purpose" of the two places is quite different.
> >>
> >> The "prefetch" flag is fully controlled by the prefetcher, and it's up
> >> to it to change it (e.g. I can easily imagine some new logic touching
> >> setting it to "false" for some reason).
> >>
> >> The "data" flag is fully controlled by the custom callbacks, so whatever
> >> the callback stores, will be there.
> >>
> >> I don't think it's worth simplifying this. In particular, I don't think
> >> the callback can assume it can rely on the "prefetch" flag.
> >>
> > Why not to add "all_visible" flag to IndexPrefetchEntry ? If will not
> > cause any extra space overhead (because of alignment), but allows to
> > avoid dynamic memory allocation (not sure if it is critical, but nice to
> > avoid if possible).
> >
>
While reading through the first patch I got some questions, I haven't
read it complete yet but this is what I got so far.

1.
+static bool
+IndexPrefetchBlockIsSequential(IndexPrefetch *prefetch, BlockNumber block)
+{
+ int idx;
...
+ if (prefetch->blockItems[idx] != (block - i))
+ return false;
+
+ /* Don't prefetch if the block happens to be the same. */
+ if (prefetch->blockItems[idx] == block)
+ return false;
+ }
+
+ /* not sequential, not recently prefetched */
+ return true;
+}

The above function name is BlockIsSequential but at the end, it
returns true if it is not sequential, seem like a problem?
Also other 2 checks right above the end of the function are returning
false if the block is the same or the pattern is sequential I think
those are wrong too.

 2.
 I have noticed that the prefetch history is maintained at the backend
level, but what if multiple backends are trying to fetch the same heap
blocks maybe scanning the same index, so should that be in some shared
structure?  I haven't thought much deeper about this from the
implementation POV, but should we think about it, or it doesn't
matter?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: A failure in t/038_save_logical_slots_shutdown.pl

2024-01-25 Thread Bharath Rupireddy

On Tue, Jan 16, 2024 at 12:13 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Amit, Bharath,
>
> > This is a more strict check because it is possible that even if the
> > latest confirmed_flush location is not persisted there is no
> > meaningful decodable WAL between whatever the last confirmed_flush
> > location saved on disk and the shutdown_checkpoint record.
> > Kuroda-San/Vignesh, do you have any suggestion on this one?
>
> I think it should be as testcase explicitly. There are two reasons:
>
> * e0b2eed is a commit for backend codes, so it should be tested by src/test/*
>   files. Each src/bin/XXX/*.pl files should test only their executable.
> * Assuming that the feature would be broken. In this case 003_logical_slots.pl
>   would fail, but we do not have a way to recognize on the build farm.
>   038_save_logical_slots_shutdown.pl helps to distinguish the case.

+1 to keep 038_save_logical_slots_shutdown.pl as-is.

> Based on that, I think it is OK to add advance_wal() and comments, like 
> Bharath's patch.

Thanks. I'll wait a while and then add it to CF to not lose it in the wild.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: UUID v7

2024-01-25 Thread Przemysław Sztoch


Andrey M. Borodin wrote on 25.01.2024 07:51:



On 25 Jan 2024, at 09:40, Nikolay Samokhvalov  wrote:

 From a practical point of view, these two things are extremely important to 
have to support partitioning. It is better to implement limitations than throw 
them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. 
you can define immutable function that is not really immutable, turn off 
autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if 
it can return strange values for imprecise RFC implementation.



On 25 Jan 2024, at 02:15, Jelte Fennema-Nio  wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).


OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s 
compose better error message?

+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

Re: Fix inappropriate comments in function ReplicationSlotAcquire

2024-01-25 Thread Amit Kapila

On Thu, Jan 25, 2024 at 4:01 PM Wei Wang (Fujitsu)
 wrote:
>
>
> Yes, agree. I think these two parts have become slightly outdated after the
> commit 1632ea4. So also tried to fix the first part of the comment.
> Attach the new patch.
>

How about changing it to something simple like:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f2781d0455..84c257a7aa 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -465,10 +465,7 @@ retry:

LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

-   /*
-* Search for the slot with the specified name if the slot to acquire is
-* not given. If the slot is not found, we either return -1 or
error out.
-*/
+/* Check if the slot exits with the given name. */
s = SearchNamedReplicationSlot(name, false);
if (s == NULL || !s->in_use)
{

-- 
With Regards,
Amit Kapila.

Re: A failure in t/038_save_logical_slots_shutdown.pl

2024-01-25 Thread Amit Kapila

On Thu, Jan 25, 2024 at 4:27 PM Bharath Rupireddy
 wrote:
>
> Thanks. I'll wait a while and then add it to CF to not lose it in the wild.
>

Feel free to add it to CF. However, I do plan to look at it in the
next few days.

-- 
With Regards,
Amit Kapila.

Re: cataloguing NOT NULL constraints

2024-01-25 Thread Andrew Bille

Hi Alvaro,
25.08.2023 14:38, Alvaro Herrera wrote:
> I have now pushed this again. Hopefully it'll stick this time.

Starting from b0e96f31, pg_upgrade fails with inherited NOT NULL constraint:
For example upgrade from 9c13b6814a (or REL_12_STABLE .. REL_16_STABLE) to
b0e96f31 (or master) with following two tables (excerpt from
src/test/regress/sql/rules.sql)

create table test_0 (id serial primary key);
create table test_1 (id integer primary key) inherits (test_0);

I get the failure:

Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster   ok
Restoring database schemas in the new cluster
 test
*failure*

Consult the last few lines of
"new/pg_upgrade_output.d/20240125T151231.112/log/pg_upgrade_dump_16384.log"
for
the probable cause of the failure.
Failure, exiting

In log:

pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: connecting to new database "test"
pg_restore: creating DATABASE PROPERTIES "test"
pg_restore: connecting to new database "test"
pg_restore: creating pg_largeobject "pg_largeobject"
pg_restore: creating COMMENT "SCHEMA "public""
pg_restore: creating TABLE "public.test_0"
pg_restore: creating SEQUENCE "public.test_0_id_seq"
pg_restore: creating SEQUENCE OWNED BY "public.test_0_id_seq"
pg_restore: creating TABLE "public.test_1"
pg_restore: creating DEFAULT "public.test_0 id"
pg_restore: executing SEQUENCE SET test_0_id_seq
pg_restore: creating CONSTRAINT "public.test_0 test_0_pkey"
pg_restore: creating CONSTRAINT "public.test_1 test_1_pkey"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3200; 2606 16397 CONSTRAINT test_1 test_1_pkey
andrew
pg_restore: error: could not execute query: ERROR:  cannot drop inherited
constraint "pgdump_throwaway_notnull_0" of relation "test_1"
Command was:
-- For binary upgrade, must preserve pg_class oids and relfilenodes
SELECT
pg_catalog.binary_upgrade_set_next_index_pg_class_oid('16396'::pg_catalog.oid);

SELECT
pg_catalog.binary_upgrade_set_next_index_relfilenode('16396'::pg_catalog.oid);


ALTER TABLE ONLY "public"."test_1"
   ADD CONSTRAINT "test_1_pkey" PRIMARY KEY ("id");

ALTER TABLE ONLY "public"."test_1" DROP CONSTRAINT
pgdump_throwaway_notnull_0;

Thanks!




On Thu, Jan 25, 2024 at 3:06 PM Alvaro Herrera 
wrote:

> I have now pushed this again.  Hopefully it'll stick this time.
>
> We may want to make some further tweaks to the behavior in some cases --
> for example, don't disallow ALTER TABLE DROP NOT NULL when the
> constraint is both inherited and has a local definition; the other
> option is to mark the constraint as no longer having a local definition.
> I left it the other way because that's what CHECK does; maybe we would
> like to change both at once.
>
> I ran it through CI, and the pg_upgrade test with a dump from 14's
> regression test database and everything worked well, but it's been a
> while since I tested the sepgsql part of it, so that might the first
> thing to explode.
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>
>
>
>
>

Re: Synchronizing slots from primary to standby

2024-01-25 Thread shveta malik

On Thu, Jan 25, 2024 at 10:39 AM Peter Smith  wrote:

> 2. synchronize_one_slot
>
> + /*
> + * Sanity check: Make sure that concerned WAL is received and flushed
> + * before syncing slot to target lsn received from the primary server.
> + *
> + * This check should never pass as on the primary server, we have waited
> + * for the standby's confirmation before updating the logical slot.
> + */
> + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + {
> + ereport(LOG,
> + errmsg("skipping slot synchronization as the received slot sync"
> +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X",
> +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> +remote_slot->name,
> +LSN_FORMAT_ARGS(latestFlushPtr)));
> +
> + return false;
> + }
>
> Previously in v65 this was an elog, but now it is an ereport. But
> since this is a sanity check condition that "should never pass" wasn't
> the elog the more appropriate choice?

We realized that this scenario can be frequently hit when the user has
not set standby_slot_names on primary. And thus ereport makes more
sense. But I agree that this comment is misleading. We will adjust the
comment in the next version.

thanks
Shveta

Re: Current Connection Information

2024-01-25 Thread Aleksander Alekseev

Hi,

> It would be viable and appropriate to implement a unified function that 
> provides important information about the current connection?
>
> Just an example: "Current Connection Informations".
>
> I implemented it in PL/pgSQL to demonstrate the idea, see on GitHub:
> https://github.com/maiquelgrassi/DBA-toolkit/blob/main/cluster/dba_whoami_function.sql

I believe one would typically do something like this:

```
select * from pg_stat_activity where pid = pg_backend_pid();
```

On top of that psql can be configured to display useful information, e.g.:

```
$ cat ~/.psqlrc
\timing on
select (case when pg_is_in_recovery() then 'replica' else 'master'
end) as master_or_replica
\gset
\set PROMPT1 '%p (%:master_or_replica:) =# '
```

Personally I somewhat doubt that there is a one-size-fits-all
equivalent of `whoami` for Postgres. E.g. one person would like to see
a list of extensions available in the current database while for
another this is redundant information.

Even if we do this I don't think this should be a PL/pgSQL function
but rather a \whoami command for psql. This solution however will
leave users of DataGrip and similar products unhappy.

-- 
Best regards,
Aleksander Alekseev

Re: Synchronizing slots from primary to standby

2024-01-25 Thread Bertrand Drouvot

Hi,

On Thu, Jan 25, 2024 at 03:54:45PM +0530, Amit Kapila wrote:
> On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Agreed. I split the original 0001 patch into 3 patches as suggested.
> > > Here is the V68 patch set.
> 
> Thanks, I have pushed 0001.
> 
> >
> > Thanks!
> >
> > Some comments.
> >
> > Looking at 0002:
> >
> > 1 ===
> >
> > +  The following options are supported:
> >
> > What about "The following option is supported"? (as currently only the 
> > "FAILOVER"
> > is)
> >
> > 2 ===
> >
> > What about adding some TAP tests too? (I can see that 
> > ALTER_REPLICATION_SLOT test
> > is added in v68-0004 but I think having some in 0002 would make sense too).
> >
> 
> The subscription tests in v68-0003 will test this functionality. The
> one advantage of adding separate tests for this is that if in the
> future we extend this replication command, it could be convenient to
> test various options. However, the same could be said about existing
> replication commands as well.

I initially did check for "START_REPLICATION" and I saw it's part of 
006_logical_decoding.pl (but did not check all the "REPLICATION" commands).

That said, it's more a Nit and I think it's fine with having the test in 
v68-0004
(as it is currently done) + the ones in v68-0003.

> But is it worth having extra tests which
> will be anyway covered in the next commit in a few days?
> 
> I understand that it is a good idea and makes one comfortable to have
> tests for each separate commit but OTOH, in the longer term it will
> just be adding more test time without achieving much benefit. I think
> we can tell explicitly in the commit message of this patch that the
> subsequent commit will cover the tests for this functionality

Yeah, I think that's enough (at least someone reading the commit message, the 
diff changes and not following this dedicated thread closely would know the lack
of test is not a miss).

> One minor comment on 0002:
> +  so that logical replication can be resumed after failover.
> + 
> 
> Can we move this and similar comments or doc changes to the later 0004
> patch where we are syncing the slots?

Sure.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: remaining sql/json patches

2024-01-25 Thread Amit Langote

On Thu, Jan 25, 2024 at 6:09 PM Amit Langote  wrote:
> On Wed, Jan 24, 2024 at 10:11 PM Amit Langote  wrote:
> > I still need to take a look at your other report regarding typmod but
> > I'm out of energy today.
>
> The attached updated patch should address one of the concerns --
> JSON_QUERY() should now work appropriately with RETURNING type with
> typmod whether or  OMIT QUOTES is specified.
>
> But I wasn't able to address the problems with RETURNING
> record_type_with_typmod, that is, the following example you shared
> upthread:
>
> create domain char3_domain_not_null as char(3) NOT NULL;
> create domain hello as text not null check (value = 'hello');
> create domain int42 as int check (value = 42);
> create type comp_domain_with_typmod AS (a char3_domain_not_null, b int42);
> select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning
> comp_domain_with_typmod);
>  json_value
> 
>
> (1 row)
>
> select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning
> comp_domain_with_typmod error on error);
> ERROR:  value too long for type character(3)
>
> select json_value(jsonb'{"rec": "abcd"}', '$.rec' returning
> char3_domain_not_null error on error);
>  json_value
> 
>  abc
> (1 row)
>
> The problem with returning comp_domain_with_typmod from json_value()
> seems to be that it's using a text-to-record CoerceViaIO expression
> picked from JsonExpr.item_coercions, which behaves differently than
> the expression tree that the following uses:
>
> select ('abcd', 42)::comp_domain_with_typmod;
>row
> --
>  (abc,42)
> (1 row)

Oh, it hadn't occurred to me to check what trying to coerce a "string"
containing the record literal would do:

select '(''abcd'', 42)'::comp_domain_with_typmod;
ERROR:  value too long for type character(3)
LINE 1: select '(''abcd'', 42)'::comp_domain_with_typmod;

which is the same thing as what the JSON_QUERY() and JSON_VALUE() are
running into.  So, it might be fair to think that the error is not a
limitation of the SQL/JSON patch but an underlying behavior that it
has to accept as is.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: UUID v7

2024-01-25 Thread Aleksander Alekseev

Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. 
> I.e. you can define immutable function that is not really immutable, turn off 
> autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if 
> it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s 
> compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

-- 
Best regards,
Aleksander Alekseev

Re: Add bump memory context type and use it for tuplesorts

2024-01-25 Thread Matthias van de Meent

On Mon, 6 Nov 2023 at 19:54, Matthias van de Meent
 wrote:
>
> On Tue, 11 Jul 2023 at 01:51, David Rowley  wrote:
>>
>> On Tue, 27 Jun 2023 at 21:19, David Rowley  wrote:
>>> I've attached the bump allocator patch and also the script I used to
>>> gather the performance results in the first 2 tabs in the attached
>>> spreadsheet.
>>
>> I've attached a v2 patch which changes the BumpContext a little to
>> remove some of the fields that are not really required.  There was no
>> need for the "keeper" field as the keeper block always comes at the
>> end of the BumpContext as these are allocated in a single malloc().
>> The pointer to the "block" also isn't really needed. This is always
>> the same as the head element in the blocks dlist.

>> +++ b/src/backend/utils/mmgr/bump.c
>> [...]
>> +MemoryContext
>> +BumpContextCreate(MemoryContext parent,
>> +  const char *name,
>> +  Size minContextSize,
>> +  Size initBlockSize,
>> +  Size maxBlockSize)
>> [...]
>> +/* Determine size of initial block */
>> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
>> +if (minContextSize != 0)
>> +allocSize = Max(allocSize, minContextSize);
>> +else
>> +allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

+allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+Bump_CHUNKHDRSZ + initBlockSize;
+if (minContextSize != 0)
+allocSize = Max(allocSize, minContextSize);

>> + * BumpFree
>> + *Unsupported.
>> [...]
>> + * BumpRealloc
>> + *Unsupported.

Rather than the error, can't we make this a no-op (potentially
optionally, or in a different memory context?)

What I mean is, I get that it is an easy validator check that the code
that uses this context doesn't accidentally leak memory through
assumptions about pfree, but this does make this memory context
unusable for more generic operations where leaking a little memory is
preferred over the overhead of other memory contexts, as
MemoryContextReset is quite cheap in the grand scheme of things.

E.g. using a bump context in btree descent could speed up queries when
we use compare operators that do allocate memory (e.g. numeric, text),
because btree operators must not leak memory and thus always have to
manually keep track of all allocations, which can be expensive.

I understand that allowing pfree/repalloc in bump contexts requires
each allocation to have a MemoryChunk prefix in overhead, but I think
it's still a valid use case to have a very low overhead allocator with
no-op deallocator (except context reset). Do you have performance
comparison results between with and without the overhead of
MemoryChunk?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: UUID v7

2024-01-25 Thread Aleksander Alekseev

Hi,

> Andrey, many thanks for the updated patch.
>
> LGTM, cfbot is happy and I don't think we have any open items left. So
> changing CF entry status back to RfC.

PFA v14. I changed:

```
elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
```

... to:

```
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Time argument of UUID v7 is outside of the valid range")));
```

Which IMO tells a bit more to the average user and is translatable.

> At a quick glance, the patch needs improving English, IMO.

Agree. We could use some help from a native English speaker for this.

-- 
Best regards,
Aleksander Alekseev


v14-0001-Implement-UUID-v7.patch
Description: Binary data

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele


On 1/25/24 04:12, Michael Paquier wrote:

On Mon, Jan 22, 2024 at 04:36:27PM +0900, Michael Paquier wrote:

+   if (ControlFile->backupStartPoint != InvalidXLogRecPtr)

Nit 1: I would use XLogRecPtrIsInvalid here.

+   ereport(LOG,
+   (errmsg("completed backup recovery with redo LSN %X/%X",
+   LSN_FORMAT_ARGS(oldBackupStartPoint;

Nit 2: How about adding backupEndPoint in this LOG?  That would give:
"completed backup recovery with redo LSN %X/%X and end LSN %X/%X".


Hearing nothing, I've just applied a version of the patch with these
two modifications on HEAD.  If this needs tweaks, just let me know.


I had planned to update the patch this morning -- so thanks for doing 
that. I think having the end point in the message makes perfect sense.


I would still advocate for a back patch here. It is frustrating to get 
logs from users that just say:


LOG:  invalid checkpoint record
PANIC:  could not locate a valid checkpoint record

It would be very helpful to know what the checkpoint record LSN was in 
this case.


Regards,
-David

Re: Remove unused fields in ReorderBufferTupleBuf

2024-01-25 Thread Aleksander Alekseev

Hi,

> > Here is the corrected patch.
>
> Thank you for updating the patch! I have some comments:

Thanks for the review.

> -tuple = (ReorderBufferTupleBuf *)
> +tuple = (HeapTuple)
>  MemoryContextAlloc(rb->tup_context,
> -
> sizeof(ReorderBufferTupleBuf) +
> +   HEAPTUPLESIZE +
> MAXIMUM_ALIGNOF +
> alloc_len);
> -tuple->alloc_tuple_size = alloc_len;
> -tuple->tuple.t_data = ReorderBufferTupleBufData(tuple);
> +tuple->t_data = (HeapTupleHeader)((char *)tuple + HEAPTUPLESIZE);
>
> Why do we need to add MAXIMUM_ALIGNOF? since HEAPTUPLESIZE is the
> MAXALIGN'd size, it seems we don't need it. heap_form_tuple() does a
> similar thing but it doesn't add it.

Indeed. I gave it a try and nothing crashed, so it appears that
MAXIMUM_ALIGNOF is not needed.

> ---
>  xl_parameter_change *xlrec =
> -(xl_parameter_change *)
> XLogRecGetData(buf->record);
> +(xl_parameter_change *)
> XLogRecGetData(buf->record);
>
> Unnecessary change.

That's pgindent. Fixed.

> ---
> -/*
> - * Free a ReorderBufferTupleBuf.
> - */
> -void
> -ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple)
> -{
> -pfree(tuple);
> -}
> -
>
> Why does ReorderBufferReturnTupleBuf need to be moved from
> reorderbuffer.c to reorderbuffer.h? It seems not related to this
> refactoring patch so I think we should do it in a separate patch if we
> really want it. I'm not sure it's necessary, though.

OK, fixed.

-- 
Best regards,
Aleksander Alekseev


v6-0001-Remove-ReorderBufferTupleBuf-structure.patch
Description: Binary data

Re: Use of backup_label not noted in log

2024-01-25 Thread Michael Banck

Hi,

On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote:
> I would still advocate for a back patch here. It is frustrating to get logs
> from users that just say:
> 
> LOG:  invalid checkpoint record
> PANIC:  could not locate a valid checkpoint record
> 
> It would be very helpful to know what the checkpoint record LSN was in this
> case.

I agree.


Michael

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Wed, 24 Jan 2024 at 19:03, Michael Paquier  wrote:

> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
> > I managed to get it to build the vcvarsall arch needs to be x64. I need
> to
> > add some options, but the patch above needs to be applied to build it.
>
> Nice.  If I may ask, what kind of host and/or configuration have you
> used to reach a state where the code can be compiled and run tests
> with meson?  If you have found specific steps, it may be a good thing
> to document that on the wiki, say around [1].
>

I am running a mac-mini M2 with parallels running windows pro 11
The only thing required is this patch. Running in a native developer
prompt

meson setup build --prefix=c:\postgres
cd build
ninja
ninja install
ninja test

all run without errors
I also have buildfarm running and will run that today

Dave

>
> Perhaps you have not included TAP?  It may be fine in terms of runtime
> checks and coverage.
>

Does TAP have to be explicitly added to the meson build ?

Dave

Re: Unnecessary smgropen in {heapam_relation,index}_copy_data?

2024-01-25 Thread Aleksander Alekseev

Hi,

> I find heapam_relation_copy_data() and index_copy_data() have the following 
> code:
>
> dstrel = smgropen(*newrlocator, rel->rd_backend);
>
> ...
>
> RelationCreateStorage(*newrlocator, rel->rd_rel->relpersistence, 
> true);
>
> The smgropen() is also called by RelationCreateStorage(), why should we call
> smgropen() explicitly here?
>
> I try to remove the smgropen(), and all tests passed.

That's a very good question. Note that the second argument of
smgropen() used to create dstrel changes after applying your patch.
I'm not 100% sure whether this is significant or not.

I added your patch to the nearest open commitfest so that we will not lose it:

https://commitfest.postgresql.org/47/4794/


-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 08:31, Dave Cramer  wrote:

>
>
> On Wed, 24 Jan 2024 at 19:03, Michael Paquier  wrote:
>
>> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
>> > I managed to get it to build the vcvarsall arch needs to be x64. I need
>> to
>> > add some options, but the patch above needs to be applied to build it.
>>
>> Nice.  If I may ask, what kind of host and/or configuration have you
>> used to reach a state where the code can be compiled and run tests
>> with meson?  If you have found specific steps, it may be a good thing
>> to document that on the wiki, say around [1].
>>
>
> I am running a mac-mini M2 with parallels running windows pro 11
> The only thing required is this patch. Running in a native developer
> prompt
>
> meson setup build --prefix=c:\postgres
> cd build
> ninja
> ninja install
> ninja test
>
> all run without errors
> I also have buildfarm running and will run that today
>
> Dave
>
>>
>> Perhaps you have not included TAP?  It may be fine in terms of runtime
>> checks and coverage.
>>
>
> Does TAP have to be explicitly added to the meson build ?
>
> Dave
>


I tried running my buildfarm using my git repo and my branch, but get the
following error
Status Line: 492 bad branch parameter
Content:
bad branch parameter fix_arm

Web txn failed with status: 1

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-25 Thread Robert Haas

On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman
 wrote:
> v12 attached has my attempt at writing better comments for this
> section of lazy_scan_heap().

+ /*
+ * If we didn't get the cleanup lock and the page is not new or empty,
+ * we can still collect LP_DEAD items in the dead_items array for
+ * later vacuuming, count live and recently dead tuples for vacuum
+ * logging, and determine if this block could later be truncated. If
+ * we encounter any xid/mxids that require advancing the
+ * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and
+ * call lazy_scan_prune().
+ */

I like this comment. I would probably drop "and the page is not new or
empty" from it since that's really speaking to the previous bit of
code, but it wouldn't be the end of the world to keep it, either.

  /*
- * Prune, freeze, and count tuples.
+ * If we got a cleanup lock, we can prune and freeze tuples and
+ * defragment the page. If we didn't get a cleanup lock, we will still
+ * consider whether or not to update the FSM.
  *
- * Accumulates details of remaining LP_DEAD line pointers on page in
- * dead_items array.  This includes LP_DEAD line pointers that we
- * pruned ourselves, as well as existing LP_DEAD line pointers that
- * were pruned some time earlier.  Also considers freezing XIDs in the
- * tuple headers of remaining items with storage. It also determines
- * if truncating this block is safe.
+ * Like lazy_scan_noprune(), lazy_scan_prune() will count
+ * recently_dead_tuples and live tuples for vacuum logging, determine
+ * if the block can later be truncated, and accumulate the details of
+ * remaining LP_DEAD line pointers on the page in the dead_items
+ * array. These dead items include those pruned by lazy_scan_prune()
+ * as well we line pointers previously marked LP_DEAD.
  */

To me, the first paragraph of this one misses the mark. What I thought
we should be saying here was something like "If we don't have a
cleanup lock, the code above has already processed this page to the
extent that is possible. Otherwise, we either got the cleanup lock
initially and have not processed the page yet, or we didn't get it
initially, attempted to process it without the cleanup lock, and
decided we needed one after all. Either way, if we now have the lock,
we must prune, freeze, and count tuples."

The second paragraph seems fine.

- * Note: It's not in fact 100% certain that we really will call
- * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
- * vacuuming (and so must skip heap vacuuming).  This is deemed okay
- * because it only happens in emergencies, or when there is very
- * little free space anyway. (Besides, we start recording free space
- * in the FSM once index vacuuming has been abandoned.)

Here's a suggestion from me:

Note: In corner cases, it's possible to miss updating the FSM
entirely. If index vacuuming is currently enabled, we'll skip the FSM
update now. But if failsafe mode is later activated, disabling index
vacuuming, there will also be no opportunity to update the FSM later,
because we'll never revisit this page. Since updating the FSM is
desirable but not absolutely required, that's OK.

I think this expresses the same sentiment as the current comment, but
IMHO more clearly. The one part of the current comment that I don't
understand at all is the remark about "when there is very little
freespace anyway". I get that if the failsafe activates we won't come
back to the page, which is the "only happens in emergencies" part of
the existing comment. But the current phrasing makes it sound like
there is a second case where it can happen -- "when there is very
little free space anyway" -- and I don't know what that is talking
about. If it's important, we should try to make it clearer.

We could also just decide to keep this entire paragraph as it is for
purposes of the present patch. The part I really thought needed
adjusting was "Prune, freeze, and count tuples."

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-25 Thread Melanie Plageman

On Thu, Jan 25, 2024 at 8:57 AM Robert Haas  wrote:
>
> On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman
>  wrote:
> > v12 attached has my attempt at writing better comments for this
> > section of lazy_scan_heap().
>
> + /*
> + * If we didn't get the cleanup lock and the page is not new or empty,
> + * we can still collect LP_DEAD items in the dead_items array for
> + * later vacuuming, count live and recently dead tuples for vacuum
> + * logging, and determine if this block could later be truncated. If
> + * we encounter any xid/mxids that require advancing the
> + * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and
> + * call lazy_scan_prune().
> + */
>
> I like this comment. I would probably drop "and the page is not new or
> empty" from it since that's really speaking to the previous bit of
> code, but it wouldn't be the end of the world to keep it, either.

Yes, probably best to get rid of the part about new or empty.

>   /*
> - * Prune, freeze, and count tuples.
> + * If we got a cleanup lock, we can prune and freeze tuples and
> + * defragment the page. If we didn't get a cleanup lock, we will still
> + * consider whether or not to update the FSM.
>   *
> - * Accumulates details of remaining LP_DEAD line pointers on page in
> - * dead_items array.  This includes LP_DEAD line pointers that we
> - * pruned ourselves, as well as existing LP_DEAD line pointers that
> - * were pruned some time earlier.  Also considers freezing XIDs in the
> - * tuple headers of remaining items with storage. It also determines
> - * if truncating this block is safe.
> + * Like lazy_scan_noprune(), lazy_scan_prune() will count
> + * recently_dead_tuples and live tuples for vacuum logging, determine
> + * if the block can later be truncated, and accumulate the details of
> + * remaining LP_DEAD line pointers on the page in the dead_items
> + * array. These dead items include those pruned by lazy_scan_prune()
> + * as well we line pointers previously marked LP_DEAD.
>   */
>
> To me, the first paragraph of this one misses the mark. What I thought
> we should be saying here was something like "If we don't have a
> cleanup lock, the code above has already processed this page to the
> extent that is possible. Otherwise, we either got the cleanup lock
> initially and have not processed the page yet, or we didn't get it
> initially, attempted to process it without the cleanup lock, and
> decided we needed one after all. Either way, if we now have the lock,
> we must prune, freeze, and count tuples."

I see. Your suggestion makes sense. The sentence starting with
"Otherwise" is a bit long. I started to lose the thread at "decided we
needed one after all". You previously referred to the cleanup lock as
"it" -- once you start referring to it as "one", I as the future
developer am no longer sure we are talking about the cleanup lock (as
opposed to the page or something else).

> - * Note: It's not in fact 100% certain that we really will call
> - * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
> - * vacuuming (and so must skip heap vacuuming).  This is deemed okay
> - * because it only happens in emergencies, or when there is very
> - * little free space anyway. (Besides, we start recording free space
> - * in the FSM once index vacuuming has been abandoned.)
>
> Here's a suggestion from me:
>
> Note: In corner cases, it's possible to miss updating the FSM
> entirely. If index vacuuming is currently enabled, we'll skip the FSM
> update now. But if failsafe mode is later activated, disabling index
> vacuuming, there will also be no opportunity to update the FSM later,
> because we'll never revisit this page. Since updating the FSM is
> desirable but not absolutely required, that's OK.
>
> I think this expresses the same sentiment as the current comment, but
> IMHO more clearly. The one part of the current comment that I don't
> understand at all is the remark about "when there is very little
> freespace anyway". I get that if the failsafe activates we won't come
> back to the page, which is the "only happens in emergencies" part of
> the existing comment. But the current phrasing makes it sound like
> there is a second case where it can happen -- "when there is very
> little free space anyway" -- and I don't know what that is talking
> about. If it's important, we should try to make it clearer.
>
> We could also just decide to keep this entire paragraph as it is for
> purposes of the present patch. The part I really thought needed
> adjusting was "Prune, freeze, and count tuples."

I think it would be nice to clarify this comment. I think the "when
there is little free space anyway" is referring to the case in
lazy_vacuum() where we set do_index_vacuuming to false because "there
are almost zero TIDs". I initially thought it was saying that in the
failsafe vacuum case the pages whose free space we wouldn't record in
the FSM have little free space anyway -- which I didn't get. But then
I looked at

Re: remaining sql/json patches

2024-01-25 Thread jian he

On Thu, Jan 25, 2024 at 7:54 PM Amit Langote  wrote:
>
> >
> > The problem with returning comp_domain_with_typmod from json_value()
> > seems to be that it's using a text-to-record CoerceViaIO expression
> > picked from JsonExpr.item_coercions, which behaves differently than
> > the expression tree that the following uses:
> >
> > select ('abcd', 42)::comp_domain_with_typmod;
> >row
> > --
> >  (abc,42)
> > (1 row)
>
> Oh, it hadn't occurred to me to check what trying to coerce a "string"
> containing the record literal would do:
>
> select '(''abcd'', 42)'::comp_domain_with_typmod;
> ERROR:  value too long for type character(3)
> LINE 1: select '(''abcd'', 42)'::comp_domain_with_typmod;
>
> which is the same thing as what the JSON_QUERY() and JSON_VALUE() are
> running into.  So, it might be fair to think that the error is not a
> limitation of the SQL/JSON patch but an underlying behavior that it
> has to accept as is.
>

Hi, I reconciled with these cases.
What bugs me now is the first query of the following 4 cases (for comparison).
SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING char(3) omit quotes);
SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING char(3) keep quotes);
SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING text omit quotes);
SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING text keep quotes);

I did some minor refactoring on the function coerceJsonFuncExprOutput.
it will make the following queries return null instead of error. NULL
is the return of json_value.

SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int2);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int4);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int8);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING bool);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING numeric);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING real);
SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING float8);


v1-0001-minor-refactor-coerceJsonFuncExprOutput.no-cfbot
Description: Binary data

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Tom Lane

Amit Langote  writes:
> On Thu, Jan 25, 2024 at 2:59 PM Richard Guo  wrote:
>> I came across a warning when building master (a044e61f1b) on old GCC
>> (4.8.5).

> Will apply the attached, which does this:

> -   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
> +   return BoolGetDatum(!escontext.error_occurred);

-1 please.  We should not break that abstraction for the sake
of ignorable warnings on ancient compilers.

regards, tom lane

Re: cleanup patches for incremental backup

2024-01-25 Thread Robert Haas

On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart  wrote:
> That seems like a reasonable starting point.  Even if it doesn't help
> determine the root cause, it should at least help rule out concurrent
> summary removal.

Here is a patch for that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patch
Description: Binary data

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote

 On Thu, Jan 25, 2024 at 23:57 Tom Lane  wrote:

> Amit Langote  writes:
> > On Thu, Jan 25, 2024 at 2:59 PM Richard Guo 
> wrote:
> >> I came across a warning when building master (a044e61f1b) on old GCC
> >> (4.8.5).
>
> > Will apply the attached, which does this:
>
> > -   return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext));
> > +   return BoolGetDatum(!escontext.error_occurred);
>
> -1 please.  We should not break that abstraction for the sake
> of ignorable warnings on ancient compilers.

Ignoring the warning was my 1st thought too, because an old discussion I
found about the warning was too old (2011).  The style I adopted in my
“fix” is used in a few other places too, so I thought I might as well
go for it.

Anyway, I’m fine with reverting.

>

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Tom Lane

Amit Langote  writes:
>  On Thu, Jan 25, 2024 at 23:57 Tom Lane  wrote:
>> -1 please.  We should not break that abstraction for the sake
>> of ignorable warnings on ancient compilers.

> Ignoring the warning was my 1st thought too, because an old discussion I
> found about the warning was too old (2011).  The style I adopted in my
> “fix” is used in a few other places too, so I thought I might as well
> go for it.

Oh, well maybe I'm missing some context.  What comparable places did
you see?

regards, tom lane

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-25 Thread Robert Haas

On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman
 wrote:
> > To me, the first paragraph of this one misses the mark. What I thought
> > we should be saying here was something like "If we don't have a
> > cleanup lock, the code above has already processed this page to the
> > extent that is possible. Otherwise, we either got the cleanup lock
> > initially and have not processed the page yet, or we didn't get it
> > initially, attempted to process it without the cleanup lock, and
> > decided we needed one after all. Either way, if we now have the lock,
> > we must prune, freeze, and count tuples."
>
> I see. Your suggestion makes sense. The sentence starting with
> "Otherwise" is a bit long. I started to lose the thread at "decided we
> needed one after all". You previously referred to the cleanup lock as
> "it" -- once you start referring to it as "one", I as the future
> developer am no longer sure we are talking about the cleanup lock (as
> opposed to the page or something else).

Ok... trying again:

If we have a cleanup lock, we must now prune, freeze, and count
tuples. We may have acquired the cleanup lock originally, or we may
have gone back and acquired it after lazy_scan_noprune() returned
false. Either way, the page hasn't been processed yet.

> I think it would be nice to clarify this comment. I think the "when
> there is little free space anyway" is referring to the case in
> lazy_vacuum() where we set do_index_vacuuming to false because "there
> are almost zero TIDs". I initially thought it was saying that in the
> failsafe vacuum case the pages whose free space we wouldn't record in
> the FSM have little free space anyway -- which I didn't get. But then
> I looked at where we set do_index_vacuuming to false.

Oh... wow. That's kind of confusing; somehow I was thinking we were
talking about free space on the disk, rather than newly free space in
pages that could be added to the FSM. And it seems really questionable
whether that case is OK. I mean, in the emergency case, fine,
whatever, we should do whatever it takes to get the system back up,
and it should barely ever happen on a well-configured system. But this
case could happen regularly, and losing track of free space could
easily cause bloat.

This might be another argument for moving FSM updates to the first
heap pass, but that's a separate task from fixing the comment.

> As for the last sentence starting with "Besides", even with Peter's
> explanation I still am not sure what it should say. There are blocks
> whose free space we don't record in the first heap pass. Then, due to
> skipping index vacuuming and the second heap pass, we also don't
> record their free space in the second heap pass. I think he is saying
> that once we set do_index_vacuuming to false, we will stop skipping
> updating the FSM after the first pass for future blocks. So, future
> blocks will have their free space recorded in the FSM. But that feels
> self-evident.

Yes, I don't think that necessarily needs to be mentioned here.

> The more salient point is that there are some blocks
> whose free space is not recorded (those whose first pass happened
> before unsetting do_index_vacuuming and whose second pass did not
> happen before do_index_vacuuming is unset). The extra sentence made me
> think there was some way we might go back and record free space for
> those blocks, but that is not true.

I don't really see why that sentence made you think that, but it's not
important. I agree with you about what point we need to emphasize
here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Unnecessary smgropen in {heapam_relation,index}_copy_data?

2024-01-25 Thread Japin Li



On Thu, 25 Jan 2024 at 21:43, Aleksander Alekseev  
wrote:
> Hi,
>
>> I find heapam_relation_copy_data() and index_copy_data() have the following 
>> code:
>>
>> dstrel = smgropen(*newrlocator, rel->rd_backend);
>>
>> ...
>>
>> RelationCreateStorage(*newrlocator, rel->rd_rel->relpersistence, 
>> true);
>>
>> The smgropen() is also called by RelationCreateStorage(), why should we call
>> smgropen() explicitly here?
>>
>> I try to remove the smgropen(), and all tests passed.
>
> That's a very good question. Note that the second argument of
> smgropen() used to create dstrel changes after applying your patch.
> I'm not 100% sure whether this is significant or not.
>

Thanks for the review.

According the comments of RelationData->rd_backend, it is the backend id, if
the relation is temporary.  The differnece is RelationCreateStorage() uses
relpersistence to determinate the backend id.

> I added your patch to the nearest open commitfest so that we will not lose it:
>
> https://commitfest.postgresql.org/47/4794/

Thank you.

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Amit Langote

On Fri, Jan 26, 2024 at 0:15 Tom Lane  wrote:

> Amit Langote  writes:
> >  On Thu, Jan 25, 2024 at 23:57 Tom Lane  wrote:
> >> -1 please.  We should not break that abstraction for the sake
> >> of ignorable warnings on ancient compilers.
>
> > Ignoring the warning was my 1st thought too, because an old discussion I
> > found about the warning was too old (2011).  The style I adopted in my
> > “fix” is used in a few other places too, so I thought I might as well
> > go for it.
>
> Oh, well maybe I'm missing some context.  What comparable places did
> you see?


Sorry, not on my computer right now so can’t paste any code, but I was able
to find a couple of functions (using Google ;) that refer to error_occurred
directly for one reason or another:

process_integer_literal(),
xml_is_document()

Re: More new SQL/JSON item methods

2024-01-25 Thread Andrew Dunstan



On 2024-01-18 Th 09:25, Jeevan Chalke wrote:



On Thu, Jan 18, 2024 at 1:03 AM Peter Eisentraut 
 wrote:


On 17.01.24 10:03, Jeevan Chalke wrote:
> I added unary '+' and '-' support as well and thus thought of
having
> separate rules altogether rather than folding those in.
>
>     Per SQL standard, the precision and scale arguments are unsigned
>     integers, so unary plus and minus signs are not supported. 
So my patch
>     removes that support, but I didn't adjust the regression
tests for that.
>
>
> However, PostgreSQL numeric casting does support a negative
scale. Here
> is an example:
>
> # select '12345'::numeric(4,-2);
>   numeric
> -
>     12300
> (1 row)
>
> And thus thought of supporting those.
> Do we want this JSON item method to behave differently here?

Ok, it would make sense to support this in SQL/JSON as well.


OK. So with this, we don't need changes done in your 0001 patches.


> I will merge them all into one and will try to keep them in the
order
> specified in sql_features.txt.
> However, for documentation, it makes more sense to keep them in
logical
> order than the alphabetical one. What are your views on this?

The documentation can be in a different order.


Thanks, Andrew and Peter for the confirmation.

Attached merged single patch along these lines.



Thanks, I have pushed this.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: A compiling warning in jsonb_populate_record_valid

2024-01-25 Thread Tom Lane

Amit Langote  writes:
> On Fri, Jan 26, 2024 at 0:15 Tom Lane  wrote:
>> Amit Langote  writes:
>>> Ignoring the warning was my 1st thought too, because an old discussion I
>>> found about the warning was too old (2011).  The style I adopted in my
>>> “fix” is used in a few other places too, so I thought I might as well
>>> go for it.

>> Oh, well maybe I'm missing some context.  What comparable places did
>> you see?

> Sorry, not on my computer right now so can’t paste any code, but I was able
> to find a couple of functions (using Google ;) that refer to error_occurred
> directly for one reason or another:

OK, looking around, it seems like our pattern is that direct access
to escontext.error_occurred is okay in the same function that sets up
the escontext (so that there's no possibility of a NULL pointer).
So this change is fine and I withdraw my objection.  Sorry for
the noise.

regards, tom lane

Re: Patch: Improve Boolean Predicate JSON Path Docs

2024-01-25 Thread Tom Lane

"David E. Wheeler"  writes:
> On Jan 24, 2024, at 16:32, Tom Lane  wrote:
>> +  
>> +   Predicate check expressions are required in the
>> +   @@ operator (and the
>> +   jsonb_path_match function), and should not be 
>> used
>> +   with the @? operator (or the
>> +   jsonb_path_exists function).
>> +  
>> + 
>> +

> I had this bit here:

>   
>Conversely, non-predicate jsonpath expressions should not 
> be
>used with the @@ operator (or the
>jsonb_path_match function).
>   

I changed the preceding para to say "... check expressions are
required in ...", which I thought was sufficient to cover that.
Also, the tabular description of the operator tells you not to do it.

> What do you think of also dropping the article from all the references to 
> “the strict mode” or “the lax mode”, to make them “strict mode” and “lax 
> mode”, respectively?

Certainly most of 'em don't need it.  I'll make it so.

regards, tom lane

Re: make dist using git archive

2024-01-25 Thread Peter Eisentraut


On 22.01.24 21:04, Tristan Partin wrote:

+git = find_program('git', required: false, native: true, disabler: true)
+bzip2 = find_program('bzip2', required: false, native: true, 
disabler: true)


This doesn't need to be a disabler. git is fine as-is. See later 
comment. Disablers only work like you are expecting when they are used 
like how git is used. Once you call a method like .path(), all bets are 
off.


ok, fixed


+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+   command: [git, 'diff-index', 
'--quiet', 'HEAD'])


Seems like you might want to add -C here too?


done


+tar_bz2 = custom_target('tar.bz2',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' 
+ bzip2.path() + ' -c', 'archive',

+    '--format', 'tar.bz2',
+    '--prefix', distdir + '/',


-    '-o', '@BUILD_ROOT@/@OUTPUT@',
+    '-o', join_paths(meson.build_root(), '@OUTPUT@'),

This will generate the tarballs in the build directory. Do the same for 
the previous target. Tested locally.


Fixed, thanks.  I had struggled with this one.


+    'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.bz2',
+)


The bz2 target should be wrapped in an `if bzip2.found()`.


Well, I think we want the dist step to fail if bzip2 is not there.  At 
least that is the current expectation.



+alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])


Are you intending for the check_dirty_index target to prohibit the other 
two targets from running? Currently that is not the case.


Yes, that was the hope, and that's how the make dist variant works.  But 
I couldn't figure this out with meson.  Also, the above actually also 
doesn't work with older meson versions, so I had to comment this out to 
get CI to work.


If it is what 
you intend, use a stamp file or something to indicate a relationship. 
Alternatively, inline the git diff-index into the other commands. These 
might also do better as external scripts. It would reduce duplication 
between the autotools and Meson builds.


Yeah, maybe that's a direction.

The updated patch also supports vpath builds with make now.

I have also added a CI patch, for amusement.  Maybe we'll want to keep 
it, though.From 13612f9a1c486e8acfe4156a7bc069a66fd30e77 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Jan 2024 12:28:28 +0100
Subject: [PATCH v1 1/2] make dist uses git archive

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 34 --
 meson.build| 41 +
 2 files changed, 53 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..680c765dd73 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) -C $(srcdir) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) -C $(srcdir) archive --format tar.gz --prefix $(distdir)/ HEAD 
-o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -C $(srcdir) -c tar.tar.bz2.command='$(BZIP2) -c' archive 
--format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index 55184db2488..43884e7cfdd 100644
--- a/meson.build
+++ b/meson.build
@@ -3348,6 +3348,47 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true)
+bzip2 = find_program('bzip2', required: false, native: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+   command: [git, '-C', '@SOURCE_ROOT@',
+

Re: cleanup patches for incremental backup

2024-01-25 Thread Nathan Bossart

On Thu, Jan 25, 2024 at 10:06:41AM -0500, Robert Haas wrote:
> On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart  
> wrote:
>> That seems like a reasonable starting point.  Even if it doesn't help
>> determine the root cause, it should at least help rule out concurrent
>> summary removal.
> 
> Here is a patch for that.

LGTM.  The only thing I might add is the cutoff_time in that LOG.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-25 Thread Melanie Plageman

On Thu, Jan 25, 2024 at 10:19 AM Robert Haas  wrote:
>
> On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman
>  wrote:
> > > To me, the first paragraph of this one misses the mark. What I thought
> > > we should be saying here was something like "If we don't have a
> > > cleanup lock, the code above has already processed this page to the
> > > extent that is possible. Otherwise, we either got the cleanup lock
> > > initially and have not processed the page yet, or we didn't get it
> > > initially, attempted to process it without the cleanup lock, and
> > > decided we needed one after all. Either way, if we now have the lock,
> > > we must prune, freeze, and count tuples."
> >
> > I see. Your suggestion makes sense. The sentence starting with
> > "Otherwise" is a bit long. I started to lose the thread at "decided we
> > needed one after all". You previously referred to the cleanup lock as
> > "it" -- once you start referring to it as "one", I as the future
> > developer am no longer sure we are talking about the cleanup lock (as
> > opposed to the page or something else).
>
> Ok... trying again:
>
> If we have a cleanup lock, we must now prune, freeze, and count
> tuples. We may have acquired the cleanup lock originally, or we may
> have gone back and acquired it after lazy_scan_noprune() returned
> false. Either way, the page hasn't been processed yet.

Cool. I might add "successfully" or "fully" to "Either way, the page
hasn't been processed yet"

> > I think it would be nice to clarify this comment. I think the "when
> > there is little free space anyway" is referring to the case in
> > lazy_vacuum() where we set do_index_vacuuming to false because "there
> > are almost zero TIDs". I initially thought it was saying that in the
> > failsafe vacuum case the pages whose free space we wouldn't record in
> > the FSM have little free space anyway -- which I didn't get. But then
> > I looked at where we set do_index_vacuuming to false.
>
> Oh... wow. That's kind of confusing; somehow I was thinking we were
> talking about free space on the disk, rather than newly free space in
> pages that could be added to the FSM.

Perhaps I misunderstood. I interpreted it to refer to the bypass optimization.

> And it seems really questionable
> whether that case is OK. I mean, in the emergency case, fine,
> whatever, we should do whatever it takes to get the system back up,
> and it should barely ever happen on a well-configured system. But this
> case could happen regularly, and losing track of free space could
> easily cause bloat.
>
> This might be another argument for moving FSM updates to the first
> heap pass, but that's a separate task from fixing the comment.

Yes, it seems we could miss recording space freed in the first pass if
we never end up doing a second pass. consider_bypass_optimization is
set to false only if index cleanup is explicitly enabled or there are
dead items accumulated for vacuum's second pass at some point.

- Melanie

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-25 Thread Alvaro Herrera

Here's a touched-up version of this patch.

First, PROC_GLOBAL->clogGroupFirst and SlruCtl->latest_page_number
change from being protected by locks to being atomics, but there's no
mention of considering memory barriers that they should have.  Looking
at the code, I think the former doesn't need any additional barriers,
but latest_page_number is missing some, which I have added.  This
deserves another careful look.

Second and most user visibly, the GUC names seem to have been chosen
based on the source-code variables, which have never meant to be user-
visible.  So I renamed a few:

xact_buffers -> transaction_buffers
subtrans_buffers -> subtransaction_buffers
serial_buffers -> serializable_buffers
commit_ts_buffers -> commit_timestamp_buffers

(unchanged: multixact_offsets_buffers, multixact_members_buffers,
notify_buffers)

I did this explicitly trying to avoid using the term SLRU in a
user-visible manner, because what do users care?  But immediately after
doing this I realized that we already have pg_stat_slru, so maybe the
cat is already out of the bag, and so perhaps we should name these GUCS
as, say slru_transaction_buffers?  That may make the connection between
these things a little more explicit.  (I do think we need to add
cross-links in the documentation of those GUCs to the pg_stat_slru
docs and vice-versa.)


Another thing that bothered me a bit is that we have auto-tuning for
transaction_buffers and commit_timestamp_buffers, but not for
subtransaction_buffers.  (Autotuning means you set the GUC to 0 and it
scales with shared_buffers).  I don't quite understand what's the reason
for the ommision, so I added it for subtrans too.  I think it may make
sense to do likewise for the multixact ones too, not sure.  It doesn't
seem worth having that for pg_serial and notify.

While messing about with these GUCs I realized that the usage of the
show_hook to print what the current number is, when autoturning is used,
was bogus: SHOW would print the number of blocks for (say)
transaction_buffers, but if you asked it to print (say)
multixact_offsets_buffers, it would give a size in MB or kB.  I'm sure
such an inconsistency would bite us.  So, digging around I found that a
good way to handle this is to remove the show_hook, and instead call
SetConfigOption() at the time when the ShmemInit function is called,
with the correct number of buffers determined.  This is pretty much what
is used for XLOGbuffers, and it works correctly as far as my testing
shows.

Still with these auto-tuning GUCs, I noticed that the auto-tuning code
would continue to grow the buffer sizes with shared_buffers to
arbitrarily large values.  I added an arbitrary maximum of 1024 (8 MB),
which is much higher than the current value of 128; but if you have
(say) 30 GB of shared_buffers (not uncommon these days), do you really
need 30MB of pg_clog cache?  It seems mostly unnecessary ... and you can
still set it manually that way if you need it.  So, largely I just
rewrote those small functions completely.

I also made the SGML documentation and postgresql.sample.conf all match
what the code actually docs.  The whole thing wasn't particularly
consistent.

I rewrote a bunch of code comments and moved stuff around to appear in
alphabetical order, etc.

More comment rewriting still pending.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
   
  
 
+ 
+  commit_timestamp_buffers (integer)
+  
+   commit_timestamp_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of memory to use to cache the contents of
+pg_commit_ts (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 0, which requests
+shared_buffers/512 up to 1024 blocks,
+but not fewer than 16 blocks.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_members_buffers (integer)
+  
+   multixact_members_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/members (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 32.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_offsets_buffers (integer)
+  
+   multixact_offsets_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/offsets (see
+

Re: make dist using git archive

2024-01-25 Thread Tristan Partin

On Thu Jan 25, 2024 at 10:04 AM CST, Peter Eisentraut wrote:

On 22.01.24 21:04, Tristan Partin wrote:
>> +    'HEAD', '.'],
>> +  install: false,
>> +  output: distdir + '.tar.bz2',
>> +)
> 
> The bz2 target should be wrapped in an `if bzip2.found()`.

The way that this currently works is that you will fail at configure 
time if bz2 doesn't exist on the system. Meson will try to resolve 
a .path() method on a NotFoundProgram. You might want to define the bz2 
target to just call `exit 1` in this case.

if bzip2.found()
 # do your current target
else
 bz2 = run_target('tar.bz2', command: ['exit', 1])
endif

This should cause pgdist to appropriately fail at run time when 
generating the bz2 tarball.

Well, I think we want the dist step to fail if bzip2 is not there.  At 
least that is the current expectation.

>> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])
> 
> Are you intending for the check_dirty_index target to prohibit the other 
> two targets from running? Currently that is not the case.

Yes, that was the hope, and that's how the make dist variant works.  But 
I couldn't figure this out with meson.  Also, the above actually also 
doesn't work with older meson versions, so I had to comment this out to 
get CI to work.

> If it is what 
> you intend, use a stamp file or something to indicate a relationship. 
> Alternatively, inline the git diff-index into the other commands. These 
> might also do better as external scripts. It would reduce duplication 
> between the autotools and Meson builds.

Yeah, maybe that's a direction.

For what it's worth, I run Meson 1.3, and the behavior of generating the 
tarballs even though it is a dirty tree still occurred. In the new patch 
you seem to say it was fixed in 0.60.

--
Tristan Partin
Neon (https://neon.tech)

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-25 Thread Alvaro Herrera

On 2024-Jan-25, Alvaro Herrera wrote:

> Here's a touched-up version of this patch.

> diff --git a/src/backend/storage/lmgr/lwlock.c 
> b/src/backend/storage/lmgr/lwlock.c
> index 98fa6035cc..4a5e05d5e4 100644
> --- a/src/backend/storage/lmgr/lwlock.c
> +++ b/src/backend/storage/lmgr/lwlock.c
> @@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = {
>   [LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash",
>   [LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA",
>   [LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash",
> + [LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU",
> + [LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU",
> + [LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU",
> + [LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU",
> + [LWTRANCHE_SERIAL_SLRU] = "SerialSLRU"
> + [LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU",
> + [LWTRANCHE_XACT_SLRU] = "XactSLRU",
>  };

Eeek.  Last minute changes ...  Fixed here.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 61038472c5..3e3119865a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2006,6 +2006,145 @@ include_dir 'conf.d'
   
  
 
+ 
+  commit_timestamp_buffers (integer)
+  
+   commit_timestamp_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of memory to use to cache the contents of
+pg_commit_ts (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 0, which requests
+shared_buffers/512 up to 1024 blocks,
+but not fewer than 16 blocks.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_members_buffers (integer)
+  
+   multixact_members_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/members (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 32.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  multixact_offsets_buffers (integer)
+  
+   multixact_offsets_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_multixact/offsets (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  notify_buffers (integer)
+  
+   notify_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_notify (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 16.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  serializable_buffers (integer)
+  
+   serializable_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_serial (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 32.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  subtransaction_buffers (integer)
+  
+   subtransaction_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_subtrans (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+The default value is 0, which requests
+shared_buffers/512 up to 1024 blocks,
+but not fewer than 16 blocks.
+This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  transaction_buffers (integer)
+  
+   transaction_buffers configuration parameter
+  
+  
+  
+   
+Specifies the amount of shared memory to use to cache the contents
+of pg_xact (see
+).
+If this value is specified without units, it is taken as blocks,
+that is BLCKSZ bytes, typically 8kB.
+

Add new error_action COPY ON_ERROR "log"

2024-01-25 Thread torikoshia


Hi,

As described in 9e2d870119, COPY ON_EEOR is expected to have more 
"error_action".

(Note that option name was changed by b725b7eec)

I'd like to have a new option "log", which skips soft errors and logs 
information that should have resulted in errors to PostgreSQL log.


I think this option has some advantages like below:

1) We can know which number of line input data was not loaded and 
reason.


  Example:

  =# copy t1 from stdin with (on_error log);
  Enter data to be copied followed by a newline.
  End with a backslash and a period on a line by itself, or an EOF 
signal.

  >> 1
  >> 2
  >> 3
  >> z
  >> \.
  LOG:  invalid input syntax for type integer: "z"
  NOTICE:  1 row was skipped due to data type incompatibility
  COPY 3

  =# \! tail data/log/postgresql*.log
  LOG:  22P02: invalid input syntax for type integer: "z"
  CONTEXT:  COPY t1, line 4, column i: "z"
  LOCATION:  pg_strtoint32_safe, numutils.c:620
  STATEMENT:  copy t1 from stdin with (on_error log);


2) Easier maintenance than storing error information in tables or 
proprietary log files.
For example, in case a large number of soft errors occur, some 
mechanisms are needed to prevent an infinite increase in the size of the 
destination data, but we can left it to PostgreSQL's log rotation.



Attached a patch.
This basically comes from previous discussion[1] which did both "ignore" 
and "log" soft error.


As shown in the example above, the log output to the client does not 
contain CONTEXT, so I'm a little concerned that client cannot see what 
line of the input data had a problem without looking at the server log.



What do you think?

[1] 
https://www.postgresql.org/message-id/c0fb57b82b150953f26a5c7e340412e8%40oss.nttdata.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 04e643facfea4b4e8dd174d22fbe5e008747a91a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 26 Jan 2024 01:17:59 +0900
Subject: [PATCH v1] Add new error_action "log" to ON_ERROR option

Currently ON_ERROR option only has "ignore" to skip malformed data and
there are no ways to know where and why COPY skipped them.

"log" skips malformed data as well as "ignore", but it logs information that
should have resulted in errors to PostgreSQL log.


---
 doc/src/sgml/ref/copy.sgml  |  8 ++--
 src/backend/commands/copy.c |  4 +++-
 src/backend/commands/copyfrom.c | 24 
 src/include/commands/copy.h |  1 +
 src/test/regress/expected/copy2.out | 14 +-
 src/test/regress/sql/copy2.sql  |  9 +
 6 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..9662c90a8b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -380,12 +380,16 @@ COPY { table_name [ ( 
   Specifies which 
   error_action to perform when there is malformed data in the input.
-  Currently, only stop (default) and ignore
-  values are supported.
+  Currently, only stop (default), ignore
+  and log values are supported.
   If the stop value is specified,
   COPY stops operation at the first error.
   If the ignore value is specified,
   COPY skips malformed data and continues copying data.
+  If the log value is specified,
+  COPY behaves the same as ignore, exept that
+  it logs information that should have resulted in errors to PostgreSQL log at
+  INFO level.
   The option is allowed only in COPY FROM.
   Only stop value is allowed when
   using binary format.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..812ca63350 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 		return COPY_ON_ERROR_STOP;
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore" or "log" values.
 	 */
 	sval = defGetString(def);
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "log") == 0)
+		return COPY_ON_ERROR_LOG;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..7886bd5353 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1013,6 +1013,23 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
+			else if (cstate->opts.on_error == COPY_ON_ERROR_LOG)
+			{
+/* Adjust elevel so we don't jump out */
+cstate->escontext->error_data->elevel = LOG;
+
+/*
+ * Despite the name, this won't raise an error since elevel is
+ * LOG now.
+ */
+ThrowErrorData(cstate->escontext->error_data);
+
+/* Initialize escontext in preparation for next soft error */

Re: index prefetching

2024-01-25 Thread Tomas Vondra

On 1/25/24 11:45, Dilip Kumar wrote:
> On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra
>  wrote:
> 
>> On 1/22/24 07:35, Konstantin Knizhnik wrote:
>>>
>>> On 22/01/2024 1:47 am, Tomas Vondra wrote:
 h, right. Well, you're right in this case we perhaps could set just one
 of those flags, but the "purpose" of the two places is quite different.

 The "prefetch" flag is fully controlled by the prefetcher, and it's up
 to it to change it (e.g. I can easily imagine some new logic touching
 setting it to "false" for some reason).

 The "data" flag is fully controlled by the custom callbacks, so whatever
 the callback stores, will be there.

 I don't think it's worth simplifying this. In particular, I don't think
 the callback can assume it can rely on the "prefetch" flag.

>>> Why not to add "all_visible" flag to IndexPrefetchEntry ? If will not
>>> cause any extra space overhead (because of alignment), but allows to
>>> avoid dynamic memory allocation (not sure if it is critical, but nice to
>>> avoid if possible).
>>>
>>
> While reading through the first patch I got some questions, I haven't
> read it complete yet but this is what I got so far.
> 
> 1.
> +static bool
> +IndexPrefetchBlockIsSequential(IndexPrefetch *prefetch, BlockNumber block)
> +{
> + int idx;
> ...
> + if (prefetch->blockItems[idx] != (block - i))
> + return false;
> +
> + /* Don't prefetch if the block happens to be the same. */
> + if (prefetch->blockItems[idx] == block)
> + return false;
> + }
> +
> + /* not sequential, not recently prefetched */
> + return true;
> +}
> 
> The above function name is BlockIsSequential but at the end, it
> returns true if it is not sequential, seem like a problem?

Actually, I think it's the comment that's wrong - the last return is
reached only for a sequential pattern (and when the block was not
accessed recently).

> Also other 2 checks right above the end of the function are returning
> false if the block is the same or the pattern is sequential I think
> those are wrong too.
> 

Hmmm. You're right this is partially wrong. There are two checks:

/*
 * For a sequential pattern, blocks "k" step ago needs to have block
 * number by "k" smaller compared to the current block.
 */
if (prefetch->blockItems[idx] != (block - i))
return false;

/* Don't prefetch if the block happens to be the same. */
if (prefetch->blockItems[idx] == block)
return false;

The first condition is correct - we want to return "false" when the
pattern is not sequential.

But the second condition is wrong - we want to skip prefetching when the
block was already prefetched recently, so this should return true (which
is a bit misleading, as it seems to imply the pattern is sequential,
when it's not).

However, this is harmless, because we then identify this block as
recently prefetched in the "full" cache check, so we won't prefetch it
anyway. So it's harmless, although a bit more expensive.

There's another inefficiency - we stop looking for the same block once
we find the first block breaking the non-sequential pattern. Imagine a
sequence of blocks 1, 2, 3, 1, 2, 3, ... in which case we never notice
the block was recently prefetched, because we always find the break of
the sequential pattern. But again, it's harmless, thanks to the full
cache of recently prefetched blocks.

>  2.
>  I have noticed that the prefetch history is maintained at the backend
> level, but what if multiple backends are trying to fetch the same heap
> blocks maybe scanning the same index, so should that be in some shared
> structure?  I haven't thought much deeper about this from the
> implementation POV, but should we think about it, or it doesn't
> matter?

Yes, the cache is at the backend level - it's a known limitation, but I
see it more as a conscious tradeoff.

Firstly, while the LRU cache is at backend level, PrefetchBuffer also
checks shared buffers for each prefetch request. So with sufficiently
large shared buffers we're likely to find it there (and for direct I/O
there won't be any other place to check).

Secondly, the only other place to check is page cache, but there's no
good (sufficiently cheap) way to check that. See the preadv2/nowait
experiment earlier in this thread.

I suppose we could implement a similar LRU cache for shared memory (and
I don't think it'd be very complicated), but I did not plan to do that
in this patch unless absolutely necessary.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UUID v7

2024-01-25 Thread Sergey Prokhorenko

Aleksander,

In this case the documentation must state that the functions 
uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that 
developers may use these functions with caution at their own risk, and these 
functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless 
developers may well pass any business date into this function: document date, 
registration date, payment date, reporting date, start date of the current 
month, data download date, and even a constant. This would be a profanation of 
UUIDv7 with very negative consequences.

Sergey prokhorenkosergeyprokhore...@yahoo.com.au 

On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev 
 wrote:  

 Hi,

> Postgres always was a bit hackerish, allowing slightly more then is safe. 
> I.e. you can define immutable function that is not really immutable, turn off 
> autovacuum or fsync. Why bother with safety guards here?
> My opinion is that we should have this function to extract timestamp. Even if 
> it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

> So +1 for erroring when you provide a timestamp outside of that range
> (either too far in the past or too far in the future).
>
> OK, it seems like we have some consensus on ERRORing..
>
> Do we have any other open items? Does v13 address all open items? Maybe let’s 
> compose better error message?
>
> +1 for erroring when ts is outside range.
>
> v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

-- 
Best regards,
Aleksander Alekseev

Re: A performance issue with Memoize

2024-01-25 Thread Tom Lane

David Rowley  writes:
> I'd feel better about doing it your way if Tom could comment on if
> there was a reason he put the function calls that way around in
> 5ebaaa494.

Apologies for not having noticed this thread before.  I'm taking
a look at it now.  However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels.  Is that a bug?  It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
 pvar->vartype == var->vartype &&
 pvar->vartypmod == var->vartypmod &&
 pvar->varcollid == var->varcollid)
+{
+Assert(bms_equal(pvar->varnullingrels, var->varnullingrels));
 return pitem->paramId;
+}
 }
 }

This triggers no failures in the regression tests, but we know
how little that proves.

Anyway, that's just a side observation unrelated to the problem
at hand.  More later.

regards, tom lane

Re: Emit fewer vacuum records by reaping removable tuples during pruning

2024-01-25 Thread Robert Haas

On Thu, Jan 25, 2024 at 11:19 AM Melanie Plageman
 wrote:
> Cool. I might add "successfully" or "fully" to "Either way, the page
> hasn't been processed yet"

I'm OK with that.

> > > I think it would be nice to clarify this comment. I think the "when
> > > there is little free space anyway" is referring to the case in
> > > lazy_vacuum() where we set do_index_vacuuming to false because "there
> > > are almost zero TIDs". I initially thought it was saying that in the
> > > failsafe vacuum case the pages whose free space we wouldn't record in
> > > the FSM have little free space anyway -- which I didn't get. But then
> > > I looked at where we set do_index_vacuuming to false.
> >
> > Oh... wow. That's kind of confusing; somehow I was thinking we were
> > talking about free space on the disk, rather than newly free space in
> > pages that could be added to the FSM.
>
> Perhaps I misunderstood. I interpreted it to refer to the bypass optimization.

I think you're probably correct. I just didn't realize what was meant.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-25 Thread Robert Haas

On Thu, Jan 25, 2024 at 11:22 AM Alvaro Herrera  wrote:
> Still with these auto-tuning GUCs, I noticed that the auto-tuning code
> would continue to grow the buffer sizes with shared_buffers to
> arbitrarily large values.  I added an arbitrary maximum of 1024 (8 MB),
> which is much higher than the current value of 128; but if you have
> (say) 30 GB of shared_buffers (not uncommon these days), do you really
> need 30MB of pg_clog cache?  It seems mostly unnecessary ... and you can
> still set it manually that way if you need it.  So, largely I just
> rewrote those small functions completely.

Yeah, I think that if we're going to scale with shared_buffers, it
should be capped.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Andrew Dunstan



On 2024-01-24 We 19:02, Michael Paquier wrote:

On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:

I managed to get it to build the vcvarsall arch needs to be x64. I need to
add some options, but the patch above needs to be applied to build it.

Nice.  If I may ask, what kind of host and/or configuration have you
used to reach a state where the code can be compiled and run tests
with meson?  If you have found specific steps, it may be a good thing
to document that on the wiki, say around [1].

Perhaps you have not included TAP?  It may be fine in terms of runtime
checks and coverage.

[1]:https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows




I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we 
really want to build with x64_arm64, i.e. to generate native arm64 
binaries. Setting just x64 will not do that, AIUI.


I tried that with the buidfarm, setting that in the config file's call 
to PGBuild::VSenv::getenv().


That upset msvc_gendef.pl, so I added this there to keep it happy:

   $arch = 'x86_64' if $arch eq 'aarch64';

After that things went ok until I got this:

[1453/2088] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
   Creating library src\backend\postgres.exe.lib
storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol 
_mm_pause referenced in function perform_spin_delay

src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals


I haven't made further progress, but I will return to it in the next day 
or so.


While this will be nice to have, I think it won't really matter until 
there is ARM64 support in released versions of Windows Server. AFAICT 
they still only sell versions for x86_64




cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: A performance issue with Memoize

2024-01-25 Thread Tom Lane

David Rowley  writes:
> I think fixing it your way makes sense.  I don't really see any reason
> why we should have two. However...

> Another way it *could* be fixed would be to get rid of pull_paramids()
> and change create_memoize_plan() to set keyparamids to all the param
> IDs that match are equal() to each param_exprs.  That way nodeMemoize
> wouldn't purge the cache as we'd know the changing param is accounted
> for in the cache.  For the record, I don't think that's better, but it
> scares me a bit less as I don't know what other repercussions there
> are of applying your patch to get rid of the duplicate
> NestLoopParam.paramval.

> I'd feel better about doing it your way if Tom could comment on if
> there was a reason he put the function calls that way around in
> 5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c.  However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not.  So flipping the order makes sense.
I'd change the comment though, maybe to

/*
 * Replace any outer-relation variables with nestloop params.
 *
 * We must provide nestloop params for both lateral references of
 * the subquery and outer vars in the scan_clauses.  It's better
 * to assign the former first, because that code path requires
 * specific param IDs, while replace_nestloop_params can adapt
 * to the IDs assigned by process_subquery_nestloop_params.
 * This avoids possibly duplicating nestloop params when the
 * same Var is needed for both reasons.
 */

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing

+ ->  Memoize (actual rows=5000 loops=N)
+   Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

> I also feel like we might be getting a bit close to the minor version
> releases to be adjusting this stuff in back branches.

Yeah, I'm not sure I would change this in the back branches.

regards, tom lane

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan  wrote:

>
> On 2024-01-24 We 19:02, Michael Paquier wrote:
>
> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
>
> I managed to get it to build the vcvarsall arch needs to be x64. I need to
> add some options, but the patch above needs to be applied to build it.
>
> Nice.  If I may ask, what kind of host and/or configuration have you
> used to reach a state where the code can be compiled and run tests
> with meson?  If you have found specific steps, it may be a good thing
> to document that on the wiki, say around [1].
>
> Perhaps you have not included TAP?  It may be fine in terms of runtime
> checks and coverage.
>
> [1]: 
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
>
>
>
> I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really
> want to build with x64_arm64, i.e. to generate native arm64 binaries.
> Setting just x64 will not do that, AIUI.
>
> I tried that with the buidfarm, setting that in the config file's call to
> PGBuild::VSenv::getenv().
>
> That upset msvc_gendef.pl, so I added this there to keep it happy:
>
> $arch = 'x86_64' if $arch eq 'aarch64';
>
> After that things went ok until I got this:
>
> [1453/2088] "link" @src/backend/postgres.exe.rsp
> FAILED: src/backend/postgres.exe src/backend/postgres.pdb
> "link" @src/backend/postgres.exe.rsp
>Creating library src\backend\postgres.exe.lib
> storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol
> _mm_pause referenced in function perform_spin_delay
> src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
>
>
> I haven't made further progress, but I will return to it in the next day
> or so.
>
> While this will be nice to have, I think it won't really matter until
> there is ARM64 support in released versions of Windows Server. AFAICT they
> still only sell versions for x86_64
>

I need it to build clients. The clients need arm64 libraries to link against

Dave

Re: More new SQL/JSON item methods

2024-01-25 Thread Tom Lane

Andrew Dunstan  writes:
> Thanks, I have pushed this.

The buildfarm is pretty widely unhappy, mostly failing on

select jsonb_path_query('1.23', '$.string()');

On a guess, I tried running that under valgrind, and behold it said

==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget 
(jsonpath_exec.c:1547)
==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FED03: executeNextItem 
(jsonpath_exec.c:1604)
==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget 
(jsonpath_exec.c:956)
==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 
(jsonpath_exec.c:612)
==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal 
(jsonpath_exec.c:438)

It's fairly obviously right about that:

JsonbValuejbv;
...
jb = &jbv;
Assert(tmp != NULL);/* We must have set tmp above */
jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
  ^

Presumably, this is a mistaken attempt to test the type
of the thing previously pointed to by "jb".

On the whole, what I'd be inclined to do here is get rid
of this test altogether and demand that every path through
the preceding "switch" deliver a value that doesn't need
pstrdup().  The only path that doesn't do that already is

case jbvBool:
tmp = (jb->val.boolean) ? "true" : "false";
break;

and TBH I'm not sure that we really need a pstrdup there
either.  The constants are immutable enough.  Is something
likely to try to pfree the pointer later?  I tried

@@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, 
JsonPathItem *jsp,
 
jb = &jbv;
Assert(tmp != NULL);/* We must have set tmp above */
-   jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
+   jb->val.string.val = tmp;
jb->val.string.len = strlen(jb->val.string.val);
jb->type = jbvString;
 
and that quieted valgrind for this particular query and still
passes regression.

(The reported crashes seem to be happening later during a
recursive invocation, seemingly because JsonbType(jb) is
returning garbage.  So there may be another bug after this one.)

regards, tom lane

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Andrew Dunstan



On 2024-01-25 Th 08:45, Dave Cramer wrote:





I tried running my buildfarm using my git repo and my branch, but get 
the following error

Status Line: 492 bad branch parameter
Content:
bad branch parameter fix_arm

Web txn failed with status: 1




You can't use your own branch with the buildfarm, you need to let it set 
up its own branches.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: More new SQL/JSON item methods

2024-01-25 Thread Andrew Dunstan




On 2024-01-25 Th 14:31, Tom Lane wrote:

Andrew Dunstan  writes:

Thanks, I have pushed this.

The buildfarm is pretty widely unhappy, mostly failing on

select jsonb_path_query('1.23', '$.string()');

On a guess, I tried running that under valgrind, and behold it said

==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget 
(jsonpath_exec.c:1547)
==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FED03: executeNextItem 
(jsonpath_exec.c:1604)
==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget 
(jsonpath_exec.c:956)
==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 
(jsonpath_exec.c:612)
==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal 
(jsonpath_exec.c:438)

It's fairly obviously right about that:

 JsonbValuejbv;
 ...
 jb = &jbv;
 Assert(tmp != NULL);/* We must have set tmp above */
 jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
   ^

Presumably, this is a mistaken attempt to test the type
of the thing previously pointed to by "jb".

On the whole, what I'd be inclined to do here is get rid
of this test altogether and demand that every path through
the preceding "switch" deliver a value that doesn't need
pstrdup().  The only path that doesn't do that already is

 case jbvBool:
 tmp = (jb->val.boolean) ? "true" : "false";
 break;

and TBH I'm not sure that we really need a pstrdup there
either.  The constants are immutable enough.  Is something
likely to try to pfree the pointer later?  I tried

@@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, 
JsonPathItem *jsp,
  
 jb = &jbv;

 Assert(tmp != NULL);/* We must have set tmp above */
-   jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
+   jb->val.string.val = tmp;
 jb->val.string.len = strlen(jb->val.string.val);
 jb->type = jbvString;
  
and that quieted valgrind for this particular query and still

passes regression.

(The reported crashes seem to be happening later during a
recursive invocation, seemingly because JsonbType(jb) is
returning garbage.  So there may be another bug after this one.)





Argh! Will look, thanks.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Relation bulk write facility

2024-01-25 Thread Heikki Linnakangas


On 10/01/2024 18:17, Robert Haas wrote:

I think we should try to pick prefixes that are one or more words
rather than using word fragments. bulkw is an awkward prefix even for
people whose first language is English, and probably more awkward for
others.


Renamed the 'bulkw' variables to 'bulkstate, and the functions to have 
smgr_bulk_* prefix.


I was tempted to use just bulk_* as the prefix, but I'm afraid e.g. 
bulk_write() is too generic.


On 22/01/2024 07:50, Peter Smith wrote:

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there was a CFbot test failure last time it was run [2]. Please have a
look and post an updated version if necessary.


Fixed the headerscheck failure by adding appropriate #includes.

--
Heikki Linnakangas
Neon (https://neon.tech)
From c2e8cff9326fb874b2e1643f5c3c8a4952eaa3ac Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 25 Jan 2024 21:40:46 +0200
Subject: [PATCH v5 1/1] Introduce a new bulk loading facility.

The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
---
 contrib/pageinspect/expected/gist.out |  14 +-
 contrib/pageinspect/sql/gist.sql  |  16 +-
 src/backend/access/gist/gistbuild.c   | 122 +++
 src/backend/access/heap/rewriteheap.c |  72 ++
 src/backend/access/nbtree/nbtree.c|  33 +--
 src/backend/access/nbtree/nbtsort.c   | 135 
 src/backend/access/spgist/spginsert.c |  49 ++---
 src/backend/catalog/storage.c |  46 ++--
 src/backend/storage/smgr/Makefile |   1 +
 src/backend/storage/smgr/bulk_write.c | 303 ++
 src/backend/storage/smgr/md.c |  45 +++-
 src/backend/storage/smgr/meson.build  |   1 +
 src/backend/storage/smgr/smgr.c   |  31 +++
 src/include/storage/bulk_write.h  |  40 
 src/include/storage/md.h  |   1 +
 src/include/storage/smgr.h|   1 +
 src/tools/pgindent/typedefs.list  |   3 +
 17 files changed, 558 insertions(+), 355 deletions(-)
 create mode 100644 src/backend/storage/smgr/bulk_write.c
 create mode 100644 src/include/storage/bulk_write.h

diff --git a/contrib/pageinspect/expected/gist.out b/contrib/pageinspect/expected/gist.out
index d1adbab8ae2..2b1d54a6279 100644
--- a/contrib/pageinspect/expected/gist.out
+++ b/contrib/pageinspect/expected/gist.out
@@ -1,13 +1,6 @@
--- The gist_page_opaque_info() function prints the page's LSN. Normally,
--- that's constant 1 (GistBuildLSN) on every page of a freshly built GiST
--- index. But with wal_level=minimal, the whole relation is dumped to WAL at
--- the end of the transaction if it's smaller than wal_skip_threshold, which
--- updates the LSNs. Wrap the tests on gist_page_opaque_info() in the
--- same transaction with the CREATE INDEX so that we see the LSNs before
--- they are possibly overwritten at end of transaction.
-BEGIN;
--- Create a test table and GiST index.
-CREATE TABLE test_gist AS SELECT point(i,i) p, i::text t FROM
+-- The gist_page_opaque_info() function prints the page's LSN.
+-- Use an unlogged index, so that the LSN is predictable.
+CREATE UNLOGGED TABLE test_gist AS SELECT point(i,i) p, i::text t FROM
 generate_series(1,1000) i;
 CREATE INDEX test_gist_idx ON test_gist USING gist (p);
 -- Page 0 is the root, the rest are leaf pages
@@ -29,7 +22,6 @@ SELECT * FROM gist_page_opaque_info(get_raw_page('test_gist_idx', 2));
  0/1 | 0/0 | 1 | {leaf}
 (1 row)
 
-COMMIT;
 SELECT * FROM gist_page_items(get_raw_page

Re: More new SQL/JSON item methods

2024-01-25 Thread Andrew Dunstan




On 2024-01-25 Th 14:31, Tom Lane wrote:

Andrew Dunstan  writes:

Thanks, I have pushed this.

The buildfarm is pretty widely unhappy, mostly failing on

select jsonb_path_query('1.23', '$.string()');

On a guess, I tried running that under valgrind, and behold it said

==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised 
value(s)
==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget 
(jsonpath_exec.c:1547)
==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FED03: executeNextItem 
(jsonpath_exec.c:1604)
==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget 
(jsonpath_exec.c:956)
==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626)
==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 
(jsonpath_exec.c:612)
==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal 
(jsonpath_exec.c:438)

It's fairly obviously right about that:

 JsonbValuejbv;
 ...
 jb = &jbv;
 Assert(tmp != NULL);/* We must have set tmp above */
 jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
   ^

Presumably, this is a mistaken attempt to test the type
of the thing previously pointed to by "jb".

On the whole, what I'd be inclined to do here is get rid
of this test altogether and demand that every path through
the preceding "switch" deliver a value that doesn't need
pstrdup().  The only path that doesn't do that already is

 case jbvBool:
 tmp = (jb->val.boolean) ? "true" : "false";
 break;

and TBH I'm not sure that we really need a pstrdup there
either.  The constants are immutable enough.  Is something
likely to try to pfree the pointer later?  I tried

@@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, 
JsonPathItem *jsp,
  
 jb = &jbv;

 Assert(tmp != NULL);/* We must have set tmp above */
-   jb->val.string.val = (jb->type == jbvString) ? tmp : 
pstrdup(tmp);
+   jb->val.string.val = tmp;
 jb->val.string.len = strlen(jb->val.string.val);
 jb->type = jbvString;
  
and that quieted valgrind for this particular query and still

passes regression.




Your fix looks sane. I also don't see why we need the pstrdup.




(The reported crashes seem to be happening later during a
recursive invocation, seemingly because JsonbType(jb) is
returning garbage.  So there may be another bug after this one.)





I don't think so. AIUI The first call deals with the '$' and the second 
one deals with the '.string()', which is why we see the error on the 
second call.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: More new SQL/JSON item methods

2024-01-25 Thread Tom Lane

Andrew Dunstan  writes:
> On 2024-01-25 Th 14:31, Tom Lane wrote:
>> (The reported crashes seem to be happening later during a
>> recursive invocation, seemingly because JsonbType(jb) is
>> returning garbage.  So there may be another bug after this one.)

> I don't think so. AIUI The first call deals with the '$' and the second 
> one deals with the '.string()', which is why we see the error on the 
> second call.

There's something else going on, because I'm still getting the
assertion failure on my Mac with this fix in place.  Annoyingly,
it goes away if I compile with -O0, so it's kind of hard to
identify what's going wrong.

regards, tom lane

Re: proposal: psql: show current user in prompt

2024-01-25 Thread Pavel Stehule

Hi

čt 11. 1. 2024 v 12:12 odesílatel Jelte Fennema-Nio 
napsal:

> On Tue, 12 Sept 2023 at 09:46, Peter Eisentraut 
> wrote:
> > ISTM that for a purpose like pgbouncer, it would be simpler to add a new
> > GUC "report these variables" and send that in the startup message?  That
> > might not help with the psql use case, but it would be much simpler.
>
> FYI I implemented it that way yesterday on this other thread (patch
> 0010 of that patchset):
>
> https://www.postgresql.org/message-id/flat/CAGECzQScQ3N-Ykv2j4NDyDtrPPc3FpRoa%3DLZ-2Uj2ocA4zr%3D4Q%40mail.gmail.com#cd9e8407820d492e8f677ee6a67c21ce

I read your patch, and I see some advantages and some disadvantages.

1. this doesn't need new protocol API just for this feature, what is nice

2. using GUC for all reported GUC looks not too readable. Maybe it should
be used just for customized reporting, not for default

3. Another issue of your proposal is less friendly enabling disabling
reporting of specific GUC. Maintaining a list needs more work than just
enabling and disabling one specific GUC.
I think this is the main disadvantage of your proposal. In my proposal I
don't need to know the state of any GUC. Just I send PQlinkParameterStatus
or PQunlinkParameterStatus. With your proposal, I need to read
_pq_.report_parameters, parse it, and modify, and send it back. This
doesn't look too practical.

Personally I prefer usage of a more generic API than my
PQlinkParameterStatus and PQunlinkParameterStatus. You talked about it with
Robert If I remember.

Can be nice if I can write just

/* similar princip is used inside ncurses */
set_report_guc_message_no = PQgetMessageNo("set_report_guc");
/* the result can be processed by server and by all proxies on the line */

if (set_report_guc_message_no == -1)
  fprintf(stderr, "feature is not supported");
result = PQsendMessage(set_report_guc_message, "user");
if (result == -1)
  fprintf(stderr, "some error ...");

With some API like this it can be easier to do some small protocol
enhancement. Maybe this is overengineering. Enhancing protocol is not too
common, and with usage PQsendTypedCommand enhancing protocol is less work
too.

Regards

Pavel

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 14:31, Andrew Dunstan  wrote:

>
> On 2024-01-25 Th 08:45, Dave Cramer wrote:
>
>
>
>
>
> I tried running my buildfarm using my git repo and my branch, but get the
> following error
> Status Line: 492 bad branch parameter
> Content:
> bad branch parameter fix_arm
>
> Web txn failed with status: 1
>
>
>
> You can't use your own branch with the buildfarm, you need to let it set
> up its own branches.
>

So I guess I have to wait until this patch is merged in ?

Dave

Re: More new SQL/JSON item methods

2024-01-25 Thread Tom Lane

I wrote:
> There's something else going on, because I'm still getting the
> assertion failure on my Mac with this fix in place.  Annoyingly,
> it goes away if I compile with -O0, so it's kind of hard to
> identify what's going wrong.

No, belay that: I must've got confused about which version I was
testing.  It's very unclear to me why the undefined reference
causes the preceding Assert to misbehave, but that is clearly
what's happening.  Compiler bug maybe?  My Mac has clang 15.0.0,
and the unhappy buildfarm members are also late-model clang.

Anyway, I did note that the preceding line

res = jperOk;

is dead code and might as well get removed while you're at it.

regards, tom lane

[Doc] Improvements to ddl.sgl Privileges Section and Glossary

2024-01-25 Thread David G. Johnston

Hey,

In a nearby user complaint email [1] some missing information regarding
ownership reassignment came to light.  I took that and went a bit further
to add what I felt was further missing information and context for how the
privilege system is designed.  I've tried to formalize and label existing
concepts a bit and updated the glossary accordingly.

The attached is a partial rewrite of the patch on the linked post after
those comments were taken into consideration.

The new glossary entry for privileges defines various qualifications of the
term that are used in the new prose.  I've marked up each of the variants
and point them all back to the main entry.  I didn't try to incorporate
those terms, or even really look, anywhere else in the documentation.  If
the general idea is accepted that kind of work can be done as a follow-up.

David J.

[1]
https://www.postgresql.org/message-id/d294818d12280f6223ddf169ab5454927f5186b6.camel%40cybertec.at
From a4d6a599a0b5d6b8f280e3d8489e7f4a4a555383 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Thu, 25 Jan 2024 13:41:48 -0700
Subject: [PATCH] v1-improvements-to-ddl-priv Section

---
 doc/src/sgml/ddl.sgml  | 109 ++---
 doc/src/sgml/glossary.sgml |  40 ++
 2 files changed, 105 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index fc03a349f0..7c9c9d0dd1 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1855,12 +1855,33 @@ ALTER TABLE products RENAME TO items;
   
 
   
-   When an object is created, it is assigned an owner. The
-   owner is normally the role that executed the creation statement.
-   For most kinds of objects, the initial state is that only the owner
-   (or a superuser) can do anything with the object. To allow
-   other roles to use it, privileges must be
-   granted.
+   The permissions needed to interact with an object are split between
+   privileges and
+   owner rights.
+   The owner has the right to modify the object, including dropping it, as well as
+   the right to grant and revoke privileges on the object to any role in the system.
+   Aside from a brief description, at the end, of what happens when you reassign the
+   owner of an object to some other role, this section describes privileges.
+  
+
+  
+   For most databases there are many roles and many objects, and thus a large amount
+   of privileges that need to be defined.  Two features exist to make management of
+   privileges easier: role membership (i.e., group roles, see )
+   including the PUBLIC  pseudo-role (which is a group role consisting
+   of all roles in the system) is one, while
+   default privileges
+   (see ) is the other.
+  
+
+  
+   The fundamental design of the privilege system is to disallow by default.  A role
+   must, directly or indirectly (via group role inheritance), hold the correct privilege
+   on the object to peform the corresponding action.
+   Furthermore, inheritance is strictly additive, there is no mechanism to block an
+   indirect privilege.
+   Revoking a privilege only removes
+   direct privileges.
   
 
   
@@ -1878,21 +1899,14 @@ ALTER TABLE products RENAME TO items;
   
 
   
-   The right to modify or destroy an object is inherent in being the
-   object's owner, and cannot be granted or revoked in itself.
-   (However, like all privileges, that right can be inherited by
-   members of the owning role; see .)
-  
-
-  
-   An object can be assigned to a new owner with an ALTER
-   command of the appropriate kind for the object, for example
-
-ALTER TABLE table_name OWNER TO new_owner;
-
-   Superusers can always do this; ordinary roles can only do it if they are
-   both the current owner of the object (or inherit the privileges of the
-   owning role) and able to SET ROLE to the new owning role.
+   Upon object creation, two sets of
+   implicit privileges
+   are established.
+   The owner is granted all applicable privileges on the object. All other roles,
+   including the PUBLIC  pseudo-role, get their default privileges.
+   From this point onward only
+   explicit priviliege
+   modification is possible, and is described next.
   
 
   
@@ -1904,15 +1918,9 @@ ALTER TABLE table_name OWNER TO new_owne
 GRANT UPDATE ON accounts TO joe;
 
Writing ALL in place of a specific privilege grants all
-   privileges that are relevant for the object type.
-  
-
-  
-   The special role name PUBLIC can
-   be used to grant a privilege to every role on the system.  Also,
-   group roles can be set up to help manage privileges when
-   there are many users of a database — for details see
-   .
+   privileges that are relevant for the object type.  The pseudo-role 
+   PUBLIC is a valid role for this purpose.  All roles in the system
+   will immediately inherit the indirect privilege(s) named in the command.
   
 
   
@@ -1936,9 +1944,11 @@ REVOKE ALL ON accounts FROM PUBLIC;
 
   
An object's owner can choose to revoke

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan  wrote:

>
> On 2024-01-24 We 19:02, Michael Paquier wrote:
>
> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
>
> I managed to get it to build the vcvarsall arch needs to be x64. I need to
> add some options, but the patch above needs to be applied to build it.
>
> Nice.  If I may ask, what kind of host and/or configuration have you
> used to reach a state where the code can be compiled and run tests
> with meson?  If you have found specific steps, it may be a good thing
> to document that on the wiki, say around [1].
>
> Perhaps you have not included TAP?  It may be fine in terms of runtime
> checks and coverage.
>
> [1]: 
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
>
>
>
> I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really
> want to build with x64_arm64, i.e. to generate native arm64 binaries.
> Setting just x64 will not do that, AIUI.
>
> I tried that with the buidfarm, setting that in the config file's call to
> PGBuild::VSenv::getenv().
>
> That upset msvc_gendef.pl, so I added this there to keep it happy:
>
> $arch = 'x86_64' if $arch eq 'aarch64';
>
> After that things went ok until I got this:
>
> [1453/2088] "link" @src/backend/postgres.exe.rsp
> FAILED: src/backend/postgres.exe src/backend/postgres.pdb
> "link" @src/backend/postgres.exe.rsp
>Creating library src\backend\postgres.exe.lib
> storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol
> _mm_pause referenced in function perform_spin_delay
> src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
>
>
> I haven't made further progress, but I will return to it in the next day
> or so.
>
> While this will be nice to have, I think it won't really matter until
> there is ARM64 support in released versions of Windows Server. AFAICT they
> still only sell versions for x86_64
>

I've tried it with my patch attached previously and x64_arm64 and it works
fine. builds using the buildfarm as well.

Is there a definitive way to figure out if the binaries are x64_arm64

Dave

Re: More new SQL/JSON item methods

2024-01-25 Thread Andrew Dunstan




On 2024-01-25 Th 15:33, Tom Lane wrote:

Andrew Dunstan  writes:

On 2024-01-25 Th 14:31, Tom Lane wrote:

(The reported crashes seem to be happening later during a
recursive invocation, seemingly because JsonbType(jb) is
returning garbage.  So there may be another bug after this one.)

I don't think so. AIUI The first call deals with the '$' and the second
one deals with the '.string()', which is why we see the error on the
second call.

There's something else going on, because I'm still getting the
assertion failure on my Mac with this fix in place.  Annoyingly,
it goes away if I compile with -O0, so it's kind of hard to
identify what's going wrong.





Curiouser and curiouser. On my Mac the error is manifest but the fix you 
suggested cures it. Built with -O2 -g, clang 15.0.0, Apple Silicon.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Anthony Roberts

Hi David,

Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.

Thanks,
Anthony

On Thu, 25 Jan 2024, 21:01 Dave Cramer,  wrote:

>
>
> On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan  wrote:
>
>>
>> On 2024-01-24 We 19:02, Michael Paquier wrote:
>>
>> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote:
>>
>> I managed to get it to build the vcvarsall arch needs to be x64. I need to
>> add some options, but the patch above needs to be applied to build it.
>>
>> Nice.  If I may ask, what kind of host and/or configuration have you
>> used to reach a state where the code can be compiled and run tests
>> with meson?  If you have found specific steps, it may be a good thing
>> to document that on the wiki, say around [1].
>>
>> Perhaps you have not included TAP?  It may be fine in terms of runtime
>> checks and coverage.
>>
>> [1]: 
>> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows
>>
>>
>>
>> I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we
>> really want to build with x64_arm64, i.e. to generate native arm64
>> binaries. Setting just x64 will not do that, AIUI.
>>
>> I tried that with the buidfarm, setting that in the config file's call to
>> PGBuild::VSenv::getenv().
>>
>> That upset msvc_gendef.pl, so I added this there to keep it happy:
>>
>> $arch = 'x86_64' if $arch eq 'aarch64';
>>
>> After that things went ok until I got this:
>>
>> [1453/2088] "link" @src/backend/postgres.exe.rsp
>> FAILED: src/backend/postgres.exe src/backend/postgres.pdb
>> "link" @src/backend/postgres.exe.rsp
>>Creating library src\backend\postgres.exe.lib
>> storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol
>> _mm_pause referenced in function perform_spin_delay
>> src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals
>>
>>
>> I haven't made further progress, but I will return to it in the next day
>> or so.
>>
>> While this will be nice to have, I think it won't really matter until
>> there is ARM64 support in released versions of Windows Server. AFAICT they
>> still only sell versions for x86_64
>>
>
> I've tried it with my patch attached previously and x64_arm64 and it works
> fine. builds using the buildfarm as well.
>
> Is there a definitive way to figure out if the binaries are x64_arm64
>
> Dave
>

Re: Remove unused fields in ReorderBufferTupleBuf

2024-01-25 Thread reid . thompson

On Thu, 2024-01-25 at 16:16 +0300, Aleksander Alekseev wrote:
> Hi,
> 
> > > Here is the corrected patch.
> > 
> > Thank you for updating the patch! I have some comments:
> 
> Thanks for the review.
> 
> > -tuple = (ReorderBufferTupleBuf *)
> > +tuple = (HeapTuple)
> >  MemoryContextAlloc(rb->tup_context,
> > -
> > sizeof(ReorderBufferTupleBuf) +
> > +   HEAPTUPLESIZE +
> > MAXIMUM_ALIGNOF +
> > alloc_len);
> > -tuple->alloc_tuple_size = alloc_len;
> > -tuple->tuple.t_data = ReorderBufferTupleBufData(tuple);
> > +tuple->t_data = (HeapTupleHeader)((char *)tuple + HEAPTUPLESIZE);
> > 
> > Why do we need to add MAXIMUM_ALIGNOF? since HEAPTUPLESIZE is the
> > MAXALIGN'd size, it seems we don't need it. heap_form_tuple() does a
> > similar thing but it doesn't add it.
> 
> Indeed. I gave it a try and nothing crashed, so it appears that
> MAXIMUM_ALIGNOF is not needed.
> 
> > ---
> >  xl_parameter_change *xlrec =
> > -(xl_parameter_change *)
> > XLogRecGetData(buf->record);
> > +(xl_parameter_change *)
> > XLogRecGetData(buf->record);
> > 
> > Unnecessary change.
> 
> That's pgindent. Fixed.
> 
> > ---
> > -/*
> > - * Free a ReorderBufferTupleBuf.
> > - */
> > -void
> > -ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf 
> > *tuple)
> > -{
> > -pfree(tuple);
> > -}
> > -
> > 
> > Why does ReorderBufferReturnTupleBuf need to be moved from
> > reorderbuffer.c to reorderbuffer.h? It seems not related to this
> > refactoring patch so I think we should do it in a separate patch if we
> > really want it. I'm not sure it's necessary, though.
> 
> OK, fixed.
> 

I walked through v6 and didn't note any issues.

I do want to ask, the patch alters ReorderBufferReturnTupleBuf() and
drops the unused parameter ReorderBuffer *rb. It seems that
ReorderBufferFreeSnap(), ReorderBufferReturnRelids(),
ReorderBufferImmediateInvalidation(), and ReorderBufferRestoreCleanup()
also pass ReorderBuffer *rb but do not use it. Would it be beneficial
to implement a separate patch to remove this parameter from these
functions also?

Thanks,
Reid

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Andrew Dunstan



On 2024-01-25 Th 15:56, Dave Cramer wrote:



On Thu, 25 Jan 2024 at 14:31, Andrew Dunstan  wrote:


On 2024-01-25 Th 08:45, Dave Cramer wrote:





I tried running my buildfarm using my git repo and my branch, but
get the following error
Status Line: 492 bad branch parameter
Content:
bad branch parameter fix_arm

Web txn failed with status: 1




You can't use your own branch with the buildfarm, you need to let
it set up its own branches.


So I guess I have to wait until this patch is merged in ?




Not quite.

There's a switch that lets you test your own code. If you say 
--from-source /path/to/repo  it will run in whatever state the repo is 
in. It won't care what branch you are on, and it won't try to update the 
repo. It won't report the results to the server, but it will build and 
test everything just like in a regular run. (On the "eat your own dog 
food" principle I use this mode a lot.) If you use that mode you 
probably also want to specify --verbose as well.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 16:04, Anthony Roberts 
wrote:

> Hi David,
>
> Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.
>
> Thanks,
> Anthony
>

So there is another way, select the file in Windows Explorer and right
click, in the compatibility tab if the "Windows on ARM" is greyed out it is
an arm binary.

So far mine are not :(

Thanks,

Dave

>

Re: More new SQL/JSON item methods

2024-01-25 Thread Andrew Dunstan




On 2024-01-25 Th 15:58, Tom Lane wrote:

I wrote:

There's something else going on, because I'm still getting the
assertion failure on my Mac with this fix in place.  Annoyingly,
it goes away if I compile with -O0, so it's kind of hard to
identify what's going wrong.

No, belay that: I must've got confused about which version I was
testing.  It's very unclear to me why the undefined reference
causes the preceding Assert to misbehave, but that is clearly
what's happening.  Compiler bug maybe?  My Mac has clang 15.0.0,
and the unhappy buildfarm members are also late-model clang.

Anyway, I did note that the preceding line

res = jperOk;

is dead code and might as well get removed while you're at it.





OK, pushed those. Thanks.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Andrew Dunstan



On 2024-01-25 Th 16:17, Dave Cramer wrote:



On Thu, 25 Jan 2024 at 16:04, Anthony Roberts 
 wrote:


Hi David,

Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.

Thanks,
Anthony



So there is another way, select the file in Windows Explorer and right 
click, in the compatibility tab if the "Windows on ARM" is greyed out 
it is an arm binary.


So far mine are not :(



Yeah, I think the default Developer Command Prompt for VS2022 is set up 
for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele


On 1/25/24 09:29, Michael Banck wrote:

Hi,

On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote:

I would still advocate for a back patch here. It is frustrating to get logs
from users that just say:

LOG:  invalid checkpoint record
PANIC:  could not locate a valid checkpoint record

It would be very helpful to know what the checkpoint record LSN was in this
case.


I agree.


Another thing to note here -- knowing the LSN is important but also 
knowing that backup recovery was attempted (i.e. backup_label exists) is 
really crucial. Knowing both just saves so much time in back and forth 
debugging.


It appears the tally for back patching is:

For: Andres, David, Michael B
Not Sure: Robert, Laurenz, Michael P

It seems at least nobody is dead set against it.

Regards,
-David

Re: Guiding principle for dropping LLVM versions?

2024-01-25 Thread Thomas Munro

On Thu, Jan 25, 2024 at 4:44 PM Thomas Munro  wrote:
> ... A few build farm animals will
> now fail in the configure step as discussed, and need some adjustment
> (ie disable LLVM or upgrade to LLVM 10+ for the master branch).

Owners pinged.

Re: Use of backup_label not noted in log

2024-01-25 Thread Tom Lane

David Steele  writes:
> Another thing to note here -- knowing the LSN is important but also 
> knowing that backup recovery was attempted (i.e. backup_label exists) is 
> really crucial. Knowing both just saves so much time in back and forth 
> debugging.

> It appears the tally for back patching is:

> For: Andres, David, Michael B
> Not Sure: Robert, Laurenz, Michael P

> It seems at least nobody is dead set against it.

We're talking about 1d35f705e, right?  That certainly looks harmless
and potentially useful.  I'm +1 for back-patching.

regards, tom lane

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele


On 1/25/24 17:42, Tom Lane wrote:

David Steele  writes:

Another thing to note here -- knowing the LSN is important but also
knowing that backup recovery was attempted (i.e. backup_label exists) is
really crucial. Knowing both just saves so much time in back and forth
debugging.



It appears the tally for back patching is:



For: Andres, David, Michael B
Not Sure: Robert, Laurenz, Michael P



It seems at least nobody is dead set against it.


We're talking about 1d35f705e, right?  That certainly looks harmless
and potentially useful.  I'm +1 for back-patching.


That's the one. If we were modifying existing messages I would be 
against it, but new, infrequent (but oh so helpful) messages seem fine.


Regards,
-David

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Dave Cramer

On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan  wrote:

>
> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>
>
>
> On Thu, 25 Jan 2024 at 16:04, Anthony Roberts 
> wrote:
>
>> Hi David,
>>
>> Unix "file" or "dumpbin /headers" in vcvarsall are your best bets.
>>
>> Thanks,
>> Anthony
>>
>
>
> So there is another way, select the file in Windows Explorer and right
> click, in the compatibility tab if the "Windows on ARM" is greyed out it is
> an arm binary.
>
> So far mine are not :(
>
>
> Yeah, I think the default Developer Command Prompt for VS2022 is set up
> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
>

Yup, now I'm in the same state you are

Dave

Re: Guiding principle for dropping LLVM versions?

2024-01-25 Thread Mark Wong


On 1/25/24 13:41, Thomas Munro wrote:

On Thu, Jan 25, 2024 at 4:44 PM Thomas Munro  wrote:

... A few build farm animals will
now fail in the configure step as discussed, and need some adjustment
(ie disable LLVM or upgrade to LLVM 10+ for the master branch).


Owners pinged.


I think I fixed up the 4 or 6 under my name...

Regards,
Mark

Re: speed up a logical replica setup

2024-01-25 Thread Euler Taveira

On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote:
> I'll post a new one soon.

I'm attaching another patch that fixes some of the issues pointed out by
Hayato, Shlok, and Junwang.

* publication doesn't exist. The analysis [1] was done by Hayato but I didn't
  use the proposed patch. Instead I refactored the code a bit [2] and call it
  from a new function (setup_publisher) that is called before the promotion.
* fix wrong path name in the initial comment [3]
* change terminology: logical replica -> physical replica [3]
* primary / standby is ready for logical replication? setup_publisher() and
  setup_subscriber() check if required GUCs are set accordingly. For primary,
  it checks wal_level = logical, max_replication_slots has remain replication
  slots for the proposed setup and also max_wal_senders available. For standby,
  it checks max_replication_slots for replication origin and also remain number
  of background workers to start the subscriber.
* retain option: I extracted this one from Hayato's patch [4]
* target server must be a standby. It seems we agree that this restriction
  simplifies the code a bit but can be relaxed in the future (if/when base
  backup support is added.)
* recovery timeout option: I decided to include it but I think the use case is
  too narrow. It helps in broken setups. However, it can be an issue in some
  scenarios like time-delayed replica, large replication lag, slow hardware,
  slow network. I didn't use the proposed patch [5]. Instead, I came up with a
  simple one that defaults to forever. The proposed one defaults to 60 seconds
  but I'm afraid that due to one of the scenarios I said in a previous
  sentence, we cancel a legitimate case. Maybe we should add a message during
  dry run saying that due to a replication lag, it will take longer to run.
* refactor primary_slot_name code. With the new setup_publisher and
  setup_subscriber functions, I splitted the function that detects the
  primary_slot_name use into 2 pieces just to avoid extra connections to have
  the job done.
* remove fallback_application_name as suggested by Hayato [5] because logical
  replication already includes one.

I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.

I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I
didn't hear objections about it, I'll do it in the next patch.


[1] 
https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/73ab86ca-3fd5-49b3-9c80-73d1525202f1%40app.fastmail.com
[3] 
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4] 
https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[5] 
https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From d9ef01a806c3d8697faa444283f19c2deaa58850 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 5 Jun 2023 14:39:40 -0400
Subject: [PATCH v9] Creates a new logical replica from a standby server

A new tool called pg_subscriber can convert a physical replica into a
logical replica. It runs on the target server and should be able to
connect to the source server (publisher) and the target server
(subscriber).

The conversion requires a few steps. Check if the target data directory
has the same system identifier than the source data directory. Stop the
target server if it is running as a standby server. Create one
replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent
LSN (This consistent LSN will be used as (a) a stopping point for the
recovery process and (b) a starting point for the subscriptions). Write
recovery parameters into the target data directory and start the target
server (Wait until the target server is promoted). Create one
publication (FOR ALL TABLES) per specified database on the source
server. Create one subscription per specified database on the target
server (Use replication slot and publication created in a previous step.
Don't enable the subscriptions yet). Sets the replication progress to
the consistent LSN that was got in a previous step. Enable the
subscription for each specified database on the target server. Remove
the additional replication slot that was used to get the consistent LSN.
Stop the target server. Change the system identifier from the target
server.

Depending on your workload and database size, creating a logical replica
couldn't be an option due to resource constraints (WAL backlog should be
available until all table data i

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-25 Thread Michael Paquier

On Thu, Jan 25, 2024 at 05:45:43PM +0900, Sutou Kouhei wrote:
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 25 Jan 2024 12:17:55 +0900,
>   Michael Paquier  wrote:
>> +extern CopyToRoutine CopyToRoutineText;
>> +extern CopyToRoutine CopyToRoutineCSV;
>> +extern CopyToRoutine CopyToRoutineBinary;
>> 
>> All that should IMO remain in copyto.c and copyfrom.c in the initial
>> patch doing the refactoring.  Why not using a fetch function instead
>> that uses a string in input?  Then you can call that once after
>> parsing the List of options in ProcessCopyOptions().
> 
> OK. How about the following for the fetch function
> signature?
> 
> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);

Or CopyToRoutineGet()?  I am not wedded to my suggestion, got a bad
history with naming things around here.

> We may introduce an enum and use it:
> 
> typedef enum CopyBuiltinFormat
> {
>   COPY_BUILTIN_FORMAT_TEXT = 0,
>   COPY_BUILTIN_FORMAT_CSV,
>   COPY_BUILTIN_FORMAT_BINARY,
> } CopyBuiltinFormat;
> 
> extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);

I am not sure that this is necessary as the option value is a string.

> Oh, sorry. I assumed that the comment style was adjusted by
> pgindent.

No worries, that's just something we get used to.  I tend to fix a lot
of these things by myself when editing patches.

>> +getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
>> +fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
>> 
>> Actually, this split is interesting.  It is possible for a custom
>> format to plug in a custom set of out functions.  Did you make use of
>> something custom for your own stuff?
> 
> I didn't. My PoC custom COPY format handler for Apache Arrow
> just handles integer and text for now. It doesn't use
> cstate->out_functions because cstate->out_functions may not
> return a valid binary format value for Apache Arrow. So it
> formats each value by itself.

I mean, if you use a custom output function, you could tweak things
even more with byteas or such..  If a callback is expected to do
something, like setting the output function OIDs in the start
callback, we'd better document it rather than letting that be implied.

>>   Actually, could it make sense to
>> split the assignment of cstate->out_functions into its own callback?
> 
> Yes. Because we need to use getTypeBinaryOutputInfo() for
> "binary" and use getTypeOutputInfo() for "text" and "csv".

Okay.  After sleeping on it, a split makes sense here, because it also
reduces the presence of TupleDesc in the start callback.

>> Sure, that's part of the start phase, but at least it would make clear
>> that a custom method *has* to assign these OIDs to work.  The patch
>> implies that as a rule, without a comment that CopyToStart *must* set
>> up these OIDs.
> 
> CopyToStart doesn't need to set up them if the handler
> doesn't use cstate->out_functions.

Noted.

>> I think that 0001 and 0005 should be handled first, as pieces
>> independent of the rest.  Then we could move on with 0002~0004 and
>> 0006~0008.
> 
> OK. I'll focus on 0001 and 0005 for now. I'll restart
> 0002-0004/0006-0008 after 0001 and 0005 are accepted.

Once you get these, I'd be interested in re-doing an evaluation of
COPY TO and more tests with COPY FROM while running Postgres on
scissors.  One thing I was thinking to use here is my blackhole_am for
COPY FROM:
https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am

As per its name, it does nothing on INSERT, so you could create a
table using it as access method, and stress the COPY FROM execution
paths without having to mount Postgres on a tmpfs because the data is
sent to the void.  Perhaps it does not matter, but that moves the
tests to the bottlenecks we want to stress (aka the per-row callback
for large data sets).

I've switched the patch as waiting on author for now.  Thanks for your
perseverance here.  I understand that's not easy to follow up with
patches and reviews (^_^;)
--
Michael

signature.asc
Description: PGP signature

Re: speed up a logical replica setup

2024-01-25 Thread Euler Taveira

On Thu, Jan 25, 2024, at 6:05 AM, Hayato Kuroda (Fujitsu) wrote:
> 01.
> ```
> /* Options */
> static char *pub_conninfo_str = NULL;
> static SimpleStringList database_names = {NULL, NULL};
> static int wait_seconds = DEFAULT_WAIT;
> static bool retain = false;
> static bool dry_run = false;
> ```
> 
> Just to confirm - is there a policy why we store the specified options? If you
> want to store as global ones, username and port should follow (my fault...).
> Or, should we have a structure to store them?

It is a matter of style I would say. Check other client applications. Some of
them also use global variable. There are others that group options into a
struct. I would say that since it has a short lifetime, I don't think the
current style is harmful.

> 04.
> ```
> {"dry-run", no_argument, NULL, 'n'},
> ```
> 
> I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the
> which value would be changed based on the input. As for the pg_upgrade, it 
> checks
> whether the node can be upgraded for now. I think, we should have the checking
> feature, so it should be renamed to --check. Also, the process should exit 
> earlier
> at that time.

It is extremely useful because (a) you have a physical replication setup and
don't know if it is prepared for logical replication, (b) check GUCs (is
max_wal_senders sufficient for this pg_subscriber command? Or is
max_replication_slots sufficient to setup the logical replication even though I
already have some used replication slots?), (c) connectivity and (d)
credentials.

> 05.
> I felt we should accept some settings from enviroment variables, like 
> pg_upgrade.
> Currently, below items should be acceted.
> 
> - data directory
> - username
> - port
> - timeout

Maybe PGDATA.

> 06.
> ```
> pg_logging_set_level(PG_LOG_WARNING);
> ```
> 
> If the default log level is warning, there are no ways to output debug logs.
> (-v option only raises one, so INFO would be output)
> I think it should be PG_LOG_INFO.

You need to specify multiple -v options.

> 07.
> Can we combine verifications into two functions, e.g., check_primary() and 
> check_standby/check_subscriber()?

I think v9 does it.

> 08.
> Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
> should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?

> 09.
> Not sure, but should we check max_slot_wal_keep_size of primary server? It can
> avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?
Hmm. We might add another check that can be noticed during dry run mode.

> 10.
> ```
> nslots_new = nslots_old + dbarr.ndbs;
> 
> if (nslots_new > max_replication_slots)
> {
> pg_log_error("max_replication_slots (%d) must be greater than or equal to "
> "the number of replication slots required (%d)", max_replication_slots, 
> nslots_new);
> exit(1);
> }
> ```
> 
> I think standby server must not have replication slots. Because subsequent
> pg_resetwal command discards all the WAL file, so WAL records pointed by them
> are removed. Currently pg_resetwal does not raise ERROR at that time.

Again, dry run mode might provide a message for it.

> 11.
> ```
> /*
> * Stop the subscriber if it is a standby server. Before executing the
> * transformation steps, make sure the subscriber is not running because
> * one of the steps is to modify some recovery parameters that require a
> * restart.
> */
> if (stat(pidfile, &statbuf) == 0)
> ```
> 
> I kept just in case, but I'm not sure it is still needed. How do you think?
> Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?

> 12.
> ```
> pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s",
>   standby.bindir, standby.pgdata);
> rc = system(pg_ctl_cmd);
> pg_ctl_status(pg_ctl_cmd, rc, 0);
> ```
> 
> 
> There are two places to stop the instance. Can you divide it into a function?

Yes.

> 13.
> ```
> * A temporary replication slot is not used here to avoid keeping a
> * replication connection open (depending when base backup was taken, the
> * connection should be open for a few hours).
> */
> conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname);
> if (conn == NULL)
> exit(1);
> consistent_lsn = create_logical_replication_slot(conn, true,
> &dbarr.perdb[0]);
> ```
> 
> I didn't notice the comment, but still not sure the reason. Why we must 
> reserve
> the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
> is created only for getting a consistent_lsn. So we do not have to keep.
> Also, just before, logical replication slots for each databases are created, 
> so
> WAL records are surely reserved.

This comment needs to be updated. It was written at the time I was pursuing
base backup support too. It doesn't matter if you remove this transient
replication slot earlier because all of the replication slots cr

Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-01-25 Thread Yugo NAGATA

On Tue, 2 Jan 2024 08:00:00 +0800
jian he  wrote:

> On Mon, Nov 6, 2023 at 8:00 AM jian he  wrote:
> >
> > minor doc issues.
> > Returns the chunk id of the TOASTed value, or NULL if the value is not 
> > TOASTed.
> > Should it be "chunk_id"?

Thank you for your suggestion. As you pointed out, it is called "chunk_id" 
in the documentation, so I rewrote it and also added a link to the section
where the TOAST table structure is explained.

> > you may place it after pg_create_logical_replication_slot entry to
> > make it look like alphabetical order.

I've been thinking about where we should place the function in the doc,
and I decided place it in the table  of Database Object Size Functions
because I think pg_column_toast_chunk_id also would assist understanding
the result of size functions as similar to pg_column_compression; that is,
those function can explain why a large value in size could be stored in
a column.

> > There is no test. maybe we can add following to 
> > src/test/regress/sql/misc.sql
> > create table val(t text);
> > INSERT into val(t) SELECT string_agg(
> >   chr((ascii('B') + round(random() * 25)) :: integer),'')
> > FROM generate_series(1,2500);
> > select pg_column_toast_chunk_id(t) is  not null from val;
> > drop table val;

Thank you for the test proposal. However, if we add a test, I want
to check that the chunk_id returned by the function exists in the
TOAST table, and that it returns NULL if the values is not TOASTed.
For the purpose, I wrote a test using a dynamic SQL since the table
name of the TOAST table have to be generated from the main table's OID.

> Hi
> the main C function (pg_column_toast_chunk_id)  I didn't change.
> I added tests as mentioned above.
> tests put it on src/test/regress/sql/misc.sql, i hope that's fine.
> I placed pg_column_toast_chunk_id in "Table 9.99. Database Object
> Location Functions" (below Table 9.98. Database Object Size
> Functions).

I could not find any change in your patch from my previous patch.
Maybe, you attached wrong file. I attached a patch updated based
on your review, including the documentation fixes and a test.
What do you think about this it? 

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 97af1b2300ecf07a34923da87a9d84e6aa963956 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Wed, 29 Mar 2023 09:59:25 +0900
Subject: [PATCH v3] Add pg_column_toast_chunk_id function

This function returns the chunk_id of an on-disk TOASTed value, or
NULL if the value is un-TOASTed or not on disk. This enables users to
know which columns are actually TOASTed. This function is also useful
to identify a problematic row when an error like
  "ERROR:  unexpected chunk number ... (expected ...) for toast value"
occurs.
---
 doc/src/sgml/func.sgml  | 17 ++
 src/backend/utils/adt/varlena.c | 41 +
 src/include/catalog/pg_proc.dat |  3 +++
 3 files changed, 61 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..2d82331323 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_column_toast_chunk_id
+
+pg_column_toast_chunk_id ( "any" )
+oid
+   
+   
+Shows the chunk_id of an on-disk
+TOASTed value. Returns NULL
+if the value is un-TOASTed or not on-disk.
+See  for details about
+TOAST.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 543afb66e5..84d36781a4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS)
 	PG_RETURN_TEXT_P(cstring_to_text(result));
 }
 
+/*
+ * Return the chunk_id of the on-disk TOASTed value.
+ * Return NULL if the value is unTOASTed or not on disk.
+ */
+Datum
+pg_column_toast_chunk_id(PG_FUNCTION_ARGS)
+{
+	int			typlen;
+	struct varlena *attr;
+	struct varatt_external toast_pointer;
+
+	/* On first call, get the input type's typlen, and save at *fn_extra */
+	if (fcinfo->flinfo->fn_extra == NULL)
+	{
+		/* Lookup the datatype of the supplied argument */
+		Oid			argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0);
+
+		typlen = get_typlen(argtypeid);
+		if (typlen == 0)		/* should not happen */
+			elog(ERROR, "cache lookup failed for type %u", argtypeid);
+
+		fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+	  sizeof(int));
+		*((int *) fcinfo->flinfo->fn_extra) = typlen;
+	}
+	else
+		typlen = *((int *) fcinfo->flinfo->fn_extra);
+
+	if (typlen != -1)
+		PG_RETURN_NULL();
+
+	attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0));
+
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+		PG_RETURN_NULL();
+
+	VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+	PG_RETURN_OID(toast_pointer.va_va

Rename setup_cancel_handler in pg_dump

2024-01-25 Thread Yugo NAGATA

Hi,

Attached is a simple patch to rename setup_cancel_handler()
in pg_dump/parallel.c. 

I am proposing it because there is a public function with
the same name in fe_utils/cancel.c. I know pg_dump/parallel.c
does not include fe_utils/cancel.h, so there is no conflict,
but I think it is better to use different names to reduce
possible confusion. 

I guess there was no concerns when setup_cancel_handler in
pg_dump/parallel.c was introduced because the same name
function was not in fe_utils that could be used in common
between client tools.. The public setup_cancel_handler in
fe_utils was introduced in a4fd3aa719e, where this function
was moved from psql.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 188186829c..261b23cb3f 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -204,7 +204,7 @@ static ParallelSlot *GetMyPSlot(ParallelState *pstate);
 static void archive_close_connection(int code, void *arg);
 static void ShutdownWorkersHard(ParallelState *pstate);
 static void WaitForTerminatingWorkers(ParallelState *pstate);
-static void setup_cancel_handler(void);
+static void pg_dump_setup_cancel_handler(void);
 static void set_cancel_pstate(ParallelState *pstate);
 static void set_cancel_slot_archive(ParallelSlot *slot, ArchiveHandle *AH);
 static void RunWorker(ArchiveHandle *AH, ParallelSlot *slot);
@@ -550,7 +550,7 @@ sigTermHandler(SIGNAL_ARGS)
 	/*
 	 * Some platforms allow delivery of new signals to interrupt an active
 	 * signal handler.  That could muck up our attempt to send PQcancel, so
-	 * disable the signals that setup_cancel_handler enabled.
+	 * disable the signals that pg_dump_setup_cancel_handler enabled.
 	 */
 	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, SIG_IGN);
@@ -605,7 +605,7 @@ sigTermHandler(SIGNAL_ARGS)
  * Enable cancel interrupt handler, if not already done.
  */
 static void
-setup_cancel_handler(void)
+pg_dump_setup_cancel_handler(void)
 {
 	/*
 	 * When forking, signal_info.handler_set will propagate into the new
@@ -705,7 +705,7 @@ consoleHandler(DWORD dwCtrlType)
  * Enable cancel interrupt handler, if not already done.
  */
 static void
-setup_cancel_handler(void)
+pg_dump_setup_cancel_handler(void)
 {
 	if (!signal_info.handler_set)
 	{
@@ -737,7 +737,7 @@ set_archive_cancel_info(ArchiveHandle *AH, PGconn *conn)
 	 * important that this happen at least once before we fork off any
 	 * threads.
 	 */
-	setup_cancel_handler();
+	pg_dump_setup_cancel_handler();
 
 	/*
 	 * On Unix, we assume that storing a pointer value is atomic with respect

Re: Use of backup_label not noted in log

2024-01-25 Thread Michael Paquier

On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote:
> I would still advocate for a back patch here. It is frustrating to get logs
> from users that just say:
> 
> LOG:  invalid checkpoint record
> PANIC:  could not locate a valid checkpoint record
> 
> It would be very helpful to know what the checkpoint record LSN was in this
> case.

Yes, I've pested over this one in the past when debugging corruption
issues.  To me, this would just mean to appens to the PANIC an "at
%X/%X", but perhaps you have more in mind for these code paths?
--
Michael


signature.asc
Description: PGP signature

RE: Fix inappropriate comments in function ReplicationSlotAcquire

2024-01-25 Thread Wei Wang (Fujitsu)

On Thu, Jan 25, 2024 at 20:33 Amit Kapila  wrote:
> On Thu, Jan 25, 2024 at 4:01 PM Wei Wang (Fujitsu)
>  wrote:
> >
> >
> > Yes, agree. I think these two parts have become slightly outdated after the
> > commit 1632ea4. So also tried to fix the first part of the comment.
> > Attach the new patch.
> >
> 
> How about changing it to something simple like:
> diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> index f2781d0455..84c257a7aa 100644
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -465,10 +465,7 @@ retry:
> 
> LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> 
> -   /*
> -* Search for the slot with the specified name if the slot to acquire 
> is
> -* not given. If the slot is not found, we either return -1 or
> error out.
> -*/
> +/* Check if the slot exits with the given name. */
> s = SearchNamedReplicationSlot(name, false);
> if (s == NULL || !s->in_use)
> {

It looks good to me. So, I updated the patch as suggested.

Regards,
Wang Wei


v3-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Description:  v3-0001-Fix-inappropriate-comments-in-function-Replicatio.patch

Re: [PATCH] Add native windows on arm64 support

2024-01-25 Thread Michael Paquier

On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote:
> On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan  wrote:
>> On 2024-01-25 Th 16:17, Dave Cramer wrote:
>> Yeah, I think the default Developer Command Prompt for VS2022 is set up
>> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64".
> 
> Yup, now I'm in the same state you are

Wait a minute here.  Based on [1], x64_arm64 means you can use a x64
host and you'll be able to produce ARM64 builds, still these will not
be able to run on the host where they were built.  How much of the
patch posted upthread is required to produce such builds?  Basically 
everything from it, I guess, so as build dependencies can be
satisfied?

[1]: 
https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170
--
Michael

signature.asc
Description: PGP signature

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-01-25 Thread vignesh C

On Wed, 20 Dec 2023 at 19:17, Jelte Fennema-Nio  wrote:
>
> On Thu, 14 Dec 2023 at 13:57, Jelte Fennema-Nio  wrote:
> > I changed all the places that were not adhering to those spellings.
>
> It seems I forgot a /g on my sed command to do this so it turned out I
> missed one that caused the test to fail to compile... Attached is a
> fixed version.
>
> I also updated the patchset to use the EOF detection provided by
> 0a5c46a7a488f2f4260a90843bb9de6c584c7f4e instead of introducing a new
> way of EOF detection using a -2 return value.

CFBot shows that the patch does not apply anymore as in [1]:
patching file doc/src/sgml/libpq.sgml
...
patching file src/interfaces/libpq/exports.txt
Hunk #1 FAILED at 191.
1 out of 1 hunk FAILED -- saving rejects to file
src/interfaces/libpq/exports.txt.rej
patching file src/interfaces/libpq/fe-connect.c

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_3511.log

Regards,
Vignesh

Re: Printing backtrace of postgres processes

2024-01-25 Thread vignesh C

On Sun, 5 Nov 2023 at 01:49, Bharath Rupireddy
 wrote:
>
> On Thu, Jul 20, 2023 at 8:22 PM Daniel Gustafsson  wrote:
> >
> > > On 11 Jan 2023, at 15:44, Bharath Rupireddy 
> > >  wrote:
> > >
> > > On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
> > >  wrote:
> > >>
> > >> I'm attaching the v22 patch set for further review.
> > >
> > > Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
> > > Attaching v23 patch set for further review.
> >
> > This thread has stalled for well over 6 months with the patch going from CF 
> > to
> > CF.  From skimming the thread it seems that a lot of the details have been
> > ironed out with most (all?) objections addressed.  Is any committer 
> > interested
> > in picking this up?  If not we should probably mark it returned with 
> > feedback.
>
> Rebase needed due to function oid clash. Picked the new OID with the
> help of src/include/catalog/unused_oids. PSA v24 patch set.

Rebase needed due to changes in parallel_schedule. PSA v25 patch set.

Regards,
Vignesh
From 32fa1d898b2599b836a2265f746acf230ffb358e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 4 Nov 2023 18:06:36 +
Subject: [PATCH v25 1/2] Move sending multiplexed-SIGUSR1 signal code to a
 function

Add a new function hosting the common code for sending
multiplexed-SIGUSR1 signal to a backend process. This function
will also be used as-is by an upcoming commit reducing the code
duplication.
---
 src/backend/storage/ipc/procarray.c | 60 +
 src/backend/utils/adt/mcxtfuncs.c   | 49 ++-
 src/include/storage/procarray.h |  1 +
 3 files changed, 64 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ee2d7f8585..f6465529e7 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3097,6 +3097,66 @@ HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type)
 	return result;
 }
 
+/*
+ * SendProcSignalBackendOrAuxproc -- check if the process with given pid is a
+ * backend or an auxiliary process and send it the SIGUSR1 signal for a given
+ * reason.
+ *
+ * Returns true if sending the signal was successful, false otherwise.
+ */
+bool
+SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason)
+{
+	PGPROC	   *proc;
+	BackendId	backendId = InvalidBackendId;
+
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * See if the process with given pid is a backend or an auxiliary process.
+	 *
+	 * If the given process is a backend, use its backend id in
+	 * SendProcSignal() later to speed up the operation. Otherwise, don't do
+	 * that because auxiliary processes (except the startup process) don't
+	 * have a valid backend id.
+	 */
+	if (proc != NULL)
+		backendId = proc->backendId;
+	else
+		proc = AuxiliaryPidGetProc(pid);
+
+	/*
+	 * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+	 * isn't valid; but by the time we reach kill(), a process for which we
+	 * get a valid proc here might have terminated on its own.  There's no way
+	 * to acquire a lock on an arbitrary process to prevent that. But since
+	 * this mechanism is usually used to debug a backend or an auxiliary
+	 * process running and consuming lots of memory or a long running process,
+	 * that it might end on its own first and its memory contexts are not
+	 * logged or backtrace not logged is not a problem.
+	 */
+	if (proc == NULL)
+	{
+		/*
+		 * This is just a warning so a loop-through-resultset will not abort
+		 * if one backend terminated on its own during the run.
+		 */
+		ereport(WARNING,
+(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		return false;
+	}
+
+	if (SendProcSignal(pid, reason, backendId) < 0)
+	{
+		/* Again, just a warning to allow loops */
+		ereport(WARNING,
+(errmsg("could not send signal to process %d: %m", pid)));
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * BackendPidGetProc -- get a backend's PGPROC given its PID
  *
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 4708d73f5f..f7b4f8dac1 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -145,51 +145,8 @@ Datum
 pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 {
 	int			pid = PG_GETARG_INT32(0);
-	PGPROC	   *proc;
-	BackendId	backendId = InvalidBackendId;
+	bool		result;
 
-	proc = BackendPidGetProc(pid);
-
-	/*
-	 * See if the process with given pid is a backend or an auxiliary process.
-	 *
-	 * If the given process is a backend, use its backend id in
-	 * SendProcSignal() later to speed up the operation. Otherwise, don't do
-	 * that because auxiliary processes (except the startup process) don't
-	 * have a valid backend id.
-	 */
-	if (proc != NULL)
-		backendId = proc->backendId;
-	else
-		proc = AuxiliaryPidGetProc(pid);
-
-	/*
-	 * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
-	 * isn't valid; but by the time w

1 2 >

1 - 100 of 131 matches

Mail list logo