Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, Thanks for trying these patches! In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 10:53:58 +0800, jian he wrote: > COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 Wow! I didn't know the "\watch count="! I'll use it. > Time: 668.996 ms > Time: 596.254 ms > Time: 592.723 ms > Time: 591.663 ms > Time: 590.803 ms It seems that 5 times isn't enough for this case as Michael said. But thanks for trying! Thanks, -- kou
Re: UUID v7
I am against turning the DBMS into another C++, in which they do not so much design something new as fix bugs in production after a crash. As for partitioning, I already wrote to Andrey Borodin that we need a special function to generate a partition id using the UUIDv7 timestamp or even simultaneously with the generation of the timestamp. For example, every month (or so, since precision is not needed here) a new partition is created. Here's a good example: https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916 But without a separate function for extracting the entire timestamp from the UUID! Let's solve this specific problem, and not give the developers a grenade with the safety removed. Many developers have already decided to store the timestamp in UUIDv7, so as not to create a separate created_at field. Then they will delete table records with the old timestamp, etc. Horrible mistakes are simply guaranteed. Sergey Prokhorenko sergeyprokhore...@yahoo.com.au On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin wrote: > On 25 Jan 2024, at 09:40, Nikolay Samokhvalov wrote: > > From a practical point of view, these two things are extremely important to > have to support partitioning. It is better to implement limitations than > throw them away. Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here? My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation. > On 25 Jan 2024, at 02:15, Jelte Fennema-Nio wrote: > > So +1 for erroring when you provide a timestamp outside of that range > (either too far in the past or too far in the future). OK, it seems like we have some consensus on ERRORing.. Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message? Best regards, Andrey Borodin.
Re: Use of backup_label not noted in log
On Mon, Jan 22, 2024 at 04:36:27PM +0900, Michael Paquier wrote: > + if (ControlFile->backupStartPoint != InvalidXLogRecPtr) > > Nit 1: I would use XLogRecPtrIsInvalid here. > > + ereport(LOG, > + (errmsg("completed backup recovery with redo LSN %X/%X", > + LSN_FORMAT_ARGS(oldBackupStartPoint; > > Nit 2: How about adding backupEndPoint in this LOG? That would give: > "completed backup recovery with redo LSN %X/%X and end LSN %X/%X". Hearing nothing, I've just applied a version of the patch with these two modifications on HEAD. If this needs tweaks, just let me know. -- Michael signature.asc Description: PGP signature
Re: A compiling warning in jsonb_populate_record_valid
On Thu, Jan 25, 2024 at 4:47 PM Richard Guo wrote: > On Thu, Jan 25, 2024 at 2:28 PM Amit Langote wrote: >> >> On Thu, Jan 25, 2024 at 2:59 PM Richard Guo wrote: >> > I came across a warning when building master (a044e61f1b) on old GCC >> > (4.8.5). >> > >> > jsonfuncs.c: In function ‘jsonb_populate_record_valid’: >> > ../../../../src/include/nodes/miscnodes.h:53:15: warning: the comparison >> > will always evaluate as ‘true’ for the address of ‘escontext’ will never >> > be NULL [-Waddress] >> > ((escontext) != NULL && IsA(escontext, ErrorSaveContext) && \ >> >^ >> > jsonfuncs.c:2481:23: note: in expansion of macro ‘SOFT_ERROR_OCCURRED’ >> > return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); >> > >> > This was introduced in commit 1edb3b491b, and can be observed on several >> > systems in the buildfarm too, such as: >> > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2024-01-25%2004%3A54%3A38&stg=build >> >> Thanks for the report. >> >> Will apply the attached, which does this: >> >> - return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); >> + return BoolGetDatum(!escontext.error_occurred); > > Looks good to me. Thanks for checking, pushed. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Split index and table statistics into different types of stats
Hi, On Tue, Nov 14, 2023 at 09:04:03AM +0100, Drouvot, Bertrand wrote: > On 11/13/23 9:44 PM, Andres Freund wrote: > > Hi, > > > > It's not nice from a layering POV that we need this level of awareness in > > bufmgr.c. I wonder if this is an argument for first splitting out stats > > like > > blocks_hit, blocks_fetched into something like "relfilenode stats" - they're > > agnostic of the relkind. > > Thanks for looking at it! Yeah I think that would make a lot of sense > to track some stats per relfilenode. > > > There aren't that many such stats right now, > > admittedly, but I think we'll want to also track dirtied, written blocks on > > a > > per relation basis once we can (i.e. we key the relevant stats by > > relfilenode > > instead of oid, so we can associate stats when writing out buffers). > > > > > > Agree. Then, I think that would make sense to start this effort before the > split index/table one. I can work on a per relfilenode stat patch first. > > Does this patch ordering make sense to you? > > 1) Introduce per relfilenode stats > 2) Split index and table stats Just a quick update on this: I had a chat with Andres at pgconf.eu and we agreed on the above ordering so that: 1) I started working on relfilenode stats (I hope to be able to provide a POC patch soon). 2) The CF entry [1] status related to this thread has been changed to "Waiting on Author". [1]: https://commitfest.postgresql.org/47/4792/ Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900, Michael Paquier wrote: > +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem > *defel); > +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate); > +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc); > +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot > *slot); > +typedef void (*CopyToEnd_function) (CopyToState cstate); > > We don't really need a set of typedefs here, let's put the definitions > in the CopyToRoutine struct instead. OK. I'll do it. > +extern CopyToRoutine CopyToRoutineText; > +extern CopyToRoutine CopyToRoutineCSV; > +extern CopyToRoutine CopyToRoutineBinary; > > All that should IMO remain in copyto.c and copyfrom.c in the initial > patch doing the refactoring. Why not using a fetch function instead > that uses a string in input? Then you can call that once after > parsing the List of options in ProcessCopyOptions(). OK. How about the following for the fetch function signature? extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format); We may introduce an enum and use it: typedef enum CopyBuiltinFormat { COPY_BUILTIN_FORMAT_TEXT = 0, COPY_BUILTIN_FORMAT_CSV, COPY_BUILTIN_FORMAT_BINARY, } CopyBuiltinFormat; extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format); > +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may > + * move the code to here later. */ > Some areas, like this comment, are written in an incorrect format. Oh, sorry. I assumed that the comment style was adjusted by pgindent. I'll use the following style: /* * ... */ > +getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); > +fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); > > Actually, this split is interesting. It is possible for a custom > format to plug in a custom set of out functions. Did you make use of > something custom for your own stuff? I didn't. My PoC custom COPY format handler for Apache Arrow just handles integer and text for now. It doesn't use cstate->out_functions because cstate->out_functions may not return a valid binary format value for Apache Arrow. So it formats each value by itself. I'll chose one of them for a custom type (that isn't supported by Apache Arrow, e.g. PostGIS types): 1. Report an unsupported error 2. Call output function for Apache Arrow provided by the custom type > Actually, could it make sense to > split the assignment of cstate->out_functions into its own callback? Yes. Because we need to use getTypeBinaryOutputInfo() for "binary" and use getTypeOutputInfo() for "text" and "csv". > Sure, that's part of the start phase, but at least it would make clear > that a custom method *has* to assign these OIDs to work. The patch > implies that as a rule, without a comment that CopyToStart *must* set > up these OIDs. CopyToStart doesn't need to set up them if the handler doesn't use cstate->out_functions. > I think that 0001 and 0005 should be handled first, as pieces > independent of the rest. Then we could move on with 0002~0004 and > 0006~0008. OK. I'll focus on 0001 and 0005 for now. I'll restart 0002-0004/0006-0008 after 0001 and 0005 are accepted. Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 13:36:03 +0900, Masahiko Sawada wrote: > I've experimented with a similar optimization for csv > and text format; have different callbacks for text and csv format and > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch > for that. Here are results: > > HEAD w/ 0001 patch + remove branches: > binary 2824.502 ms > text 2715.264 ms > csv 2803.381 ms > > The numbers look better now. I'm not sure these are within a noise > range but it might be worth considering having different callbacks for > text and csv formats. Wow! Interesting. I tried the approach before but I didn't see any difference by the approach. But it may depend on my environment. I'll import the approach to the next patch set so that others can try the approach easily. Thanks, -- kou
RE: speed up a logical replica setup
Dear hackers, Here are comments for v8 patch set. I may revise them by myself, but I want to post here to share all of them. 01. ``` /* Options */ static char *pub_conninfo_str = NULL; static SimpleStringList database_names = {NULL, NULL}; static int wait_seconds = DEFAULT_WAIT; static bool retain = false; static bool dry_run = false; ``` Just to confirm - is there a policy why we store the specified options? If you want to store as global ones, username and port should follow (my fault...). Or, should we have a structure to store them? 02. ``` {"subscriber-conninfo", required_argument, NULL, 'S'}, ``` This is my fault, but "--subscriber-conninfo" is still remained. It should be removed if it is not really needed. 03. ``` {"username", required_argument, NULL, 'u'}, ``` Should we accept 'U' instead of 'u'? 04. ``` {"dry-run", no_argument, NULL, 'n'}, ``` I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the which value would be changed based on the input. As for the pg_upgrade, it checks whether the node can be upgraded for now. I think, we should have the checking feature, so it should be renamed to --check. Also, the process should exit earlier at that time. 05. I felt we should accept some settings from enviroment variables, like pg_upgrade. Currently, below items should be acceted. - data directory - username - port - timeout 06. ``` pg_logging_set_level(PG_LOG_WARNING); ``` If the default log level is warning, there are no ways to output debug logs. (-v option only raises one, so INFO would be output) I think it should be PG_LOG_INFO. 07. Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()? 08. Not sure, but if we want to record outputs by pg_subscriber, the sub-directory should be created. The name should contain the timestamp. 09. Not sure, but should we check max_slot_wal_keep_size of primary server? It can avoid to fail starting of logical replicaiton. 10. ``` nslots_new = nslots_old + dbarr.ndbs; if (nslots_new > max_replication_slots) { pg_log_error("max_replication_slots (%d) must be greater than or equal to " "the number of replication slots required (%d)", max_replication_slots, nslots_new); exit(1); } ``` I think standby server must not have replication slots. Because subsequent pg_resetwal command discards all the WAL file, so WAL records pointed by them are removed. Currently pg_resetwal does not raise ERROR at that time. 11. ``` /* * Stop the subscriber if it is a standby server. Before executing the * transformation steps, make sure the subscriber is not running because * one of the steps is to modify some recovery parameters that require a * restart. */ if (stat(pidfile, &statbuf) == 0) ``` I kept just in case, but I'm not sure it is still needed. How do you think? Removing it can reduce an inclusion of pidfile.h. 12. ``` pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s", standby.bindir, standby.pgdata); rc = system(pg_ctl_cmd); pg_ctl_status(pg_ctl_cmd, rc, 0); ``` There are two places to stop the instance. Can you divide it into a function? 13. ``` * A temporary replication slot is not used here to avoid keeping a * replication connection open (depending when base backup was taken, the * connection should be open for a few hours). */ conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname); if (conn == NULL) exit(1); consistent_lsn = create_logical_replication_slot(conn, true, &dbarr.perdb[0]); ``` I didn't notice the comment, but still not sure the reason. Why we must reserve the slot until pg_subscriber finishes? IIUC, the slot would be never used, it is created only for getting a consistent_lsn. So we do not have to keep. Also, just before, logical replication slots for each databases are created, so WAL records are surely reserved. 14. ``` pg_log_info("starting the subscriber"); start_standby_server(&standby, subport, server_start_log); ``` This info should be in the function. 15. ``` /* * Create a subscription for each database. */ for (i = 0; i < dbarr.ndbs; i++) ``` This can be divided into a function, like create_all_subscriptions(). 16. My fault: usage() must be updated. 17. use_primary_slot_name ``` if (PQntuples(res) != 1) { pg_log_error("could not obtain replication slot information: got %d rows, expected %d row", PQntuples(res), 1); return NULL;
Re: Improve WALRead() to suck data directly from WAL buffers when possible
On Tue, Jan 23, 2024 at 9:37 AM Jeff Davis wrote: > > On Mon, 2024-01-22 at 12:12 -0800, Andres Freund wrote: > > I still think that anything that requires such checks shouldn't be > > merged. It's completely bogus to check page contents for validity > > when we > > should have metadata telling us which range of the buffers is valid > > and which > > not. > > The check seems entirely unnecessary, to me. A leftover from v18? > > I have attached a new patch (version "19j") to illustrate some of my > previous suggestions. I didn't spend a lot of time on it so it's not > ready for commit, but I believe my suggestions are easier to understand > in code form. > Note that, right now, it only works for XLogSendPhysical(). I believe > it's best to just make it work for 1-3 callers that we understand well, > and we can generalize later if it makes sense. +1 to do it for XLogSendPhysical() first. Enabling it for others can just be done as something like the attached v20-0003. > I'm still not clear on why some callers are reading XLOG_BLCKSZ > (expecting zeros at the end), and if it's OK to just change them to use > the exact byte count. "expecting zeros at the end" - this can't always be true as the WAL can get flushed after determining the flush ptr before reading it from the WAL file. FWIW, here's what I've tried previoulsy - https://github.com/BRupireddy2/postgres/tree/ensure_extra_read_WAL_page_is_zero_padded_at_the_end_WIP, the tests hit the Assert(false); added. Which means, the zero-padding comment around WALRead() call-sites isn't quite right. /* * Even though we just determined how much of the page can be validly read * as 'count', read the whole page anyway. It's guaranteed to be * zero-padded up to the page boundary if it's incomplete. */ if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli, I think this needs to be discussed separately. If okay, I'll start a new thread. > Also, if we've detected that the first requested buffer has been > evicted, is there any value in continuing the loop to see if more > recent buffers are available? For example, if the requested LSNs range > over buffers 4, 5, and 6, and 4 has already been evicted, should we try > to return LSN data from 5 and 6 at the proper offset in the dest > buffer? If so, we'd need to adjust the API so the caller knows what > parts of the dest buffer were filled in. I'd second this capability for now to keep the API simple and clear, but we can consider expanding it as needed. I reviewed the v19j and attached v20 patch set: 1. * The caller must ensure that it's reasonable to read from the WAL buffers, * i.e. that the requested data is from the current timeline, that we're not * in recovery, etc. I still think the XLogReadFromBuffers can just return in any of the above cases instead of comments. I feel we must assume the caller is going to ask the WAL from a different timeline and/or in recovery and design the API to deal with it. Done that way in v20 patch. 2. Fixed some typos, reworded a few comments (i.e. used "current insert/write position" instead of "Insert/Write pointer" like elsewhere), ran pgindent. 3. - * XXX probably this should be improved to suck data directly from the - * WAL buffers when possible. Removed the above comment before WALRead() since we have that facility now. Perhaps, we can say the callers can suck data directly from the WAL buffers using XLogReadFromBuffers. But I have no strong opinion on this. 4. + * Most callers will have already updated LogwrtResult when determining + * how far to read, but it's OK if it's out of date. (XXX: is it worth + * taking a spinlock to update LogwrtResult and check again before calling + * WaitXLogInsertionsToFinish()?) If the callers use GetFlushRecPtr() to determine how far to read, LogwrtResult will be *reasonably* latest, otherwise not. If LogwrtResult is a bit old, XLogReadFromBuffers will call WaitXLogInsertionsToFinish which will just loop over all insertion locks and return. As far as the current WAL readers are concerned, we don't need an explicit spinlock to determine LogwrtResult because all of them use GetFlushRecPtr() to determine how far to read. If there's any caller that's not updating LogwrtResult at all, we can consider reading LogwrtResult it ourselves in future. 5. I think the two requirements specified at https://www.postgresql.org/message-id/20231109205836.zjoawdrn4q77yemv%40awork3.anarazel.de still hold with the v19j. 5.1 Never allow WAL being read that's past XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos) as it does not exist. 5.2 If the to-be-read LSN is between XLogCtl->LogwrtResult->Write and XLogBytePosToRecPtr(Insert->CurrBytePos) we need to call WaitXLogInsertionsToFinish() before copying the data. +if (upto > LogwrtResult.Write) +{ +XLogRecPtr writtenUpto = WaitXLogInsertionsToFinish(upto, false); + +upto = Min(upto, writtenUpto); +nbytes = upto - st
Re: remaining sql/json patches
On 9.16.4. JSON_TABLE ` name type FORMAT JSON [ENCODING UTF8] [ PATH json_path_specification ] Inserts a composite SQL/JSON item into the output row ` i am not sure "Inserts a composite SQL/JSON item into the output row" I think it means, for any type's typecategory is TYPCATEGORY_STRING, if FORMAT JSON is specified explicitly, the output value (text type) will be legal json type representation. I also did a minor refactor on JSON_VALUE_OP, jsexpr->omit_quotes. diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 74bc6f49..56ab12ac 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -4289,7 +4289,7 @@ ExecInitJsonExpr(JsonExpr *jexpr, ExprState *state, * nodes. */ jsestate->escontext.type = T_ErrorSaveContext; - if (jexpr->result_coercion || jexpr->omit_quotes) + if (jexpr->result_coercion) { jsestate->jump_eval_result_coercion = ExecInitJsonExprCoercion(state, jexpr->result_coercion, diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 31c0847e..9802b4ae 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -4363,6 +4363,7 @@ transformJsonFuncExpr(ParseState *pstate, JsonFuncExpr *func) jsexpr->returning->format->format_type = JS_FORMAT_DEFAULT; jsexpr->returning->format->encoding = JS_ENC_DEFAULT; + jsexpr->omit_quotes = true; jsexpr->result_coercion = coerceJsonFuncExprOutput(pstate, jsexpr); /* v1-0001-refactor-sqljson_jsontable-related-tests.no-cfbot Description: Binary data
Fix inappropriate comments in function ReplicationSlotAcquire
Hi, In the function ReplicationSlotAcquire(), I found there is a missing in the below comments: ``` /* * Search for the slot with the specified name if the slot to acquire is * not given. If the slot is not found, we either return -1 or error out. */ s = SearchNamedReplicationSlot(name, false); if (s == NULL || !s->in_use) { LWLockRelease(ReplicationSlotControlLock); ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("replication slot \"%s\" does not exist", name))); } ``` It seems that when the slot is not found, we will only error out and will not return -1. Attach the patch to remove the inappropriate description of return value. Regards, Wang Wei v1-0001-Fix-inappropriate-comments-in-function-Replicatio.patch Description: v1-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Re: speed up a logical replica setup
On 24.01.24 00:44, Euler Taveira wrote: Subscriber has a different meaning of subscription. Subscription is an SQL object. Subscriber is the server (node in replication terminology) where the subscription resides. Having said that pg_createsubscriber doesn't seem a bad name because you are creating a new subscriber. (Indeed, you are transforming / converting but "create" seems closer and users can infer that it is a tool to build a new logical replica. That makes sense. (Also, the problem with "convert" etc. is that "convertsubscriber" would imply that you are converting an existing subscriber to something else. It would need to be something like "convertbackup" then, which doesn't seem helpful.) I think "convert" and "transform" fit for this case. However, "create", "convert" and "transform" have 6, 7 and 9 characters, respectively. I suggest that we avoid long names (subscriber already has 10 characters). My preference is pg_createsubscriber. That seems best to me.
Re: Fix inappropriate comments in function ReplicationSlotAcquire
On Thu, Jan 25, 2024 at 2:57 PM Wei Wang (Fujitsu) wrote: > > In the function ReplicationSlotAcquire(), I found there is a missing in the > below comments: > > ``` > /* > * Search for the slot with the specified name if the slot to acquire > is > * not given. If the slot is not found, we either return -1 or error > out. > */ > s = SearchNamedReplicationSlot(name, false); > if (s == NULL || !s->in_use) > { > LWLockRelease(ReplicationSlotControlLock); > > ereport(ERROR, > (errcode(ERRCODE_UNDEFINED_OBJECT), > errmsg("replication slot \"%s\" does not > exist", > name))); > } > ``` > > It seems that when the slot is not found, we will only error out and will not > return -1. > You seem to be correct. However, isn't the first part of the comment also slightly confusing? In particular, "... if the slot to acquire is not given." In this function, I don't see the case where a slot to acquire is given. -- With Regards, Amit Kapila.
Re: Documentation to upgrade logical replication cluster
On Wed, 24 Jan 2024 at 15:16, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > Thanks for updating the patch! Basically your patch looks good. > Below lines are my comments for v3. > > 01. > > ``` > > The output plugins referenced by the slots on the old cluster must be > installed in the new PostgreSQL executable directory. > > ``` > > PostgreSQL must be marked as . Modified > 02. > > ``` > > pg_ctl -D /opt/PostgreSQL/data1 stop -l logfile > > ``` > > I checked that found that -l was no-op when `pg_ctl stop` was specified. Can > we remove? > The documentation is not listed -l for the stop command. > All the similar lines should be fixed as well. Modified > 03. > > ``` > On node3, create any tables that were created in > the upgraded node2 between > and now, > ``` > > If tables are newly defined on node1 between 1 - 11, they are not defined on > node3. > So they must be defined on node3 as well. The new node1 tables will be created in node2 in step-11. Since we have mentioned that, create the tables that were created between step 6 and now, all of node1 and node2 tables will get created > 04. > > ``` > > > ``` > > Even if the referred steps is correct, ID should be allocated to step, not > para. > That's why the rendering is bit a strange. Modified The attached v4 version patch has the changes for the same. Regards, Vignesh From 4abe4ed86d68a5ecb60c3bf29249bb883605fb76 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Wed, 13 Dec 2023 14:11:58 +0530 Subject: [PATCH v4] Documentation for upgrading logical replication cluster. Documentation for upgrading logical replication cluster. --- doc/src/sgml/logical-replication.sgml | 807 ++ doc/src/sgml/ref/pgupgrade.sgml | 143 + 2 files changed, 822 insertions(+), 128 deletions(-) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index ec2130669e..81501f9b92 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1926,6 +1926,813 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER + + Upgrade + + + Migration of logical replication clusters is possible only when all the + members of the old logical replication clusters are version 17.0 or later. + Before reading this section, refer page for + more details about pg_upgrade. + + + + Prepare for publisher upgrades + + +pg_upgrade attempts to migrate logical +slots. This helps avoid the need for manually defining the same +logical slots on the new publisher. Migration of logical slots is +only supported when the old cluster is version 17.0 or later. +Logical slots on clusters before version 17.0 will silently be +ignored. + + + +Before you start upgrading the publisher cluster, ensure that the +subscription is temporarily disabled, by executing +ALTER SUBSCRIPTION ... DISABLE. +Re-enable the subscription after the upgrade. + + + +There are some prerequisites for pg_upgrade to +be able to upgrade the logical slots. If these are not met an error +will be reported. + + + + + + The new cluster must have + wal_level as + logical. + + + + + The new cluster must have + max_replication_slots + configured to a value greater than or equal to the number of slots + present in the old cluster. + + + + + The output plugins referenced by the slots on the old cluster must be + installed in the new PostgreSQL executable + directory. + + + + + The old cluster has replicated all the transactions and logical decoding + messages to subscribers. + + + + + All slots on the old cluster must be usable, i.e., there are no slots + whose + pg_replication_slots.conflicting + is true. + + + + + The new cluster must not have permanent logical slots, i.e., + there must be no slots where + pg_replication_slots.temporary + is false. + + + + + + + Prepare for subscriber upgrades + + +Setup the +subscriber configurations in the new subscriber. +pg_upgrade attempts to migrate subscription +dependencies which includes the subscription's table information present in +pg_subscription_rel +system catalog and also the subscription's replication origin. This allows +logical replication on the new subscriber to continue from where the +old subscriber was up to. Migration of subscription dependencies is only +supported when the old cluster is version 17.0 or later. Subscription +dependencies on clusters before version 17.0 will silently be ignored. + + + +There are some prerequisites for pg_upgrade to +be able to upgrade the subscriptions. If these are not met an er
Re: Documentation to upgrade logical replication cluster
On Thu, 25 Jan 2024 at 05:45, Peter Smith wrote: > > Here are some review comments for patch v3. > > == > doc/src/sgml/ref/pgupgrade.sgml > > 1. > + > + > + This page does not cover steps to upgrade logical replication > clusters, refer > +for details on upgrading > + logical replication clusters. > + > + > > I felt that maybe this note was misplaced. Won't it be better to put > this down in the "Usage" section of this page? > > BEFORE > These are the steps to perform an upgrade with pg_upgrade: > > SUGGESTION (or something like this) > Below are the steps to perform an upgrade with pg_upgrade. > > Note, the steps to upgrade logical replication clusters are not > covered here; refer to > for details. Modified > ~~~ > > 2. > Configure the servers for log shipping. (You do not need to run > pg_backup_start() and > pg_backup_stop() > or take a file system backup as the standbys are still synchronized > - with the primary.) Only logical slots on the primary are copied to > the > - new standby, but other slots on the old standby are not copied so must > - be recreated manually. > + with the primary.) In version 17.0 or later, only logical slots on > the > + primary are copied to the new standby, but other slots on the > old standby > + are not copied so must be recreated manually. > > > This para was still unclear to me. What is version 17.0 referring to > -- the old_cluster version? Do you mean something like: > If the old cluster is < v17 then logical slots are not copied. If the > old_cluster is >= v17 then... Yes, I have rephrased it now. > == > doc/src/sgml/logical-replication.sgml > > 3. > + > +While upgrading a subscriber, write operations can be performed in the > +publisher, these changes will be replicated to the subscriber once the > +subscriber upgradation is completed. > + > > 3a. > /publisher, these changes/publisher. These changes/ Modified > ~ > > 3b. > "upgradation" ??. See [1] > > maybe just /upgradation/upgrade/ Modified > ~~~ > > 4. GENERAL - prompts/paths > > I noticed in v3 you removed all the cmd prompts like: > dba@node1:/opt/PostgreSQL/postgres/17/bin$ > dba@node2:/opt/PostgreSQL/postgres/18/bin$ > etc. > > I thought those were helpful to disambiguate which server/version was > being operated on. I wonder if there is some way to keep information > still but not make it look like a real current directory that > Kuroda-san did not like: > > e.g. Maybe something like the below is possible? > > (dba@node1: v17) pg_upgrade... > (dba@node2: v18) pg_upgrade... I did not want to add this as our current documentation is consistent with how it is documented in the pg_upgrade page at [1]. The v4 version patch attached at [2] has the changes for the same. [1] - https://www.postgresql.org/docs/devel/pgupgrade.html [2] - https://www.postgresql.org/message-id/CALDaNm1wCHmBwpLM%3Dd9oBoZqKXOe-TwC-LCcHC9gFy0bazZU6Q%40mail.gmail.com Regards, Vignesh
Re: Popcount optimization using AVX512
On 2024-Jan-25, Shankaran, Akash wrote: > With the updated patch, we observed significant improvements and > handily beat the previous popcount algorithm performance. No > regressions in any scenario are observed: > Platform: Intel Xeon Platinum 8360Y (Icelake) for data sizes 1kb - 64kb. > Microbenchmark: 2x - 3x gains presently vs 19% previously, on the same > microbenchmark described initially in this thread. These are great results. However, it would be much better if the improved code were available for all relevant builds and activated if a CPUID test determines that the relevant instructions are available, instead of requiring a compile-time flag -- which most builds are not going to use, thus wasting the opportunity for running the optimized code. I suppose this would require patching pg_popcount64_choose() to be more specific. Looking at the existing code, I would also consider renaming the "_fast" variants to something like pg_popcount32_asml/ pg_popcount64_asmq so that you can name the new one pg_popcount64_asmdq or such. (Or maybe leave the 32-bit version alone as "fast/slow", since there's no third option for that one -- or do I misread?) I also think this needs to move the CFLAGS-decision-making elsewhere; asking the user to get it right is too much of a burden. Is it workable to simply verify compiler support for the additional flags needed, and if so add them to a new CFLAGS_BITUTILS variable or such? We already have the CFLAGS_CRC model that should be easy to follow. Should be easy enough to mostly copy what's in configure.ac and meson.build, right? Finally, the matter of using ifunc as proposed by Noah seems to be still in the air, with no patches offered for the popcount family. Given that Nathan reports [1] a performance decrease, maybe we should set that thought aside for now and continue to use function pointers. It's worth keeping in mind that popcount is already using function pointers (at least in the case where we try to use POPCNT directly), so patching to select between three options instead of between two wouldn't be a regression. [1] https://postgr.es/m/20231107201441.GA898662@nathanxps13 -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
RE: speed up a logical replica setup
Dear Peter, Thanks for giving your idea! > > Subscriber has a different meaning of subscription. Subscription is an SQL > > object. Subscriber is the server (node in replication terminology) where the > > subscription resides. Having said that pg_createsubscriber doesn't seem > > a bad > > name because you are creating a new subscriber. (Indeed, you are > > transforming / > > converting but "create" seems closer and users can infer that it is a > > tool to > > build a new logical replica. > > That makes sense. > > (Also, the problem with "convert" etc. is that "convertsubscriber" would > imply that you are converting an existing subscriber to something else. > It would need to be something like "convertbackup" then, which doesn't > seem helpful.) > > > I think "convert" and "transform" fit for this case. However, "create", > > "convert" and "transform" have 6, 7 and 9 characters, respectively. I > > suggest > > that we avoid long names (subscriber already has 10 characters). My > > preference > > is pg_createsubscriber. > > That seems best to me. Just FYI - I'm ok to change the name to pg_createsubscriber. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Synchronizing slots from primary to standby
On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot wrote: > > On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote: > > > > Agreed. I split the original 0001 patch into 3 patches as suggested. > > Here is the V68 patch set. Thanks, I have pushed 0001. > > Thanks! > > Some comments. > > Looking at 0002: > > 1 === > > + The following options are supported: > > What about "The following option is supported"? (as currently only the > "FAILOVER" > is) > > 2 === > > What about adding some TAP tests too? (I can see that ALTER_REPLICATION_SLOT > test > is added in v68-0004 but I think having some in 0002 would make sense too). > The subscription tests in v68-0003 will test this functionality. The one advantage of adding separate tests for this is that if in the future we extend this replication command, it could be convenient to test various options. However, the same could be said about existing replication commands as well. But is it worth having extra tests which will be anyway covered in the next commit in a few days? I understand that it is a good idea and makes one comfortable to have tests for each separate commit but OTOH, in the longer term it will just be adding more test time without achieving much benefit. I think we can tell explicitly in the commit message of this patch that the subsequent commit will cover the tests for this functionality One minor comment on 0002: + so that logical replication can be resumed after failover. + Can we move this and similar comments or doc changes to the later 0004 patch where we are syncing the slots? -- With Regards, Amit Kapila.
RE: Fix inappropriate comments in function ReplicationSlotAcquire
On Thu, Jan 25, 2024 at 18:39 Amit Kapila wrote: > Thanks for your review. > On Thu, Jan 25, 2024 at 2:57 PM Wei Wang (Fujitsu) > wrote: > > > > In the function ReplicationSlotAcquire(), I found there is a missing in the > > below comments: > > > > ``` > > /* > > * Search for the slot with the specified name if the slot to > > acquire is > > * not given. If the slot is not found, we either return -1 or > > error out. > > */ > > s = SearchNamedReplicationSlot(name, false); > > if (s == NULL || !s->in_use) > > { > > LWLockRelease(ReplicationSlotControlLock); > > > > ereport(ERROR, > > (errcode(ERRCODE_UNDEFINED_OBJECT), > > errmsg("replication slot \"%s\" does not > exist", > > name))); > > } > > ``` > > > > It seems that when the slot is not found, we will only error out and will > > not > > return -1. > > > > You seem to be correct. However, isn't the first part of the comment > also slightly confusing? In particular, "... if the slot to acquire is > not given." In this function, I don't see the case where a slot to > acquire is given. Yes, agree. I think these two parts have become slightly outdated after the commit 1632ea4. So also tried to fix the first part of the comment. Attach the new patch. Regards, Wang Wei v2-0001-Fix-inappropriate-comments-in-function-Replicatio.patch Description: v2-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Re: index prefetching
On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra wrote: > On 1/22/24 07:35, Konstantin Knizhnik wrote: > > > > On 22/01/2024 1:47 am, Tomas Vondra wrote: > >> h, right. Well, you're right in this case we perhaps could set just one > >> of those flags, but the "purpose" of the two places is quite different. > >> > >> The "prefetch" flag is fully controlled by the prefetcher, and it's up > >> to it to change it (e.g. I can easily imagine some new logic touching > >> setting it to "false" for some reason). > >> > >> The "data" flag is fully controlled by the custom callbacks, so whatever > >> the callback stores, will be there. > >> > >> I don't think it's worth simplifying this. In particular, I don't think > >> the callback can assume it can rely on the "prefetch" flag. > >> > > Why not to add "all_visible" flag to IndexPrefetchEntry ? If will not > > cause any extra space overhead (because of alignment), but allows to > > avoid dynamic memory allocation (not sure if it is critical, but nice to > > avoid if possible). > > > While reading through the first patch I got some questions, I haven't read it complete yet but this is what I got so far. 1. +static bool +IndexPrefetchBlockIsSequential(IndexPrefetch *prefetch, BlockNumber block) +{ + int idx; ... + if (prefetch->blockItems[idx] != (block - i)) + return false; + + /* Don't prefetch if the block happens to be the same. */ + if (prefetch->blockItems[idx] == block) + return false; + } + + /* not sequential, not recently prefetched */ + return true; +} The above function name is BlockIsSequential but at the end, it returns true if it is not sequential, seem like a problem? Also other 2 checks right above the end of the function are returning false if the block is the same or the pattern is sequential I think those are wrong too. 2. I have noticed that the prefetch history is maintained at the backend level, but what if multiple backends are trying to fetch the same heap blocks maybe scanning the same index, so should that be in some shared structure? I haven't thought much deeper about this from the implementation POV, but should we think about it, or it doesn't matter? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: A failure in t/038_save_logical_slots_shutdown.pl
On Tue, Jan 16, 2024 at 12:13 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Amit, Bharath, > > > This is a more strict check because it is possible that even if the > > latest confirmed_flush location is not persisted there is no > > meaningful decodable WAL between whatever the last confirmed_flush > > location saved on disk and the shutdown_checkpoint record. > > Kuroda-San/Vignesh, do you have any suggestion on this one? > > I think it should be as testcase explicitly. There are two reasons: > > * e0b2eed is a commit for backend codes, so it should be tested by src/test/* > files. Each src/bin/XXX/*.pl files should test only their executable. > * Assuming that the feature would be broken. In this case 003_logical_slots.pl > would fail, but we do not have a way to recognize on the build farm. > 038_save_logical_slots_shutdown.pl helps to distinguish the case. +1 to keep 038_save_logical_slots_shutdown.pl as-is. > Based on that, I think it is OK to add advance_wal() and comments, like > Bharath's patch. Thanks. I'll wait a while and then add it to CF to not lose it in the wild. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: UUID v7
Andrey M. Borodin wrote on 25.01.2024 07:51: On 25 Jan 2024, at 09:40, Nikolay Samokhvalov wrote: From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away. Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here? My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation. On 25 Jan 2024, at 02:15, Jelte Fennema-Nio wrote: So +1 for erroring when you provide a timestamp outside of that range (either too far in the past or too far in the future). OK, it seems like we have some consensus on ERRORing.. Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message? +1 for erroring when ts is outside range. v13 looks good for me. I think we have reached a optimal compromise. -- Przemysław Sztoch | Mobile +48 509 99 00 66
Re: Fix inappropriate comments in function ReplicationSlotAcquire
On Thu, Jan 25, 2024 at 4:01 PM Wei Wang (Fujitsu) wrote: > > > Yes, agree. I think these two parts have become slightly outdated after the > commit 1632ea4. So also tried to fix the first part of the comment. > Attach the new patch. > How about changing it to something simple like: diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index f2781d0455..84c257a7aa 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -465,10 +465,7 @@ retry: LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); - /* -* Search for the slot with the specified name if the slot to acquire is -* not given. If the slot is not found, we either return -1 or error out. -*/ +/* Check if the slot exits with the given name. */ s = SearchNamedReplicationSlot(name, false); if (s == NULL || !s->in_use) { -- With Regards, Amit Kapila.
Re: A failure in t/038_save_logical_slots_shutdown.pl
On Thu, Jan 25, 2024 at 4:27 PM Bharath Rupireddy wrote: > > Thanks. I'll wait a while and then add it to CF to not lose it in the wild. > Feel free to add it to CF. However, I do plan to look at it in the next few days. -- With Regards, Amit Kapila.
Re: cataloguing NOT NULL constraints
Hi Alvaro, 25.08.2023 14:38, Alvaro Herrera wrote: > I have now pushed this again. Hopefully it'll stick this time. Starting from b0e96f31, pg_upgrade fails with inherited NOT NULL constraint: For example upgrade from 9c13b6814a (or REL_12_STABLE .. REL_16_STABLE) to b0e96f31 (or master) with following two tables (excerpt from src/test/regress/sql/rules.sql) create table test_0 (id serial primary key); create table test_1 (id integer primary key) inherits (test_0); I get the failure: Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster test *failure* Consult the last few lines of "new/pg_upgrade_output.d/20240125T151231.112/log/pg_upgrade_dump_16384.log" for the probable cause of the failure. Failure, exiting In log: pg_restore: connecting to database for restore pg_restore: creating DATABASE "test" pg_restore: connecting to new database "test" pg_restore: creating DATABASE PROPERTIES "test" pg_restore: connecting to new database "test" pg_restore: creating pg_largeobject "pg_largeobject" pg_restore: creating COMMENT "SCHEMA "public"" pg_restore: creating TABLE "public.test_0" pg_restore: creating SEQUENCE "public.test_0_id_seq" pg_restore: creating SEQUENCE OWNED BY "public.test_0_id_seq" pg_restore: creating TABLE "public.test_1" pg_restore: creating DEFAULT "public.test_0 id" pg_restore: executing SEQUENCE SET test_0_id_seq pg_restore: creating CONSTRAINT "public.test_0 test_0_pkey" pg_restore: creating CONSTRAINT "public.test_1 test_1_pkey" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 3200; 2606 16397 CONSTRAINT test_1 test_1_pkey andrew pg_restore: error: could not execute query: ERROR: cannot drop inherited constraint "pgdump_throwaway_notnull_0" of relation "test_1" Command was: -- For binary upgrade, must preserve pg_class oids and relfilenodes SELECT pg_catalog.binary_upgrade_set_next_index_pg_class_oid('16396'::pg_catalog.oid); SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('16396'::pg_catalog.oid); ALTER TABLE ONLY "public"."test_1" ADD CONSTRAINT "test_1_pkey" PRIMARY KEY ("id"); ALTER TABLE ONLY "public"."test_1" DROP CONSTRAINT pgdump_throwaway_notnull_0; Thanks! On Thu, Jan 25, 2024 at 3:06 PM Alvaro Herrera wrote: > I have now pushed this again. Hopefully it'll stick this time. > > We may want to make some further tweaks to the behavior in some cases -- > for example, don't disallow ALTER TABLE DROP NOT NULL when the > constraint is both inherited and has a local definition; the other > option is to mark the constraint as no longer having a local definition. > I left it the other way because that's what CHECK does; maybe we would > like to change both at once. > > I ran it through CI, and the pg_upgrade test with a dump from 14's > regression test database and everything worked well, but it's been a > while since I tested the sepgsql part of it, so that might the first > thing to explode. > > -- > Álvaro Herrera 48°01'N 7°57'E — > https://www.EnterpriseDB.com/ > > > > >
Re: Synchronizing slots from primary to standby
On Thu, Jan 25, 2024 at 10:39 AM Peter Smith wrote: > 2. synchronize_one_slot > > + /* > + * Sanity check: Make sure that concerned WAL is received and flushed > + * before syncing slot to target lsn received from the primary server. > + * > + * This check should never pass as on the primary server, we have waited > + * for the standby's confirmation before updating the logical slot. > + */ > + latestFlushPtr = GetStandbyFlushRecPtr(NULL); > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + { > + ereport(LOG, > + errmsg("skipping slot synchronization as the received slot sync" > +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X", > +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > +remote_slot->name, > +LSN_FORMAT_ARGS(latestFlushPtr))); > + > + return false; > + } > > Previously in v65 this was an elog, but now it is an ereport. But > since this is a sanity check condition that "should never pass" wasn't > the elog the more appropriate choice? We realized that this scenario can be frequently hit when the user has not set standby_slot_names on primary. And thus ereport makes more sense. But I agree that this comment is misleading. We will adjust the comment in the next version. thanks Shveta
Re: Current Connection Information
Hi, > It would be viable and appropriate to implement a unified function that > provides important information about the current connection? > > Just an example: "Current Connection Informations". > > I implemented it in PL/pgSQL to demonstrate the idea, see on GitHub: > https://github.com/maiquelgrassi/DBA-toolkit/blob/main/cluster/dba_whoami_function.sql I believe one would typically do something like this: ``` select * from pg_stat_activity where pid = pg_backend_pid(); ``` On top of that psql can be configured to display useful information, e.g.: ``` $ cat ~/.psqlrc \timing on select (case when pg_is_in_recovery() then 'replica' else 'master' end) as master_or_replica \gset \set PROMPT1 '%p (%:master_or_replica:) =# ' ``` Personally I somewhat doubt that there is a one-size-fits-all equivalent of `whoami` for Postgres. E.g. one person would like to see a list of extensions available in the current database while for another this is redundant information. Even if we do this I don't think this should be a PL/pgSQL function but rather a \whoami command for psql. This solution however will leave users of DataGrip and similar products unhappy. -- Best regards, Aleksander Alekseev
Re: Synchronizing slots from primary to standby
Hi, On Thu, Jan 25, 2024 at 03:54:45PM +0530, Amit Kapila wrote: > On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot > wrote: > > > > On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote: > > > > > > Agreed. I split the original 0001 patch into 3 patches as suggested. > > > Here is the V68 patch set. > > Thanks, I have pushed 0001. > > > > > Thanks! > > > > Some comments. > > > > Looking at 0002: > > > > 1 === > > > > + The following options are supported: > > > > What about "The following option is supported"? (as currently only the > > "FAILOVER" > > is) > > > > 2 === > > > > What about adding some TAP tests too? (I can see that > > ALTER_REPLICATION_SLOT test > > is added in v68-0004 but I think having some in 0002 would make sense too). > > > > The subscription tests in v68-0003 will test this functionality. The > one advantage of adding separate tests for this is that if in the > future we extend this replication command, it could be convenient to > test various options. However, the same could be said about existing > replication commands as well. I initially did check for "START_REPLICATION" and I saw it's part of 006_logical_decoding.pl (but did not check all the "REPLICATION" commands). That said, it's more a Nit and I think it's fine with having the test in v68-0004 (as it is currently done) + the ones in v68-0003. > But is it worth having extra tests which > will be anyway covered in the next commit in a few days? > > I understand that it is a good idea and makes one comfortable to have > tests for each separate commit but OTOH, in the longer term it will > just be adding more test time without achieving much benefit. I think > we can tell explicitly in the commit message of this patch that the > subsequent commit will cover the tests for this functionality Yeah, I think that's enough (at least someone reading the commit message, the diff changes and not following this dedicated thread closely would know the lack of test is not a miss). > One minor comment on 0002: > + so that logical replication can be resumed after failover. > + > > Can we move this and similar comments or doc changes to the later 0004 > patch where we are syncing the slots? Sure. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
On Thu, Jan 25, 2024 at 6:09 PM Amit Langote wrote: > On Wed, Jan 24, 2024 at 10:11 PM Amit Langote wrote: > > I still need to take a look at your other report regarding typmod but > > I'm out of energy today. > > The attached updated patch should address one of the concerns -- > JSON_QUERY() should now work appropriately with RETURNING type with > typmod whether or OMIT QUOTES is specified. > > But I wasn't able to address the problems with RETURNING > record_type_with_typmod, that is, the following example you shared > upthread: > > create domain char3_domain_not_null as char(3) NOT NULL; > create domain hello as text not null check (value = 'hello'); > create domain int42 as int check (value = 42); > create type comp_domain_with_typmod AS (a char3_domain_not_null, b int42); > select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning > comp_domain_with_typmod); > json_value > > > (1 row) > > select json_value(jsonb'{"rec": "(abcd,42)"}', '$.rec' returning > comp_domain_with_typmod error on error); > ERROR: value too long for type character(3) > > select json_value(jsonb'{"rec": "abcd"}', '$.rec' returning > char3_domain_not_null error on error); > json_value > > abc > (1 row) > > The problem with returning comp_domain_with_typmod from json_value() > seems to be that it's using a text-to-record CoerceViaIO expression > picked from JsonExpr.item_coercions, which behaves differently than > the expression tree that the following uses: > > select ('abcd', 42)::comp_domain_with_typmod; >row > -- > (abc,42) > (1 row) Oh, it hadn't occurred to me to check what trying to coerce a "string" containing the record literal would do: select '(''abcd'', 42)'::comp_domain_with_typmod; ERROR: value too long for type character(3) LINE 1: select '(''abcd'', 42)'::comp_domain_with_typmod; which is the same thing as what the JSON_QUERY() and JSON_VALUE() are running into. So, it might be fair to think that the error is not a limitation of the SQL/JSON patch but an underlying behavior that it has to accept as is. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: UUID v7
Hi, > Postgres always was a bit hackerish, allowing slightly more then is safe. > I.e. you can define immutable function that is not really immutable, turn off > autovacuum or fsync. Why bother with safety guards here? > My opinion is that we should have this function to extract timestamp. Even if > it can return strange values for imprecise RFC implementation. Completely agree. Users that don't like or don't need it can pretend there are no uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide them however, users that need them will end up writing their own probably buggy and not compatible implementations. That would be much worse. > So +1 for erroring when you provide a timestamp outside of that range > (either too far in the past or too far in the future). > > OK, it seems like we have some consensus on ERRORing.. > > Do we have any other open items? Does v13 address all open items? Maybe let’s > compose better error message? > > +1 for erroring when ts is outside range. > > v13 looks good for me. I think we have reached a optimal compromise. Andrey, many thanks for the updated patch. LGTM, cfbot is happy and I don't think we have any open items left. So changing CF entry status back to RfC. -- Best regards, Aleksander Alekseev
Re: Add bump memory context type and use it for tuplesorts
On Mon, 6 Nov 2023 at 19:54, Matthias van de Meent wrote: > > On Tue, 11 Jul 2023 at 01:51, David Rowley wrote: >> >> On Tue, 27 Jun 2023 at 21:19, David Rowley wrote: >>> I've attached the bump allocator patch and also the script I used to >>> gather the performance results in the first 2 tabs in the attached >>> spreadsheet. >> >> I've attached a v2 patch which changes the BumpContext a little to >> remove some of the fields that are not really required. There was no >> need for the "keeper" field as the keeper block always comes at the >> end of the BumpContext as these are allocated in a single malloc(). >> The pointer to the "block" also isn't really needed. This is always >> the same as the head element in the blocks dlist. >> +++ b/src/backend/utils/mmgr/bump.c >> [...] >> +MemoryContext >> +BumpContextCreate(MemoryContext parent, >> + const char *name, >> + Size minContextSize, >> + Size initBlockSize, >> + Size maxBlockSize) >> [...] >> +/* Determine size of initial block */ >> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ + >> +if (minContextSize != 0) >> +allocSize = Max(allocSize, minContextSize); >> +else >> +allocSize = Max(allocSize, initBlockSize); Shouldn't this be the following, considering the meaning of "initBlockSize"? +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ + +Bump_CHUNKHDRSZ + initBlockSize; +if (minContextSize != 0) +allocSize = Max(allocSize, minContextSize); >> + * BumpFree >> + *Unsupported. >> [...] >> + * BumpRealloc >> + *Unsupported. Rather than the error, can't we make this a no-op (potentially optionally, or in a different memory context?) What I mean is, I get that it is an easy validator check that the code that uses this context doesn't accidentally leak memory through assumptions about pfree, but this does make this memory context unusable for more generic operations where leaking a little memory is preferred over the overhead of other memory contexts, as MemoryContextReset is quite cheap in the grand scheme of things. E.g. using a bump context in btree descent could speed up queries when we use compare operators that do allocate memory (e.g. numeric, text), because btree operators must not leak memory and thus always have to manually keep track of all allocations, which can be expensive. I understand that allowing pfree/repalloc in bump contexts requires each allocation to have a MemoryChunk prefix in overhead, but I think it's still a valid use case to have a very low overhead allocator with no-op deallocator (except context reset). Do you have performance comparison results between with and without the overhead of MemoryChunk? Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: UUID v7
Hi, > Andrey, many thanks for the updated patch. > > LGTM, cfbot is happy and I don't think we have any open items left. So > changing CF entry status back to RfC. PFA v14. I changed: ``` elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes"); ``` ... to: ``` ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("Time argument of UUID v7 is outside of the valid range"))); ``` Which IMO tells a bit more to the average user and is translatable. > At a quick glance, the patch needs improving English, IMO. Agree. We could use some help from a native English speaker for this. -- Best regards, Aleksander Alekseev v14-0001-Implement-UUID-v7.patch Description: Binary data
Re: Use of backup_label not noted in log
On 1/25/24 04:12, Michael Paquier wrote: On Mon, Jan 22, 2024 at 04:36:27PM +0900, Michael Paquier wrote: + if (ControlFile->backupStartPoint != InvalidXLogRecPtr) Nit 1: I would use XLogRecPtrIsInvalid here. + ereport(LOG, + (errmsg("completed backup recovery with redo LSN %X/%X", + LSN_FORMAT_ARGS(oldBackupStartPoint; Nit 2: How about adding backupEndPoint in this LOG? That would give: "completed backup recovery with redo LSN %X/%X and end LSN %X/%X". Hearing nothing, I've just applied a version of the patch with these two modifications on HEAD. If this needs tweaks, just let me know. I had planned to update the patch this morning -- so thanks for doing that. I think having the end point in the message makes perfect sense. I would still advocate for a back patch here. It is frustrating to get logs from users that just say: LOG: invalid checkpoint record PANIC: could not locate a valid checkpoint record It would be very helpful to know what the checkpoint record LSN was in this case. Regards, -David
Re: Remove unused fields in ReorderBufferTupleBuf
Hi, > > Here is the corrected patch. > > Thank you for updating the patch! I have some comments: Thanks for the review. > -tuple = (ReorderBufferTupleBuf *) > +tuple = (HeapTuple) > MemoryContextAlloc(rb->tup_context, > - > sizeof(ReorderBufferTupleBuf) + > + HEAPTUPLESIZE + > MAXIMUM_ALIGNOF + > alloc_len); > -tuple->alloc_tuple_size = alloc_len; > -tuple->tuple.t_data = ReorderBufferTupleBufData(tuple); > +tuple->t_data = (HeapTupleHeader)((char *)tuple + HEAPTUPLESIZE); > > Why do we need to add MAXIMUM_ALIGNOF? since HEAPTUPLESIZE is the > MAXALIGN'd size, it seems we don't need it. heap_form_tuple() does a > similar thing but it doesn't add it. Indeed. I gave it a try and nothing crashed, so it appears that MAXIMUM_ALIGNOF is not needed. > --- > xl_parameter_change *xlrec = > -(xl_parameter_change *) > XLogRecGetData(buf->record); > +(xl_parameter_change *) > XLogRecGetData(buf->record); > > Unnecessary change. That's pgindent. Fixed. > --- > -/* > - * Free a ReorderBufferTupleBuf. > - */ > -void > -ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple) > -{ > -pfree(tuple); > -} > - > > Why does ReorderBufferReturnTupleBuf need to be moved from > reorderbuffer.c to reorderbuffer.h? It seems not related to this > refactoring patch so I think we should do it in a separate patch if we > really want it. I'm not sure it's necessary, though. OK, fixed. -- Best regards, Aleksander Alekseev v6-0001-Remove-ReorderBufferTupleBuf-structure.patch Description: Binary data
Re: Use of backup_label not noted in log
Hi, On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote: > I would still advocate for a back patch here. It is frustrating to get logs > from users that just say: > > LOG: invalid checkpoint record > PANIC: could not locate a valid checkpoint record > > It would be very helpful to know what the checkpoint record LSN was in this > case. I agree. Michael
Re: [PATCH] Add native windows on arm64 support
On Wed, 24 Jan 2024 at 19:03, Michael Paquier wrote: > On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: > > I managed to get it to build the vcvarsall arch needs to be x64. I need > to > > add some options, but the patch above needs to be applied to build it. > > Nice. If I may ask, what kind of host and/or configuration have you > used to reach a state where the code can be compiled and run tests > with meson? If you have found specific steps, it may be a good thing > to document that on the wiki, say around [1]. > I am running a mac-mini M2 with parallels running windows pro 11 The only thing required is this patch. Running in a native developer prompt meson setup build --prefix=c:\postgres cd build ninja ninja install ninja test all run without errors I also have buildfarm running and will run that today Dave > > Perhaps you have not included TAP? It may be fine in terms of runtime > checks and coverage. > Does TAP have to be explicitly added to the meson build ? Dave
Re: Unnecessary smgropen in {heapam_relation,index}_copy_data?
Hi, > I find heapam_relation_copy_data() and index_copy_data() have the following > code: > > dstrel = smgropen(*newrlocator, rel->rd_backend); > > ... > > RelationCreateStorage(*newrlocator, rel->rd_rel->relpersistence, > true); > > The smgropen() is also called by RelationCreateStorage(), why should we call > smgropen() explicitly here? > > I try to remove the smgropen(), and all tests passed. That's a very good question. Note that the second argument of smgropen() used to create dstrel changes after applying your patch. I'm not 100% sure whether this is significant or not. I added your patch to the nearest open commitfest so that we will not lose it: https://commitfest.postgresql.org/47/4794/ -- Best regards, Aleksander Alekseev
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 08:31, Dave Cramer wrote: > > > On Wed, 24 Jan 2024 at 19:03, Michael Paquier wrote: > >> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: >> > I managed to get it to build the vcvarsall arch needs to be x64. I need >> to >> > add some options, but the patch above needs to be applied to build it. >> >> Nice. If I may ask, what kind of host and/or configuration have you >> used to reach a state where the code can be compiled and run tests >> with meson? If you have found specific steps, it may be a good thing >> to document that on the wiki, say around [1]. >> > > I am running a mac-mini M2 with parallels running windows pro 11 > The only thing required is this patch. Running in a native developer > prompt > > meson setup build --prefix=c:\postgres > cd build > ninja > ninja install > ninja test > > all run without errors > I also have buildfarm running and will run that today > > Dave > >> >> Perhaps you have not included TAP? It may be fine in terms of runtime >> checks and coverage. >> > > Does TAP have to be explicitly added to the meson build ? > > Dave > I tried running my buildfarm using my git repo and my branch, but get the following error Status Line: 492 bad branch parameter Content: bad branch parameter fix_arm Web txn failed with status: 1
Re: Emit fewer vacuum records by reaping removable tuples during pruning
On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman wrote: > v12 attached has my attempt at writing better comments for this > section of lazy_scan_heap(). + /* + * If we didn't get the cleanup lock and the page is not new or empty, + * we can still collect LP_DEAD items in the dead_items array for + * later vacuuming, count live and recently dead tuples for vacuum + * logging, and determine if this block could later be truncated. If + * we encounter any xid/mxids that require advancing the + * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and + * call lazy_scan_prune(). + */ I like this comment. I would probably drop "and the page is not new or empty" from it since that's really speaking to the previous bit of code, but it wouldn't be the end of the world to keep it, either. /* - * Prune, freeze, and count tuples. + * If we got a cleanup lock, we can prune and freeze tuples and + * defragment the page. If we didn't get a cleanup lock, we will still + * consider whether or not to update the FSM. * - * Accumulates details of remaining LP_DEAD line pointers on page in - * dead_items array. This includes LP_DEAD line pointers that we - * pruned ourselves, as well as existing LP_DEAD line pointers that - * were pruned some time earlier. Also considers freezing XIDs in the - * tuple headers of remaining items with storage. It also determines - * if truncating this block is safe. + * Like lazy_scan_noprune(), lazy_scan_prune() will count + * recently_dead_tuples and live tuples for vacuum logging, determine + * if the block can later be truncated, and accumulate the details of + * remaining LP_DEAD line pointers on the page in the dead_items + * array. These dead items include those pruned by lazy_scan_prune() + * as well we line pointers previously marked LP_DEAD. */ To me, the first paragraph of this one misses the mark. What I thought we should be saying here was something like "If we don't have a cleanup lock, the code above has already processed this page to the extent that is possible. Otherwise, we either got the cleanup lock initially and have not processed the page yet, or we didn't get it initially, attempted to process it without the cleanup lock, and decided we needed one after all. Either way, if we now have the lock, we must prune, freeze, and count tuples." The second paragraph seems fine. - * Note: It's not in fact 100% certain that we really will call - * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index - * vacuuming (and so must skip heap vacuuming). This is deemed okay - * because it only happens in emergencies, or when there is very - * little free space anyway. (Besides, we start recording free space - * in the FSM once index vacuuming has been abandoned.) Here's a suggestion from me: Note: In corner cases, it's possible to miss updating the FSM entirely. If index vacuuming is currently enabled, we'll skip the FSM update now. But if failsafe mode is later activated, disabling index vacuuming, there will also be no opportunity to update the FSM later, because we'll never revisit this page. Since updating the FSM is desirable but not absolutely required, that's OK. I think this expresses the same sentiment as the current comment, but IMHO more clearly. The one part of the current comment that I don't understand at all is the remark about "when there is very little freespace anyway". I get that if the failsafe activates we won't come back to the page, which is the "only happens in emergencies" part of the existing comment. But the current phrasing makes it sound like there is a second case where it can happen -- "when there is very little free space anyway" -- and I don't know what that is talking about. If it's important, we should try to make it clearer. We could also just decide to keep this entire paragraph as it is for purposes of the present patch. The part I really thought needed adjusting was "Prune, freeze, and count tuples." -- Robert Haas EDB: http://www.enterprisedb.com
Re: Emit fewer vacuum records by reaping removable tuples during pruning
On Thu, Jan 25, 2024 at 8:57 AM Robert Haas wrote: > > On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman > wrote: > > v12 attached has my attempt at writing better comments for this > > section of lazy_scan_heap(). > > + /* > + * If we didn't get the cleanup lock and the page is not new or empty, > + * we can still collect LP_DEAD items in the dead_items array for > + * later vacuuming, count live and recently dead tuples for vacuum > + * logging, and determine if this block could later be truncated. If > + * we encounter any xid/mxids that require advancing the > + * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and > + * call lazy_scan_prune(). > + */ > > I like this comment. I would probably drop "and the page is not new or > empty" from it since that's really speaking to the previous bit of > code, but it wouldn't be the end of the world to keep it, either. Yes, probably best to get rid of the part about new or empty. > /* > - * Prune, freeze, and count tuples. > + * If we got a cleanup lock, we can prune and freeze tuples and > + * defragment the page. If we didn't get a cleanup lock, we will still > + * consider whether or not to update the FSM. > * > - * Accumulates details of remaining LP_DEAD line pointers on page in > - * dead_items array. This includes LP_DEAD line pointers that we > - * pruned ourselves, as well as existing LP_DEAD line pointers that > - * were pruned some time earlier. Also considers freezing XIDs in the > - * tuple headers of remaining items with storage. It also determines > - * if truncating this block is safe. > + * Like lazy_scan_noprune(), lazy_scan_prune() will count > + * recently_dead_tuples and live tuples for vacuum logging, determine > + * if the block can later be truncated, and accumulate the details of > + * remaining LP_DEAD line pointers on the page in the dead_items > + * array. These dead items include those pruned by lazy_scan_prune() > + * as well we line pointers previously marked LP_DEAD. > */ > > To me, the first paragraph of this one misses the mark. What I thought > we should be saying here was something like "If we don't have a > cleanup lock, the code above has already processed this page to the > extent that is possible. Otherwise, we either got the cleanup lock > initially and have not processed the page yet, or we didn't get it > initially, attempted to process it without the cleanup lock, and > decided we needed one after all. Either way, if we now have the lock, > we must prune, freeze, and count tuples." I see. Your suggestion makes sense. The sentence starting with "Otherwise" is a bit long. I started to lose the thread at "decided we needed one after all". You previously referred to the cleanup lock as "it" -- once you start referring to it as "one", I as the future developer am no longer sure we are talking about the cleanup lock (as opposed to the page or something else). > - * Note: It's not in fact 100% certain that we really will call > - * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index > - * vacuuming (and so must skip heap vacuuming). This is deemed okay > - * because it only happens in emergencies, or when there is very > - * little free space anyway. (Besides, we start recording free space > - * in the FSM once index vacuuming has been abandoned.) > > Here's a suggestion from me: > > Note: In corner cases, it's possible to miss updating the FSM > entirely. If index vacuuming is currently enabled, we'll skip the FSM > update now. But if failsafe mode is later activated, disabling index > vacuuming, there will also be no opportunity to update the FSM later, > because we'll never revisit this page. Since updating the FSM is > desirable but not absolutely required, that's OK. > > I think this expresses the same sentiment as the current comment, but > IMHO more clearly. The one part of the current comment that I don't > understand at all is the remark about "when there is very little > freespace anyway". I get that if the failsafe activates we won't come > back to the page, which is the "only happens in emergencies" part of > the existing comment. But the current phrasing makes it sound like > there is a second case where it can happen -- "when there is very > little free space anyway" -- and I don't know what that is talking > about. If it's important, we should try to make it clearer. > > We could also just decide to keep this entire paragraph as it is for > purposes of the present patch. The part I really thought needed > adjusting was "Prune, freeze, and count tuples." I think it would be nice to clarify this comment. I think the "when there is little free space anyway" is referring to the case in lazy_vacuum() where we set do_index_vacuuming to false because "there are almost zero TIDs". I initially thought it was saying that in the failsafe vacuum case the pages whose free space we wouldn't record in the FSM have little free space anyway -- which I didn't get. But then I looked at
Re: remaining sql/json patches
On Thu, Jan 25, 2024 at 7:54 PM Amit Langote wrote: > > > > > The problem with returning comp_domain_with_typmod from json_value() > > seems to be that it's using a text-to-record CoerceViaIO expression > > picked from JsonExpr.item_coercions, which behaves differently than > > the expression tree that the following uses: > > > > select ('abcd', 42)::comp_domain_with_typmod; > >row > > -- > > (abc,42) > > (1 row) > > Oh, it hadn't occurred to me to check what trying to coerce a "string" > containing the record literal would do: > > select '(''abcd'', 42)'::comp_domain_with_typmod; > ERROR: value too long for type character(3) > LINE 1: select '(''abcd'', 42)'::comp_domain_with_typmod; > > which is the same thing as what the JSON_QUERY() and JSON_VALUE() are > running into. So, it might be fair to think that the error is not a > limitation of the SQL/JSON patch but an underlying behavior that it > has to accept as is. > Hi, I reconciled with these cases. What bugs me now is the first query of the following 4 cases (for comparison). SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING char(3) omit quotes); SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING char(3) keep quotes); SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING text omit quotes); SELECT JSON_QUERY(jsonb '[1,2]', '$' RETURNING text keep quotes); I did some minor refactoring on the function coerceJsonFuncExprOutput. it will make the following queries return null instead of error. NULL is the return of json_value. SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int2); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int4); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING int8); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING bool); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING numeric); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING real); SELECT JSON_QUERY(jsonb '"123"', '$' RETURNING float8); v1-0001-minor-refactor-coerceJsonFuncExprOutput.no-cfbot Description: Binary data
Re: A compiling warning in jsonb_populate_record_valid
Amit Langote writes: > On Thu, Jan 25, 2024 at 2:59 PM Richard Guo wrote: >> I came across a warning when building master (a044e61f1b) on old GCC >> (4.8.5). > Will apply the attached, which does this: > - return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); > + return BoolGetDatum(!escontext.error_occurred); -1 please. We should not break that abstraction for the sake of ignorable warnings on ancient compilers. regards, tom lane
Re: cleanup patches for incremental backup
On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart wrote: > That seems like a reasonable starting point. Even if it doesn't help > determine the root cause, it should at least help rule out concurrent > summary removal. Here is a patch for that. -- Robert Haas EDB: http://www.enterprisedb.com v1-0001-Temporary-patch-to-help-debug-pg_walsummary-test-.patch Description: Binary data
Re: A compiling warning in jsonb_populate_record_valid
On Thu, Jan 25, 2024 at 23:57 Tom Lane wrote: > Amit Langote writes: > > On Thu, Jan 25, 2024 at 2:59 PM Richard Guo > wrote: > >> I came across a warning when building master (a044e61f1b) on old GCC > >> (4.8.5). > > > Will apply the attached, which does this: > > > - return BoolGetDatum(!SOFT_ERROR_OCCURRED(&escontext)); > > + return BoolGetDatum(!escontext.error_occurred); > > -1 please. We should not break that abstraction for the sake > of ignorable warnings on ancient compilers. Ignoring the warning was my 1st thought too, because an old discussion I found about the warning was too old (2011). The style I adopted in my “fix” is used in a few other places too, so I thought I might as well go for it. Anyway, I’m fine with reverting. >
Re: A compiling warning in jsonb_populate_record_valid
Amit Langote writes: > On Thu, Jan 25, 2024 at 23:57 Tom Lane wrote: >> -1 please. We should not break that abstraction for the sake >> of ignorable warnings on ancient compilers. > Ignoring the warning was my 1st thought too, because an old discussion I > found about the warning was too old (2011). The style I adopted in my > “fix” is used in a few other places too, so I thought I might as well > go for it. Oh, well maybe I'm missing some context. What comparable places did you see? regards, tom lane
Re: Emit fewer vacuum records by reaping removable tuples during pruning
On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman wrote: > > To me, the first paragraph of this one misses the mark. What I thought > > we should be saying here was something like "If we don't have a > > cleanup lock, the code above has already processed this page to the > > extent that is possible. Otherwise, we either got the cleanup lock > > initially and have not processed the page yet, or we didn't get it > > initially, attempted to process it without the cleanup lock, and > > decided we needed one after all. Either way, if we now have the lock, > > we must prune, freeze, and count tuples." > > I see. Your suggestion makes sense. The sentence starting with > "Otherwise" is a bit long. I started to lose the thread at "decided we > needed one after all". You previously referred to the cleanup lock as > "it" -- once you start referring to it as "one", I as the future > developer am no longer sure we are talking about the cleanup lock (as > opposed to the page or something else). Ok... trying again: If we have a cleanup lock, we must now prune, freeze, and count tuples. We may have acquired the cleanup lock originally, or we may have gone back and acquired it after lazy_scan_noprune() returned false. Either way, the page hasn't been processed yet. > I think it would be nice to clarify this comment. I think the "when > there is little free space anyway" is referring to the case in > lazy_vacuum() where we set do_index_vacuuming to false because "there > are almost zero TIDs". I initially thought it was saying that in the > failsafe vacuum case the pages whose free space we wouldn't record in > the FSM have little free space anyway -- which I didn't get. But then > I looked at where we set do_index_vacuuming to false. Oh... wow. That's kind of confusing; somehow I was thinking we were talking about free space on the disk, rather than newly free space in pages that could be added to the FSM. And it seems really questionable whether that case is OK. I mean, in the emergency case, fine, whatever, we should do whatever it takes to get the system back up, and it should barely ever happen on a well-configured system. But this case could happen regularly, and losing track of free space could easily cause bloat. This might be another argument for moving FSM updates to the first heap pass, but that's a separate task from fixing the comment. > As for the last sentence starting with "Besides", even with Peter's > explanation I still am not sure what it should say. There are blocks > whose free space we don't record in the first heap pass. Then, due to > skipping index vacuuming and the second heap pass, we also don't > record their free space in the second heap pass. I think he is saying > that once we set do_index_vacuuming to false, we will stop skipping > updating the FSM after the first pass for future blocks. So, future > blocks will have their free space recorded in the FSM. But that feels > self-evident. Yes, I don't think that necessarily needs to be mentioned here. > The more salient point is that there are some blocks > whose free space is not recorded (those whose first pass happened > before unsetting do_index_vacuuming and whose second pass did not > happen before do_index_vacuuming is unset). The extra sentence made me > think there was some way we might go back and record free space for > those blocks, but that is not true. I don't really see why that sentence made you think that, but it's not important. I agree with you about what point we need to emphasize here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Unnecessary smgropen in {heapam_relation,index}_copy_data?
On Thu, 25 Jan 2024 at 21:43, Aleksander Alekseev wrote: > Hi, > >> I find heapam_relation_copy_data() and index_copy_data() have the following >> code: >> >> dstrel = smgropen(*newrlocator, rel->rd_backend); >> >> ... >> >> RelationCreateStorage(*newrlocator, rel->rd_rel->relpersistence, >> true); >> >> The smgropen() is also called by RelationCreateStorage(), why should we call >> smgropen() explicitly here? >> >> I try to remove the smgropen(), and all tests passed. > > That's a very good question. Note that the second argument of > smgropen() used to create dstrel changes after applying your patch. > I'm not 100% sure whether this is significant or not. > Thanks for the review. According the comments of RelationData->rd_backend, it is the backend id, if the relation is temporary. The differnece is RelationCreateStorage() uses relpersistence to determinate the backend id. > I added your patch to the nearest open commitfest so that we will not lose it: > > https://commitfest.postgresql.org/47/4794/ Thank you.
Re: A compiling warning in jsonb_populate_record_valid
On Fri, Jan 26, 2024 at 0:15 Tom Lane wrote: > Amit Langote writes: > > On Thu, Jan 25, 2024 at 23:57 Tom Lane wrote: > >> -1 please. We should not break that abstraction for the sake > >> of ignorable warnings on ancient compilers. > > > Ignoring the warning was my 1st thought too, because an old discussion I > > found about the warning was too old (2011). The style I adopted in my > > “fix” is used in a few other places too, so I thought I might as well > > go for it. > > Oh, well maybe I'm missing some context. What comparable places did > you see? Sorry, not on my computer right now so can’t paste any code, but I was able to find a couple of functions (using Google ;) that refer to error_occurred directly for one reason or another: process_integer_literal(), xml_is_document()
Re: More new SQL/JSON item methods
On 2024-01-18 Th 09:25, Jeevan Chalke wrote: On Thu, Jan 18, 2024 at 1:03 AM Peter Eisentraut wrote: On 17.01.24 10:03, Jeevan Chalke wrote: > I added unary '+' and '-' support as well and thus thought of having > separate rules altogether rather than folding those in. > > Per SQL standard, the precision and scale arguments are unsigned > integers, so unary plus and minus signs are not supported. So my patch > removes that support, but I didn't adjust the regression tests for that. > > > However, PostgreSQL numeric casting does support a negative scale. Here > is an example: > > # select '12345'::numeric(4,-2); > numeric > - > 12300 > (1 row) > > And thus thought of supporting those. > Do we want this JSON item method to behave differently here? Ok, it would make sense to support this in SQL/JSON as well. OK. So with this, we don't need changes done in your 0001 patches. > I will merge them all into one and will try to keep them in the order > specified in sql_features.txt. > However, for documentation, it makes more sense to keep them in logical > order than the alphabetical one. What are your views on this? The documentation can be in a different order. Thanks, Andrew and Peter for the confirmation. Attached merged single patch along these lines. Thanks, I have pushed this. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: A compiling warning in jsonb_populate_record_valid
Amit Langote writes: > On Fri, Jan 26, 2024 at 0:15 Tom Lane wrote: >> Amit Langote writes: >>> Ignoring the warning was my 1st thought too, because an old discussion I >>> found about the warning was too old (2011). The style I adopted in my >>> “fix” is used in a few other places too, so I thought I might as well >>> go for it. >> Oh, well maybe I'm missing some context. What comparable places did >> you see? > Sorry, not on my computer right now so can’t paste any code, but I was able > to find a couple of functions (using Google ;) that refer to error_occurred > directly for one reason or another: OK, looking around, it seems like our pattern is that direct access to escontext.error_occurred is okay in the same function that sets up the escontext (so that there's no possibility of a NULL pointer). So this change is fine and I withdraw my objection. Sorry for the noise. regards, tom lane
Re: Patch: Improve Boolean Predicate JSON Path Docs
"David E. Wheeler" writes: > On Jan 24, 2024, at 16:32, Tom Lane wrote: >> + >> + Predicate check expressions are required in the >> + @@ operator (and the >> + jsonb_path_match function), and should not be >> used >> + with the @? operator (or the >> + jsonb_path_exists function). >> + >> + >> + > I had this bit here: > >Conversely, non-predicate jsonpath expressions should not > be >used with the @@ operator (or the >jsonb_path_match function). > I changed the preceding para to say "... check expressions are required in ...", which I thought was sufficient to cover that. Also, the tabular description of the operator tells you not to do it. > What do you think of also dropping the article from all the references to > “the strict mode” or “the lax mode”, to make them “strict mode” and “lax > mode”, respectively? Certainly most of 'em don't need it. I'll make it so. regards, tom lane
Re: make dist using git archive
On 22.01.24 21:04, Tristan Partin wrote: +git = find_program('git', required: false, native: true, disabler: true) +bzip2 = find_program('bzip2', required: false, native: true, disabler: true) This doesn't need to be a disabler. git is fine as-is. See later comment. Disablers only work like you are expecting when they are used like how git is used. Once you call a method like .path(), all bets are off. ok, fixed +distdir = meson.project_name() + '-' + meson.project_version() + +check_dirty_index = run_target('check-dirty-index', + command: [git, 'diff-index', '--quiet', 'HEAD']) Seems like you might want to add -C here too? done +tar_bz2 = custom_target('tar.bz2', + build_always_stale: true, + command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' + bzip2.path() + ' -c', 'archive', + '--format', 'tar.bz2', + '--prefix', distdir + '/', - '-o', '@BUILD_ROOT@/@OUTPUT@', + '-o', join_paths(meson.build_root(), '@OUTPUT@'), This will generate the tarballs in the build directory. Do the same for the previous target. Tested locally. Fixed, thanks. I had struggled with this one. + 'HEAD', '.'], + install: false, + output: distdir + '.tar.bz2', +) The bz2 target should be wrapped in an `if bzip2.found()`. Well, I think we want the dist step to fail if bzip2 is not there. At least that is the current expectation. +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2]) Are you intending for the check_dirty_index target to prohibit the other two targets from running? Currently that is not the case. Yes, that was the hope, and that's how the make dist variant works. But I couldn't figure this out with meson. Also, the above actually also doesn't work with older meson versions, so I had to comment this out to get CI to work. If it is what you intend, use a stamp file or something to indicate a relationship. Alternatively, inline the git diff-index into the other commands. These might also do better as external scripts. It would reduce duplication between the autotools and Meson builds. Yeah, maybe that's a direction. The updated patch also supports vpath builds with make now. I have also added a CI patch, for amusement. Maybe we'll want to keep it, though.From 13612f9a1c486e8acfe4156a7bc069a66fd30e77 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 25 Jan 2024 12:28:28 +0100 Subject: [PATCH v1 1/2] make dist uses git archive Discussion: https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org --- GNUmakefile.in | 34 -- meson.build| 41 + 2 files changed, 53 insertions(+), 22 deletions(-) diff --git a/GNUmakefile.in b/GNUmakefile.in index eba569e930e..680c765dd73 100644 --- a/GNUmakefile.in +++ b/GNUmakefile.in @@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers submake-libpgport distdir= postgresql-$(VERSION) dummy = =install= +GIT = git + dist: $(distdir).tar.gz $(distdir).tar.bz2 - rm -rf $(distdir) - -$(distdir).tar: distdir - $(TAR) chf $@ $(distdir) - -.INTERMEDIATE: $(distdir).tar - -distdir-location: - @echo $(distdir) - -distdir: - rm -rf $(distdir)* $(dummy) - for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \ - file=`expr X$$x : 'X\./\(.*\)'`; \ - if test -d "$(top_srcdir)/$$file" ; then \ - mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \ - else \ - ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \ - || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \ - fi || exit; \ - done - $(MAKE) -C $(distdir) distclean + +.PHONY: check-dirty-index +check-dirty-index: + $(GIT) -C $(srcdir) diff-index --quiet HEAD + +$(distdir).tar.gz: check-dirty-index + $(GIT) -C $(srcdir) archive --format tar.gz --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ + +$(distdir).tar.bz2: check-dirty-index + $(GIT) -C $(srcdir) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ distcheck: dist rm -rf $(dummy) diff --git a/meson.build b/meson.build index 55184db2488..43884e7cfdd 100644 --- a/meson.build +++ b/meson.build @@ -3348,6 +3348,47 @@ run_target('help', +### +# Distribution archive +### + +git = find_program('git', required: false, native: true) +bzip2 = find_program('bzip2', required: false, native: true) + +distdir = meson.project_name() + '-' + meson.project_version() + +check_dirty_index = run_target('check-dirty-index', + command: [git, '-C', '@SOURCE_ROOT@', +
Re: cleanup patches for incremental backup
On Thu, Jan 25, 2024 at 10:06:41AM -0500, Robert Haas wrote: > On Wed, Jan 24, 2024 at 2:39 PM Nathan Bossart > wrote: >> That seems like a reasonable starting point. Even if it doesn't help >> determine the root cause, it should at least help rule out concurrent >> summary removal. > > Here is a patch for that. LGTM. The only thing I might add is the cutoff_time in that LOG. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Emit fewer vacuum records by reaping removable tuples during pruning
On Thu, Jan 25, 2024 at 10:19 AM Robert Haas wrote: > > On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman > wrote: > > > To me, the first paragraph of this one misses the mark. What I thought > > > we should be saying here was something like "If we don't have a > > > cleanup lock, the code above has already processed this page to the > > > extent that is possible. Otherwise, we either got the cleanup lock > > > initially and have not processed the page yet, or we didn't get it > > > initially, attempted to process it without the cleanup lock, and > > > decided we needed one after all. Either way, if we now have the lock, > > > we must prune, freeze, and count tuples." > > > > I see. Your suggestion makes sense. The sentence starting with > > "Otherwise" is a bit long. I started to lose the thread at "decided we > > needed one after all". You previously referred to the cleanup lock as > > "it" -- once you start referring to it as "one", I as the future > > developer am no longer sure we are talking about the cleanup lock (as > > opposed to the page or something else). > > Ok... trying again: > > If we have a cleanup lock, we must now prune, freeze, and count > tuples. We may have acquired the cleanup lock originally, or we may > have gone back and acquired it after lazy_scan_noprune() returned > false. Either way, the page hasn't been processed yet. Cool. I might add "successfully" or "fully" to "Either way, the page hasn't been processed yet" > > I think it would be nice to clarify this comment. I think the "when > > there is little free space anyway" is referring to the case in > > lazy_vacuum() where we set do_index_vacuuming to false because "there > > are almost zero TIDs". I initially thought it was saying that in the > > failsafe vacuum case the pages whose free space we wouldn't record in > > the FSM have little free space anyway -- which I didn't get. But then > > I looked at where we set do_index_vacuuming to false. > > Oh... wow. That's kind of confusing; somehow I was thinking we were > talking about free space on the disk, rather than newly free space in > pages that could be added to the FSM. Perhaps I misunderstood. I interpreted it to refer to the bypass optimization. > And it seems really questionable > whether that case is OK. I mean, in the emergency case, fine, > whatever, we should do whatever it takes to get the system back up, > and it should barely ever happen on a well-configured system. But this > case could happen regularly, and losing track of free space could > easily cause bloat. > > This might be another argument for moving FSM updates to the first > heap pass, but that's a separate task from fixing the comment. Yes, it seems we could miss recording space freed in the first pass if we never end up doing a second pass. consider_bypass_optimization is set to false only if index cleanup is explicitly enabled or there are dead items accumulated for vacuum's second pass at some point. - Melanie
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Here's a touched-up version of this patch. First, PROC_GLOBAL->clogGroupFirst and SlruCtl->latest_page_number change from being protected by locks to being atomics, but there's no mention of considering memory barriers that they should have. Looking at the code, I think the former doesn't need any additional barriers, but latest_page_number is missing some, which I have added. This deserves another careful look. Second and most user visibly, the GUC names seem to have been chosen based on the source-code variables, which have never meant to be user- visible. So I renamed a few: xact_buffers -> transaction_buffers subtrans_buffers -> subtransaction_buffers serial_buffers -> serializable_buffers commit_ts_buffers -> commit_timestamp_buffers (unchanged: multixact_offsets_buffers, multixact_members_buffers, notify_buffers) I did this explicitly trying to avoid using the term SLRU in a user-visible manner, because what do users care? But immediately after doing this I realized that we already have pg_stat_slru, so maybe the cat is already out of the bag, and so perhaps we should name these GUCS as, say slru_transaction_buffers? That may make the connection between these things a little more explicit. (I do think we need to add cross-links in the documentation of those GUCs to the pg_stat_slru docs and vice-versa.) Another thing that bothered me a bit is that we have auto-tuning for transaction_buffers and commit_timestamp_buffers, but not for subtransaction_buffers. (Autotuning means you set the GUC to 0 and it scales with shared_buffers). I don't quite understand what's the reason for the ommision, so I added it for subtrans too. I think it may make sense to do likewise for the multixact ones too, not sure. It doesn't seem worth having that for pg_serial and notify. While messing about with these GUCs I realized that the usage of the show_hook to print what the current number is, when autoturning is used, was bogus: SHOW would print the number of blocks for (say) transaction_buffers, but if you asked it to print (say) multixact_offsets_buffers, it would give a size in MB or kB. I'm sure such an inconsistency would bite us. So, digging around I found that a good way to handle this is to remove the show_hook, and instead call SetConfigOption() at the time when the ShmemInit function is called, with the correct number of buffers determined. This is pretty much what is used for XLOGbuffers, and it works correctly as far as my testing shows. Still with these auto-tuning GUCs, I noticed that the auto-tuning code would continue to grow the buffer sizes with shared_buffers to arbitrarily large values. I added an arbitrary maximum of 1024 (8 MB), which is much higher than the current value of 128; but if you have (say) 30 GB of shared_buffers (not uncommon these days), do you really need 30MB of pg_clog cache? It seems mostly unnecessary ... and you can still set it manually that way if you need it. So, largely I just rewrote those small functions completely. I also made the SGML documentation and postgresql.sample.conf all match what the code actually docs. The whole thing wasn't particularly consistent. I rewrote a bunch of code comments and moved stuff around to appear in alphabetical order, etc. More comment rewriting still pending. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 61038472c5..3e3119865a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2006,6 +2006,145 @@ include_dir 'conf.d' + + commit_timestamp_buffers (integer) + + commit_timestamp_buffers configuration parameter + + + + +Specifies the amount of memory to use to cache the contents of +pg_commit_ts (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 0, which requests +shared_buffers/512 up to 1024 blocks, +but not fewer than 16 blocks. +This parameter can only be set at server start. + + + + + + multixact_members_buffers (integer) + + multixact_members_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_multixact/members (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 32. +This parameter can only be set at server start. + + + + + + multixact_offsets_buffers (integer) + + multixact_offsets_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_multixact/offsets (see +
Re: make dist using git archive
On Thu Jan 25, 2024 at 10:04 AM CST, Peter Eisentraut wrote: On 22.01.24 21:04, Tristan Partin wrote: >> + 'HEAD', '.'], >> + install: false, >> + output: distdir + '.tar.bz2', >> +) > > The bz2 target should be wrapped in an `if bzip2.found()`. The way that this currently works is that you will fail at configure time if bz2 doesn't exist on the system. Meson will try to resolve a .path() method on a NotFoundProgram. You might want to define the bz2 target to just call `exit 1` in this case. if bzip2.found() # do your current target else bz2 = run_target('tar.bz2', command: ['exit', 1]) endif This should cause pgdist to appropriately fail at run time when generating the bz2 tarball. Well, I think we want the dist step to fail if bzip2 is not there. At least that is the current expectation. >> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2]) > > Are you intending for the check_dirty_index target to prohibit the other > two targets from running? Currently that is not the case. Yes, that was the hope, and that's how the make dist variant works. But I couldn't figure this out with meson. Also, the above actually also doesn't work with older meson versions, so I had to comment this out to get CI to work. > If it is what > you intend, use a stamp file or something to indicate a relationship. > Alternatively, inline the git diff-index into the other commands. These > might also do better as external scripts. It would reduce duplication > between the autotools and Meson builds. Yeah, maybe that's a direction. For what it's worth, I run Meson 1.3, and the behavior of generating the tarballs even though it is a dirty tree still occurred. In the new patch you seem to say it was fixed in 0.60. -- Tristan Partin Neon (https://neon.tech)
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On 2024-Jan-25, Alvaro Herrera wrote: > Here's a touched-up version of this patch. > diff --git a/src/backend/storage/lmgr/lwlock.c > b/src/backend/storage/lmgr/lwlock.c > index 98fa6035cc..4a5e05d5e4 100644 > --- a/src/backend/storage/lmgr/lwlock.c > +++ b/src/backend/storage/lmgr/lwlock.c > @@ -163,6 +163,13 @@ static const char *const BuiltinTrancheNames[] = { > [LWTRANCHE_LAUNCHER_HASH] = "LogicalRepLauncherHash", > [LWTRANCHE_DSM_REGISTRY_DSA] = "DSMRegistryDSA", > [LWTRANCHE_DSM_REGISTRY_HASH] = "DSMRegistryHash", > + [LWTRANCHE_COMMITTS_SLRU] = "CommitTSSLRU", > + [LWTRANCHE_MULTIXACTOFFSET_SLRU] = "MultixactOffsetSLRU", > + [LWTRANCHE_MULTIXACTMEMBER_SLRU] = "MultixactMemberSLRU", > + [LWTRANCHE_NOTIFY_SLRU] = "NotifySLRU", > + [LWTRANCHE_SERIAL_SLRU] = "SerialSLRU" > + [LWTRANCHE_SUBTRANS_SLRU] = "SubtransSLRU", > + [LWTRANCHE_XACT_SLRU] = "XactSLRU", > }; Eeek. Last minute changes ... Fixed here. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 61038472c5..3e3119865a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2006,6 +2006,145 @@ include_dir 'conf.d' + + commit_timestamp_buffers (integer) + + commit_timestamp_buffers configuration parameter + + + + +Specifies the amount of memory to use to cache the contents of +pg_commit_ts (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 0, which requests +shared_buffers/512 up to 1024 blocks, +but not fewer than 16 blocks. +This parameter can only be set at server start. + + + + + + multixact_members_buffers (integer) + + multixact_members_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_multixact/members (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 32. +This parameter can only be set at server start. + + + + + + multixact_offsets_buffers (integer) + + multixact_offsets_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_multixact/offsets (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 16. +This parameter can only be set at server start. + + + + + + notify_buffers (integer) + + notify_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_notify (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 16. +This parameter can only be set at server start. + + + + + + serializable_buffers (integer) + + serializable_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_serial (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 32. +This parameter can only be set at server start. + + + + + + subtransaction_buffers (integer) + + subtransaction_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_subtrans (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +The default value is 0, which requests +shared_buffers/512 up to 1024 blocks, +but not fewer than 16 blocks. +This parameter can only be set at server start. + + + + + + transaction_buffers (integer) + + transaction_buffers configuration parameter + + + + +Specifies the amount of shared memory to use to cache the contents +of pg_xact (see +). +If this value is specified without units, it is taken as blocks, +that is BLCKSZ bytes, typically 8kB. +
Add new error_action COPY ON_ERROR "log"
Hi, As described in 9e2d870119, COPY ON_EEOR is expected to have more "error_action". (Note that option name was changed by b725b7eec) I'd like to have a new option "log", which skips soft errors and logs information that should have resulted in errors to PostgreSQL log. I think this option has some advantages like below: 1) We can know which number of line input data was not loaded and reason. Example: =# copy t1 from stdin with (on_error log); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 >> 2 >> 3 >> z >> \. LOG: invalid input syntax for type integer: "z" NOTICE: 1 row was skipped due to data type incompatibility COPY 3 =# \! tail data/log/postgresql*.log LOG: 22P02: invalid input syntax for type integer: "z" CONTEXT: COPY t1, line 4, column i: "z" LOCATION: pg_strtoint32_safe, numutils.c:620 STATEMENT: copy t1 from stdin with (on_error log); 2) Easier maintenance than storing error information in tables or proprietary log files. For example, in case a large number of soft errors occur, some mechanisms are needed to prevent an infinite increase in the size of the destination data, but we can left it to PostgreSQL's log rotation. Attached a patch. This basically comes from previous discussion[1] which did both "ignore" and "log" soft error. As shown in the example above, the log output to the client does not contain CONTEXT, so I'm a little concerned that client cannot see what line of the input data had a problem without looking at the server log. What do you think? [1] https://www.postgresql.org/message-id/c0fb57b82b150953f26a5c7e340412e8%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 04e643facfea4b4e8dd174d22fbe5e008747a91a Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Fri, 26 Jan 2024 01:17:59 +0900 Subject: [PATCH v1] Add new error_action "log" to ON_ERROR option Currently ON_ERROR option only has "ignore" to skip malformed data and there are no ways to know where and why COPY skipped them. "log" skips malformed data as well as "ignore", but it logs information that should have resulted in errors to PostgreSQL log. --- doc/src/sgml/ref/copy.sgml | 8 ++-- src/backend/commands/copy.c | 4 +++- src/backend/commands/copyfrom.c | 24 src/include/commands/copy.h | 1 + src/test/regress/expected/copy2.out | 14 +- src/test/regress/sql/copy2.sql | 9 + 6 files changed, 48 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 21a5c4a052..9662c90a8b 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -380,12 +380,16 @@ COPY { table_name [ ( Specifies which error_action to perform when there is malformed data in the input. - Currently, only stop (default) and ignore - values are supported. + Currently, only stop (default), ignore + and log values are supported. If the stop value is specified, COPY stops operation at the first error. If the ignore value is specified, COPY skips malformed data and continues copying data. + If the log value is specified, + COPY behaves the same as ignore, exept that + it logs information that should have resulted in errors to PostgreSQL log at + INFO level. The option is allowed only in COPY FROM. Only stop value is allowed when using binary format. diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..812ca63350 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* - * Allow "stop", or "ignore" values. + * Allow "stop", "ignore" or "log" values. */ sval = defGetString(def); if (pg_strcasecmp(sval, "stop") == 0) return COPY_ON_ERROR_STOP; if (pg_strcasecmp(sval, "ignore") == 0) return COPY_ON_ERROR_IGNORE; + if (pg_strcasecmp(sval, "log") == 0) + return COPY_ON_ERROR_LOG; ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 1fe70b9133..7886bd5353 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1013,6 +1013,23 @@ CopyFrom(CopyFromState cstate) */ cstate->escontext->error_occurred = false; + else if (cstate->opts.on_error == COPY_ON_ERROR_LOG) + { +/* Adjust elevel so we don't jump out */ +cstate->escontext->error_data->elevel = LOG; + +/* + * Despite the name, this won't raise an error since elevel is + * LOG now. + */ +ThrowErrorData(cstate->escontext->error_data); + +/* Initialize escontext in preparation for next soft error */
Re: index prefetching
On 1/25/24 11:45, Dilip Kumar wrote: > On Wed, Jan 24, 2024 at 11:43 PM Tomas Vondra > wrote: > >> On 1/22/24 07:35, Konstantin Knizhnik wrote: >>> >>> On 22/01/2024 1:47 am, Tomas Vondra wrote: h, right. Well, you're right in this case we perhaps could set just one of those flags, but the "purpose" of the two places is quite different. The "prefetch" flag is fully controlled by the prefetcher, and it's up to it to change it (e.g. I can easily imagine some new logic touching setting it to "false" for some reason). The "data" flag is fully controlled by the custom callbacks, so whatever the callback stores, will be there. I don't think it's worth simplifying this. In particular, I don't think the callback can assume it can rely on the "prefetch" flag. >>> Why not to add "all_visible" flag to IndexPrefetchEntry ? If will not >>> cause any extra space overhead (because of alignment), but allows to >>> avoid dynamic memory allocation (not sure if it is critical, but nice to >>> avoid if possible). >>> >> > While reading through the first patch I got some questions, I haven't > read it complete yet but this is what I got so far. > > 1. > +static bool > +IndexPrefetchBlockIsSequential(IndexPrefetch *prefetch, BlockNumber block) > +{ > + int idx; > ... > + if (prefetch->blockItems[idx] != (block - i)) > + return false; > + > + /* Don't prefetch if the block happens to be the same. */ > + if (prefetch->blockItems[idx] == block) > + return false; > + } > + > + /* not sequential, not recently prefetched */ > + return true; > +} > > The above function name is BlockIsSequential but at the end, it > returns true if it is not sequential, seem like a problem? Actually, I think it's the comment that's wrong - the last return is reached only for a sequential pattern (and when the block was not accessed recently). > Also other 2 checks right above the end of the function are returning > false if the block is the same or the pattern is sequential I think > those are wrong too. > Hmmm. You're right this is partially wrong. There are two checks: /* * For a sequential pattern, blocks "k" step ago needs to have block * number by "k" smaller compared to the current block. */ if (prefetch->blockItems[idx] != (block - i)) return false; /* Don't prefetch if the block happens to be the same. */ if (prefetch->blockItems[idx] == block) return false; The first condition is correct - we want to return "false" when the pattern is not sequential. But the second condition is wrong - we want to skip prefetching when the block was already prefetched recently, so this should return true (which is a bit misleading, as it seems to imply the pattern is sequential, when it's not). However, this is harmless, because we then identify this block as recently prefetched in the "full" cache check, so we won't prefetch it anyway. So it's harmless, although a bit more expensive. There's another inefficiency - we stop looking for the same block once we find the first block breaking the non-sequential pattern. Imagine a sequence of blocks 1, 2, 3, 1, 2, 3, ... in which case we never notice the block was recently prefetched, because we always find the break of the sequential pattern. But again, it's harmless, thanks to the full cache of recently prefetched blocks. > 2. > I have noticed that the prefetch history is maintained at the backend > level, but what if multiple backends are trying to fetch the same heap > blocks maybe scanning the same index, so should that be in some shared > structure? I haven't thought much deeper about this from the > implementation POV, but should we think about it, or it doesn't > matter? Yes, the cache is at the backend level - it's a known limitation, but I see it more as a conscious tradeoff. Firstly, while the LRU cache is at backend level, PrefetchBuffer also checks shared buffers for each prefetch request. So with sufficiently large shared buffers we're likely to find it there (and for direct I/O there won't be any other place to check). Secondly, the only other place to check is page cache, but there's no good (sufficiently cheap) way to check that. See the preadv2/nowait experiment earlier in this thread. I suppose we could implement a similar LRU cache for shared memory (and I don't think it'd be very complicated), but I did not plan to do that in this patch unless absolutely necessary. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: UUID v7
Aleksander, In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment. The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences. Sergey prokhorenkosergeyprokhore...@yahoo.com.au On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev wrote: Hi, > Postgres always was a bit hackerish, allowing slightly more then is safe. > I.e. you can define immutable function that is not really immutable, turn off > autovacuum or fsync. Why bother with safety guards here? > My opinion is that we should have this function to extract timestamp. Even if > it can return strange values for imprecise RFC implementation. Completely agree. Users that don't like or don't need it can pretend there are no uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide them however, users that need them will end up writing their own probably buggy and not compatible implementations. That would be much worse. > So +1 for erroring when you provide a timestamp outside of that range > (either too far in the past or too far in the future). > > OK, it seems like we have some consensus on ERRORing.. > > Do we have any other open items? Does v13 address all open items? Maybe let’s > compose better error message? > > +1 for erroring when ts is outside range. > > v13 looks good for me. I think we have reached a optimal compromise. Andrey, many thanks for the updated patch. LGTM, cfbot is happy and I don't think we have any open items left. So changing CF entry status back to RfC. -- Best regards, Aleksander Alekseev
Re: A performance issue with Memoize
David Rowley writes: > I'd feel better about doing it your way if Tom could comment on if > there was a reason he put the function calls that way around in > 5ebaaa494. Apologies for not having noticed this thread before. I'm taking a look at it now. However, while sniffing around this I found what seems like an oversight in paramassign.c's assign_param_for_var(): it says it should compare all the same fields as _equalVar except for varlevelsup, but it's failing to compare varnullingrels. Is that a bug? It's conceivable that it's not possible to get here with varnullingrels different and all else the same, but I don't feel good about that proposition. I tried adding @@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var) pvar->vartype == var->vartype && pvar->vartypmod == var->vartypmod && pvar->varcollid == var->varcollid) +{ +Assert(bms_equal(pvar->varnullingrels, var->varnullingrels)); return pitem->paramId; +} } } This triggers no failures in the regression tests, but we know how little that proves. Anyway, that's just a side observation unrelated to the problem at hand. More later. regards, tom lane
Re: Emit fewer vacuum records by reaping removable tuples during pruning
On Thu, Jan 25, 2024 at 11:19 AM Melanie Plageman wrote: > Cool. I might add "successfully" or "fully" to "Either way, the page > hasn't been processed yet" I'm OK with that. > > > I think it would be nice to clarify this comment. I think the "when > > > there is little free space anyway" is referring to the case in > > > lazy_vacuum() where we set do_index_vacuuming to false because "there > > > are almost zero TIDs". I initially thought it was saying that in the > > > failsafe vacuum case the pages whose free space we wouldn't record in > > > the FSM have little free space anyway -- which I didn't get. But then > > > I looked at where we set do_index_vacuuming to false. > > > > Oh... wow. That's kind of confusing; somehow I was thinking we were > > talking about free space on the disk, rather than newly free space in > > pages that could be added to the FSM. > > Perhaps I misunderstood. I interpreted it to refer to the bypass optimization. I think you're probably correct. I just didn't realize what was meant. -- Robert Haas EDB: http://www.enterprisedb.com
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On Thu, Jan 25, 2024 at 11:22 AM Alvaro Herrera wrote: > Still with these auto-tuning GUCs, I noticed that the auto-tuning code > would continue to grow the buffer sizes with shared_buffers to > arbitrarily large values. I added an arbitrary maximum of 1024 (8 MB), > which is much higher than the current value of 128; but if you have > (say) 30 GB of shared_buffers (not uncommon these days), do you really > need 30MB of pg_clog cache? It seems mostly unnecessary ... and you can > still set it manually that way if you need it. So, largely I just > rewrote those small functions completely. Yeah, I think that if we're going to scale with shared_buffers, it should be capped. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Add native windows on arm64 support
On 2024-01-24 We 19:02, Michael Paquier wrote: On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: I managed to get it to build the vcvarsall arch needs to be x64. I need to add some options, but the patch above needs to be applied to build it. Nice. If I may ask, what kind of host and/or configuration have you used to reach a state where the code can be compiled and run tests with meson? If you have found specific steps, it may be a good thing to document that on the wiki, say around [1]. Perhaps you have not included TAP? It may be fine in terms of runtime checks and coverage. [1]:https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really want to build with x64_arm64, i.e. to generate native arm64 binaries. Setting just x64 will not do that, AIUI. I tried that with the buidfarm, setting that in the config file's call to PGBuild::VSenv::getenv(). That upset msvc_gendef.pl, so I added this there to keep it happy: $arch = 'x86_64' if $arch eq 'aarch64'; After that things went ok until I got this: [1453/2088] "link" @src/backend/postgres.exe.rsp FAILED: src/backend/postgres.exe src/backend/postgres.pdb "link" @src/backend/postgres.exe.rsp Creating library src\backend\postgres.exe.lib storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol _mm_pause referenced in function perform_spin_delay src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals I haven't made further progress, but I will return to it in the next day or so. While this will be nice to have, I think it won't really matter until there is ARM64 support in released versions of Windows Server. AFAICT they still only sell versions for x86_64 cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: A performance issue with Memoize
David Rowley writes: > I think fixing it your way makes sense. I don't really see any reason > why we should have two. However... > Another way it *could* be fixed would be to get rid of pull_paramids() > and change create_memoize_plan() to set keyparamids to all the param > IDs that match are equal() to each param_exprs. That way nodeMemoize > wouldn't purge the cache as we'd know the changing param is accounted > for in the cache. For the record, I don't think that's better, but it > scares me a bit less as I don't know what other repercussions there > are of applying your patch to get rid of the duplicate > NestLoopParam.paramval. > I'd feel better about doing it your way if Tom could comment on if > there was a reason he put the function calls that way around in > 5ebaaa494. I'm fairly sure I thought it wouldn't matter because of the Param de-duplication done in paramassign.c. However, Richard's example shows that's not so, because process_subquery_nestloop_params is picky about the param ID assigned to a particular Var while replace_nestloop_params is not. So flipping the order makes sense. I'd change the comment though, maybe to /* * Replace any outer-relation variables with nestloop params. * * We must provide nestloop params for both lateral references of * the subquery and outer vars in the scan_clauses. It's better * to assign the former first, because that code path requires * specific param IDs, while replace_nestloop_params can adapt * to the IDs assigned by process_subquery_nestloop_params. * This avoids possibly duplicating nestloop params when the * same Var is needed for both reasons. */ However ... it seems like we're not out of the woods yet. Why is Richard's proposed test case still showing + -> Memoize (actual rows=5000 loops=N) + Cache Key: t1.two, t1.two Seems like there is missing de-duplication logic, or something. > I also feel like we might be getting a bit close to the minor version > releases to be adjusting this stuff in back branches. Yeah, I'm not sure I would change this in the back branches. regards, tom lane
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan wrote: > > On 2024-01-24 We 19:02, Michael Paquier wrote: > > On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: > > I managed to get it to build the vcvarsall arch needs to be x64. I need to > add some options, but the patch above needs to be applied to build it. > > Nice. If I may ask, what kind of host and/or configuration have you > used to reach a state where the code can be compiled and run tests > with meson? If you have found specific steps, it may be a good thing > to document that on the wiki, say around [1]. > > Perhaps you have not included TAP? It may be fine in terms of runtime > checks and coverage. > > [1]: > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows > > > > I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really > want to build with x64_arm64, i.e. to generate native arm64 binaries. > Setting just x64 will not do that, AIUI. > > I tried that with the buidfarm, setting that in the config file's call to > PGBuild::VSenv::getenv(). > > That upset msvc_gendef.pl, so I added this there to keep it happy: > > $arch = 'x86_64' if $arch eq 'aarch64'; > > After that things went ok until I got this: > > [1453/2088] "link" @src/backend/postgres.exe.rsp > FAILED: src/backend/postgres.exe src/backend/postgres.pdb > "link" @src/backend/postgres.exe.rsp >Creating library src\backend\postgres.exe.lib > storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol > _mm_pause referenced in function perform_spin_delay > src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals > > > I haven't made further progress, but I will return to it in the next day > or so. > > While this will be nice to have, I think it won't really matter until > there is ARM64 support in released versions of Windows Server. AFAICT they > still only sell versions for x86_64 > I need it to build clients. The clients need arm64 libraries to link against Dave
Re: More new SQL/JSON item methods
Andrew Dunstan writes: > Thanks, I have pushed this. The buildfarm is pretty widely unhappy, mostly failing on select jsonb_path_query('1.23', '$.string()'); On a guess, I tried running that under valgrind, and behold it said ==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised value(s) ==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget (jsonpath_exec.c:1547) ==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FED03: executeNextItem (jsonpath_exec.c:1604) ==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget (jsonpath_exec.c:956) ==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 (jsonpath_exec.c:612) ==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal (jsonpath_exec.c:438) It's fairly obviously right about that: JsonbValuejbv; ... jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); ^ Presumably, this is a mistaken attempt to test the type of the thing previously pointed to by "jb". On the whole, what I'd be inclined to do here is get rid of this test altogether and demand that every path through the preceding "switch" deliver a value that doesn't need pstrdup(). The only path that doesn't do that already is case jbvBool: tmp = (jb->val.boolean) ? "true" : "false"; break; and TBH I'm not sure that we really need a pstrdup there either. The constants are immutable enough. Is something likely to try to pfree the pointer later? I tried @@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ - jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); + jb->val.string.val = tmp; jb->val.string.len = strlen(jb->val.string.val); jb->type = jbvString; and that quieted valgrind for this particular query and still passes regression. (The reported crashes seem to be happening later during a recursive invocation, seemingly because JsonbType(jb) is returning garbage. So there may be another bug after this one.) regards, tom lane
Re: [PATCH] Add native windows on arm64 support
On 2024-01-25 Th 08:45, Dave Cramer wrote: I tried running my buildfarm using my git repo and my branch, but get the following error Status Line: 492 bad branch parameter Content: bad branch parameter fix_arm Web txn failed with status: 1 You can't use your own branch with the buildfarm, you need to let it set up its own branches. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: More new SQL/JSON item methods
On 2024-01-25 Th 14:31, Tom Lane wrote: Andrew Dunstan writes: Thanks, I have pushed this. The buildfarm is pretty widely unhappy, mostly failing on select jsonb_path_query('1.23', '$.string()'); On a guess, I tried running that under valgrind, and behold it said ==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised value(s) ==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget (jsonpath_exec.c:1547) ==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FED03: executeNextItem (jsonpath_exec.c:1604) ==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget (jsonpath_exec.c:956) ==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 (jsonpath_exec.c:612) ==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal (jsonpath_exec.c:438) It's fairly obviously right about that: JsonbValuejbv; ... jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); ^ Presumably, this is a mistaken attempt to test the type of the thing previously pointed to by "jb". On the whole, what I'd be inclined to do here is get rid of this test altogether and demand that every path through the preceding "switch" deliver a value that doesn't need pstrdup(). The only path that doesn't do that already is case jbvBool: tmp = (jb->val.boolean) ? "true" : "false"; break; and TBH I'm not sure that we really need a pstrdup there either. The constants are immutable enough. Is something likely to try to pfree the pointer later? I tried @@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ - jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); + jb->val.string.val = tmp; jb->val.string.len = strlen(jb->val.string.val); jb->type = jbvString; and that quieted valgrind for this particular query and still passes regression. (The reported crashes seem to be happening later during a recursive invocation, seemingly because JsonbType(jb) is returning garbage. So there may be another bug after this one.) Argh! Will look, thanks. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Relation bulk write facility
On 10/01/2024 18:17, Robert Haas wrote: I think we should try to pick prefixes that are one or more words rather than using word fragments. bulkw is an awkward prefix even for people whose first language is English, and probably more awkward for others. Renamed the 'bulkw' variables to 'bulkstate, and the functions to have smgr_bulk_* prefix. I was tempted to use just bulk_* as the prefix, but I'm afraid e.g. bulk_write() is too generic. On 22/01/2024 07:50, Peter Smith wrote: Hi, This patch has a CF status of "Needs Review" [1], but it seems there was a CFbot test failure last time it was run [2]. Please have a look and post an updated version if necessary. Fixed the headerscheck failure by adding appropriate #includes. -- Heikki Linnakangas Neon (https://neon.tech) From c2e8cff9326fb874b2e1643f5c3c8a4952eaa3ac Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 25 Jan 2024 21:40:46 +0200 Subject: [PATCH v5 1/1] Introduce a new bulk loading facility. The new facility makes it easier to optimize bulk loading, as the logic for buffering, WAL-logging, and syncing the relation only needs to be implemented once. It's also less error-prone: We have had a number of bugs in how a relation is fsync'd - or not - at the end of a bulk loading operation. By centralizing that logic to one place, we only need to write it correctly once. The new facility is faster for small relations: Instead of of calling smgrimmedsync(), we register the fsync to happen at next checkpoint, which avoids the fsync latency. That can make a big difference if you are e.g. restoring a schema-only dump with lots of relations. It is also slightly more efficient with large relations, as the WAL logging is performed multiple pages at a time. That avoids some WAL header overhead. The sorted GiST index build did that already, this moves the buffering to the new facility. The changes to pageinspect GiST test needs an explanation: Before this patch, the sorted GiST index build set the LSN on every page to the special GistBuildLSN value, not the LSN of the WAL record, even though they were WAL-logged. There was no particular need for it, it just happened naturally when we wrote out the pages before WAL-logging them. Now we WAL-log the pages first, like in B-tree build, so the pages are stamped with the record's real LSN. When the build is not WAL-logged, we still use GistBuildLSN. To make the test output predictable, use an unlogged index. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi --- contrib/pageinspect/expected/gist.out | 14 +- contrib/pageinspect/sql/gist.sql | 16 +- src/backend/access/gist/gistbuild.c | 122 +++ src/backend/access/heap/rewriteheap.c | 72 ++ src/backend/access/nbtree/nbtree.c| 33 +-- src/backend/access/nbtree/nbtsort.c | 135 src/backend/access/spgist/spginsert.c | 49 ++--- src/backend/catalog/storage.c | 46 ++-- src/backend/storage/smgr/Makefile | 1 + src/backend/storage/smgr/bulk_write.c | 303 ++ src/backend/storage/smgr/md.c | 45 +++- src/backend/storage/smgr/meson.build | 1 + src/backend/storage/smgr/smgr.c | 31 +++ src/include/storage/bulk_write.h | 40 src/include/storage/md.h | 1 + src/include/storage/smgr.h| 1 + src/tools/pgindent/typedefs.list | 3 + 17 files changed, 558 insertions(+), 355 deletions(-) create mode 100644 src/backend/storage/smgr/bulk_write.c create mode 100644 src/include/storage/bulk_write.h diff --git a/contrib/pageinspect/expected/gist.out b/contrib/pageinspect/expected/gist.out index d1adbab8ae2..2b1d54a6279 100644 --- a/contrib/pageinspect/expected/gist.out +++ b/contrib/pageinspect/expected/gist.out @@ -1,13 +1,6 @@ --- The gist_page_opaque_info() function prints the page's LSN. Normally, --- that's constant 1 (GistBuildLSN) on every page of a freshly built GiST --- index. But with wal_level=minimal, the whole relation is dumped to WAL at --- the end of the transaction if it's smaller than wal_skip_threshold, which --- updates the LSNs. Wrap the tests on gist_page_opaque_info() in the --- same transaction with the CREATE INDEX so that we see the LSNs before --- they are possibly overwritten at end of transaction. -BEGIN; --- Create a test table and GiST index. -CREATE TABLE test_gist AS SELECT point(i,i) p, i::text t FROM +-- The gist_page_opaque_info() function prints the page's LSN. +-- Use an unlogged index, so that the LSN is predictable. +CREATE UNLOGGED TABLE test_gist AS SELECT point(i,i) p, i::text t FROM generate_series(1,1000) i; CREATE INDEX test_gist_idx ON test_gist USING gist (p); -- Page 0 is the root, the rest are leaf pages @@ -29,7 +22,6 @@ SELECT * FROM gist_page_opaque_info(get_raw_page('test_gist_idx', 2)); 0/1 | 0/0 | 1 | {leaf} (1 row) -COMMIT; SELECT * FROM gist_page_items(get_raw_page
Re: More new SQL/JSON item methods
On 2024-01-25 Th 14:31, Tom Lane wrote: Andrew Dunstan writes: Thanks, I have pushed this. The buildfarm is pretty widely unhappy, mostly failing on select jsonb_path_query('1.23', '$.string()'); On a guess, I tried running that under valgrind, and behold it said ==00:00:00:05.637 435530== Conditional jump or move depends on uninitialised value(s) ==00:00:00:05.637 435530==at 0x8FD131: executeItemOptUnwrapTarget (jsonpath_exec.c:1547) ==00:00:00:05.637 435530==by 0x8FED03: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FED03: executeNextItem (jsonpath_exec.c:1604) ==00:00:00:05.637 435530==by 0x8FCA58: executeItemOptUnwrapTarget (jsonpath_exec.c:956) ==00:00:00:05.637 435530==by 0x8FFDE4: executeItem (jsonpath_exec.c:626) ==00:00:00:05.637 435530==by 0x8FFDE4: executeJsonPath.constprop.30 (jsonpath_exec.c:612) ==00:00:00:05.637 435530==by 0x8FFF8C: jsonb_path_query_internal (jsonpath_exec.c:438) It's fairly obviously right about that: JsonbValuejbv; ... jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); ^ Presumably, this is a mistaken attempt to test the type of the thing previously pointed to by "jb". On the whole, what I'd be inclined to do here is get rid of this test altogether and demand that every path through the preceding "switch" deliver a value that doesn't need pstrdup(). The only path that doesn't do that already is case jbvBool: tmp = (jb->val.boolean) ? "true" : "false"; break; and TBH I'm not sure that we really need a pstrdup there either. The constants are immutable enough. Is something likely to try to pfree the pointer later? I tried @@ -1544,7 +1544,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, jb = &jbv; Assert(tmp != NULL);/* We must have set tmp above */ - jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp); + jb->val.string.val = tmp; jb->val.string.len = strlen(jb->val.string.val); jb->type = jbvString; and that quieted valgrind for this particular query and still passes regression. Your fix looks sane. I also don't see why we need the pstrdup. (The reported crashes seem to be happening later during a recursive invocation, seemingly because JsonbType(jb) is returning garbage. So there may be another bug after this one.) I don't think so. AIUI The first call deals with the '$' and the second one deals with the '.string()', which is why we see the error on the second call. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: More new SQL/JSON item methods
Andrew Dunstan writes: > On 2024-01-25 Th 14:31, Tom Lane wrote: >> (The reported crashes seem to be happening later during a >> recursive invocation, seemingly because JsonbType(jb) is >> returning garbage. So there may be another bug after this one.) > I don't think so. AIUI The first call deals with the '$' and the second > one deals with the '.string()', which is why we see the error on the > second call. There's something else going on, because I'm still getting the assertion failure on my Mac with this fix in place. Annoyingly, it goes away if I compile with -O0, so it's kind of hard to identify what's going wrong. regards, tom lane
Re: proposal: psql: show current user in prompt
Hi čt 11. 1. 2024 v 12:12 odesílatel Jelte Fennema-Nio napsal: > On Tue, 12 Sept 2023 at 09:46, Peter Eisentraut > wrote: > > ISTM that for a purpose like pgbouncer, it would be simpler to add a new > > GUC "report these variables" and send that in the startup message? That > > might not help with the psql use case, but it would be much simpler. > > FYI I implemented it that way yesterday on this other thread (patch > 0010 of that patchset): > > https://www.postgresql.org/message-id/flat/CAGECzQScQ3N-Ykv2j4NDyDtrPPc3FpRoa%3DLZ-2Uj2ocA4zr%3D4Q%40mail.gmail.com#cd9e8407820d492e8f677ee6a67c21ce I read your patch, and I see some advantages and some disadvantages. 1. this doesn't need new protocol API just for this feature, what is nice 2. using GUC for all reported GUC looks not too readable. Maybe it should be used just for customized reporting, not for default 3. Another issue of your proposal is less friendly enabling disabling reporting of specific GUC. Maintaining a list needs more work than just enabling and disabling one specific GUC. I think this is the main disadvantage of your proposal. In my proposal I don't need to know the state of any GUC. Just I send PQlinkParameterStatus or PQunlinkParameterStatus. With your proposal, I need to read _pq_.report_parameters, parse it, and modify, and send it back. This doesn't look too practical. Personally I prefer usage of a more generic API than my PQlinkParameterStatus and PQunlinkParameterStatus. You talked about it with Robert If I remember. Can be nice if I can write just /* similar princip is used inside ncurses */ set_report_guc_message_no = PQgetMessageNo("set_report_guc"); /* the result can be processed by server and by all proxies on the line */ if (set_report_guc_message_no == -1) fprintf(stderr, "feature is not supported"); result = PQsendMessage(set_report_guc_message, "user"); if (result == -1) fprintf(stderr, "some error ..."); With some API like this it can be easier to do some small protocol enhancement. Maybe this is overengineering. Enhancing protocol is not too common, and with usage PQsendTypedCommand enhancing protocol is less work too. Regards Pavel
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 14:31, Andrew Dunstan wrote: > > On 2024-01-25 Th 08:45, Dave Cramer wrote: > > > > > > I tried running my buildfarm using my git repo and my branch, but get the > following error > Status Line: 492 bad branch parameter > Content: > bad branch parameter fix_arm > > Web txn failed with status: 1 > > > > You can't use your own branch with the buildfarm, you need to let it set > up its own branches. > So I guess I have to wait until this patch is merged in ? Dave
Re: More new SQL/JSON item methods
I wrote: > There's something else going on, because I'm still getting the > assertion failure on my Mac with this fix in place. Annoyingly, > it goes away if I compile with -O0, so it's kind of hard to > identify what's going wrong. No, belay that: I must've got confused about which version I was testing. It's very unclear to me why the undefined reference causes the preceding Assert to misbehave, but that is clearly what's happening. Compiler bug maybe? My Mac has clang 15.0.0, and the unhappy buildfarm members are also late-model clang. Anyway, I did note that the preceding line res = jperOk; is dead code and might as well get removed while you're at it. regards, tom lane
[Doc] Improvements to ddl.sgl Privileges Section and Glossary
Hey, In a nearby user complaint email [1] some missing information regarding ownership reassignment came to light. I took that and went a bit further to add what I felt was further missing information and context for how the privilege system is designed. I've tried to formalize and label existing concepts a bit and updated the glossary accordingly. The attached is a partial rewrite of the patch on the linked post after those comments were taken into consideration. The new glossary entry for privileges defines various qualifications of the term that are used in the new prose. I've marked up each of the variants and point them all back to the main entry. I didn't try to incorporate those terms, or even really look, anywhere else in the documentation. If the general idea is accepted that kind of work can be done as a follow-up. David J. [1] https://www.postgresql.org/message-id/d294818d12280f6223ddf169ab5454927f5186b6.camel%40cybertec.at From a4d6a599a0b5d6b8f280e3d8489e7f4a4a555383 Mon Sep 17 00:00:00 2001 From: "David G. Johnston" Date: Thu, 25 Jan 2024 13:41:48 -0700 Subject: [PATCH] v1-improvements-to-ddl-priv Section --- doc/src/sgml/ddl.sgml | 109 ++--- doc/src/sgml/glossary.sgml | 40 ++ 2 files changed, 105 insertions(+), 44 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index fc03a349f0..7c9c9d0dd1 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1855,12 +1855,33 @@ ALTER TABLE products RENAME TO items; - When an object is created, it is assigned an owner. The - owner is normally the role that executed the creation statement. - For most kinds of objects, the initial state is that only the owner - (or a superuser) can do anything with the object. To allow - other roles to use it, privileges must be - granted. + The permissions needed to interact with an object are split between + privileges and + owner rights. + The owner has the right to modify the object, including dropping it, as well as + the right to grant and revoke privileges on the object to any role in the system. + Aside from a brief description, at the end, of what happens when you reassign the + owner of an object to some other role, this section describes privileges. + + + + For most databases there are many roles and many objects, and thus a large amount + of privileges that need to be defined. Two features exist to make management of + privileges easier: role membership (i.e., group roles, see ) + including the PUBLIC pseudo-role (which is a group role consisting + of all roles in the system) is one, while + default privileges + (see ) is the other. + + + + The fundamental design of the privilege system is to disallow by default. A role + must, directly or indirectly (via group role inheritance), hold the correct privilege + on the object to peform the corresponding action. + Furthermore, inheritance is strictly additive, there is no mechanism to block an + indirect privilege. + Revoking a privilege only removes + direct privileges. @@ -1878,21 +1899,14 @@ ALTER TABLE products RENAME TO items; - The right to modify or destroy an object is inherent in being the - object's owner, and cannot be granted or revoked in itself. - (However, like all privileges, that right can be inherited by - members of the owning role; see .) - - - - An object can be assigned to a new owner with an ALTER - command of the appropriate kind for the object, for example - -ALTER TABLE table_name OWNER TO new_owner; - - Superusers can always do this; ordinary roles can only do it if they are - both the current owner of the object (or inherit the privileges of the - owning role) and able to SET ROLE to the new owning role. + Upon object creation, two sets of + implicit privileges + are established. + The owner is granted all applicable privileges on the object. All other roles, + including the PUBLIC pseudo-role, get their default privileges. + From this point onward only + explicit priviliege + modification is possible, and is described next. @@ -1904,15 +1918,9 @@ ALTER TABLE table_name OWNER TO new_owne GRANT UPDATE ON accounts TO joe; Writing ALL in place of a specific privilege grants all - privileges that are relevant for the object type. - - - - The special role name PUBLIC can - be used to grant a privilege to every role on the system. Also, - group roles can be set up to help manage privileges when - there are many users of a database — for details see - . + privileges that are relevant for the object type. The pseudo-role + PUBLIC is a valid role for this purpose. All roles in the system + will immediately inherit the indirect privilege(s) named in the command. @@ -1936,9 +1944,11 @@ REVOKE ALL ON accounts FROM PUBLIC; An object's owner can choose to revoke
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan wrote: > > On 2024-01-24 We 19:02, Michael Paquier wrote: > > On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: > > I managed to get it to build the vcvarsall arch needs to be x64. I need to > add some options, but the patch above needs to be applied to build it. > > Nice. If I may ask, what kind of host and/or configuration have you > used to reach a state where the code can be compiled and run tests > with meson? If you have found specific steps, it may be a good thing > to document that on the wiki, say around [1]. > > Perhaps you have not included TAP? It may be fine in terms of runtime > checks and coverage. > > [1]: > https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows > > > > I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we really > want to build with x64_arm64, i.e. to generate native arm64 binaries. > Setting just x64 will not do that, AIUI. > > I tried that with the buidfarm, setting that in the config file's call to > PGBuild::VSenv::getenv(). > > That upset msvc_gendef.pl, so I added this there to keep it happy: > > $arch = 'x86_64' if $arch eq 'aarch64'; > > After that things went ok until I got this: > > [1453/2088] "link" @src/backend/postgres.exe.rsp > FAILED: src/backend/postgres.exe src/backend/postgres.pdb > "link" @src/backend/postgres.exe.rsp >Creating library src\backend\postgres.exe.lib > storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol > _mm_pause referenced in function perform_spin_delay > src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals > > > I haven't made further progress, but I will return to it in the next day > or so. > > While this will be nice to have, I think it won't really matter until > there is ARM64 support in released versions of Windows Server. AFAICT they > still only sell versions for x86_64 > I've tried it with my patch attached previously and x64_arm64 and it works fine. builds using the buildfarm as well. Is there a definitive way to figure out if the binaries are x64_arm64 Dave
Re: More new SQL/JSON item methods
On 2024-01-25 Th 15:33, Tom Lane wrote: Andrew Dunstan writes: On 2024-01-25 Th 14:31, Tom Lane wrote: (The reported crashes seem to be happening later during a recursive invocation, seemingly because JsonbType(jb) is returning garbage. So there may be another bug after this one.) I don't think so. AIUI The first call deals with the '$' and the second one deals with the '.string()', which is why we see the error on the second call. There's something else going on, because I'm still getting the assertion failure on my Mac with this fix in place. Annoyingly, it goes away if I compile with -O0, so it's kind of hard to identify what's going wrong. Curiouser and curiouser. On my Mac the error is manifest but the fix you suggested cures it. Built with -O2 -g, clang 15.0.0, Apple Silicon. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] Add native windows on arm64 support
Hi David, Unix "file" or "dumpbin /headers" in vcvarsall are your best bets. Thanks, Anthony On Thu, 25 Jan 2024, 21:01 Dave Cramer, wrote: > > > On Thu, 25 Jan 2024 at 12:30, Andrew Dunstan wrote: > >> >> On 2024-01-24 We 19:02, Michael Paquier wrote: >> >> On Wed, Jan 24, 2024 at 06:45:21AM -0500, Dave Cramer wrote: >> >> I managed to get it to build the vcvarsall arch needs to be x64. I need to >> add some options, but the patch above needs to be applied to build it. >> >> Nice. If I may ask, what kind of host and/or configuration have you >> used to reach a state where the code can be compiled and run tests >> with meson? If you have found specific steps, it may be a good thing >> to document that on the wiki, say around [1]. >> >> Perhaps you have not included TAP? It may be fine in terms of runtime >> checks and coverage. >> >> [1]: >> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto#Running_on_Windows >> >> >> >> I now have an ARM64 machine, so I set up a W11 ARM64 VM. I think we >> really want to build with x64_arm64, i.e. to generate native arm64 >> binaries. Setting just x64 will not do that, AIUI. >> >> I tried that with the buidfarm, setting that in the config file's call to >> PGBuild::VSenv::getenv(). >> >> That upset msvc_gendef.pl, so I added this there to keep it happy: >> >> $arch = 'x86_64' if $arch eq 'aarch64'; >> >> After that things went ok until I got this: >> >> [1453/2088] "link" @src/backend/postgres.exe.rsp >> FAILED: src/backend/postgres.exe src/backend/postgres.pdb >> "link" @src/backend/postgres.exe.rsp >>Creating library src\backend\postgres.exe.lib >> storage_lmgr_s_lock.c.obj : error LNK2019: unresolved external symbol >> _mm_pause referenced in function perform_spin_delay >> src\backend\postgres.exe : fatal error LNK1120: 1 unresolved externals >> >> >> I haven't made further progress, but I will return to it in the next day >> or so. >> >> While this will be nice to have, I think it won't really matter until >> there is ARM64 support in released versions of Windows Server. AFAICT they >> still only sell versions for x86_64 >> > > I've tried it with my patch attached previously and x64_arm64 and it works > fine. builds using the buildfarm as well. > > Is there a definitive way to figure out if the binaries are x64_arm64 > > Dave >
Re: Remove unused fields in ReorderBufferTupleBuf
On Thu, 2024-01-25 at 16:16 +0300, Aleksander Alekseev wrote: > Hi, > > > > Here is the corrected patch. > > > > Thank you for updating the patch! I have some comments: > > Thanks for the review. > > > -tuple = (ReorderBufferTupleBuf *) > > +tuple = (HeapTuple) > > MemoryContextAlloc(rb->tup_context, > > - > > sizeof(ReorderBufferTupleBuf) + > > + HEAPTUPLESIZE + > > MAXIMUM_ALIGNOF + > > alloc_len); > > -tuple->alloc_tuple_size = alloc_len; > > -tuple->tuple.t_data = ReorderBufferTupleBufData(tuple); > > +tuple->t_data = (HeapTupleHeader)((char *)tuple + HEAPTUPLESIZE); > > > > Why do we need to add MAXIMUM_ALIGNOF? since HEAPTUPLESIZE is the > > MAXALIGN'd size, it seems we don't need it. heap_form_tuple() does a > > similar thing but it doesn't add it. > > Indeed. I gave it a try and nothing crashed, so it appears that > MAXIMUM_ALIGNOF is not needed. > > > --- > > xl_parameter_change *xlrec = > > -(xl_parameter_change *) > > XLogRecGetData(buf->record); > > +(xl_parameter_change *) > > XLogRecGetData(buf->record); > > > > Unnecessary change. > > That's pgindent. Fixed. > > > --- > > -/* > > - * Free a ReorderBufferTupleBuf. > > - */ > > -void > > -ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf > > *tuple) > > -{ > > -pfree(tuple); > > -} > > - > > > > Why does ReorderBufferReturnTupleBuf need to be moved from > > reorderbuffer.c to reorderbuffer.h? It seems not related to this > > refactoring patch so I think we should do it in a separate patch if we > > really want it. I'm not sure it's necessary, though. > > OK, fixed. > I walked through v6 and didn't note any issues. I do want to ask, the patch alters ReorderBufferReturnTupleBuf() and drops the unused parameter ReorderBuffer *rb. It seems that ReorderBufferFreeSnap(), ReorderBufferReturnRelids(), ReorderBufferImmediateInvalidation(), and ReorderBufferRestoreCleanup() also pass ReorderBuffer *rb but do not use it. Would it be beneficial to implement a separate patch to remove this parameter from these functions also? Thanks, Reid
Re: [PATCH] Add native windows on arm64 support
On 2024-01-25 Th 15:56, Dave Cramer wrote: On Thu, 25 Jan 2024 at 14:31, Andrew Dunstan wrote: On 2024-01-25 Th 08:45, Dave Cramer wrote: I tried running my buildfarm using my git repo and my branch, but get the following error Status Line: 492 bad branch parameter Content: bad branch parameter fix_arm Web txn failed with status: 1 You can't use your own branch with the buildfarm, you need to let it set up its own branches. So I guess I have to wait until this patch is merged in ? Not quite. There's a switch that lets you test your own code. If you say --from-source /path/to/repo it will run in whatever state the repo is in. It won't care what branch you are on, and it won't try to update the repo. It won't report the results to the server, but it will build and test everything just like in a regular run. (On the "eat your own dog food" principle I use this mode a lot.) If you use that mode you probably also want to specify --verbose as well. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 16:04, Anthony Roberts wrote: > Hi David, > > Unix "file" or "dumpbin /headers" in vcvarsall are your best bets. > > Thanks, > Anthony > So there is another way, select the file in Windows Explorer and right click, in the compatibility tab if the "Windows on ARM" is greyed out it is an arm binary. So far mine are not :( Thanks, Dave >
Re: More new SQL/JSON item methods
On 2024-01-25 Th 15:58, Tom Lane wrote: I wrote: There's something else going on, because I'm still getting the assertion failure on my Mac with this fix in place. Annoyingly, it goes away if I compile with -O0, so it's kind of hard to identify what's going wrong. No, belay that: I must've got confused about which version I was testing. It's very unclear to me why the undefined reference causes the preceding Assert to misbehave, but that is clearly what's happening. Compiler bug maybe? My Mac has clang 15.0.0, and the unhappy buildfarm members are also late-model clang. Anyway, I did note that the preceding line res = jperOk; is dead code and might as well get removed while you're at it. OK, pushed those. Thanks. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] Add native windows on arm64 support
On 2024-01-25 Th 16:17, Dave Cramer wrote: On Thu, 25 Jan 2024 at 16:04, Anthony Roberts wrote: Hi David, Unix "file" or "dumpbin /headers" in vcvarsall are your best bets. Thanks, Anthony So there is another way, select the file in Windows Explorer and right click, in the compatibility tab if the "Windows on ARM" is greyed out it is an arm binary. So far mine are not :( Yeah, I think the default Developer Command Prompt for VS2022 is set up for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64". cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Use of backup_label not noted in log
On 1/25/24 09:29, Michael Banck wrote: Hi, On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote: I would still advocate for a back patch here. It is frustrating to get logs from users that just say: LOG: invalid checkpoint record PANIC: could not locate a valid checkpoint record It would be very helpful to know what the checkpoint record LSN was in this case. I agree. Another thing to note here -- knowing the LSN is important but also knowing that backup recovery was attempted (i.e. backup_label exists) is really crucial. Knowing both just saves so much time in back and forth debugging. It appears the tally for back patching is: For: Andres, David, Michael B Not Sure: Robert, Laurenz, Michael P It seems at least nobody is dead set against it. Regards, -David
Re: Guiding principle for dropping LLVM versions?
On Thu, Jan 25, 2024 at 4:44 PM Thomas Munro wrote: > ... A few build farm animals will > now fail in the configure step as discussed, and need some adjustment > (ie disable LLVM or upgrade to LLVM 10+ for the master branch). Owners pinged.
Re: Use of backup_label not noted in log
David Steele writes: > Another thing to note here -- knowing the LSN is important but also > knowing that backup recovery was attempted (i.e. backup_label exists) is > really crucial. Knowing both just saves so much time in back and forth > debugging. > It appears the tally for back patching is: > For: Andres, David, Michael B > Not Sure: Robert, Laurenz, Michael P > It seems at least nobody is dead set against it. We're talking about 1d35f705e, right? That certainly looks harmless and potentially useful. I'm +1 for back-patching. regards, tom lane
Re: Use of backup_label not noted in log
On 1/25/24 17:42, Tom Lane wrote: David Steele writes: Another thing to note here -- knowing the LSN is important but also knowing that backup recovery was attempted (i.e. backup_label exists) is really crucial. Knowing both just saves so much time in back and forth debugging. It appears the tally for back patching is: For: Andres, David, Michael B Not Sure: Robert, Laurenz, Michael P It seems at least nobody is dead set against it. We're talking about 1d35f705e, right? That certainly looks harmless and potentially useful. I'm +1 for back-patching. That's the one. If we were modifying existing messages I would be against it, but new, infrequent (but oh so helpful) messages seem fine. Regards, -David
Re: [PATCH] Add native windows on arm64 support
On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan wrote: > > On 2024-01-25 Th 16:17, Dave Cramer wrote: > > > > On Thu, 25 Jan 2024 at 16:04, Anthony Roberts > wrote: > >> Hi David, >> >> Unix "file" or "dumpbin /headers" in vcvarsall are your best bets. >> >> Thanks, >> Anthony >> > > > So there is another way, select the file in Windows Explorer and right > click, in the compatibility tab if the "Windows on ARM" is greyed out it is > an arm binary. > > So far mine are not :( > > > Yeah, I think the default Developer Command Prompt for VS2022 is set up > for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64". > Yup, now I'm in the same state you are Dave
Re: Guiding principle for dropping LLVM versions?
On 1/25/24 13:41, Thomas Munro wrote: On Thu, Jan 25, 2024 at 4:44 PM Thomas Munro wrote: ... A few build farm animals will now fail in the configure step as discussed, and need some adjustment (ie disable LLVM or upgrade to LLVM 10+ for the master branch). Owners pinged. I think I fixed up the 4 or 6 under my name... Regards, Mark
Re: speed up a logical replica setup
On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote: > I'll post a new one soon. I'm attaching another patch that fixes some of the issues pointed out by Hayato, Shlok, and Junwang. * publication doesn't exist. The analysis [1] was done by Hayato but I didn't use the proposed patch. Instead I refactored the code a bit [2] and call it from a new function (setup_publisher) that is called before the promotion. * fix wrong path name in the initial comment [3] * change terminology: logical replica -> physical replica [3] * primary / standby is ready for logical replication? setup_publisher() and setup_subscriber() check if required GUCs are set accordingly. For primary, it checks wal_level = logical, max_replication_slots has remain replication slots for the proposed setup and also max_wal_senders available. For standby, it checks max_replication_slots for replication origin and also remain number of background workers to start the subscriber. * retain option: I extracted this one from Hayato's patch [4] * target server must be a standby. It seems we agree that this restriction simplifies the code a bit but can be relaxed in the future (if/when base backup support is added.) * recovery timeout option: I decided to include it but I think the use case is too narrow. It helps in broken setups. However, it can be an issue in some scenarios like time-delayed replica, large replication lag, slow hardware, slow network. I didn't use the proposed patch [5]. Instead, I came up with a simple one that defaults to forever. The proposed one defaults to 60 seconds but I'm afraid that due to one of the scenarios I said in a previous sentence, we cancel a legitimate case. Maybe we should add a message during dry run saying that due to a replication lag, it will take longer to run. * refactor primary_slot_name code. With the new setup_publisher and setup_subscriber functions, I splitted the function that detects the primary_slot_name use into 2 pieces just to avoid extra connections to have the job done. * remove fallback_application_name as suggested by Hayato [5] because logical replication already includes one. I'm still thinking about replacing --subscriber-conninfo with separate items (username, port, password?, host = socket dir). Maybe it is an overengineering. The user can always prepare the environment to avoid unwanted and/or external connections. I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I didn't hear objections about it, I'll do it in the next patch. [1] https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com [2] https://www.postgresql.org/message-id/73ab86ca-3fd5-49b3-9c80-73d1525202f1%40app.fastmail.com [3] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com [4] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com [5] https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com -- Euler Taveira EDB https://www.enterprisedb.com/ From d9ef01a806c3d8697faa444283f19c2deaa58850 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Mon, 5 Jun 2023 14:39:40 -0400 Subject: [PATCH v9] Creates a new logical replica from a standby server A new tool called pg_subscriber can convert a physical replica into a logical replica. It runs on the target server and should be able to connect to the source server (publisher) and the target server (subscriber). The conversion requires a few steps. Check if the target data directory has the same system identifier than the source data directory. Stop the target server if it is running as a standby server. Create one replication slot per specified database on the source server. One additional replication slot is created at the end to get the consistent LSN (This consistent LSN will be used as (a) a stopping point for the recovery process and (b) a starting point for the subscriptions). Write recovery parameters into the target data directory and start the target server (Wait until the target server is promoted). Create one publication (FOR ALL TABLES) per specified database on the source server. Create one subscription per specified database on the target server (Use replication slot and publication created in a previous step. Don't enable the subscriptions yet). Sets the replication progress to the consistent LSN that was got in a previous step. Enable the subscription for each specified database on the target server. Remove the additional replication slot that was used to get the consistent LSN. Stop the target server. Change the system identifier from the target server. Depending on your workload and database size, creating a logical replica couldn't be an option due to resource constraints (WAL backlog should be available until all table data i
Re: Make COPY format extendable: Extract COPY TO format implementations
On Thu, Jan 25, 2024 at 05:45:43PM +0900, Sutou Kouhei wrote: > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Thu, 25 Jan 2024 12:17:55 +0900, > Michael Paquier wrote: >> +extern CopyToRoutine CopyToRoutineText; >> +extern CopyToRoutine CopyToRoutineCSV; >> +extern CopyToRoutine CopyToRoutineBinary; >> >> All that should IMO remain in copyto.c and copyfrom.c in the initial >> patch doing the refactoring. Why not using a fetch function instead >> that uses a string in input? Then you can call that once after >> parsing the List of options in ProcessCopyOptions(). > > OK. How about the following for the fetch function > signature? > > extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format); Or CopyToRoutineGet()? I am not wedded to my suggestion, got a bad history with naming things around here. > We may introduce an enum and use it: > > typedef enum CopyBuiltinFormat > { > COPY_BUILTIN_FORMAT_TEXT = 0, > COPY_BUILTIN_FORMAT_CSV, > COPY_BUILTIN_FORMAT_BINARY, > } CopyBuiltinFormat; > > extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format); I am not sure that this is necessary as the option value is a string. > Oh, sorry. I assumed that the comment style was adjusted by > pgindent. No worries, that's just something we get used to. I tend to fix a lot of these things by myself when editing patches. >> +getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); >> +fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); >> >> Actually, this split is interesting. It is possible for a custom >> format to plug in a custom set of out functions. Did you make use of >> something custom for your own stuff? > > I didn't. My PoC custom COPY format handler for Apache Arrow > just handles integer and text for now. It doesn't use > cstate->out_functions because cstate->out_functions may not > return a valid binary format value for Apache Arrow. So it > formats each value by itself. I mean, if you use a custom output function, you could tweak things even more with byteas or such.. If a callback is expected to do something, like setting the output function OIDs in the start callback, we'd better document it rather than letting that be implied. >> Actually, could it make sense to >> split the assignment of cstate->out_functions into its own callback? > > Yes. Because we need to use getTypeBinaryOutputInfo() for > "binary" and use getTypeOutputInfo() for "text" and "csv". Okay. After sleeping on it, a split makes sense here, because it also reduces the presence of TupleDesc in the start callback. >> Sure, that's part of the start phase, but at least it would make clear >> that a custom method *has* to assign these OIDs to work. The patch >> implies that as a rule, without a comment that CopyToStart *must* set >> up these OIDs. > > CopyToStart doesn't need to set up them if the handler > doesn't use cstate->out_functions. Noted. >> I think that 0001 and 0005 should be handled first, as pieces >> independent of the rest. Then we could move on with 0002~0004 and >> 0006~0008. > > OK. I'll focus on 0001 and 0005 for now. I'll restart > 0002-0004/0006-0008 after 0001 and 0005 are accepted. Once you get these, I'd be interested in re-doing an evaluation of COPY TO and more tests with COPY FROM while running Postgres on scissors. One thing I was thinking to use here is my blackhole_am for COPY FROM: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am As per its name, it does nothing on INSERT, so you could create a table using it as access method, and stress the COPY FROM execution paths without having to mount Postgres on a tmpfs because the data is sent to the void. Perhaps it does not matter, but that moves the tests to the bottlenecks we want to stress (aka the per-row callback for large data sets). I've switched the patch as waiting on author for now. Thanks for your perseverance here. I understand that's not easy to follow up with patches and reviews (^_^;) -- Michael signature.asc Description: PGP signature
Re: speed up a logical replica setup
On Thu, Jan 25, 2024, at 6:05 AM, Hayato Kuroda (Fujitsu) wrote: > 01. > ``` > /* Options */ > static char *pub_conninfo_str = NULL; > static SimpleStringList database_names = {NULL, NULL}; > static int wait_seconds = DEFAULT_WAIT; > static bool retain = false; > static bool dry_run = false; > ``` > > Just to confirm - is there a policy why we store the specified options? If you > want to store as global ones, username and port should follow (my fault...). > Or, should we have a structure to store them? It is a matter of style I would say. Check other client applications. Some of them also use global variable. There are others that group options into a struct. I would say that since it has a short lifetime, I don't think the current style is harmful. > 04. > ``` > {"dry-run", no_argument, NULL, 'n'}, > ``` > > I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the > which value would be changed based on the input. As for the pg_upgrade, it > checks > whether the node can be upgraded for now. I think, we should have the checking > feature, so it should be renamed to --check. Also, the process should exit > earlier > at that time. It is extremely useful because (a) you have a physical replication setup and don't know if it is prepared for logical replication, (b) check GUCs (is max_wal_senders sufficient for this pg_subscriber command? Or is max_replication_slots sufficient to setup the logical replication even though I already have some used replication slots?), (c) connectivity and (d) credentials. > 05. > I felt we should accept some settings from enviroment variables, like > pg_upgrade. > Currently, below items should be acceted. > > - data directory > - username > - port > - timeout Maybe PGDATA. > 06. > ``` > pg_logging_set_level(PG_LOG_WARNING); > ``` > > If the default log level is warning, there are no ways to output debug logs. > (-v option only raises one, so INFO would be output) > I think it should be PG_LOG_INFO. You need to specify multiple -v options. > 07. > Can we combine verifications into two functions, e.g., check_primary() and > check_standby/check_subscriber()? I think v9 does it. > 08. > Not sure, but if we want to record outputs by pg_subscriber, the sub-directory > should be created. The name should contain the timestamp. The log file already contains the timestamp. Why? > 09. > Not sure, but should we check max_slot_wal_keep_size of primary server? It can > avoid to fail starting of logical replicaiton. A broken physical replication *before* running this tool is its responsibility? Hmm. We might add another check that can be noticed during dry run mode. > 10. > ``` > nslots_new = nslots_old + dbarr.ndbs; > > if (nslots_new > max_replication_slots) > { > pg_log_error("max_replication_slots (%d) must be greater than or equal to " > "the number of replication slots required (%d)", max_replication_slots, > nslots_new); > exit(1); > } > ``` > > I think standby server must not have replication slots. Because subsequent > pg_resetwal command discards all the WAL file, so WAL records pointed by them > are removed. Currently pg_resetwal does not raise ERROR at that time. Again, dry run mode might provide a message for it. > 11. > ``` > /* > * Stop the subscriber if it is a standby server. Before executing the > * transformation steps, make sure the subscriber is not running because > * one of the steps is to modify some recovery parameters that require a > * restart. > */ > if (stat(pidfile, &statbuf) == 0) > ``` > > I kept just in case, but I'm not sure it is still needed. How do you think? > Removing it can reduce an inclusion of pidfile.h. Are you suggesting another way to check if the standby is up and running? > 12. > ``` > pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s", > standby.bindir, standby.pgdata); > rc = system(pg_ctl_cmd); > pg_ctl_status(pg_ctl_cmd, rc, 0); > ``` > > > There are two places to stop the instance. Can you divide it into a function? Yes. > 13. > ``` > * A temporary replication slot is not used here to avoid keeping a > * replication connection open (depending when base backup was taken, the > * connection should be open for a few hours). > */ > conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname); > if (conn == NULL) > exit(1); > consistent_lsn = create_logical_replication_slot(conn, true, > &dbarr.perdb[0]); > ``` > > I didn't notice the comment, but still not sure the reason. Why we must > reserve > the slot until pg_subscriber finishes? IIUC, the slot would be never used, it > is created only for getting a consistent_lsn. So we do not have to keep. > Also, just before, logical replication slots for each databases are created, > so > WAL records are surely reserved. This comment needs to be updated. It was written at the time I was pursuing base backup support too. It doesn't matter if you remove this transient replication slot earlier because all of the replication slots cr
Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
On Tue, 2 Jan 2024 08:00:00 +0800 jian he wrote: > On Mon, Nov 6, 2023 at 8:00 AM jian he wrote: > > > > minor doc issues. > > Returns the chunk id of the TOASTed value, or NULL if the value is not > > TOASTed. > > Should it be "chunk_id"? Thank you for your suggestion. As you pointed out, it is called "chunk_id" in the documentation, so I rewrote it and also added a link to the section where the TOAST table structure is explained. > > you may place it after pg_create_logical_replication_slot entry to > > make it look like alphabetical order. I've been thinking about where we should place the function in the doc, and I decided place it in the table of Database Object Size Functions because I think pg_column_toast_chunk_id also would assist understanding the result of size functions as similar to pg_column_compression; that is, those function can explain why a large value in size could be stored in a column. > > There is no test. maybe we can add following to > > src/test/regress/sql/misc.sql > > create table val(t text); > > INSERT into val(t) SELECT string_agg( > > chr((ascii('B') + round(random() * 25)) :: integer),'') > > FROM generate_series(1,2500); > > select pg_column_toast_chunk_id(t) is not null from val; > > drop table val; Thank you for the test proposal. However, if we add a test, I want to check that the chunk_id returned by the function exists in the TOAST table, and that it returns NULL if the values is not TOASTed. For the purpose, I wrote a test using a dynamic SQL since the table name of the TOAST table have to be generated from the main table's OID. > Hi > the main C function (pg_column_toast_chunk_id) I didn't change. > I added tests as mentioned above. > tests put it on src/test/regress/sql/misc.sql, i hope that's fine. > I placed pg_column_toast_chunk_id in "Table 9.99. Database Object > Location Functions" (below Table 9.98. Database Object Size > Functions). I could not find any change in your patch from my previous patch. Maybe, you attached wrong file. I attached a patch updated based on your review, including the documentation fixes and a test. What do you think about this it? Regards, Yugo Nagata -- Yugo NAGATA >From 97af1b2300ecf07a34923da87a9d84e6aa963956 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 29 Mar 2023 09:59:25 +0900 Subject: [PATCH v3] Add pg_column_toast_chunk_id function This function returns the chunk_id of an on-disk TOASTed value, or NULL if the value is un-TOASTed or not on disk. This enables users to know which columns are actually TOASTed. This function is also useful to identify a problematic row when an error like "ERROR: unexpected chunk number ... (expected ...) for toast value" occurs. --- doc/src/sgml/func.sgml | 17 ++ src/backend/utils/adt/varlena.c | 41 + src/include/catalog/pg_proc.dat | 3 +++ 3 files changed, 61 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 210c7c0b02..2d82331323 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28078,6 +28078,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset + + + + pg_column_toast_chunk_id + +pg_column_toast_chunk_id ( "any" ) +oid + + +Shows the chunk_id of an on-disk +TOASTed value. Returns NULL +if the value is un-TOASTed or not on-disk. +See for details about +TOAST. + + + diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 543afb66e5..84d36781a4 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(cstring_to_text(result)); } +/* + * Return the chunk_id of the on-disk TOASTed value. + * Return NULL if the value is unTOASTed or not on disk. + */ +Datum +pg_column_toast_chunk_id(PG_FUNCTION_ARGS) +{ + int typlen; + struct varlena *attr; + struct varatt_external toast_pointer; + + /* On first call, get the input type's typlen, and save at *fn_extra */ + if (fcinfo->flinfo->fn_extra == NULL) + { + /* Lookup the datatype of the supplied argument */ + Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0); + + typlen = get_typlen(argtypeid); + if (typlen == 0) /* should not happen */ + elog(ERROR, "cache lookup failed for type %u", argtypeid); + + fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt, + sizeof(int)); + *((int *) fcinfo->flinfo->fn_extra) = typlen; + } + else + typlen = *((int *) fcinfo->flinfo->fn_extra); + + if (typlen != -1) + PG_RETURN_NULL(); + + attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0)); + + if (!VARATT_IS_EXTERNAL_ONDISK(attr)) + PG_RETURN_NULL(); + + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); + + PG_RETURN_OID(toast_pointer.va_va
Rename setup_cancel_handler in pg_dump
Hi, Attached is a simple patch to rename setup_cancel_handler() in pg_dump/parallel.c. I am proposing it because there is a public function with the same name in fe_utils/cancel.c. I know pg_dump/parallel.c does not include fe_utils/cancel.h, so there is no conflict, but I think it is better to use different names to reduce possible confusion. I guess there was no concerns when setup_cancel_handler in pg_dump/parallel.c was introduced because the same name function was not in fe_utils that could be used in common between client tools.. The public setup_cancel_handler in fe_utils was introduced in a4fd3aa719e, where this function was moved from psql. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c index 188186829c..261b23cb3f 100644 --- a/src/bin/pg_dump/parallel.c +++ b/src/bin/pg_dump/parallel.c @@ -204,7 +204,7 @@ static ParallelSlot *GetMyPSlot(ParallelState *pstate); static void archive_close_connection(int code, void *arg); static void ShutdownWorkersHard(ParallelState *pstate); static void WaitForTerminatingWorkers(ParallelState *pstate); -static void setup_cancel_handler(void); +static void pg_dump_setup_cancel_handler(void); static void set_cancel_pstate(ParallelState *pstate); static void set_cancel_slot_archive(ParallelSlot *slot, ArchiveHandle *AH); static void RunWorker(ArchiveHandle *AH, ParallelSlot *slot); @@ -550,7 +550,7 @@ sigTermHandler(SIGNAL_ARGS) /* * Some platforms allow delivery of new signals to interrupt an active * signal handler. That could muck up our attempt to send PQcancel, so - * disable the signals that setup_cancel_handler enabled. + * disable the signals that pg_dump_setup_cancel_handler enabled. */ pqsignal(SIGINT, SIG_IGN); pqsignal(SIGTERM, SIG_IGN); @@ -605,7 +605,7 @@ sigTermHandler(SIGNAL_ARGS) * Enable cancel interrupt handler, if not already done. */ static void -setup_cancel_handler(void) +pg_dump_setup_cancel_handler(void) { /* * When forking, signal_info.handler_set will propagate into the new @@ -705,7 +705,7 @@ consoleHandler(DWORD dwCtrlType) * Enable cancel interrupt handler, if not already done. */ static void -setup_cancel_handler(void) +pg_dump_setup_cancel_handler(void) { if (!signal_info.handler_set) { @@ -737,7 +737,7 @@ set_archive_cancel_info(ArchiveHandle *AH, PGconn *conn) * important that this happen at least once before we fork off any * threads. */ - setup_cancel_handler(); + pg_dump_setup_cancel_handler(); /* * On Unix, we assume that storing a pointer value is atomic with respect
Re: Use of backup_label not noted in log
On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote: > I would still advocate for a back patch here. It is frustrating to get logs > from users that just say: > > LOG: invalid checkpoint record > PANIC: could not locate a valid checkpoint record > > It would be very helpful to know what the checkpoint record LSN was in this > case. Yes, I've pested over this one in the past when debugging corruption issues. To me, this would just mean to appens to the PANIC an "at %X/%X", but perhaps you have more in mind for these code paths? -- Michael signature.asc Description: PGP signature
RE: Fix inappropriate comments in function ReplicationSlotAcquire
On Thu, Jan 25, 2024 at 20:33 Amit Kapila wrote: > On Thu, Jan 25, 2024 at 4:01 PM Wei Wang (Fujitsu) > wrote: > > > > > > Yes, agree. I think these two parts have become slightly outdated after the > > commit 1632ea4. So also tried to fix the first part of the comment. > > Attach the new patch. > > > > How about changing it to something simple like: > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c > index f2781d0455..84c257a7aa 100644 > --- a/src/backend/replication/slot.c > +++ b/src/backend/replication/slot.c > @@ -465,10 +465,7 @@ retry: > > LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > - /* > -* Search for the slot with the specified name if the slot to acquire > is > -* not given. If the slot is not found, we either return -1 or > error out. > -*/ > +/* Check if the slot exits with the given name. */ > s = SearchNamedReplicationSlot(name, false); > if (s == NULL || !s->in_use) > { It looks good to me. So, I updated the patch as suggested. Regards, Wang Wei v3-0001-Fix-inappropriate-comments-in-function-Replicatio.patch Description: v3-0001-Fix-inappropriate-comments-in-function-Replicatio.patch
Re: [PATCH] Add native windows on arm64 support
On Thu, Jan 25, 2024 at 04:52:30PM -0500, Dave Cramer wrote: > On Thu, 25 Jan 2024 at 16:32, Andrew Dunstan wrote: >> On 2024-01-25 Th 16:17, Dave Cramer wrote: >> Yeah, I think the default Developer Command Prompt for VS2022 is set up >> for x86 builds. AIUI you should start by executing "vcvarsall x64_arm64". > > Yup, now I'm in the same state you are Wait a minute here. Based on [1], x64_arm64 means you can use a x64 host and you'll be able to produce ARM64 builds, still these will not be able to run on the host where they were built. How much of the patch posted upthread is required to produce such builds? Basically everything from it, I guess, so as build dependencies can be satisfied? [1]: https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170 -- Michael signature.asc Description: PGP signature
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 20 Dec 2023 at 19:17, Jelte Fennema-Nio wrote: > > On Thu, 14 Dec 2023 at 13:57, Jelte Fennema-Nio wrote: > > I changed all the places that were not adhering to those spellings. > > It seems I forgot a /g on my sed command to do this so it turned out I > missed one that caused the test to fail to compile... Attached is a > fixed version. > > I also updated the patchset to use the EOF detection provided by > 0a5c46a7a488f2f4260a90843bb9de6c584c7f4e instead of introducing a new > way of EOF detection using a -2 return value. CFBot shows that the patch does not apply anymore as in [1]: patching file doc/src/sgml/libpq.sgml ... patching file src/interfaces/libpq/exports.txt Hunk #1 FAILED at 191. 1 out of 1 hunk FAILED -- saving rejects to file src/interfaces/libpq/exports.txt.rej patching file src/interfaces/libpq/fe-connect.c Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3511.log Regards, Vignesh
Re: Printing backtrace of postgres processes
On Sun, 5 Nov 2023 at 01:49, Bharath Rupireddy wrote: > > On Thu, Jul 20, 2023 at 8:22 PM Daniel Gustafsson wrote: > > > > > On 11 Jan 2023, at 15:44, Bharath Rupireddy > > > wrote: > > > > > > On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy > > > wrote: > > >> > > >> I'm attaching the v22 patch set for further review. > > > > > > Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2. > > > Attaching v23 patch set for further review. > > > > This thread has stalled for well over 6 months with the patch going from CF > > to > > CF. From skimming the thread it seems that a lot of the details have been > > ironed out with most (all?) objections addressed. Is any committer > > interested > > in picking this up? If not we should probably mark it returned with > > feedback. > > Rebase needed due to function oid clash. Picked the new OID with the > help of src/include/catalog/unused_oids. PSA v24 patch set. Rebase needed due to changes in parallel_schedule. PSA v25 patch set. Regards, Vignesh From 32fa1d898b2599b836a2265f746acf230ffb358e Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 4 Nov 2023 18:06:36 + Subject: [PATCH v25 1/2] Move sending multiplexed-SIGUSR1 signal code to a function Add a new function hosting the common code for sending multiplexed-SIGUSR1 signal to a backend process. This function will also be used as-is by an upcoming commit reducing the code duplication. --- src/backend/storage/ipc/procarray.c | 60 + src/backend/utils/adt/mcxtfuncs.c | 49 ++- src/include/storage/procarray.h | 1 + 3 files changed, 64 insertions(+), 46 deletions(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index ee2d7f8585..f6465529e7 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -3097,6 +3097,66 @@ HaveVirtualXIDsDelayingChkpt(VirtualTransactionId *vxids, int nvxids, int type) return result; } +/* + * SendProcSignalBackendOrAuxproc -- check if the process with given pid is a + * backend or an auxiliary process and send it the SIGUSR1 signal for a given + * reason. + * + * Returns true if sending the signal was successful, false otherwise. + */ +bool +SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason) +{ + PGPROC *proc; + BackendId backendId = InvalidBackendId; + + proc = BackendPidGetProc(pid); + + /* + * See if the process with given pid is a backend or an auxiliary process. + * + * If the given process is a backend, use its backend id in + * SendProcSignal() later to speed up the operation. Otherwise, don't do + * that because auxiliary processes (except the startup process) don't + * have a valid backend id. + */ + if (proc != NULL) + backendId = proc->backendId; + else + proc = AuxiliaryPidGetProc(pid); + + /* + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid + * isn't valid; but by the time we reach kill(), a process for which we + * get a valid proc here might have terminated on its own. There's no way + * to acquire a lock on an arbitrary process to prevent that. But since + * this mechanism is usually used to debug a backend or an auxiliary + * process running and consuming lots of memory or a long running process, + * that it might end on its own first and its memory contexts are not + * logged or backtrace not logged is not a problem. + */ + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, +(errmsg("PID %d is not a PostgreSQL server process", pid))); + return false; + } + + if (SendProcSignal(pid, reason, backendId) < 0) + { + /* Again, just a warning to allow loops */ + ereport(WARNING, +(errmsg("could not send signal to process %d: %m", pid))); + return false; + } + + return true; +} + /* * BackendPidGetProc -- get a backend's PGPROC given its PID * diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index 4708d73f5f..f7b4f8dac1 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -145,51 +145,8 @@ Datum pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) { int pid = PG_GETARG_INT32(0); - PGPROC *proc; - BackendId backendId = InvalidBackendId; + bool result; - proc = BackendPidGetProc(pid); - - /* - * See if the process with given pid is a backend or an auxiliary process. - * - * If the given process is a backend, use its backend id in - * SendProcSignal() later to speed up the operation. Otherwise, don't do - * that because auxiliary processes (except the startup process) don't - * have a valid backend id. - */ - if (proc != NULL) - backendId = proc->backendId; - else - proc = AuxiliaryPidGetProc(pid); - - /* - * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid - * isn't valid; but by the time w