Re: XLog size reductions: Reduced XLog record header size for PG17
On Tue, Sep 19, 2023 at 12:07:07PM +0200, Matthias van de Meent wrote: > V5 is a rebased version of v4, and includes the latest patch from > "smaller XLRec block header" [0] as 0001. 0001 and 0007 are the meat of the changes. -#define XLR_CHECK_CONSISTENCY 0x02 +#define XLR_CHECK_CONSISTENCY (0x20) I can't help but notice that there are a few stylistic choices like this one that are part of the patch. Using parenthesis in the case of hexa values is inconsistent with the usual practices I've seen in the tree. #define COPY_HEADER_FIELD(_dst, _size)\ do {\ -if (remaining < _size)\ +if (remaining < (_size))\ goto shortdata_err;\ There are a couple of stylistic changes like this one, that I guess could just use their own patch to make these macros easier to use. -#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info) +#define XLogRecGetInfo(decoder) ((decoder)->record->header.xl_info & XLR_INFO_MASK) +#define XLogRecGetRmgrInfo(decoder) (((decoder)->record->header.xl_info) & XLR_RMGR_INFO_MASK) This stuff in 0002 is independent of 0001, am I right? Doing this split with an extra macro is okay by me, reducing the presence of XLR_INFO_MASK and bitwise operations based on it. 0003 is also mechanical, but if you begin to enforce the use of XLR_RMGR_INFO_MASK as the bits allowed to be passed down to the RMGR identity callback, we should have at least a validity check to make sure that nothing, even custom RMGRs, pass down unexpected bits? I am not convinced that XLOG_INCLUDE_XID is a good interface, TBH, and I fear that people are going to forget to set it. Wouldn't it be better to use an option where the XID is excluded instead, making the inclusing the an XID the default? > The resource manager has ID = 0, thus requiring some special > handling in other code. Apart from being generally useful, it is > used in future patches to detect the end of wal in lieu of a zero-ed > fixed-size xl_tot_len field. Err, no, that may not be true. See for example this thread where the topic of improving the checks of xl_tot_len and rely on this value on when a record header has been validated, even across page borders: https://www.postgresql.org/message-id/17928-aa92416a70ff4...@postgresql.org Except that, in which cases could an invalid RMGR be useful? -- Michael signature.asc Description: PGP signature
Re: Oversight in reparameterize_path_by_child leading to executor crash
On 23/8/2023 12:37, Richard Guo wrote: If we go with the "tablesample scans can't be reparameterized" approach in the back branches, I'm a little concerned that what if we find more cases in the futrue where we need modify RTEs for reparameterization. So I spent some time seeking and have managed to find one: there might be lateral references in a scan path's restriction clauses, and currently reparameterize_path_by_child fails to adjust them. It may help you somehow: in [1], we designed a feature where the partitionwise join technique can be applied to a JOIN of partitioned and non-partitioned tables. Unfortunately, it is out of community discussions, but we still support it for sharding usage - it is helpful for the implementation of 'global' tables in a distributed configuration. And there we were stuck into the same problem with lateral relids adjustment. So you can build a more general view of the problem with this patch. [1] Asymmetric partition-wise JOIN https://www.postgresql.org/message-id/flat/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ%40mail.gmail.com -- regards, Andrey Lepikhov Postgres Professional
Re: disfavoring unparameterized nested loops
On 29/9/2022 21:32, Benjamin Coutu wrote: I'd like to revamp this important discussion. As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks at Postgres) "estimation errors quickly grow as the number of joins increases, and that these errors are usually the reason for bad plans" - I think we can all get behind that statement. While nested loop joins work great when cardinality estimates are correct, they are notoriously bad when the optimizer underestimates and they degrade very fast in such cases - the good vs. bad here is very asymmetric. On the other hand hash joins degrade much more gracefully - they are considered very robust against underestimation. The above mentioned paper illustrates that all mayor DBMS (including Postgres) tend to underestimate and usually that underestimation increases drastically with the number of joins (see Figures 3+4 of the paper). Now, a simple approach to guarding against bad plans that arise from underestimation could be to use what I would call a nested-loop-conviction-multiplier based on the current depth of the join tree, e.g. for a base table that multiplier would obviously be 1, but would then grow (e.g.) quadratically. That conviction-multiplier would *NOT* be used to skew the cardinality estimates themselves, but rather be applied to the overall nested loop join cost at each particular stage of the plan when comparing it to other more robust join strategies like hash or sort-merge joins. That way when we can be sure to have a good estimate at the bottom of the join tree we treat all things equal, but favor nested loops less and less as we move up the join tree for the sake of robustness. Also, we can expand the multiplier whenever we fall back to using the default cardinality constant as surely all bets are off at that point - we should definitely treat nested loop joins as out of favor in this instance and that could easily be incorporated by simply increasing the conviction-mutliplier. What are your thoughts on this simple idea - is it perhaps too simple? In my practice, parameterized nested loop reduces, sometimes drastically, execution time. If your query touches a lot of tables but extracts only a tiny part of the data, and you have good coverage by indexes, PNL works great. Moreover, I have pondered extending parameterization through subqueries and groupings. What could you say about a different way: hybrid join? In MS SQL Server, they have such a feature [1], and, according to their description, it requires low overhead. They start from HashJoin and switch to NestLoop if the inner input contains too small tuples. It solves the issue, Isn't it? [1] https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411 -- regards, Andrey Lepikhov Postgres Professional
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for the review! Current status of the patch is: - IOOBJECT_WAL / IOCONTEXT_NORMAL read, write and fsync stats are added. - IOOBJECT_WAL / IOCONTEXT_NORMAL write and fsync tests are added. - IOOBJECT_WAL / IOCONTEXT_INIT stats are added. - pg_stat_io shows different op_bytes for the IOOBJECT_WAL operations. - Working on which 'BackendType / IOContext / IOOp' should be banned in pg_stat_io. - PendingWalStats.wal_sync and PendingWalStats.wal_write_time / PendingWalStats.wal_sync_time are moved to pgstat_count_io_op_n() / pgstat_count_io_op_time() respectively. TODOs: - Documentation. - Try to set op_bytes for BackendType / IOContext. - Decide which 'BackendType / IOContext / IOOp' should not be tracked. - Add IOOBJECT_WAL / IOCONTEXT_NORMAL read tests. - Add IOOBJECT_WAL / IOCONTEXT_INIT tests. I am adding tracking of BackendType / IOContext / IOOp as tables, empty cell means it is not decided yet: IOCONTEXT_NORMAL / Backend / IOOp table: ╔═╦═══╦═══╦═══╗ ║ IOCONTEXT_NORMAL║ read ║ write ║ fsync ║ ╠═╬═══╬═══╬═══╣ ║ autovacuum launcher ║ FALSE ║ FALSE ║ FALSE ║ ╠═╬═══╬═══╬═══╣ ║ autovacuum worker ║ FALSE ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╬═══╣ ║ client backend ║ ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╬═══╣ ║ background worker ║ ║ ║ ║ ╠═╬═══╬═══╬═══╣ ║ background writer ║ ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╬═══╣ ║ checkpointer║ ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╬═══╣ ║ standalone backend ║ TRUE ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╬═══╣ ║ startup ║ TRUE ║ ║ ║ ╠═╬═══╬═══╬═══╣ ║ walreceiver ║ ║ ║ ║ ╠═╬═══╬═══╬═══╣ ║ walsender ║ ║ ║ ║ ╠═╬═══╬═══╬═══╣ ║ walwriter ║ ║ TRUE ║ TRUE ║ ╚═╩═══╩═══╩═══╝ IOCONTEXT_WAL_INIT / Backend / IOOp table: ╔═╦═══╦═══╗ ║ IOCONTEXT_WAL_INIT ║ write ║ fsync ║ ╠═╬═══╬═══╣ ║ autovacuum launcher ║ ║ ║ ╠═╬═══╬═══╣ ║ autovacuum worker ║ ║ ║ ╠═╬═══╬═══╣ ║ client backend ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╣ ║ background worker ║ ║ ║ ╠═╬═══╬═══╣ ║ background writer ║ ║ ║ ╠═╬═══╬═══╣ ║ checkpointer║ ║ ║ ╠═╬═══╬═══╣ ║ standalone backend ║ TRUE ║ TRUE ║ ╠═╬═══╬═══╣ ║ startup ║ ║ ║ ╠═╬═══╬═══╣ ║ walreceiver ║ ║ ║ ╠═╬═══╬═══╣ ║ walsender ║ ║ ║ ╠═╬═══╬═══╣ ║ walwriter ║ ║ ║ ╚═╩═══╩═══╝ On Wed, 9 Aug 2023 at 21:52, Melanie Plageman wrote: > > > On Sat, 22 Jul 2023 at 01:30, Melanie Plageman > > wrote: > > > I think it would be good to count WAL reads even though they are not > > > currently represented in pg_stat_wal. Here is a thread discussing this > > > [1]. > > > > I used the same implementation in the thread link [1]. I added 'WAL > > read' to only xlogrecovery.c for now. I didn't add 'WAL read' to > > xlogreader.c and walsender.c because they cause some failures on: > > '!pgStatLocal.shmem->is_shutdown' asserts. I will spend more time on > > these. Also, I added Bharath to CC. I have a question about 'WAL > > read': > > 1. There are two places where 'WAL read' happens. > > a. In WALRead() in xlogreader.c, it reads 'count' bytes, most of the > > time count is equal to XLOG_BLCKSZ but there are some cases it is not. > > For example > > - in XLogSendPhysical() in walsender.c WALRead() is called by nbytes > > - in WALDumpReadPage() in pg_waldump.c WALRead() is called by count > > These nbytes and count variables could be different from XLOG_BLCKSZ. > > > > b. in XLogPageRead() in xlogreader.c, it reads exactly XLOG_BLCKSZ bytes: > > pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff); > > > > So, what should op_bytes be set to for 'WAL read' operations? > > If there is any combination of BackendType and IOContext which will > always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's > op_bytes. For other cases, we may have to consider using op_bytes 1 and > tracking reads and write IOOps in number of bytes (instead of number of > pages). I don't actually know if there is a clear separation by > BackendType for these different cases. I agree. I will edit that later, added to TODOs. > > The other alternative I see
Re: pg_rewind with cascade standby doesn't work well
On 2023/09/20 12:04, Michael Paquier wrote: This is a known issue. I guess that the same as this thread and this CF entry: https://commitfest.postgresql.org/44/4244/ https://www.postgresql.org/message-id/flat/zarvomifjze7f...@paquier.xyz I think this is a separate issue, and we should still use Kuwamura-san's patch even after the one you posted on the thread gets accepted. BTW, I was able to reproduce the assertion failure Kuwamura-san reported, even after applying your latest patch from the thread. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Wed, Sep 20, 2023 at 12:12 PM Amit Kapila wrote: > > On Wed, Sep 20, 2023 at 11:51 AM Dilip Kumar wrote: > > > > On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu) > > wrote: > > > > > > Dear Amit, > > > > > > Thank you for reviewing! PSA new version. In this version I ran pgindent > > > again. > > > > > > > + /* > > + * There is a possibility that following records may be generated > > + * during the upgrade. > > + */ > > + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) && > > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) && > > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) && > > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) && > > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) && > > + !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) && > > + !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE)) > > + is_valid = false; > > + > > + CHECK_FOR_INTERRUPTS(); > > > > Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands > > can not occur during the upgrade? > > > > Because autovacuum is disabled during upgrade. See comment: "Use -b to > disable autovacuum" in start_postmaster(). Okay got it, thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Fix error handling in be_tls_open_server()
> On 19 Sep 2023, at 10:06, Sergey Shinderuk wrote: > > On 19.09.2023 03:54, Michael Paquier wrote: >> One doubt that I have is if we shouldn't let X509_NAME_print_ex() be >> as it is now, and not force a failure on the bio if this calls fails. > > If malloc fails inside X509_NAME_print_ex, then we will be left with empty > port->peer_dn. Looking at the OpenSSL code, there a other (albeit esoteric) errors that return -1 as well. I agree that we should handle this error. X509_NAME_print_ex is not documented to return -1 in OpenSSL 1.0.2 but reading the code it's clear that it does, so checking for -1 is safe for all supported OpenSSL versions (supported by us that is). Attached is a v2 on top of HEAD with commit message etc, which I propose to backpatch to v15 where it was introduced. -- Daniel Gustafsson v2-0001-Avoid-potential-pfree-on-NULL-on-OpenSSL-errors.patch Description: Binary data
Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }
Hi st 20. 9. 2023 v 9:34 odesílatel Maciek Sakrejda napsal: > On Tue, Sep 19, 2023 at 5:56 PM Jeff Davis wrote: > >... > > On Tue, 2023-09-19 at 11:41 -0400, Robert Haas wrote: > > > That leads to a second idea, which is having it continue > > > to be a GUC but only affect directly-entered SQL, with all > > > indirectly-entered SQL either being stored as a node tree or having a > > > search_path property attached somewhere. > > > > That's not too far from the proposed patch and I'd certainly be > > interested to hear more and/or adapt my patch towards this idea. > > As an interested bystander, that's the same thing I was thinking when > reading this. I reread your original e-mail, Jeff, and I still think > that. > > I wonder if something like CURRENT (i.e., the search path at function > creation time) might be a useful keyword addition. I can see some uses > (more forgiving than SYSTEM but not as loose as SESSION), but I don't > know if it would justify its presence. > Personally, I dislike this - because the value of the search path is hidden in this case. I agree so it can be comfortable, but it can be confusing for review, migration, ... Regards Pavel > Thanks for working on this. > > Thanks, > Maciek > > >
Re: Fix error handling in be_tls_open_server()
On 20.09.2023 11:42, Daniel Gustafsson wrote: Attached is a v2 on top of HEAD with commit message etc, which I propose to backpatch to v15 where it was introduced. There is a typo: upon en error. Otherwise, looks good to me. Thank you. -- Sergey Shinderukhttps://postgrespro.com/
Re: Fix error handling in be_tls_open_server()
> On 20 Sep 2023, at 10:55, Sergey Shinderuk wrote: > > On 20.09.2023 11:42, Daniel Gustafsson wrote: >> Attached is a v2 on top of HEAD with commit message etc, which I propose to >> backpatch to v15 where it was introduced. > There is a typo: upon en error. Otherwise, looks good to me. Thank you. Thanks, will fix before pushing. -- Daniel Gustafsson
Re: There should be a way to use the force flag when restoring databases
On 06.08.23 21:39, Ahmed Ibrahim wrote: I have addressed the pg version compatibility with the FORCE option in drop. Here is the last version of the patch The patch is pretty small, but I think there is some disagreement whether we want this option at all? Maybe some more people can make their opinions more explicit?
Re: POC, WIP: OR-clause support for indexes
On 29.08.23 05:37, a.rybakina wrote: Thank you for your interest in this problem and help, and I'm sorry that I didn't respond to this email for a long time. To be honest, I wanted to investigate the problems in more detail and already answer more clearly, but unfortunately I have not found anything more significant yet. What is the status of this patch? It is registered in the commitfest. It looks like a stalled research project? The last posted patch doesn't contain any description or tests, so it doesn't look very ready.
Re: bug fix and documentation improvement about vacuumdb
Thank you for all your reviews! >>> PATTERN should be changed to SCHEMA because -n and -N options don't support >>> pattern matching for schema names. The attached patch 0001 fixes this. >> >> True, there is no pattern matching performed. I wonder if it's worth lifting >> the pattern matching from pg_dump into common code such that tools like this >> can use it? > > I agree that this should be changed to SCHEMA. It might be tough to add > pattern matching with the current catalog query, and I don't know whether > there is demand for such a feature, but I wouldn't discourage someone from > trying. I think that supporting pattern matching is quite nice. But it will be not only tough but also a breaking change, I wonder. So I guess this change should be commited either way. >>> Yeah, I think we can fix the JOIN as you suggest. I quickly put a patch >>> together to demonstrate. > > Looks good from a quick skim. I do agree with this updates. Thank you! >> We should probably add some tests... > > Agreed. The attached patch includes new tests for this bug. Also, I fixed the current test for -N option seems to be incorrect. >>> It seems to work fine. However, if we're aiming for consistent >>> spacing, the "IS NULL" (two spaces in between) might be an concern. >> >> I don't think that's a problem. I would rather have readable C code and two >> spaces in the generated SQL than contorting the C code to produce less >> whitespace in a query few will read in its generated form. > > I think we could pretty easily avoid the extra space and keep the C code > relatively readable. These sorts of things bug me, too (see 2af3336). Though I don't think it affects readability, I'm neutral about this. >> >> Third, for the description of the -N option, I wonder if "vacuum all tables except >> >> in the specified schema(s)" might be clearer. The current one says nothing about >> >> tables not in the specified schema. >> > >> > Maybe, but the point of vacuumdb is to analyze a database so I'm not sure who >> > would expect anything else than vacuuming everything but the excluded schema >> > when specifying -N. What else could "vacuumdb -N foo" be interpreted to do >> > that can be confusing? >> >> I agree with Daniel on this one. > > +1. That make sense. I retract my suggestion. v1_vacuumdb_add_tests.patch Description: Binary data
Re: disfavoring unparameterized nested loops
On Wed, 20 Sept 2023 at 19:56, Andrey Lepikhov wrote: > What could you say about a different way: hybrid join? In MS SQL Server, > they have such a feature [1], and, according to their description, it > requires low overhead. They start from HashJoin and switch to NestLoop > if the inner input contains too small tuples. It solves the issue, Isn't it? A complexity which you may not be considering here is that Nested Loop joins always preserve the tuple order from the outer side of the join, whereas hash joins will not do this when multi-batching. I've no idea how the SQL Server engineers solved that. David > [1] > https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411
Re: logical decoding and replication of sequences, take 2
On Wed, Aug 16, 2023 at 7:57 PM Tomas Vondra wrote: > I was reading through 0001, I noticed this comment in ReorderBufferSequenceIsTransactional() function + * To decide if a sequence change should be handled as transactional or applied + * immediately, we track (sequence) relfilenodes created by each transaction. + * We don't know if the current sub-transaction was already assigned to the + * top-level transaction, so we need to check all transactions. It says "We don't know if the current sub-transaction was already assigned to the top-level transaction, so we need to check all transactions". But IIRC as part of the steaming of in-progress transactions we have ensured that whenever we are logging the first change by any subtransaction we include the top transaction ID in it. Refer this code LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *record) { ... /* * If the top-level xid is valid, we need to assign the subxact to the * top-level xact. We need to do this for all records, hence we do it * before the switch. */ if (TransactionIdIsValid(txid)) { ReorderBufferAssignChild(ctx->reorder, txid, XLogRecGetXid(record), buf.origptr); } } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PoC] Reducing planning time when tables have many partitions
Hello Ashutosh and Andrey, Thank you for your email, and I really apologize for my late response. On Thu, Sep 7, 2023 at 3:43 PM Ashutosh Bapat wrote: > It seems that you are still investigating and fixing issues. But the > CF entry is marked as "needs review". I think a better status is > "WoA". Do you agree with that? Yes, I am now investigating and fixing issues. I agree with you and changed the entry's status to "Waiting on Author". Thank you for your advice. On Tue, Sep 19, 2023 at 5:21 PM Andrey Lepikhov wrote: > Working on self-join removal in the thread [1] nearby, I stuck into the > problem, which made an additional argument to work in this new direction > than a couple of previous ones. > With indexing positions in the list of equivalence members, we make some > optimizations like join elimination more complicated - it may need to > remove some clauses and equivalence class members. > For changing lists of derives or ec_members, we should go through all > the index lists and fix them, which is a non-trivial operation. Thank you for looking into this and pointing that out. I understand that this problem will occur somewhere like your patch [1] quoted below because we need to modify RelOptInfo->eclass_child_members in addition to ec_members. Is my understanding correct? (Of course, I know ec_[no]rel_members, but I doubt we need them.) = +static void +update_eclass(EquivalenceClass *ec, int from, int to) +{ + List *new_members = NIL; + ListCell *lc; + + foreach(lc, ec->ec_members) + { + EquivalenceMember *em = lfirst_node(EquivalenceMember, lc); + boolis_redundant = false; + ... + + if (!is_redundant) + new_members = lappend(new_members, em); + } + + list_free(ec->ec_members); + ec->ec_members = new_members; = I think we may be able to remove the eclass_child_members field by making child members on demand. v20 makes child members at add_[child_]join_rel_equivalences() and adds them into RelOptInfo->eclass_child_members. Instead of doing that, if we translate on demand when child members are requested, RelOptInfo->eclass_child_members is no longer necessary. After that, there is only ec_members, which consists of parent members, so removing clauses will still be simple. Do you think this idea will solve your problem? If so, I will experiment with this and share a new patch version. The main concern with this idea is that the same child member will be created many times, wasting time and memory. Some techniques like caching might solve this. [1] https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru -- Best regards, Yuya Watari
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Wed, Sep 20, 2023 at 12:16 PM Amit Kapila wrote: > > On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu) > wrote: > > > > Dear Amit, > > +int > +count_old_cluster_logical_slots(void) > +{ > + int dbnum; > + int slot_count = 0; > + > + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) > + slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots; > + > + return slot_count; > +} > > In this code, aren't we assuming that 'slot_arr.nslots' will be zero > for versions <=PG16? On my Windows machine, this value is not zero but > rather some uninitialized negative value which makes its caller try to > allocate some undefined memory and fail. I think you need to > initialize this in get_old_cluster_logical_slot_infos() for lower > versions. > +{ oid => '8046', descr => 'for use by pg_upgrade', + proname => 'binary_upgrade_validate_wal_records', + prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool', + proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}', + proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}', + prosrc => 'binary_upgrade_validate_wal_records' }, In this many of the fields seem bogus. For example, we don't need prorows => '10', proretset => 't' for this function. Similarly proargmodes also look incorrect as we don't have any out parameter. -- With Regards, Amit Kapila.
Re: Comment about set_join_pathlist_hook()
On Wed, 20 Sept 2023 at 22:06, Etsuro Fujita wrote: > So I would like to propose to extend the comment to explain what they > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c. > Attached is a patch for that. Looks good to me. I see you've copy/edited the comment just above the call to set_rel_pathlist_hook(). That makes sense. David
Re: [PoC] Reducing planning time when tables have many partitions
On Wed, Sep 20, 2023 at 3:35 PM Yuya Watari wrote: > I think we may be able to remove the eclass_child_members field by > making child members on demand. v20 makes child members at > add_[child_]join_rel_equivalences() and adds them into > RelOptInfo->eclass_child_members. Instead of doing that, if we > translate on demand when child members are requested, > RelOptInfo->eclass_child_members is no longer necessary. After that, > there is only ec_members, which consists of parent members, so > removing clauses will still be simple. Do you think this idea will > solve your problem? If so, I will experiment with this and share a new > patch version. The main concern with this idea is that the same child > member will be created many times, wasting time and memory. Some > techniques like caching might solve this. > While working on RestrictInfo translations patch I was thinking on these lines. [1] uses hash table for storing translated RestrictInfo. An EC can have a hash table to store ec_member translations. The same patchset also has some changes in the code which generates RestrictInfo clauses from ECs. I think that code will be simplified by your approach. [1] https://www.postgresql.org/message-id/caexhw5u0yyyr2mwvlrvvy_qnld65kpc9u-bo0ox7bglkgba...@mail.gmail.com -- Best Wishes, Ashutosh Bapat
Re: bug fix and documentation improvement about vacuumdb
> On 20 Sep 2023, at 11:46, Kuwamura Masaki > wrote: > I think that supporting pattern matching is quite nice. > But it will be not only tough but also a breaking change, I wonder. > So I guess this change should be commited either way. I agree. Supporting pattern matching should, if anyone is interested in trying, be done separately in its own thread, no need to move the goalposts here. Sorry if I made it sound like so upthread. > The attached patch includes new tests for this bug. > Also, I fixed the current test for -N option seems to be incorrect. When sending an update, please include the previous patch as well with your new tests as a 0002 patch in a patchset. The CFBot can only apply and build/test patches when the entire patchset is attached to the email. The below testresults indicate that the patch failed the new tests (which in a way is good since without the fix the tests *should* fail), since the fix patch was not applied: http://cfbot.cputube.org/masaki-kuwamura.html -- Daniel Gustafsson
Re: persist logical slots to disk during shutdown checkpoint
On Thu, Sep 14, 2023 at 2:41 PM Amit Kapila wrote: > > On Thu, Sep 14, 2023 at 7:20 AM Michael Paquier wrote: > > > > On Wed, Sep 13, 2023 at 04:20:37PM +0530, Amit Kapila wrote: > > > The patch is updated as per recent discussion. > > > > WFM. Thanks for the updated version. > > > > Pushed. Commitfest entry "https://commitfest.postgresql.org/44/4536/ is in "Ready for committer" state. Is there something remaining here? We should probably set it as "committed". -- Best Wishes, Ashutosh Bapat
Re: persist logical slots to disk during shutdown checkpoint
On Wed, Sep 20, 2023 at 04:48:00PM +0530, Ashutosh Bapat wrote: > Commitfest entry "https://commitfest.postgresql.org/44/4536/ is in > "Ready for committer" state. Is there something remaining here? We > should probably set it as "committed". Thanks, I've switched that to "Committed". -- Michael signature.asc Description: PGP signature
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Amit, > +int > +count_old_cluster_logical_slots(void) > +{ > + int dbnum; > + int slot_count = 0; > + > + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++) > + slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots; > + > + return slot_count; > +} > > In this code, aren't we assuming that 'slot_arr.nslots' will be zero > for versions <=PG16? On my Windows machine, this value is not zero but > rather some uninitialized negative value which makes its caller try to > allocate some undefined memory and fail. I think you need to > initialize this in get_old_cluster_logical_slot_infos() for lower > versions. Good catch, I could not notice because it worked well in my RHEL. Here is the updated version. Best Regards, Hayato Kuroda FUJITSU LIMITED v42-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch Description: v42-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Comment about set_join_pathlist_hook()
Hi, What I am concerned about from the report [1] is that this comment is a bit too terse; it might cause a misunderstanding that extensions can do different things than we intend to allow: /* * 6. Finally, give extensions a chance to manipulate the path list. */ if (set_join_pathlist_hook) set_join_pathlist_hook(root, joinrel, outerrel, innerrel, jointype, &extra); So I would like to propose to extend the comment to explain what they can do, as in the comment about set_rel_pathlist_hook() in allpaths.c. Attached is a patch for that. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CACawEhV%3D%2BQ0HXrcDergbTR9EkVFukgRPMTZbRFL-YK5CRmvYag%40mail.gmail.com update-set_join_pathlist_hook-comment.patch Description: Binary data
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Amit, Thank you for reviewing! New version can be available in [1]. > > +{ oid => '8046', descr => 'for use by pg_upgrade', > + proname => 'binary_upgrade_validate_wal_records', > + prorows => '10', proretset => 't', provolatile => 's', prorettype => > 'bool', > + proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}', > + proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}', > + prosrc => 'binary_upgrade_validate_wal_records' }, > > In this many of the fields seem bogus. For example, we don't need > prorows => '10', proretset => 't' for this function. Similarly > proargmodes also look incorrect as we don't have any out parameter. > The part was made in old versions and has kept till now. I rechecked them and changed like below: * This function just returns boolean, proretset was changed to 'f'. * Based on above, prorows should be zero. Removed. * Returned value is quite depended on the internal status, provolatile was changed to 'v'. * There are no OUT and INOUT arguments, no need to set proallargtypes and proargmodes. Removed. * Anonymous arguments are allowed, proargnames was removed NULL. * This function is not expected to be call in parallel. proparallel was set to 'u'. * The argument must not be NULL, and we should error out. proisstrict was changed 'f'. Also, the check was added to the function. [1]: https://www.postgresql.org/message-id/TYAPR01MB586615579356A84A8CF29A00F5F9A%40TYAPR01MB5866.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Hi Ashutosh, On Wed, Aug 16, 2023 at 2:28 PM Ashutosh Bapat wrote: > On Fri, Aug 4, 2023 at 2:11 PM Ashutosh Bapat > wrote: > > > > Attached patchset fixing those. > > 0001 - patch to report planning memory, with to explain.out regression > > output fix. We may consider committing this as well. > > 0002 - with your comment addressed above. > > 0003 - Added this patch for handling SpecialJoinInfos for inner joins. > These SpecialJoinInfos are created on the fly for parent joins. They > can be created on the fly for child joins as well without requiring > any translations. Thus they do not require any new memory. This patch > is intended to be merged into 0002 finally. I read this thread and have been reading the latest patch. At first glance, it seems quite straightforward to me. I agree with Richard that pfree()'ing 4 bitmapsets may not be a lot of added overhead. I will study the patch a bit more. Just one comment on 0003: + /* +* Dummy SpecialJoinInfos do not have any translated fields and hence have +* nothing to free. +*/ + if (child_sjinfo->jointype == JOIN_INNER) + return; Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: There should be a way to use the force flag when restoring databases
> On 20 Sep 2023, at 11:24, Peter Eisentraut wrote: > > On 06.08.23 21:39, Ahmed Ibrahim wrote: >> I have addressed the pg version compatibility with the FORCE option in drop. >> Here is the last version of the patch > > The patch is pretty small, but I think there is some disagreement whether we > want this option at all? Maybe some more people can make their opinions more > explicit? My my concern is that a --force parameter conveys to the user that it's a big hammer to override things and get them done, when in reality this doesn't do that. Taking the example from the pg_restore documentation which currently has a dropdb step: :~ $ ./bin/createdb foo :~ $ ./bin/psql -c "create table t(a int);" foo CREATE TABLE :~ $ ./bin/pg_dump --format=custom -f foo.dump foo :~ $ ./bin/pg_restore -d foo -C -c --force foo.dump pg_restore: error: could not execute query: ERROR: cannot drop the currently open database Command was: DROP DATABASE foo WITH(FORCE); pg_restore: error: could not execute query: ERROR: database "foo" already exists Command was: CREATE DATABASE foo WITH TEMPLATE = template0 ENCODING = 'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8'; pg_restore: error: could not execute query: ERROR: relation "t" already exists Command was: CREATE TABLE public.t ( a integer ); pg_restore: warning: errors ignored on restore: 3 Without knowing internals, I would expect an option named --force to make that just work, especially given the documentation provided in this patch. I think the risk for user confusion outweighs the benefits, or maybe I'm just not smart enough to see all the benefits? If so, I would argue that more documentation is required. Skimming the patch itself, it updates the --help output with --force for pg_dump and not for pg_restore. Additionally it produces a compilerwarning: pg_restore.c:127:26: warning: incompatible pointer types initializing 'int *' with an expression of type 'bool *' [-Wincompatible-pointer-types] {"force", no_argument, &force, 1}, ^~ 1 warning generated. -- Daniel Gustafsson
Re: POC, WIP: OR-clause support for indexes
Hi! When I sent the patch version to commitfest, I thought that the work on this topic was completed. Patch version and test results in [0]. But in the process of discussing this patch, we found out that there is another place where you can make a transformation, specifically, during the calculation of selectivity. I implemented the raw version [1], but unfortunately it didn't work in regression tests. I'm sorry that I didn't write about the status earlier, I was very overwhelmed with tasks at work due to releases and preparations for the conference. I returned to the work of this patch, today or tomorrow I'll drop the version. [0] https://www.postgresql.org/message-id/4bac271d-1700-db24-74ac-8414f2baf9fd%40postgrespro.ru https://www.postgresql.org/message-id/11403645-b342-c400-859e-47d0f41ec22a%40postgrespro.ru [1] https://www.postgresql.org/message-id/b301dce1-09fd-72b1-834a-527ca428db5e%40yandex.ru On 20.09.2023 12:37, Peter Eisentraut wrote: On 29.08.23 05:37, a.rybakina wrote: Thank you for your interest in this problem and help, and I'm sorry that I didn't respond to this email for a long time. To be honest, I wanted to investigate the problems in more detail and already answer more clearly, but unfortunately I have not found anything more significant yet. What is the status of this patch? It is registered in the commitfest. It looks like a stalled research project? The last posted patch doesn't contain any description or tests, so it doesn't look very ready.
Re: Infinite Interval
On Wed, 20 Sept 2023 at 11:27, jian he wrote: > > if I remove IntervalAggState's element: MemoryContext, it will not work. > so I don't understand what the above sentence means.. Sorry. (it's > my problem) > I don't see why it won't work. The point is to try to simplify do_interval_accum() as much as possible. Looking at the current code, I see a few places that could be simpler: +X.day = newval->day; +X.month = newval->month; +X.time = newval->time; + +temp.day = state->sumX.day; +temp.month = state->sumX.month; +temp.time = state->sumX.time; Why do we need these local variables X and temp? It could just add the values from newval directly to those in state->sumX. +/* The rest of this needs to work in the aggregate context */ +old_context = MemoryContextSwitchTo(state->agg_context); Why? It's not allocating any memory here, so I don't see a need to switch context. So basically, do_interval_accum() could be simplified to: static void do_interval_accum(IntervalAggState *state, Interval *newval) { /* Count infinite intervals separately from all else */ if (INTERVAL_IS_NOBEGIN (newval)) { state->nInfcount++; return; } if (INTERVAL_IS_NOEND(newval)) { state->pInfcount++; return; } /* Update count of finite intervals */ state->N++; /* Update sum of finite intervals */ if (unlikely(pg_add_s32_overflow(state->sumX.month, newval->month, &state->sumX.month))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); if (unlikely(pg_add_s32_overflow(state->sumX.day, newval->day, &state->sumX.day))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); if (unlikely(pg_add_s64_overflow(state->sumX.time, newval->time, &state->sumX.time))) ereport(ERROR, errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), errmsg("interval out of range")); return; } and that can be further refactored, as described below, and similarly for do_interval_discard(), except using pg_sub_s32/64_overflow(). > > Also, this needs to include serialization and deserialization > > functions, otherwise these aggregates will no longer be able to use > > parallel workers. That makes a big difference to queryE, if the size > > of the test data is scaled up. > > > I tried, but failed. sum(interval) result is correct, but > avg(interval) result is wrong. > > SELECT sum(b) ,avg(b) > ,avg(b) = sum(b)/count(*) as should_be_true > ,avg(b) * count(*) = sum(b) as should_be_true_too > from interval_aggtest_1m; --1million row. > The above query expects two bool columns to return true, but actually > both returns false.(spend some time found out parallel mode will make > the number of rows to 1_000_002, should be 1_000_). > I think the reason for your wrong results is this code in interval_avg_combine(): +if (state2->N > 0) +{ +/* The rest of this needs to work in the aggregate context */ +old_context = MemoryContextSwitchTo(agg_context); + +/* Accumulate interval values */ +do_interval_accum(state1, &state2->sumX); + +MemoryContextSwitchTo(old_context); +} The problem is that using do_interval_accum() to add the 2 sums together also adds 1 to the count N, making it incorrect. This code should only be adding state2->sumX to state1->sumX, not touching state1->N. And, as in do_interval_accum(), there is no need to switch memory context. Given that there are multiple places in this file that need to add intervals, I think it makes sense to further refactor, and add a local function to add 2 finite intervals, along the lines of the code above. This can then be called from do_interval_accum(), interval_avg_combine(), and interval_pl(). And similarly for subtracting 2 finite intervals. Regards, Dean
Re: should frontend tools use syncfs() ?
On Thu, 7 Sept 2023 at 03:34, Nathan Bossart wrote: > Committed. > Hi! Great job! But here is one problem I've encountered during working on some unrelated stuff. How we have two different things call the same name – sync_method. One in xlog: intsync_method = DEFAULT_SYNC_METHOD; ...and another one in "bins": static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; In current include order, this is not a problem, but imagine you add a couple of new includes, for example: --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" And build will be broken, because we how have two different things called "sync_method" with different types: In file included from .../src/bin/pg_rewind/pg_rewind.c:33: In file included from .../src/include/storage/bufpage.h:22: In file included from .../src/include/utils/rel.h:18: .../src/include/access/xlog.h:27:24: error: redeclaration of 'sync_method' with a different type: 'int' vs 'DataDirSyncMethod' (aka 'enum DataDirSyncMethod') extern PGDLLIMPORT int sync_method; ... As a solution, I suggest renaming sync_method in xlog module to wal_sync_method. In fact, appropriate GUC for this variable, called "wal_sync_method" and I see no reason not to use the exact same name for a variable in xlog module. -- Best regards, Maxim Orlov. diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 424ecba028f..3aa1fc60cb4 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" /* * A postgres disk page is an abstraction layered on top of a postgres v10-0001-Fix-conflicting-types-for-sync_method.patch Description: Binary data
Re: Infinite Interval
On Wed, 20 Sept 2023 at 13:09, Dean Rasheed wrote: > > So basically, do_interval_accum() could be simplified to: > Oh, and I guess it also needs an INTERVAL_NOT_FINITE() check, to make sure that finite values don't sum to our representation of infinity, as in interval_pl(). Regards, Dean
Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
> On 19 Sep 2023, at 11:21, bt23nguyent wrote: > Please let me know if you have any further suggestions that I can improve > more. +*logdetail = pstrdup("WAL archiving failed because basic_archive.archive_directory is not set"); Nitpick: detail messages should end with a period per the error message style guide [0]. -archiving will proceed only when it returns true. +archiving will proceed only when it returns true. The +archiver may also emit the detail explaining how the module is not configured +to the sever log if the archive module has any. I think this paragraph needs to be updated to include how the returned logdetail is emitted, since it currently shows the WARNING without mentioning the added detail in case returned. It would also be good to mention that it should be an allocated string which the caller can free. -- Daniel Gustafsson [0] https://www.postgresql.org/docs/devel/error-style-guide.html
Re: pg_upgrade and logical replication
On Fri, Sep 15, 2023 at 3:08 PM vignesh C wrote: > > The attached v8 version patch has the changes for the same. > Is the check to ensure remote_lsn is valid correct in function check_for_subscription_state()? How about the case where the apply worker didn't receive any change but just marked the relation as 'ready'? Also, the patch seems to be allowing subscription relations from PG >=10 to be migrated but how will that work if the corresponding publisher is also upgraded without slots? Won't the corresponding workers start failing as soon as you restart the upgrade server? Do we need to document the steps for users? -- With Regards, Amit Kapila.
Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).
Hi, On the latest master head, I can see a $subject bug that seems to be related commit #b0e96f311985: Here is the table definition: create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i) DEFERRABLE); And after restore from the dump, it shows a descriptor where column 'i' not marked NOT NULL: =# \d foo Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- i | integer | | | j | integer | | | Indexes: "pk" PRIMARY KEY, btree (i) DEFERRABLE The pg_attribute entry: =# select attname, attnotnull from pg_attribute where attrelid = 'foo'::regclass and attnum > 0; attname | attnotnull -+ i | f j | f (2 rows) -- Regards, Amul Sul EDB: http://www.enterprisedb.com
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Wed, Sep 20, 2023 at 5:24 PM Amit Langote wrote: > > I read this thread and have been reading the latest patch. At first > glance, it seems quite straightforward to me. I agree with Richard > that pfree()'ing 4 bitmapsets may not be a lot of added overhead. I > will study the patch a bit more. Thanks. > > Just one comment on 0003: > > + /* > +* Dummy SpecialJoinInfos do not have any translated fields and hence have > +* nothing to free. > +*/ > + if (child_sjinfo->jointype == JOIN_INNER) > + return; > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? try_partitionwise_join() calls free_child_sjinfo_members() unconditionally i.e. it will be called even SpecialJoinInfos of INNER join. Above condition filters those out early. In fact if we convert into an Assert, in a production build the rest of the code will free Relids which are referenced somewhere else causing dangling pointers. -- Best Wishes, Ashutosh Bapat
Re: Infinite Interval
On Mon, Sep 18, 2023 at 5:09 PM Dean Rasheed wrote: > > On Wed, 13 Sept 2023 at 11:13, Ashutosh Bapat > wrote: > > > > On Tue, Sep 12, 2023 at 2:39 PM Dean Rasheed > > wrote: > > > > > > and it looks like the infinite interval > > > input code is broken. > > > > The code required to handle 'infinity' as an input value was removed by > > d6d1430f404386162831bc32906ad174b2007776. I have added a separate > > commit which reverts that commit as 0004, which should be merged into > > 0003. > > > > I think that simply reverting d6d1430f404386162831bc32906ad174b2007776 > is not sufficient. This does not make it clear what the point is of > the code in the "case RESERV" block. That code really should check the > value returned by DecodeSpecial(), otherwise invalid inputs are not > caught until later, and the error reported is not ideal. For example: > > select interval 'now'; > ERROR: unexpected dtype 12 while parsing interval "now" > > So DecodeInterval() should return DTERR_BAD_FORMAT in such cases (see > similar code in DecodeTimeOnly(), for example). The since the code was there earlier, I missed that part. Sorry. > > I'd also suggest a comment to indicate why itm_in isn't updated in > this case (see similar case in DecodeDateTime(), for example). > Added but in the function prologue since it's part of the API. > > Another point to consider is what should happen if "ago" is specified > with infinite inputs. As it stands, it is accepted, but does nothing: > > select interval 'infinity ago'; > interval > -- > infinity > (1 row) > > select interval '-infinity ago'; > interval > --- > -infinity > (1 row) > > This could be made to invert the sign, as it does for finite inputs, > but I think perhaps it would be better to simply reject such inputs. Fixed this. After studying what DecodeInterval is doing, I think it's better to treat all infinity specifications similar to "ago". They need to be the last part of the input string. Rest of the code makes sure that nothing preceds infinity specification as other case blocks do not handle RESERVE or DTK_LATE and DTK_EARLY. This means that "+infinity", "-infinity" or "infinity" can be the only allowed word as a valid interval input when either of them is specified. I will post these changes in another email along with other patches. -- Best Wishes, Ashutosh Bapat
Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Hi Dmitry, Thanks for looking at the patch and sorry for the long line. On 3/17/23 21:14, Dmitry Dolgov wrote: The idea sounds reasonable to me, but I have to admit snapshot_and_stats implementation looks awkward. Maybe it would be better to have a separate structure field for both stats and snapshot, which will be set to point to a corresponding place in the shared FAM e.g. when the worker is getting initialized? shm_toc_allocate mentions BUFFERALIGN to handle possibility of some atomic operations needing it, so I guess that would have to be an alignment in this case as well. Probably another option would be to allocate two separate pieces of shared memory, which resolves questions like proper alignment, but annoyingly will require an extra lookup and a new key. I considered the other options and it seems to me none of them is particularly superior. All of them have pros and cons with the cons mostly outweighing the pros. Let me quickly elaborate: 1. Use multiple shm_toc entries: Shared state is split into multiple pieces. Extra pointers in BitmapHeapScanState needed to point at the split out data. BitmapHeapScanState has already a shared_info member, which is not a pointer to the shared memory but a pointer to the leader local data allocated used to store the instrumentation data from the workers. This is confusing but at least consistent with how its done in other places (e.g. execSort.c, nodeHash.c, nodeIncrementalSort.c). Having another pointer there which points to the shared memory makes it even more confusing. If we go this way we would have e.g. shared_info_copy and shared_info members in BitmapHeapScanState. 2. Store two extra pointers to the shared FAM entries in BitmapHeapScanState: IMHO, that is the better alternative of (1) as it doesn't need an extra TOC entry but comes with the same confusion of multiple pointers to SharedBitmapHeapScanInfo in BitmapHeapScanState. But maybe that's not too bad? 3. Solution in initial patch (use two functions to obtain pointers where/when needed): Avoids the need for another pointer in BitmapHeapScanState at the cost / ugliness of having to call the helper functions. Another, not yet discussed, option I can see work is: 4. Allocate a fixed amount of memory for the instrumentation stats based on MAX_PARALLEL_WORKER_LIMIT: MAX_PARALLEL_WORKER_LIMIT is 1024 and used as the limit of the max_parallel_workers GUC. This way MAX_PARALLEL_WORKER_LIMIT * sizeof(BitmapHeapScanInstrumentation) = 1024 * 8 = 8192 bytes would be allocated. To cut this down in half we could additionally change the type of lossy_pages and exact_pages from long to uint32. Only possibly needed memory would have to get initialized, the remaining unused memory would remain untouched to not waste cycles. My first preference is the new option (4). My second preference is option (1). What's your take? -- David Geier (ServiceNow)
Re: Infinite Interval
On Wed, Sep 20, 2023 at 5:44 PM Dean Rasheed wrote: > > On Wed, 20 Sept 2023 at 13:09, Dean Rasheed wrote: > > > > So basically, do_interval_accum() could be simplified to: > > > > Oh, and I guess it also needs an INTERVAL_NOT_FINITE() check, to make > sure that finite values don't sum to our representation of infinity, > as in interval_pl(). Fixed in the latest patch set. -- Best Wishes, Ashutosh Bapat
Re: Disabling Heap-Only Tuples
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > On Tue, 2023-09-19 at 12:52 -0400, Robert Haas wrote: > > On Tue, Sep 19, 2023 at 12:30 PM Alvaro Herrera > > wrote: > > > I was thinking something vaguely like "a table size that's roughly what > > > an optimal autovacuuming schedule would leave the table at" assuming 0.2 > > > vacuum_scale_factor. You would determine the absolute minimum size for > > > the table given the current live tuples in the table, then add 20% to > > > account for a steady state of dead tuples and vacuumed space. So it's > > > not 1.2x of the "current" table size at the time the local_update_limit > > > feature is installed, but 1.2x of the optimal table size. > > > > Right, that would be great. And honestly if that's something we can > > figure out, then why does the parameter even need to be an integer > > instead of a Boolean? If the system knows the optimal table size, then > > the user can just say "try to compact this table" and need not say to > > what size. The 1.2 multiplier is probably situation dependent and > > maybe the multiplier should indeed be a configuration parameter, but > > we would be way better off if the absolute size didn't need to be. > > I don't have high hopes for a reliable way to automatically determine > the target table size. There are these queries floating around to estimate > table bloat, which are used by various monitoring systems. I find that they > get it right a lot of the time, but sometimes they get it wrong. Perhaps > we can do better than that, but I vastly prefer a setting that I can control > (even at the danger that I can misconfigure it) over an automatism that I > cannot control and that sometimes gets it wrong. Not completely against a setting- but would certainly prefer that this be done in a more automated way, if possible. To that end, my thought would be some kind of regular review of the FSM, or maybe actual review by walking through the table (as VACUUM already does...) to get an idea of where there's space and where there's used up areas and then use that to inform various operations (either VACUUM itself or perhaps UPDATEs from SQL). We could also try to 'start simple' and look for cases that we can say "well, that's definitely not good" and address those initially. Consider (imagine as a histogram; X is used space, . is empty): 1: XXX 2: XXX 3: XXX 4: XXX 5: X 6: X 7: . 8: . 9: . 10: . 11: . 12: . 13: . 14: . 15: . 16: . 17: . 18: . 19: . 20: X Well, obviously there's tons of free space in the middle and if we could just move those few tuples/pages/whatever that are near the end to earlier in the table then we'd be able to truncate off and shrink a lot of the table. > I like Alvaro's idea to automatically reset "local_update_limit" when the > table has shrunk enough. Why not perform that task during vacuum truncation? > If vacuum truncation has taken place, check if the table size is no bigger > than "local_update_limit" * (1 + "autovacuum_vacuum_scale_factor"), and if > it is no bigger, reset "local_update_limit". That way, we would not have > to worry about a lock, because vacuum truncation already has the table locked. Agreed on this too. Essentially, once we've done some truncation, we should 'reset'. I've no doubt that there's some better algorithm for this, but I keep coming back to something as simple as- if the entire second half of the table will fit into the entire first half then the table is twice as large as it needs to be and perhaps that triggers a preference for placing tuples in the first half of the table. As for what handles this- maybe have both UPDATE and VACUUM able to, but prefer for UPDATE to do so and only have VACUUM kick in once the tuples at the end of the relation are older than some xid-based threshold (perhaps all of the tuples on a given page have to be old enough?). While it feels a bit 'late' in terms of when to start taking this action, we could possibly start with 'all frozen' as an indicator of 'old enough'? Then, between the FSM and the VM, VACUUM could decide that pages at the end of the table should be moved to be earlier and go about making that happen. I'm a bit concerned about the risk of some kind of deadlock or similar happening between VACUUM and user processes if we're trying to do this with multiple tuples at a time but hopefully we could come up with a way to avoid that. This process naturally would have to involve updating indexes and the VM and FSM as the tuples get moved. In terms of what this would look like, my thinking is that VACUUM would scan the table and the FSM and perhaps the VM and then say "ok, this table is bigger than it needs to be, let's try to fix that" and then set a flag on the table, which a user could also explicitly set to give them control over this process happening sooner or not happening at all, and that would indicate to UPDATE to prefer earlier pages over the current page or HOT updates, w
Re: Index range search optimization
Hi, Peter! Thank you for your interest in this patch. On Tue, Sep 19, 2023 at 1:48 AM Peter Geoghegan wrote: > > On Thu, Sep 14, 2023 at 3:23 AM Alexander Korotkov > wrote: > > The attached patch allows Postgres to skip scan keys required for > > directional scans (even when other keys are present in the scan). I'll > > soon post the testing results and a more polished version of this patch. > > This is very interesting to me, partly because it seems related to my > ongoing work on SAOP execution within nbtree. > > My patch gives _bt_readpage and particularly _bt_checkkeys more > high-level context, which they use to intelligently control the scan. > That enables us to dynamically decide whether we should now perform > another descent of the index via another call to _bt_first, or if we > should prefer to continue on the leaf level for now. Maybe we will > match many distinct sets of array keys on the same leaf page, in the > same call to _bt_readpage. We don't want to miss out on such > opportunities, but we also want to quickly notice when we're on a page > where matching any more array keys is just hopeless. > > There is a need to keep these two things in balance. We need to notice > the hopeless cases before wasting too many cycles on it. That creates > a practical need to do an early precheck of the high key (roughly the > same check that we do already). If the high key indicates that > continuing on this page is truly hopeless, then we should give up and > do another primitive index scan -- _bt_first will reposition us onto > the leaf page that we need to go to next, which is (hopefully) far > away from the leaf page we started on. This is a pretty neat optimization indeed! > Your patch therefore has the potential to help my own patch. But, it > also has some potential to conflict with it, because my patch makes > the meaning of SK_BT_REQFWD and SK_BT_REQBKWD more complicated (though > only in cases where we have SK_SEARCHARRAY scan keys). I'm sure that > this can be managed sensibly, though. OK! Let me know if you feel that I need to change something in this patch to lower the potential conflict. > Some feedback on your patch: > > * I notice that you're not using the high key for this, even in a > forward scan -- you're using the last non-pivot tuple on the page. Why > is that? (I have some idea why, actually, but I'd like to hear your > thoughts first.) I'm using the last non-pivot tuple on the page instead of hikey because it's lower than hikey. As you stated below, the most significant column in the hikey is likely different from that of the last non-pivot tuple. So, it's more likely to use the optimization with the last non-pivot tuple. > * Separately, I don't think that it makes sense to use the same > requiredDirMatched value (which came from the last non-pivot tuple on > the page) when the special _bt_checkkeys call for the high key takes > place. I don't think that this will lead to wrong answers, but it's > weird, and is likely to defeat the existing optimization in some > important cases. > > Due to the influence of suffix truncation, it's relatively likely that > the most significant column in the high key will be different to the > corresponding value from the last few non-pivot tuples on the page -- > the high key tends to be "aligned with natural boundaries in the key > space", and so "gives us a good preview of the right sibling page". We > don't want to treat it the same as non-pivot tuples here, because it's > quite different, in ways that are subtle but still important. This definitely makes sense. I've removed the usage of requiredDirMatched from this _bt_checkkeys() call. > * I would avoid using the terminology "preprocess scan keys" for this. > That exact terminology is already used to describe what > _bt_preprocess_keys() does. > > That function is actually involved in Konstantin's patch, so that > could be very confusing. When we "preprocess" a scan key, it outputs a > processed scan key with markings such as the required markings that > you're using in the patch -- it's something that acts on/changes the > scan keys themselves. Whereas your patch is exploiting information > from already-processed scan keys, by applying it to the key space of a > page. > > I suggest calling it "prechecking the page", or something like that. I > don't feel very strongly about what you call it, provided it isn't > confusing or ambiguous. This also makes sense. I've rephrased the comment. -- Regards, Alexander Korotkov 0001-Skip-checking-of-scan-keys-required-for-direction-v2.patch Description: Binary data
Re: Infinite Interval
On Wed, 20 Sept 2023 at 15:00, Ashutosh Bapat wrote: > > 0005 - Refactored Jian's code fixing window functions. Does not > contain the changes for serialization and deserialization. Jian, > please let me know if I have missed anything else. > That looks a lot neater. One thing I don't care for is this code pattern in finite_interval_pl(): +result->month = span1->month + span2->month; +/* overflow check copied from int4pl */ +if (SAMESIGN(span1->month, span2->month) && +!SAMESIGN(result->month, span1->month)) +ereport(ERROR, The problem is that this is a bug waiting to happen for anyone who uses this function with "result" pointing to the same Interval struct as "span1" or "span2". I understand that the current code avoids this by careful use of temporary Interval structs, but it's still a pretty ugly pattern. This can be avoided by using pg_add_s32/64_overflow(), which then allows the callers to be simplified, getting rid of the temporary Interval structs and memcpy()'s. Also, in do_interval_discard(), this seems a bit risky: +neg_val.day = -newval->day; +neg_val.month = -newval->month; +neg_val.time = -newval->time; because it could in theory turn a finite large negative interval into an infinite one (-INT_MAX -> INT_MAX), leading to an assertion failure in finite_interval_pl(). Now maybe that's not possible for some other reasons, but I think we may as well do the same refactoring for interval_mi() as we're doing for interval_pl() -- i.e., introduce a finite_interval_mi() function, making the addition and subtraction code match, and removing the need for neg_val in do_interval_discard(). Regards, Dean
Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).
On 2023-Sep-20, Amul Sul wrote: > On the latest master head, I can see a $subject bug that seems to be related > commit #b0e96f311985: > > Here is the table definition: > create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i) DEFERRABLE); Interesting, thanks for the report. Your attribution to that commit is correct. The table is dumped like this: CREATE TABLE public.foo ( i integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT, j integer ); ALTER TABLE ONLY public.foo ADD CONSTRAINT pk PRIMARY KEY (i) DEFERRABLE; ALTER TABLE ONLY public.foo DROP CONSTRAINT pgdump_throwaway_notnull_0; so the problem here is that the deferrable PK is not considered a reason to keep attnotnull set, so we produce a throwaway constraint that we then drop. This is already bogus, but what is more bogus is the fact that the backend accepts the DROP CONSTRAINT at all. The pg_dump failing should be easy to fix, but fixing the backend to error out sounds more critical. So, the reason for this behavior is that RelationGetIndexList doesn't want to list an index that isn't marked indimmediate as a primary key. I can easily hack around that by doing diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 7234cb3da6..971d9c8738 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -4794,7 +4794,6 @@ RelationGetIndexList(Relation relation) * check them. */ if (!index->indisunique || - !index->indimmediate || !heap_attisnull(htup, Anum_pg_index_indpred, NULL)) continue; @@ -4821,6 +4820,9 @@ RelationGetIndexList(Relation relation) relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) pkeyIndex = index->indexrelid; + if (!index->indimmediate) + continue; + if (!index->indisvalid) continue; But of course this is not great, since it impacts unrelated bits of code that are relying on relation->pkindex or RelationGetIndexAttrBitmap having their current behavior with non-immediate index. I think a real solution is to stop relying on RelationGetIndexAttrBitmap in ATExecDropNotNull(). (And, again, pg_dump needs some patch as well to avoid printing a throwaway NOT NULL constraint at this point.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: disfavoring unparameterized nested loops
On Wed, Sep 20, 2023, at 4:49 PM, David Rowley wrote: > On Wed, 20 Sept 2023 at 19:56, Andrey Lepikhov > wrote: >> What could you say about a different way: hybrid join? In MS SQL Server, >> they have such a feature [1], and, according to their description, it >> requires low overhead. They start from HashJoin and switch to NestLoop >> if the inner input contains too small tuples. It solves the issue, Isn't it? > > A complexity which you may not be considering here is that Nested Loop > joins always preserve the tuple order from the outer side of the join, > whereas hash joins will not do this when multi-batching. My idea here is the same as MS SQL guys did: prefetch from the HashJoin inner some predefined number of tuples and, if the planner has made a mistake and overestimated it, move hash join inner to NestLoop as an outer. The opposite strategy, "underestimation" - starting with NestLoop and switching to HashJoin looks more difficult, but the main question is: is it worthwhile to research? > I've no idea how the SQL Server engineers solved that. >> [1] >> https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411 -- Regards, Andrei Lepikhov
Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Although v7 patch doesn't have commit messages on the patch, I think leave commit message is good for reviewers. Sure, didn't notice it. Added the commit message to the updated patch. * Note: DON'T convert this error to "soft" style (errsave/ereturn). We * want this data type to stay permanently in the hard-error world so that * it can be used for testing that such cases still work reasonably. From this point of view, I think this is a supposed way of using widget. I agree, it's a good approach for checking datatype errors, because that's what was intended. OTOH widget is declared in create_type.sql and I'm not sure it's ok to use it in another test copy2.sql. I think that other regress tests with 'widget' type that will be created in the future can be not only in the create_type.sql. So it's not a problem that some type or function is taken from another regress test. For example, the table 'onek' is used in many regress tests. Regards, Damir Belyalov Postgres Professional From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001 From: Damir Belyalov Date: Fri, 15 Sep 2023 11:14:57 +0300 Subject: [PATCH v7] Add new COPY option IGNORE_DATATYPE_ERRORS Currently entire COPY fails even when there is one unexpected data regarding data type or range. IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY data which don't contain problem. This patch uses the soft error handling infrastructure, which is introduced by d9f7f5d32f20. Author: Damir Belyalov, Atsushi Torikoshi --- doc/src/sgml/ref/copy.sgml | 13 + src/backend/commands/copy.c | 13 + src/backend/commands/copyfrom.c | 37 src/backend/commands/copyfromparse.c | 20 ++--- src/bin/psql/tab-complete.c | 3 +- src/include/commands/copy.h | 1 + src/include/commands/copyfrom_internal.h | 3 ++ src/test/regress/expected/copy2.out | 28 ++ src/test/regress/sql/copy2.sql | 26 + 9 files changed, 139 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 4d614a0225..d5cdbb4025 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * } FORCE_NOT_NULL ( column_name [, ...] ) FORCE_NULL ( column_name [, ...] ) +IGNORE_DATATYPE_ERRORS [ boolean ] ENCODING 'encoding_name' @@ -370,6 +371,18 @@ COPY { table_name [ ( + +IGNORE_DATATYPE_ERRORS + + + Drops rows that contain malformed data while copying. These are rows + with columns where the data type's input-function raises an error. + This option is not allowed when using binary format. Note that this + is only supported in current COPY syntax. + + + + ENCODING diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f14fae3308..beb73f5357 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate, bool format_specified = false; bool freeze_specified = false; bool header_specified = false; + bool ignore_datatype_errors_specified = false; ListCell *option; /* Support external use for option sanity checking */ @@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate, freeze_specified = true; opts_out->freeze = defGetBoolean(defel); } + else if (strcmp(defel->defname, "ignore_datatype_errors") == 0) + { + if (ignore_datatype_errors_specified) +errorConflictingDefElem(defel, pstate); + ignore_datatype_errors_specified = true; + opts_out->ignore_datatype_errors = defGetBoolean(defel); + } else if (strcmp(defel->defname, "delimiter") == 0) { if (opts_out->delim) @@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("cannot specify DEFAULT in BINARY mode"))); + if (opts_out->binary && opts_out->ignore_datatype_errors) + ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode"))); + /* Set defaults for omitted options */ if (!opts_out->delim) opts_out->delim = opts_out->csv_mode ? "," : "\t"; diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 70871ed819..b18aea6376 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate) ti_options |= TABLE_INSERT_FROZEN; } + /* Set up soft error handler for IGNORE_DATATYPE_ERRORS */ + if (cstate->opts.ignore_datatype_errors) + { + ErrorSaveContext escontext = {T_ErrorSaveContext}; + escontext.details_wanted = true; + cstate->escontext = escontext; + } + /* * We need a ResultRelInfo so we can use the regular executor's * index-entry-making mach
pg_upgrade --check fails to warn about abstime
I got a complaint that pg_upgrade --check fails to raise red flags when the source database contains type abstime when upgrading from pg11. The type (along with reltime and tinterval) was removed by pg12. In passing, while testing this, I noticed that the translation infrastructure in pg_upgrade/util.c is broken: we do have the messages in the translation catalog, but the translations for the messages from prep_status are never displayed. So I made the quick hack of adding _() around the fmt, and this was the result: Verificando Consistencia en Vivo en el Servidor Antiguo --- Verificando las versiones de los clústerséxito Verificando que el usuario de base de datos es el usuario de instalaciónéxito Verificando los parámetros de conexión de bases de datoséxito Verificando transacciones preparadas éxito Verificando tipos compuestos definidos por el sistema en tablas de usuarioéxito Verificando tipos de datos reg* en datos de usuario éxito Verificando contrib/isn con discordancia en mecanismo de paso de bigintéxito Checking for incompatible "aclitem" data type in user tables éxito Checking for removed "abstime" data type in user tables éxito Checking for removed "reltime" data type in user tables éxito Checking for removed "tinterval" data type in user tables éxito Verificando conversiones de codificación definidas por el usuarioéxito Verificando operadores postfix definidos por el usuario éxito Verificando funciones polimórficas incompatibles éxito Verificando tablas WITH OIDS éxito Verificando columnas de usuario del tipo «sql_identifier» éxito Verificando la presencia de las bibliotecas requeridaséxito Verificando que el usuario de base de datos es el usuario de instalaciónéxito Verificando transacciones preparadas éxito Verificando los directorios de tablespaces para el nuevo clústeréxito Note how nicely they line up ... not. There is some code that claims to do this correctly, but apparently it counts bytes, not characters, and also it appears to be measuring the original rather than the translation. I think we're trimming the strings in the wrong places. We need to apply _() to the originals, not the trimmed ones. Anyway, clearly nobody has looked at this very much. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "We’ve narrowed the problem down to the customer’s pants being in a situation of vigorous combustion" (Robert Haas, Postgres expert extraordinaire) >From 0c11fb69019e0c694249eaa70eb325a015f107f1 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 19 Sep 2023 12:30:11 +0200 Subject: [PATCH] pg_upgrade: check for types removed in pg12 Commit cda6a8d01d39 removed a few datatypes, but didn't update pg_upgrade --check to throw error if these types are used. So the users find that pg_upgrade --check tells them that everything is fine, only to fail when the real upgrade is attempted. --- src/bin/pg_upgrade/check.c | 48 +- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 56e313f562..3a72c5bfd7 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -26,6 +26,8 @@ static void check_for_tables_with_oids(ClusterInfo *cluster); static void check_for_composite_data_type_usage(ClusterInfo *cluster); static void check_for_reg_data_type_usage(ClusterInfo *cluster); static void check_for_aclitem_data_type_usage(ClusterInfo *cluster); +static void check_for_removed_data_type_usage(ClusterInfo *cluster, + const char *datatype); static void check_for_jsonb_9_4_usage(ClusterInfo *cluster); static void check_for_pg_role_prefix(ClusterInfo *cluster); static void check_for_new_tablespace_dir(void); @@ -111,6 +113,16 @@ check_and_dump_old_cluster(bool live_check) if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500) check_for_aclitem_data_type_usage(&old_cluster); + /* + * PG 12 removed types abstime, reltime, tinterval. + */ + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1100) + { + check_for_removed_data_type_usage(&old_cluster, "abstime"); + check_for_removed_data_type_usage(&old_cluster, "reltime"); + check_for_removed_data_type_usage(&old_cluster, "tinterval"); + } + /* * PG 14 changed the function signature of encoding conversion functions. * Conversions from older versions cannot be upgraded automatically @@ -1215,7 +1227,8 @@ check_for_aclitem_data_type_usage(ClusterInfo *cluster) { char output_path[MAXPGPATH]; - prep_status("Checking for incompatible \"aclitem\" data type in user tables"); + prep_status("Checking for incompatible \"%s\" data type in user tables", +"aclitem"); snprintf(output_path, sizeof(output_path), "tables_using_aclitem.txt"); @@ -12
Re: pg_upgrade --check fails to warn about abstime
+/* + * check_for_removed_data_type_usage + * + *similar to the above, but for types that were removed in 12. + */ +static void +check_for_removed_data_type_usage(ClusterInfo *cluster, const char *datatype) Seems like you could make this more generic instead of hardcoding version 12, and then you could use it for any future removed types as well. Just a thought. -- Tristan Partin Neon (https://neon.tech)
Re: pg_upgrade --check fails to warn about abstime
On 2023-Sep-20, Tristan Partin wrote: > > +/* > > + * check_for_removed_data_type_usage > > + * > > + *similar to the above, but for types that were removed in 12. > > + */ > > +static void > > +check_for_removed_data_type_usage(ClusterInfo *cluster, const char > > *datatype) > > Seems like you could make this more generic instead of hardcoding version > 12, and then you could use it for any future removed types as well. Yeah, I thought about that, and then closed that with "we can whack it around when we need it". At this point I imagine there's very few other datatypes we can remove from the core server, if any. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)
On 9/20/23 01:24, Tom Lane wrote: > Tomas Vondra writes: >> bsd@freebsd:~ $ tclsh8.6 >> % clock scan "1/26/2010" >> time value too large/small to represent > > In hopes of replicating this, I tried installing FreeBSD 14-BETA2 > aarch64 on my Pi 3B. This test case works fine: > > $ tclsh8.6 > % clock scan "1/26/2010" > 1264482000 > > $ tclsh8.7 > % clock scan "1/26/2010" > 1264482000 > > and unsurprisingly, pltcl's regression tests pass. I surmise > that something is broken in BETA1 that they fixed in BETA2. > > plpython works too, with the python 3.9 package (and no older > python). > > However, all is not peachy, because plperl doesn't work. > Trying to CREATE EXTENSION either plperl or plperlu leads > to a libperl panic: > > pl_regression=# create extension plperl; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. > > with this in the postmaster log: > > panic: pthread_key_create failed > > That message is certainly not ours, so it must be coming out of libperl. > > Another thing that seemed strange is that ecpg's preproc.o takes > O(forever) to compile. I killed the build after observing that the > compiler had gotten to 40 minutes of CPU time, and redid that step > with PROFILE=-O0, which allowed it to compile in 20 seconds or so. > (I also tried -O1, but gave up after a few minutes.) This machine > can compile the main backend grammar in a minute or two, so there is > something very odd there. > > I'm coming to the conclusion that 14-BETA is, well, beta grade. > I'll be interested to see if you get the same results when you > update to BETA2. Thanks, I'll try that when I'll be at the office next week. retards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)
On 9/20/23 19:59, Tomas Vondra wrote: > > > On 9/20/23 01:24, Tom Lane wrote: >> Tomas Vondra writes: >>> bsd@freebsd:~ $ tclsh8.6 >>> % clock scan "1/26/2010" >>> time value too large/small to represent >> >> In hopes of replicating this, I tried installing FreeBSD 14-BETA2 >> aarch64 on my Pi 3B. This test case works fine: >> >> $ tclsh8.6 >> % clock scan "1/26/2010" >> 1264482000 >> >> $ tclsh8.7 >> % clock scan "1/26/2010" >> 1264482000 >> >> and unsurprisingly, pltcl's regression tests pass. I surmise >> that something is broken in BETA1 that they fixed in BETA2. >> >> plpython works too, with the python 3.9 package (and no older >> python). >> >> However, all is not peachy, because plperl doesn't work. >> Trying to CREATE EXTENSION either plperl or plperlu leads >> to a libperl panic: >> >> pl_regression=# create extension plperl; >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Succeeded. >> >> with this in the postmaster log: >> >> panic: pthread_key_create failed >> >> That message is certainly not ours, so it must be coming out of libperl. >> >> Another thing that seemed strange is that ecpg's preproc.o takes >> O(forever) to compile. I killed the build after observing that the >> compiler had gotten to 40 minutes of CPU time, and redid that step >> with PROFILE=-O0, which allowed it to compile in 20 seconds or so. >> (I also tried -O1, but gave up after a few minutes.) This machine >> can compile the main backend grammar in a minute or two, so there is >> something very odd there. >> >> I'm coming to the conclusion that 14-BETA is, well, beta grade. >> I'll be interested to see if you get the same results when you >> update to BETA2. > > Thanks, I'll try that when I'll be at the office next week. > FWIW when I disabled tcl, the tests pass (it's running with --nostatus --nosend, so it's not visible on the buildfarm site). Including the plperl stuff: == running regression test queries== test plperl ... ok 397 ms test plperl_lc... ok 152 ms test plperl_trigger ... ok 374 ms test plperl_shared... ok 163 ms test plperl_elog ... ok 184 ms test plperl_util ... ok 210 ms test plperl_init ... ok 150 ms test plperlu ... ok 117 ms test plperl_array ... ok 228 ms test plperl_call ... ok 189 ms test plperl_transaction ... ok 412 ms test plperl_plperlu ... ok 238 ms == All 12 tests passed. == I wonder if this got broken between BETA1 and BETA2. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_upgrade --check fails to warn about abstime
On Wed Sep 20, 2023 at 12:58 PM CDT, Alvaro Herrera wrote: On 2023-Sep-20, Tristan Partin wrote: > > +/* > > + * check_for_removed_data_type_usage > > + * > > + *similar to the above, but for types that were removed in 12. > > + */ > > +static void > > +check_for_removed_data_type_usage(ClusterInfo *cluster, const char *datatype) > > Seems like you could make this more generic instead of hardcoding version > 12, and then you could use it for any future removed types as well. Yeah, I thought about that, and then closed that with "we can whack it around when we need it". At this point I imagine there's very few other datatypes we can remove from the core server, if any. Makes complete sense to me. Patch looks good to me with one comment. +pg_fatal("Your installation contains the \"%s\" data type in user tables.\n" + "Data type \"%s\" has been removed in PostgreSQL version 12,\n" + "so this cluster cannot currently be upgraded. You can drop the\n" + "problem columns, or change them to another data type, and restart\n" + "the upgrade. A list of the problem columns is in the file:\n" + "%s", datatype, datatype, output_path); I would wrap the second \"%s\" in commas. Data type, "abstime", has been... Maybe also add a "The" to start that sentence to make it less terse. Up to you. -- Tristan Partin Neon (https://neon.tech)
Re: should frontend tools use syncfs() ?
On Wed, Sep 20, 2023 at 03:12:56PM +0300, Maxim Orlov wrote: > As a solution, I suggest renaming sync_method in xlog module to > wal_sync_method. In fact, > appropriate GUC for this variable, called "wal_sync_method" and I see no > reason not to use > the exact same name for a variable in xlog module. +1 I think we should also consider renaming things like SYNC_METHOD_FSYNC to WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
On 2023-09-19 Tu 23:07, Amit Langote wrote: On Tue, Sep 19, 2023 at 9:00 PM Amit Langote wrote: On Tue, Sep 19, 2023 at 7:37 PM jian he wrote: --- https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also polymorphic, and the actual data type(s) supplied for the polymorphic arguments determine the actual result type for that call. select json_query(jsonb'{"a":[{"a":[2,3]},{"a":[4,5]}]}','$.a[*].a?(@<=3)' returning anyrange); should fail. Now it returns NULL. Maybe we can validate it in transformJsonFuncExpr? --- I'm not sure whether we should make the parser complain about the weird types being specified in RETURNING. Sleeping over this, maybe adding the following to transformJsonOutput() does make sense? + if (get_typtype(ret->typid) == TYPTYPE_PSEUDO) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("returning pseudo-types is not supported in SQL/JSON functions")); + Seems reasonable. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Add 'worker_type' to pg_stat_subscription
On Tue, Sep 19, 2023 at 08:36:35AM +0530, Amit Kapila wrote: > I am of the opinion that worker_type should be 'apply' instead of > 'leader apply' because even when it is a leader for parallel apply > worker, it could perform individual transactions apply. For reference, > I checked pg_stat_activity.backend_type, there is nothing called main > or leader backend even when the backend is involved in parallel query. Okay. Here is v9 of the patch with this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From b8268a4c95ff217742bc2e127f74f67c9a417233 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 20 Sep 2023 12:22:21 -0700 Subject: [PATCH v9 1/1] Add worker type to pg_stat_subscription. Thanks to 2a8b40e368, the logical replication worker type is easily determined, and this information is a nice addition to the pg_stat_subscription view. The worker type could already be deduced via other columns such as leader_pid and relid, but that is unnecessary complexity for users. Bumps catversion. Author: Peter Smith Reviewed-by: Michael Paquier, Maxim Orlov, Amit Kapila Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com --- doc/src/sgml/monitoring.sgml | 13 - src/backend/catalog/system_views.sql | 1 + src/backend/replication/logical/launcher.c | 18 +- src/include/catalog/pg_proc.dat| 6 +++--- src/test/regress/expected/rules.out| 3 ++- src/test/subscription/t/004_sync.pl| 2 +- 6 files changed, 36 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 4ff415d6a0..9c4930e9ae 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1993,6 +1993,17 @@ description | Waiting for a newly initialized WAL file to reach durable storage + + + worker_type text + + + Type of the subscription worker process. Possible types are + apply, parallel apply, and + table synchronization. + + + pid integer @@ -2008,7 +2019,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage Process ID of the leader apply worker if this process is a parallel - apply worker; NULL if this process is a leader apply worker or a + apply worker; NULL if this process is a leader apply worker or a table synchronization worker diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 77b06e2a7a..fcb14976c0 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS SELECT su.oid AS subid, su.subname, +st.worker_type, st.pid, st.leader_pid, st.relid, diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 7882fc91ce..501910b445 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid) Datum pg_stat_get_subscription(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_SUBSCRIPTION_COLS 9 +#define PG_STAT_GET_SUBSCRIPTION_COLS 10 Oid subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0); int i; ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; @@ -1339,6 +1339,22 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS) else values[8] = TimestampTzGetDatum(worker.reply_time); + switch (worker.type) + { + case WORKERTYPE_APPLY: +values[9] = CStringGetTextDatum("apply"); +break; + case WORKERTYPE_PARALLEL_APPLY: +values[9] = CStringGetTextDatum("parallel apply"); +break; + case WORKERTYPE_TABLESYNC: +values[9] = CStringGetTextDatum("table synchronization"); +break; + case WORKERTYPE_UNKNOWN: +/* Should never happen. */ +elog(ERROR, "unknown worker type"); + } + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 9805bc6118..f0b7b9cbd8 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5484,9 +5484,9 @@ proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => 'oid', - proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz}', - proargmodes => '{i,o,o,o,o,o,o,o,o,o}', - proargnames => '{subid,subid,relid,pid,leader_pid,received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time}', + proallargtypes => '{oid,oid,oid,int4,int4,pg_lsn,timestamptz,timestamptz,pg_lsn,timestamptz,text}', + pro
Re: New WAL record to detect the checkpoint redo location
On Mon, Sep 18, 2023 at 2:57 PM Robert Haas wrote: > I've been brainstorming about this today, trying to figure out some > ideas to make it work. Here are some patches. 0001 refactors XLogInsertRecord to unify a couple of separate tests of isLogSwitch, hopefully making it cleaner and cheaper to add more special cases. 0002 is a very minimal patch to add XLOG_CHECKPOINT_REDO without using it for anything. 0003 modifies CreateCheckPoint() to insert an XLOG_CHECKPOINT_REDO record for any non-shutdown checkpoint, and modifies XLogInsertRecord() to treat that as a new special case, wherein after inserting the record the redo pointer is reset while still holding the WAL insertion locks. I've tested this to the extent of running the regression tests, and I also did one (1) manual test where it looked like the right thing was happening, but that's it, so this might be buggy or perform like garbage for all I know. But my hope is that it isn't buggy and performs adequately. If there's any chance of getting some comments on the basic design choices before I spend time testing and polishing it, that would be very helpful. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com v6-0001-Unify-two-isLogSwitch-tests-in-XLogInsertRecord.patch Description: Binary data v6-0002-Minimally-add-XLOG_CHECKPOINT_REDO.patch Description: Binary data v6-0003-WIP-Insert-XLOG_CHECKPOINT_REDO-at-the-redo-point.patch Description: Binary data
Re: LLVM 16 (opaque pointers)
Belated thanks Dmitry, Ronan, Andres for your feedback. Here's a new version, also including Dmitry's patch for 17 which it is now also time to push. It required a bit more trivial #if magic to be conditional, as Dmitry already mentioned. I just noticed that Dmitry had the LLVMPassBuilderOptionsSetInlinerThreshold() function added to LLVM 17's C API for this patch. Thanks! (Better than putting stuff in llvmjit_wrap.c, if you can get it upstreamed in time.) I thought I needed to block users from building with too-old clang and too-new LLVM, but I haven't managed to find a combination that actually breaks anything. I wouldn't recommend it, but for example clang 10 bitcode seems to be inlinable without problems by LLVM 16 on my system (I didn't use an assert build of LLVM though). I think that could be a separate adjustment if we learn that we need to enforce or document a constraint there. So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main branch) against PostgreSQL versions 14, 15, 16. I've attached the versions that apply to master and 16, and pushed back-patches to 14 and 15 to public branches if anyone's interested[1]. Back-patching further seems a bit harder. I'm quite willing to do it, but ... do we actually need to, ie does anyone really *require* old PostgreSQL release branches to work with new LLVM? (I'll start a separate thread about the related question of when we get to drop support for old LLVMs.) One point from an earlier email: On Sat, Aug 12, 2023 at 6:09 AM Andres Freund wrote: > > AttributeTemplate(PG_FUNCTION_ARGS) > > { > > + PGFunction fp PG_USED_FOR_ASSERTS_ONLY; > > + > > + fp = &AttributeTemplate; > Other parts of the file do this by putting the functions into > referenced_functions[], i'd copy that here and below. Actually here I just wanted to assert that the 3 template functions match certain function pointer types. To restate what these functions are about: in the JIT code I need the function type, but we have only the function pointer type, and it is now impossible to go from a function pointer type to a function type, so I needed to define some example functions with the right prototype (well, one of them existed already but I needed more), and then I wanted to assert that they are assignable to the appropriate function pointer types. Does that make sense? In this version I changed it to what I hope is a more obvious/explicit expression of that goal: + AssertVariableIsOfType(&ExecEvalSubroutineTemplate, + ExecEvalSubroutine); [1] https://github.com/macdice/postgres/tree/llvm16-14 and -15 From d5e4ac6dd9c2016c00bedfd8cd404e2f277701e1 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH v2 1/2] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Reviewed-by: Ronan Dunklau Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c| 57 ++-- src/backend/jit/llvm/llvmjit_deform.c | 119 src/backend/jit/llvm/llvmjit_expr.c | 401 -- src/backend/jit/llvm/llvmjit_types.c | 39 ++- src/backend/jit/llvm/llvmjit_wrap.cpp | 12 + src/backend/jit/llvm/meson.build | 2 +- src/include/jit/llvmjit.h | 7 + src/include/jit/llvmjit_emit.h| 106 +-- 8 files changed, 479 insertions(+), 264 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 09650e2c70..06bb440503 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -85,8 +85,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggSt
Re: Improvements in pg_dump/pg_restore toc format and performances
On Tue, Sep 19, 2023 at 12:15:55PM +0200, Pierre Ducroquet wrote: > Attached updated patches fix this regression, I'm sorry I missed that. Thanks for the new patches. 0001 and 0002 look reasonable to me. This is a nitpick, but we might want to use atooid() in 0002, which is just shorthand for the strtoul() call you are using. > - WriteStr(AH, NULL); /* Terminate List */ > + WriteInt(AH, -1); /* Terminate List */ I think we need to be cautious about using WriteInt and ReadInt here. OIDs are unsigned, so we probably want to use InvalidOid (0) instead. > + if (AH->version >= K_VERS_1_16) > { > - depSize *= 2; > - deps = (DumpId *) pg_realloc(deps, > sizeof(DumpId) * depSize); > + depId = ReadInt(AH); > + if (depId == -1) > + break; /* end of list > */ > + if (depIdx >= depSize) > + { > + depSize *= 2; > + deps = (DumpId *) > pg_realloc(deps, sizeof(DumpId) * depSize); > + } > + deps[depIdx] = depId; > + } > + else > + { > + tmp = ReadStr(AH); > + if (!tmp) > + break; /* end of list > */ > + if (depIdx >= depSize) > + { > + depSize *= 2; > + deps = (DumpId *) > pg_realloc(deps, sizeof(DumpId) * depSize); > + } > + deps[depIdx] = strtoul(tmp, NULL, 10); > + free(tmp); > } I would suggest making the resizing logic common. if (AH->version >= K_VERS_1_16) { depId = ReadInt(AH); if (depId == InvalidOid) break; /* end of list */ } else { tmp = ReadStr(AH); if (!tmp) break; /* end of list */ depId = strtoul(tmp, NULL, 10); free(tmp); } if (depIdx >= depSize) { depSize *= 2; deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize); } deps[depIdx] = depId; Also, can we make depId more locally scoped? I have yet to look into 0004 still. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Guiding principle for dropping LLVM versions?
Hi, Currently we claim to support all versions of LLVM from 3.9 up. It's now getting quite inconvenient to test changes on older releases with single digit major versions, because they aren't available through usual package channels on current distributions, and frankly it feels like pointless busy-work to build those older versions from source (not to mention that it takes huuurrs to compile that much C++). At the other end of the window, we've also been back-patching support for the latest LLVM versions into all supported releases, which might make slightly more sense, but I don't know. For the trailing end of the window, would it make sense to say that when PostgreSQL 17 ships, it doesn't need to support any LLVM versions that are no longer available in the default package repositories of current major distros? I'm trying to understand the practical constraints. Perhaps a package maintainer could correct me if I have this wrong. Distros typically support a range of releases from the past few years, and then bless one as 'default' by making it the one you get if you install a meta package eg 'llvm' without a number (for example, on Debian 12 this is LLVM 14, though LLVM 13 is still available). Having a default encourages sharing, eg one LLVM library can be used by many different things. The maintainer of the PostgreSQL package then chooses which one to link against, and it's usually the default one unless we can't use that one yet for technical reasons (a situation that might arise from time to time in bleeding edge distros). So if we just knew the *oldest default* on every live distro at release time, I assume no package maintainer would get upset if we ripped out support for everything older, and that'd let us vacuum a lot of old versions out of our tree. A more conservative horizon would be: which is the *oldest* LLVM you can still get through the usual channels on every relevant distro, for the benefit of people compiling from source, who for some reason want to use a version older then the default on their distro? I don't know what the motivation would be. What reason could there be to be more conservative than that? I wonder if there is a good way to make this sort of thing more systematic. If we could agree on a guiding principle vaguely like the above, then perhaps we just need a wiki page that lists relevant distributions, versions and EOL dates, that could be used to reduce the combinations of stuff we have to consider and make the pruning decisions into no-brainers.
Re: Add 'worker_type' to pg_stat_subscription
On Thu, Sep 21, 2023 at 5:30 AM Nathan Bossart wrote: > > On Tue, Sep 19, 2023 at 08:36:35AM +0530, Amit Kapila wrote: > > I am of the opinion that worker_type should be 'apply' instead of > > 'leader apply' because even when it is a leader for parallel apply > > worker, it could perform individual transactions apply. For reference, > > I checked pg_stat_activity.backend_type, there is nothing called main > > or leader backend even when the backend is involved in parallel query. > > Okay. Here is v9 of the patch with this change. > One question -- the patch comment still says "Bumps catversion.", but catversion.h change is missing from the v9 patch? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: pg_upgrade and logical replication
On Wed, Sep 20, 2023 at 04:54:36PM +0530, Amit Kapila wrote: > Also, the patch seems to be allowing subscription relations from PG > >=10 to be migrated but how will that work if the corresponding > publisher is also upgraded without slots? Won't the corresponding > workers start failing as soon as you restart the upgrade server? Do we > need to document the steps for users? Hmm? How is that related to the upgrade of the subscribers? And how is that different from the case where a subscriber tries to connect back to a publisher where a slot has been dropped? There is no need of pg_upgrade to reach such a state: ERROR: could not start WAL streaming: ERROR: replication slot "popo" does not exist -- Michael signature.asc Description: PGP signature
Re: Add 'worker_type' to pg_stat_subscription
On Thu, Sep 21, 2023 at 09:01:01AM +1000, Peter Smith wrote: > One question -- the patch comment still says "Bumps catversion.", but > catversion.h change is missing from the v9 patch? Yeah, previous patches did that, but it is no big deal. My take is that it is a good practice to never do a catversion bump in posted patches, and that it is equally a good practice from Nathan to be reminded about that with the addition of a note in the commit message of the patch posted. -- Michael signature.asc Description: PGP signature
Re: Add 'worker_type' to pg_stat_subscription
On Thu, Sep 21, 2023 at 08:14:23AM +0900, Michael Paquier wrote: > On Thu, Sep 21, 2023 at 09:01:01AM +1000, Peter Smith wrote: >> One question -- the patch comment still says "Bumps catversion.", but >> catversion.h change is missing from the v9 patch? > > Yeah, previous patches did that, but it is no big deal. My take is > that it is a good practice to never do a catversion bump in posted > patches, and that it is equally a good practice from Nathan to be > reminded about that with the addition of a note in the commit message > of the patch posted. Right, I'll take care of it before committing. I'm trying to make sure I don't forget. :) -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add 'worker_type' to pg_stat_subscription
On Thu, Sep 21, 2023 at 9:34 AM Nathan Bossart wrote: > > On Thu, Sep 21, 2023 at 08:14:23AM +0900, Michael Paquier wrote: > > On Thu, Sep 21, 2023 at 09:01:01AM +1000, Peter Smith wrote: > >> One question -- the patch comment still says "Bumps catversion.", but > >> catversion.h change is missing from the v9 patch? > > > > Yeah, previous patches did that, but it is no big deal. My take is > > that it is a good practice to never do a catversion bump in posted > > patches, and that it is equally a good practice from Nathan to be > > reminded about that with the addition of a note in the commit message > > of the patch posted. > > Right, I'll take care of it before committing. I'm trying to make sure I > don't forget. :) OK, all good. ~~~ This is a bit of a late entry, but looking at the PG DOCS, I felt it might be simpler if we don't always refer to every other worker type when explaining NULLs. The descriptions are already bigger than they need to be, and if more types ever get added they will keep growing. ~ BEFORE leader_pid integer Process ID of the leader apply worker if this process is a parallel apply worker; NULL if this process is a leader apply worker or a table synchronization worker SUGGESTION leader_pid integer Process ID of the leader apply worker; NULL if this process is not a parallel apply worker ~ BEFORE relid oid OID of the relation that the worker is synchronizing; NULL for the leader apply worker and parallel apply workers SUGGESTION relid oid OID of the relation being synchronized; NULL if this process is not a table synchronization worker -- Kind Regards, Peter Smith. Fujitsu Australia
Re: LLVM 16 (opaque pointers)
Hi Thomas, On Thu, 2023-09-21 at 08:22 +1200, Thomas Munro wrote: > So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main > branch) against PostgreSQL versions 14, 15, 16. I've attached the > versions that apply to master and 16, and pushed back-patches to 14 > and 15 to public branches if anyone's interested[1]. Back-patching > further seems a bit harder. I'm quite willing to do it, but ... do we > actually need to, ie does anyone really *require* old PostgreSQL > release branches to work with new LLVM? RHEL releases new LLVM version along with their new minor releases every 6 month, and we have to build older versions with new LLVM each time. >From RHEL point of view, it would be great if we can back-patch back to v12 :( Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: Guiding principle for dropping LLVM versions?
Hi, On Thu, 2023-09-21 at 10:54 +1200, Thomas Munro wrote: > I'm trying to understand the practical constraints. Perhaps a package > maintainer could correct me if I have this wrong. Distros typically > support a range of releases from the past few years, and then bless > one as 'default' by making it the one you get if you install a meta > package eg 'llvm' without a number (for example, on Debian 12 this is > LLVM 14, though LLVM 13 is still available). Having a default > encourages sharing, eg one LLVM library can be used by many different > things. The maintainer of the PostgreSQL package then chooses which > one to link against, and it's usually the default one unless we can't > use that one yet for technical reasons (a situation that might arise > from time to time in bleeding edge distros). So if we just knew the > *oldest default* on every live distro at release time, I assume no > package maintainer would get upset if we ripped out support for > everything older, and that'd let us vacuum a lot of old versions out > of our tree. RPM packager speaking: Even though older LLVM versions exist on both RHEL and Fedora, they don't provide older Clang packages, which means we have to link to the latest release anyway (like currently Fedora 38 packages are waiting for LLVM 16 patch, as they cannot be linked against LLVM 15) Regards, Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: LLVM 16 (opaque pointers)
On Thu, Sep 21, 2023 at 12:24 PM Devrim Gündüz wrote: > On Thu, 2023-09-21 at 08:22 +1200, Thomas Munro wrote: > > So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main > > branch) against PostgreSQL versions 14, 15, 16. I've attached the > > versions that apply to master and 16, and pushed back-patches to 14 > > and 15 to public branches if anyone's interested[1]. Back-patching > > further seems a bit harder. I'm quite willing to do it, but ... do we > > actually need to, ie does anyone really *require* old PostgreSQL > > release branches to work with new LLVM? > > RHEL releases new LLVM version along with their new minor releases every > 6 month, and we have to build older versions with new LLVM each time. > From RHEL point of view, it would be great if we can back-patch back to > v12 :( Got it. OK, I'll work on 12 and 13 now.
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat wrote: > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote wrote: > > Just one comment on 0003: > > > > + /* > > +* Dummy SpecialJoinInfos do not have any translated fields and hence > > have > > +* nothing to free. > > +*/ > > + if (child_sjinfo->jointype == JOIN_INNER) > > + return; > > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? > > try_partitionwise_join() calls free_child_sjinfo_members() > unconditionally i.e. it will be called even SpecialJoinInfos of INNER > join. Above condition filters those out early. In fact if we convert > into an Assert, in a production build the rest of the code will free > Relids which are referenced somewhere else causing dangling pointers. You're right. I hadn't realized that the parent SpecialJoinInfo passed to try_partitionwise_join() might itself be a dummy. I guess I should've read the whole thing before commenting. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Bug fix in vacuumdb --buffer-usage-limit xxx -Z
Hi, When --buffer-usage-limit option is specified, vacuumdb issues VACUUM or VACUUM ANALYZE command with BUFFER_USAGE_LIMIT option. Also if --buffer-usage-limit and -Z options are specified, vacuumdb should issue ANALYZE command with BUFFER_USAGE_LIMIT option. But it does not. That is, vacuumdb -Z seems to fail to handle --buffer-usage-limit option. This seems a bug. You can see my patch in the attached file and how it works by adding -e option in vacuumdb. Ryoga Yoshidadiff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index f03d5b1c6c..57355639c0 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -964,6 +964,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion, appendPQExpBuffer(sql, "%sVERBOSE", sep); sep = comma; } + if (vacopts->buffer_usage_limit) + { +Assert(serverVersion >= 16); +appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep, + vacopts->buffer_usage_limit); +sep = comma; + } if (sep != paren) appendPQExpBufferChar(sql, ')'); }
Re: bug fix and documentation improvement about vacuumdb
> > I agree. Supporting pattern matching should, if anyone is interested in > trying, be done separately in its own thread, no need to move the goalposts > here. Sorry if I made it sound like so upthread. > I got it. > When sending an update, please include the previous patch as well with > your new > tests as a 0002 patch in a patchset. The CFBot can only apply and > build/test > patches when the entire patchset is attached to the email. The below > testresults indicate that the patch failed the new tests (which in a way is > good since without the fix the tests *should* fail), since the fix patch > was > not applied: > > http://cfbot.cputube.org/masaki-kuwamura.html > I'm sorry, I didn't know that. I attached both the test and fix patch to this mail. (The fix patch is clearly Nathan-san's though) If I'm still in a wrong way, please let me know. Masaki Kuwamura v1_vacuumdb_add_tests.patch Description: Binary data vacuumdb_fix.patch Description: Binary data
Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set
On 2023-09-20 21:14, Daniel Gustafsson wrote: On 19 Sep 2023, at 11:21, bt23nguyent wrote: Please let me know if you have any further suggestions that I can improve more. +*logdetail = pstrdup("WAL archiving failed because basic_archive.archive_directory is not set"); Nitpick: detail messages should end with a period per the error message style guide [0]. Yes! I totally missed this detail. -archiving will proceed only when it returns true. +archiving will proceed only when it returns true. The +archiver may also emit the detail explaining how the module is not configured +to the sever log if the archive module has any. I think this paragraph needs to be updated to include how the returned logdetail is emitted, since it currently shows the WARNING without mentioning the added detail in case returned. It would also be good to mention that it should be an allocated string which the caller can free. -- Daniel Gustafsson [0] https://www.postgresql.org/docs/devel/error-style-guide.html Thank you for your kind review comment! I agree with you that this document update is not explanatory enough. So here is an updated patch. If there is any further suggestion, please let me know. Best regards, Tung Nguyendiff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index 4d78c31859..dd0f2816dc 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -48,7 +48,7 @@ typedef struct BasicArchiveData static char *archive_directory = NULL; static void basic_archive_startup(ArchiveModuleState *state); -static bool basic_archive_configured(ArchiveModuleState *state); +static bool basic_archive_configured(ArchiveModuleState *state, char **logdetail); static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path); static void basic_archive_file_internal(const char *file, const char *path); static bool check_archive_directory(char **newval, void **extra, GucSource source); @@ -159,9 +159,15 @@ check_archive_directory(char **newval, void **extra, GucSource source) * Checks that archive_directory is not blank. */ static bool -basic_archive_configured(ArchiveModuleState *state) +basic_archive_configured(ArchiveModuleState *state, char **logdetail) { - return archive_directory != NULL && archive_directory[0] != '\0'; + if (archive_directory == NULL || archive_directory[0] == '\0') +{ +*logdetail = pstrdup("WAL archiving failed because basic_archive.archive_directory is not set."); +return false; +} +else +return true; } /* diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml index 7064307d9e..b58ed238b4 100644 --- a/doc/src/sgml/archive-modules.sgml +++ b/doc/src/sgml/archive-modules.sgml @@ -101,7 +101,7 @@ typedef void (*ArchiveStartupCB) (ArchiveModuleState *state); assumes the module is configured. -typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state, char **logdetail); If true is returned, the server will proceed with @@ -112,7 +112,10 @@ typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); WARNING: archive_mode enabled, yet archiving is not configured In the latter case, the server will periodically call this function, and -archiving will proceed only when it returns true. +archiving will proceed only when it returns true. The +archiver may also emit the detail explaining how the module is not configured +to the sever log if the archive module has any. The detail message of archive +module should be an allocated string which the caller can free. diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c index 157ca4c751..c957a18ee7 100644 --- a/src/backend/archive/shell_archive.c +++ b/src/backend/archive/shell_archive.c @@ -23,7 +23,7 @@ #include "common/percentrepl.h" #include "pgstat.h" -static bool shell_archive_configured(ArchiveModuleState *state); +static bool shell_archive_configured(ArchiveModuleState *state, char **logdetail); static bool shell_archive_file(ArchiveModuleState *state, const char *file, const char *path); @@ -43,7 +43,7 @@ shell_archive_init(void) } static bool -shell_archive_configured(ArchiveModuleState *state) +shell_archive_configured(ArchiveModuleState *state, char **logdetail) { return XLogArchiveCommand[0] != '\0'; } diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 46af349564..2fd1d03b09 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -390,7 +390,8 @@ pgarch_ArchiverCopyLoop(void) { struct stat stat_buf; char pathname[MAXPGPATH]; - + char *logdetail; + /* * Do not initiate any more archive commands after receiving * SIGTERM, nor
Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
On Thu, Sep 21, 2023 at 10:44:49AM +0900, Ryoga Yoshida wrote: > When --buffer-usage-limit option is specified, vacuumdb issues VACUUM or > VACUUM ANALYZE command with BUFFER_USAGE_LIMIT option. Also if > --buffer-usage-limit and -Z options are specified, vacuumdb should issue > ANALYZE command with BUFFER_USAGE_LIMIT option. But it does not. That is, > vacuumdb -Z seems to fail to handle --buffer-usage-limit option. This seems > a bug. > > You can see my patch in the attached file and how it works by adding -e > option in vacuumdb. Good catch, indeed the option is missing from the ANALYZE commands built under analyze_only. I can also notice that we have no tests for this option in src/bin/scripts/t checking the shape of the commands generated. Could you add something for ANALYZE and VACUUM? The option could just be appended in one of the existing cases, for instance. -- Michael signature.asc Description: PGP signature
Re: Comment about set_join_pathlist_hook()
On Wed, Sep 20, 2023, at 5:05 PM, Etsuro Fujita wrote: > Hi, > > What I am concerned about from the report [1] is that this comment is > a bit too terse; it might cause a misunderstanding that extensions can > do different things than we intend to allow: > > /* > * 6. Finally, give extensions a chance to manipulate the path list. > */ > if (set_join_pathlist_hook) > set_join_pathlist_hook(root, joinrel, outerrel, innerrel, >jointype, &extra); > > So I would like to propose to extend the comment to explain what they > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c. > Attached is a patch for that. It makes sense. But why do you restrict addition to pathlist by only the add_path() routine? It can fail to add a path to the pathlist. We need to find out the result of the add_path operation and need to check the pathlist each time. So, sometimes, it can be better to add a path manually. One more slip-up could be prevented by the comment: removing a path from the pathlist we should remember about the cheapest_* pointers. Also, it may be good to remind a user, that jointype and extra->sjinfo->jointype aren't the same all the time. > [1] > https://www.postgresql.org/message-id/CACawEhV%3D%2BQ0HXrcDergbTR9EkVFukgRPMTZbRFL-YK5CRmvYag%40mail.gmail.com -- Regards, Andrei Lepikhov
Re: Infinite Interval
> On Wed, 20 Sept 2023 at 15:00, Ashutosh Bapat > wrote: > > > > 0005 - Refactored Jian's code fixing window functions. Does not > > contain the changes for serialization and deserialization. Jian, > > please let me know if I have missed anything else. > > attached serialization and deserialization function. > > Also, in do_interval_discard(), this seems a bit risky: > > +neg_val.day = -newval->day; > +neg_val.month = -newval->month; > +neg_val.time = -newval->time; > we already have interval negate function, So I changed to interval_um_internal. based on 20230920 patches. I have made the attached changes. The serialization do make big difference when configure to parallel mode. From bff5e3dfa8607a8b45aa287a7c55fda9d984f339 Mon Sep 17 00:00:00 2001 From: pgaddict Date: Thu, 21 Sep 2023 10:05:21 +0800 Subject: [PATCH v22 7/7] interval aggregate serializatinn and deserialization add interval aggregate serialization and deserialization function. fix a typo. change manually negate a interval to use interval_um_internal. --- src/backend/utils/adt/timestamp.c| 86 +--- src/include/catalog/pg_aggregate.dat | 4 ++ src/include/catalog/pg_proc.dat | 6 ++ 3 files changed, 88 insertions(+), 8 deletions(-) diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index c6dc2d44..2b86171a 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -75,7 +75,7 @@ typedef struct /* * The transition datatype for interval aggregates is declared as Internal. It's - * a pointer to a NumericAggState allocated in the aggregate context. + * a pointer to a IntervalAggState allocated in the aggregate context. */ typedef struct IntervalAggState { @@ -3984,10 +3984,7 @@ interval_avg_combine(PG_FUNCTION_ARGS) state1->N = state2->N; state1->pInfcount = state2->pInfcount; state1->nInfcount = state2->nInfcount; - - state1->sumX.day = state2->sumX.day; - state1->sumX.month = state2->sumX.month; - state1->sumX.time = state2->sumX.time; + memcpy(&state1->sumX, &state2->sumX, sizeof(Interval)); PG_RETURN_POINTER(state1); } @@ -4008,6 +4005,81 @@ interval_avg_combine(PG_FUNCTION_ARGS) PG_RETURN_POINTER(state1); } +/* + * interval_avg_serialize + * Serialize IntervalAggState for interval aggregates. + */ +Datum +interval_avg_serialize(PG_FUNCTION_ARGS) +{ + IntervalAggState *state; + StringInfoData buf; + bytea *result; + /* Ensure we disallow calling when not in aggregate context */ + if (!AggCheckCallContext(fcinfo, NULL)) + elog(ERROR, "aggregate function called in non-aggregate context"); + state = (IntervalAggState *) PG_GETARG_POINTER(0); + pq_begintypsend(&buf); + /* N */ + pq_sendint64(&buf, state->N); + /* Interval struct elements, one by one. */ + pq_sendint64(&buf, state->sumX.time); + pq_sendint32(&buf, state->sumX.day); + pq_sendint32(&buf, state->sumX.month); + /* pInfcount */ + pq_sendint64(&buf, state->pInfcount); + /* nInfcount */ + pq_sendint64(&buf, state->nInfcount); + result = pq_endtypsend(&buf); + PG_RETURN_BYTEA_P(result); +} + +/* + * interval_avg_deserialize + * Deserialize bytea into IntervalAggState for interval aggregates. + */ +Datum +interval_avg_deserialize(PG_FUNCTION_ARGS) +{ + bytea *sstate; + IntervalAggState *result; + StringInfoData buf; + + if (!AggCheckCallContext(fcinfo, NULL)) + elog(ERROR, "aggregate function called in non-aggregate context"); + + sstate = PG_GETARG_BYTEA_PP(0); + + /* + * Copy the bytea into a StringInfo so that we can "receive" it using the + * standard recv-function infrastructure. + */ + initStringInfo(&buf); + appendBinaryStringInfo(&buf, + VARDATA_ANY(sstate), VARSIZE_ANY_EXHDR(sstate)); + + result = makeIntervalAggState(fcinfo); + + /* N */ + result->N = pq_getmsgint64(&buf); + + /* Interval struct elements, one by one. */ + result->sumX.time = pq_getmsgint64(&buf); + result->sumX.day = pq_getmsgint(&buf, sizeof(result->sumX.day)); + result->sumX.month = pq_getmsgint(&buf, sizeof(result->sumX.month)); + + /* pInfcount */ + result->pInfcount = pq_getmsgint64(&buf); + + /* nInfcount */ + result->nInfcount = pq_getmsgint64(&buf); + + pq_getmsgend(&buf); + pfree(buf.data); + + PG_RETURN_POINTER(result); +} + /* * Remove the given interval value from the aggregated state. */ @@ -4034,9 +4106,7 @@ do_interval_discard(IntervalAggState *state, Interval *newval) Interval temp; Interval neg_val; - neg_val.day = -newval->day; - neg_val.month = -newval->month; - neg_val.time = -newval->time; + interval_um_internal(newval, &neg_val); memcpy(&temp, &state->sumX, sizeof(Interval)); finite_interv
CI: Unfamiliar error while testing macOS
Dear hackers, While developing my patch, I found that the CI for macOS failed with unknown error [1]. Do you know the reason why it happened? Please tell me if you have workarounds... It failed the test at "Upload 'ccache' cache". The Cirrus app said a following message: > Persistent worker failed to start the task: remote agent failed: failed to > run agent: wait: remote command exited without exit status or exit signal [1]: https://cirrus-ci.com/task/5439712639320064 Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Thanks for the patch. I rerun the test in https://www.postgresql.org/message-id/flat/ZQtzcH2lvo8leXEr%40paquier.xyz#cc5ed83e0edc0b9a1c1305f08ff7a335 . We can discuss all the problems in this thread. First I encountered the problem " FATAL: could not find recovery.signal or standby.signal when recovering with backup_label ", then I deleted the backup_label file and started the instance successfully. > Delete a backup_label from a fresh base backup can easily lead to data > corruption, as the startup process would pick up as LSN to start > recovery from the control file rather than the backup_label file. > This would happen if a checkpoint updates the redo LSN in the control > file while a backup happens and the control file is copied after the > checkpoint, for instance. If one wishes to deploy a new primary from > a base backup, recovery.signal is the way to go, making sure that the > new primary is bumped into a new timeline once recovery finishes, on > top of making sure that the startup process starts recovery from a > position where the cluster would be able to achieve a consistent > state. ereport(FATAL, (errmsg("could not find redo location referenced by checkpoint record"), errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n" "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n" "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.", DataDir, DataDir, DataDir))); There are two similar error messages in xlogrecovery.c. Maybe we can modify the error messages to be similar. -- Bowen Shi On Thu, 21 Sept 2023 at 11:01, Michael Paquier wrote: > > On Wed, Jul 19, 2023 at 11:21:17AM -0700, David Zhang wrote: > > 1) simply start server from a base backup > > > > FATAL: could not find recovery.signal or standby.signal when recovering > > with backup_label > > > > HINT: If you are restoring from a backup, touch > > "/media/david/disk1/pg_backup1/recovery.signal" or > > "/media/david/disk1/pg_backup1/standby.signal" and add required recovery > > options. > > Note the difference when --write-recovery-conf is specified, where a > standby.conf is created with a primary_conninfo in > postgresql.auto.conf. So, yes, that's expected by default with the > patch. > > > 2) touch a recovery.signal file and then try to start the server, the > > following error was encountered: > > > > FATAL: must specify restore_command when standby mode is not enabled > > Yes, that's also something expected in the scope of the v1 posted. > The idea behind this restriction is that specifying recovery.signal is > equivalent to asking for archive recovery, but not specifying > restore_command is equivalent to not provide any options to be able to > recover. See validateRecoveryParameters() and note that this > restriction exists since the beginning of times, introduced in commit > 66ec2db. I tend to agree that there is something to be said about > self-contained backups taken from pg_basebackup, though, as these > would fail if no restore_command is specified, and this restriction is > in place before Postgres has introduced replication and easier ways to > have base backups. As a whole, I think that there is a good argument > in favor of removing this restriction for the case where archive > recovery is requested if users have all their WAL in pg_wal/ to be > able to recover up to a consistent point, keeping these GUC > restrictions if requesting a standby (not recovery.signal, only > standby.signal). > > > 3) touch a standby.signal file, then the server successfully started, > > however, it operates in standby mode, whereas the intended behavior was for > > it to function as a primary server. > > standby.signal implies that the server will start in standby mode. If > one wants to deploy a new primary, that would imply a timeline jump at > the end of recovery, you would need to specify recovery.signal > instead. > > We need more discussions and more opinions, but the discussion has > stalled for a few months now. In case, I am adding Thomas Munro in CC > who has mentioned to me at PGcon that he was interested in this patch > (this thread's problem is not directly related to the fact that the > checkpointer now runs in crash recovery, though). > > For now, I am attaching a rebased v2. > -- > Michael
Re: Synchronizing slots from primary to standby
On Tue, Sep 19, 2023 at 10:29 AM shveta malik wrote: > > On Fri, Sep 15, 2023 at 2:22 PM Peter Smith wrote: > > > > Hi. Here are some review comments for v17-0002. > > > > Thanks Peter for the feedback. I have addressed most of these in v18 > except 2. Please find my comments for the ones not addressed. > > > This is a WIP and a long way from complete, but I wanted to send what > > I have so far (while it is still current with your latest posted > > patches). > > > > == > > > > > 34. ListSlotDatabaseOIDs - sorting/logic > > > > Maybe explain better the reason for having the qsort and other logic. > > > > TBH, I was not sure of the necessity for the names lists and the > > sorting and bsearch logic. AFAICT these are all *only* used to check > > for uniqueness and existence of the slot name. So I was wondering if a > > hashmap keyed by the slot name might be more appropriate and also > > simpler than all this list sorting/searching. > > > > Pending. I will revisit this soon and will let you know more on this. > IMO, it was done to optimize the search as slot_names list could be > pretty huge if max_replication_slots is set to high value. > > > ~~ > > > > 35. ListSlotDatabaseOIDs > > > > + for (int slotno = 0; slotno < max_replication_slots; slotno++) > > + { > > > > loop variable declaration > > > > > > == > > src/backend/storage/lmgr/lwlock.c > > OK > > > > == > > src/backend/storage/lmgr/lwlocknames.txt > > OK > > > > == > > .../utils/activity/wait_event_names.txt > > TODO > > > > == > > src/backend/utils/misc/guc_tables.c > > OK > > > > == > > src/backend/utils/misc/postgresql.conf.sample > > > > 36. > > # primary to streaming replication standby server > > +#max_slotsync_workers = 2 # max number of slot synchronization > > workers on a standby > > > > IMO it is better to say "maximum" instead of "max" in the comment. > > > > (make sure the GUC description text is identical) > > > > == > > src/include/catalog/pg_proc.dat > > > > 37. > > +{ oid => '6312', descr => 'get invalidate cause of a replication slot', > > + proname => 'pg_get_invalidation_cause', provolatile => 's', > > proisstrict => 't', > > + prorettype => 'int2', proargtypes => 'name', > > + prosrc => 'pg_get_invalidation_cause' }, > > > > 37a. > > SUGGESTION (descr) > > what caused the replication slot to become invalid > > > > ~ > > > > 37b > > 'pg_get_invalidation_cause' seemed like a poor function name because > > it doesn't have any context -- not even the word "slot" in it. > > > > == > > src/include/commands/subscriptioncmds.h > > OK > > > > == > > src/include/nodes/replnodes.h > > OK > > > > == > > src/include/postmaster/bgworker_internals.h > > > > 38. > > #define MAX_PARALLEL_WORKER_LIMIT 1024 > > +#define MAX_SLOT_SYNC_WORKER_LIMIT 50 > > > > Consider SLOTSYNC instead of SLOT_SYNC for consistency with other > > names of this worker. > > > > == > > OK > > > > == > > src/include/replication/logicalworker.h > > > > 39. > > extern void ApplyWorkerMain(Datum main_arg); > > extern void ParallelApplyWorkerMain(Datum main_arg); > > extern void TablesyncWorkerMain(Datum main_arg); > > +extern void ReplSlotSyncMain(Datum main_arg); > > > > The name is not consistent with others nearby. At least it should > > include the word "Worker" like everything else does. > > > > == > > src/include/replication/slot.h > > OK > > > > == > > src/include/replication/walreceiver.h > > > > 40. > > +/* > > + * Slot's DBid related data > > + */ > > +typedef struct WalRcvRepSlotDbData > > +{ > > + Oid database; /* Slot's DBid received from remote */ > > + TimestampTz last_sync_time; /* The last time we tried to launch sync > > + * worker for above Dbid */ > > +} WalRcvRepSlotDbData; > > + > > > > > > Is that comment about field 'last_sync_time' correct? I thought this > > field is the last time the slot was synced -- not the last time the > > worker was launched. > > Sorry for confusion. Comment is correct, the name is misleading. I > have changed the name in v18. > > > > > == > > src/include/replication/worker_internal.h > > > > 41. > > - /* User to use for connection (will be same as owner of subscription). */ > > + /* User to use for connection (will be same as owner of subscription > > + * in case of LogicalRep worker). */ > > Oid userid; > > +} WorkerHeader; > > > > 41a. > > > > This is not the normal style for a multi-line comment. > > > > ~ > > > > 41b. > > I wondered if the name "WorkerHeader" is just a bit *too* generic and > > might cause future trouble because of the vague name. > > > > I agree. Can you please suggest a better name for it? > > > thanks > Shveta Currently in patch001, synchronize_slot_names is a GUC on both primary and physical standby. This GUC tells which all logical slots need to be synced on physical standbys from the primary. Ideally it should be a GUC on physical standby alone and each physical standby should be able to communicate the value to the primary
Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
On Thu, 21 Sept 2023 at 13:45, Ryoga Yoshida wrote: > When --buffer-usage-limit option is specified, vacuumdb issues VACUUM or > VACUUM ANALYZE command with BUFFER_USAGE_LIMIT option. Also if > --buffer-usage-limit and -Z options are specified, vacuumdb should issue > ANALYZE command with BUFFER_USAGE_LIMIT option. But it does not. That > is, vacuumdb -Z seems to fail to handle --buffer-usage-limit option. > This seems a bug. > > You can see my patch in the attached file and how it works by adding -e > option in vacuumdb. Thanks for the report and the patch. I agree this has been overlooked. I also wonder if we should be escaping the buffer-usage-limit string sent in the comment. It seems quite an unlikely attack vector, as the user would have command line access and could likely just use psql anyway, but I had thought about something along the lines of: $ vacuumdb --buffer-usage-limit "1MB'); drop database postgres;--" postgres vacuumdb: vacuuming database "postgres" vacuumdb: error: processing of database "postgres" failed: ERROR: VACUUM cannot run inside a transaction block seems that won't work, due to sending multiple commands at once, but I think we should fix it anyway. David
Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
On Thu, Sep 21, 2023 at 6:37 AM Amit Langote wrote: > > On Wed, Sep 20, 2023 at 10:24 PM Ashutosh Bapat > wrote: > > On Wed, Sep 20, 2023 at 5:24 PM Amit Langote > > wrote: > > > Just one comment on 0003: > > > > > > + /* > > > +* Dummy SpecialJoinInfos do not have any translated fields and hence > > > have > > > +* nothing to free. > > > +*/ > > > + if (child_sjinfo->jointype == JOIN_INNER) > > > + return; > > > > > > Should this instead be Assert(child_sjinfo->jointype != JOIN_INNER)? > > > > try_partitionwise_join() calls free_child_sjinfo_members() > > unconditionally i.e. it will be called even SpecialJoinInfos of INNER > > join. Above condition filters those out early. In fact if we convert > > into an Assert, in a production build the rest of the code will free > > Relids which are referenced somewhere else causing dangling pointers. > > You're right. I hadn't realized that the parent SpecialJoinInfo > passed to try_partitionwise_join() might itself be a dummy. I guess I > should've read the whole thing before commenting. Maybe there's something to improve in terms of readability, I don't know. -- Best Wishes, Ashutosh Bapat
Re: Guiding principle for dropping LLVM versions?
Thomas Munro writes: > I wonder if there is a good way to make this sort of thing more > systematic. If we could agree on a guiding principle vaguely like the > above, then perhaps we just need a wiki page that lists relevant > distributions, versions and EOL dates, that could be used to reduce > the combinations of stuff we have to consider and make the pruning > decisions into no-brainers. FWIW, I think "compile older Postgres on newer infrastructure" is a more common and more defensible scenario than "compile newer Postgres on older infrastructure". We've spent a ton of effort on the latter scenario (and I've helped lead the charge in many cases), but I think the real-world demand for it isn't truly that high once you get beyond a year or two back. On the other hand, if you have an app that depends on PG 11 behavioral details and you don't want to update it right now, you might nonetheless need to put that server onto recent infrastructure for practical reasons. Thus, I think it's worthwhile to spend effort on back-patching new-LLVM compatibility fixes into old PG branches, but I agree that newer PG branches can drop compatibility with obsolete LLVM versions. LLVM is maybe not the poster child for these concerns -- for either direction of compatibility problems, someone could build without JIT support and not really be dead in the water. In any case, I agree with your prior decision to not touch v11 for this. With that branch's next release being its last, I think the odds of introducing a bug we can't fix later outweigh any arguable portability gain. regards, tom lane
Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
On Thu, 21 Sept 2023 at 16:18, David Rowley wrote: > Thanks for the report and the patch. I agree this has been overlooked. > > I also wonder if we should be escaping the buffer-usage-limit string > sent in the comment. It seems quite an unlikely attack vector, as the > user would have command line access and could likely just use psql > anyway, but I had thought about something along the lines of: > > $ vacuumdb --buffer-usage-limit "1MB'); drop database postgres;--" postgres > vacuumdb: vacuuming database "postgres" > vacuumdb: error: processing of database "postgres" failed: ERROR: > VACUUM cannot run inside a transaction block > > seems that won't work, due to sending multiple commands at once, but I > think we should fix it anyway. I've pushed your patch plus some additional code to escape the text specified in --buffer-usage-limit before passing it to the server in commit 5cfba1ad6 Thanks again for the report. David
Re: Comment about set_join_pathlist_hook()
Hi, On Thu, Sep 21, 2023 at 11:49 AM Lepikhov Andrei wrote: > On Wed, Sep 20, 2023, at 5:05 PM, Etsuro Fujita wrote: > > What I am concerned about from the report [1] is that this comment is > > a bit too terse; it might cause a misunderstanding that extensions can > > do different things than we intend to allow: > > > > /* > > * 6. Finally, give extensions a chance to manipulate the path list. > > */ > > if (set_join_pathlist_hook) > > set_join_pathlist_hook(root, joinrel, outerrel, innerrel, > >jointype, &extra); > > > > So I would like to propose to extend the comment to explain what they > > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c. > > Attached is a patch for that. > > It makes sense. But why do you restrict addition to pathlist by only the > add_path() routine? It can fail to add a path to the pathlist. We need to > find out the result of the add_path operation and need to check the pathlist > each time. So, sometimes, it can be better to add a path manually. I do not agree with you on this point; I think you can do so at your own responsibility, but I think it is better for extensions to use add_path(), because that makes them stable. (Assuming that add_path() has a bug and we change the logic of it to fix the bug, extensions that do not follow the standard procedure might not work anymore.) > One more slip-up could be prevented by the comment: removing a path from the > pathlist we should remember about the cheapest_* pointers. Do we really need to do this? I mean we do set_cheapest() afterward. See standard_join_search(). > Also, it may be good to remind a user, that jointype and > extra->sjinfo->jointype aren't the same all the time. That might be an improvement, but IMO that is not the point here, because the purpose to expand the comment is to avoid extensions doing different things than we intend to allow. Thanks for looking! Best regards, Etsuro Fujita
Re: pg_upgrade and logical replication
On Fri, Sep 15, 2023 at 03:08:21PM +0530, vignesh C wrote: > On Tue, 12 Sept 2023 at 14:25, Hayato Kuroda (Fujitsu) > wrote: >> Is there a possibility that apply worker on old cluster connects to the >> publisher during the upgrade? Regarding the pg_upgrade on publisher, the we >> refuse TCP/IP connections from remotes and port number is also changed, so >> we can >> assume that subscriber does not connect to. But IIUC such settings may not >> affect >> to the connection source, so that the apply worker may try to connect to the >> publisher. Also, is there any hazards if it happens? > > Yes, there is a possibility that the apply worker gets started and new > transaction data is being synced from the publisher. I have made a fix > not to start the launcher process in binary ugprade mode as we don't > want the launcher to start apply worker during upgrade. Hmm. I was wondering if 0001 is the right way to handle this case, but at the end I'm OK to paint one extra isBinaryUpgrade in the code path where apply launchers are registered. I don't think that the patch is complete, though. A comment should be added in pg_upgrade's server.c, exactly start_postmaster(), to tell that -b also stops apply workers. I am attaching a version updated as of the attached, that I'd be OK to apply. I don't really think that we need to worry about a subscriber connecting back to a publisher in this case, though? I mean, each postmaster instance started by pg_upgrade restricts the access to the instance with unix_socket_directories set to a custom path and permissions at 0700, and a subscription's connection string does not know the unix path used by pg_upgrade. I certainly agree that stopping these processes could lead to inconsistencies in the data the subscribers have been holding though, if we are not careful, so preventing them from running is a good practice anyway. I have also reviewed 0002. As a whole, I think that I'm OK with the main approach of the patch in pg_dump to use a new type of dumpable object for subscription relations that are dumped with their upgrade functions after. This still needs more work, and more documentation. Also, perhaps we should really have an option to control if this part of the copy happens or not. With a --no-subscription-relations for pg_dump at least? +{ oid => '4551', descr => 'add a relation with the specified relation state to pg_subscription_rel table', During a development cycle, any new function added needs to use an OID in range 8000-. Running unused_oids will suggest new random OIDs. FWIW, I am not convinced that there is a need for two functions to add an entry to pg_subscription_rel, with sole difference between both the handling of a valid or invalid LSN. We should have only one function that's able to handle NULL for the LSN. So let's remove rel_state_a and rel_state_b, and have a single rel_state(). The description of the SQL functions is inconsistent with the other binary upgrade ones, I would suggest for the two functions: "for use by pg_upgrade (relation for pg_subscription_rel)" "for use by pg_upgrade (remote_lsn for origin)" + i_srsublsn = PQfnumber(res, "srsublsn"); [...] + subrinfo[cur_rel].srsublsn = pg_strdup(PQgetvalue(res, i, i_srsublsn)); In getSubscriptionTables(), this should check for PQgetisnull() because we would have a NULL value for InvalidXLogRecPtr in the catalog. Using a char* for srsublsn is OK, but just assign NULL to it, then just pass a hardcoded NULL value to the function as we do in other places. So I don't quite get why this is not the same handling as suboriginremotelsn. getSubscriptionTables() is entirely skipped if we don't want any subscriptions, if we deal with a server of 9.6 or older or if we don't do binary upgrades, which is OK. +/* + * getSubscriptionTables + * get information about subscription membership for dumpable tables. + */ This commit is slightly misleading and should mention that this is an upgrade-only path? The code for dumpSubscriptionTable() is a copy-paste of dumpPublicationTable(), but a lot of what you are doing here is actually pointless if we are not in binary mode? Why should this code path not taken only under dataOnly? I mean, this is a code path we should never take except if we are in binary mode. This should have at least a cross-check to make sure that we never have a DO_SUBSCRIPTION_REL in this code path if we are in non-binary mode. +if (dopt->binary_upgrade && subinfo->suboriginremotelsn) +{ +appendPQExpBufferStr(query, + "SELECT pg_catalog.binary_upgrade_replorigin_advance("); +appendStringLiteralAH(query, subinfo->dobj.name, fout); +appendPQExpBuffer(query, ", '%s');\n", subinfo->suboriginremotelsn); +} Hmm.. Could it be actually useful even for debugging to still have this query if suboriginremotelsn is an InvalidXLogRecPtr? I think that this should have a comment of the kind "\n-- For b
Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
On Thu, Sep 21, 2023 at 05:50:26PM +1200, David Rowley wrote: > I've pushed your patch plus some additional code to escape the text > specified in --buffer-usage-limit before passing it to the server in > commit 5cfba1ad6 That was fast. If I may ask, why don't you have some regression tests for the two code paths of vacuumdb that append this option to the commands generated for VACUUM and ANALYZE? -- Michael signature.asc Description: PGP signature
Re: pg_upgrade and logical replication
On Wed, Sep 20, 2023 at 04:54:36PM +0530, Amit Kapila wrote: > Is the check to ensure remote_lsn is valid correct in function > check_for_subscription_state()? How about the case where the apply > worker didn't receive any change but just marked the relation as > 'ready'? I may be missing, of course, but a relation is switched to SUBREL_STATE_READY only once a sync happened and its state was SUBREL_STATE_SYNCDONE, implying that SubscriptionRelState->lsn is never InvalidXLogRecPtr, no? For instance, nothing happens when a Assert(!XLogRecPtrIsInvalid(rstate->lsn)) is added in process_syncing_tables_for_apply(). -- Michael signature.asc Description: PGP signature
Re: Comment about set_join_pathlist_hook()
On Thu, Sep 21, 2023, at 12:53 PM, Etsuro Fujita wrote: > Hi, > > On Thu, Sep 21, 2023 at 11:49 AM Lepikhov Andrei > wrote: >> On Wed, Sep 20, 2023, at 5:05 PM, Etsuro Fujita wrote: >> > What I am concerned about from the report [1] is that this comment is >> > a bit too terse; it might cause a misunderstanding that extensions can >> > do different things than we intend to allow: >> > >> > /* >> > * 6. Finally, give extensions a chance to manipulate the path list. >> > */ >> > if (set_join_pathlist_hook) >> > set_join_pathlist_hook(root, joinrel, outerrel, innerrel, >> >jointype, &extra); >> > >> > So I would like to propose to extend the comment to explain what they >> > can do, as in the comment about set_rel_pathlist_hook() in allpaths.c. >> > Attached is a patch for that. >> >> It makes sense. But why do you restrict addition to pathlist by only the >> add_path() routine? It can fail to add a path to the pathlist. We need to >> find out the result of the add_path operation and need to check the pathlist >> each time. So, sometimes, it can be better to add a path manually. > > I do not agree with you on this point; I think you can do so at your > own responsibility, but I think it is better for extensions to use > add_path(), because that makes them stable. (Assuming that add_path() > has a bug and we change the logic of it to fix the bug, extensions > that do not follow the standard procedure might not work anymore.) Ok, I got it.This question related to the add_path() interface itself, not to the comment. It is awkward to check every time in the pathlist the result of the add_path. >> One more slip-up could be prevented by the comment: removing a path from the >> pathlist we should remember about the cheapest_* pointers. > > Do we really need to do this? I mean we do set_cheapest() afterward. > See standard_join_search(). Agree, in the case of current join it doesn't make sense. I stuck in this situation because providing additional path at the current level I need to arrange paths for the inner and outer too. Thanks for the explanation! -- Regards, Andrei Lepikhov
Re: Unlogged relation copy is not fsync'd
On Fri, Sep 15, 2023 at 02:47:45PM +0300, Heikki Linnakangas wrote: > On 05/09/2023 21:20, Robert Haas wrote: > Thinking about this some more, I think this is still not 100% correct, even > with the patch I posted earlier: > > > /* > > * When we WAL-logged rel pages, we must nonetheless fsync them. The > > * reason is that since we're copying outside shared buffers, a > > CHECKPOINT > > * occurring during the copy has no way to flush the previously written > > * data to disk (indeed it won't know the new rel even exists). A crash > > * later on would replay WAL from the checkpoint, therefore it wouldn't > > * replay our earlier WAL entries. If we do not fsync those pages here, > > * they might still not be on disk when the crash occurs. > > */ > > if (use_wal || relpersistence == RELPERSISTENCE_UNLOGGED) > > smgrimmedsync(dst, forkNum); > > Let's walk through each case one by one: > > 1. Temporary table. No fsync() needed. This is correct. > > 2. Unlogged rel, main fork. Needs to be fsync'd, because we skipped WAL, and > also because we bypassed the buffer manager. Correct. Agreed. > 3. Unlogged rel, init fork. Needs to be fsync'd, even though we WAL-logged > it, because we bypassed the buffer manager. Like the comment explains. This > is now correct with the patch. This has two subtypes: 3a. Unlogged rel, init fork, use_wal (wal_level!=minimal). Matches what you wrote. 3b. Unlogged rel, init fork, !use_wal (wal_level==minimal). Needs to be fsync'd because we didn't WAL-log it and RelationCreateStorage() won't call AddPendingSync(). (RelationCreateStorage() could start calling AddPendingSync() for this case. I think we chose not to do that because there will never be additional writes to the init fork, and smgrDoPendingSyncs() would send the fork to FlushRelationsAllBuffers() even though the fork will never appear in shared buffers. On the other hand, grouping the sync with the other end-of-xact syncs could improve efficiency for some filesystems. Also, the number of distinguishable cases is unpleasantly high.) > 4. Permanent rel, use_wal == true. Needs to be fsync'd, even though we > WAL-logged it, because we bypassed the buffer manager. Like the comment > explains. Correct. > > 5. Permanent rel, use_wal == false. We skip fsync, because it means that > it's a new relation, so we have a sync request pending for it. (An assertion > for that would be nice). At the end of transaction, in smgrDoPendingSyncs(), > we will either fsync it or we will WAL-log all the pages if it was a small > relation. I believe this is *wrong*. If smgrDoPendingSyncs() decides to > WAL-log the pages, we have the same race condition that's explained in the > comment, because we bypassed the buffer manager: > > 1. RelationCopyStorage() copies some of the pages. > 2. Checkpoint happens, which fsyncs the relation (smgrcreate() registered a > dirty segment when the relation was created) > 3. RelationCopyStorage() copies the rest of the pages. > 4. smgrDoPendingSyncs() WAL-logs all the pages. smgrDoPendingSyncs() delegates to log_newpage_range(). log_newpage_range() loads each page into the buffer manager and calls MarkBufferDirty() on each. Hence, ... > 5. Another checkpoint happens. It does *not* fsync the relation. ... I think this will fsync the relation. No? > 6. Crash.
Re: pg_upgrade --check fails to warn about abstime
Thanks, Alvaro, for working on this. The patch looks good to me. + * similar to the above, but for types that were removed in 12. Comment can start with a capital letter. Also, We need to backport the same, right? On Wed, Sep 20, 2023 at 10:24 PM Alvaro Herrera wrote: > I got a complaint that pg_upgrade --check fails to raise red flags when > the source database contains type abstime when upgrading from pg11. The > type (along with reltime and tinterval) was removed by pg12. > > > In passing, while testing this, I noticed that the translation > infrastructure in pg_upgrade/util.c is broken: we do have the messages > in the translation catalog, but the translations for the messages from > prep_status are never displayed. So I made the quick hack of adding _() > around the fmt, and this was the result: > > Verificando Consistencia en Vivo en el Servidor Antiguo > --- > Verificando las versiones de los clústerséxito > Verificando que el usuario de base de datos es el usuario de > instalaciónéxito > Verificando los parámetros de conexión de bases de datoséxito > Verificando transacciones preparadas éxito > Verificando tipos compuestos definidos por el sistema en tablas de > usuarioéxito > Verificando tipos de datos reg* en datos de usuario éxito > Verificando contrib/isn con discordancia en mecanismo de paso de > bigintéxito > Checking for incompatible "aclitem" data type in user tables éxito > Checking for removed "abstime" data type in user tables éxito > Checking for removed "reltime" data type in user tables éxito > Checking for removed "tinterval" data type in user tables éxito > Verificando conversiones de codificación definidas por el usuarioéxito > Verificando operadores postfix definidos por el usuario éxito > Verificando funciones polimórficas incompatibles éxito > Verificando tablas WITH OIDS éxito > Verificando columnas de usuario del tipo «sql_identifier» éxito > Verificando la presencia de las bibliotecas requeridaséxito > Verificando que el usuario de base de datos es el usuario de > instalaciónéxito > Verificando transacciones preparadas éxito > Verificando los directorios de tablespaces para el nuevo clústeréxito > > Note how nicely they line up ... not. There is some code that claims to > do this correctly, but apparently it counts bytes, not characters, and > also it appears to be measuring the original rather than the > translation. > > I think we're trimming the strings in the wrong places. We need to > apply _() to the originals, not the trimmed ones. Anyway, clearly > nobody has looked at this very much. > > -- > Álvaro Herrera PostgreSQL Developer — > https://www.EnterpriseDB.com/ > "We’ve narrowed the problem down to the customer’s pants being in a > situation > of vigorous combustion" (Robert Haas, Postgres expert extraordinaire) > -- -- Thanks & Regards, Suraj kharage, edbpostgres.com
Re: Bug fix in vacuumdb --buffer-usage-limit xxx -Z
On Thu, 21 Sept 2023 at 17:59, Michael Paquier wrote: > That was fast. If I may ask, why don't you have some regression tests > for the two code paths of vacuumdb that append this option to the > commands generated for VACUUM and ANALYZE? I think we have differing standards for what constitutes as a useful test. For me, there would have to be a much higher likelihood of this ever getting broken again. I deem it pretty unlikely that someone will accidentally remove the code that I just committed and a test to test that vacuumdb -Z --buffer-usage-limit ... passes the BUFFER_USAGE_LIMIT option would likely just forever mark that we once had a trivial bug that forgot to include the --buffer-usage-limit when -Z was specified. If others feel strongly that a test is worthwhile, then I'll reconsider. David