Re: LSM tree for Postgres
On 09.08.2020 04:53, Alexander Korotkov wrote: I realize that it is not true LSM. But still I wan to notice that it is able to provide ~10 times increase of insert speed when size of index is comparable with RAM size. And "true LSM" from RocksDB shows similar results. It's very far from being shown. All the things you've shown is a naive benchmark. I don't object that your design can work out some cases. And it's great that we have the lsm3 extension now. But I think for PostgreSQL core we should think about better design. Sorry, I mean that at particular benchmark and hardware Lsm3 and RocksDB shows similar performance. It definitely doesn't mean that it will be true in all other cases. This is one of the reasons why I have published this Lsm3 and RockDB FDW extensions: anybody can try to test them at their workload. It will be very interesting to me to know this results, because I certainly understand that measuring of random insert performance in dummy table is not enough to make some conclusions. And I certainly do not want to say that we do not need "right" LSM implementation inside Postgres core. It just requires an order of magnitude more efforts. And there are many questions and challenges. For example Postgres buffer size (8kb) seems to be too small for LSM. Should LSM implementation bypass Postgres buffer cache? There pros and contras... Another issue is logging. Should we just log all operations with LSM in WAL in usual way (as it is done for nbtree and Lsm3)? It seems to me that for LSM alternative and more efficient solutions may be proposed. For example we may not log inserts in top index at all and just replay them during recovery, assuming that this operation with small index is fast enough. And merge of top index with base index can be done in atomic way and so also doesn't require WAL. As far as I know Anastasia Lubennikova several years ago has implemented LSM for Postgres. There was some performance issues (with concurrent access?). This is why the first thing I want to clarify for myself is what are the bottlenecks of LSM architecture and are them caused by LSM itself or its integration in Postgres infrastructure. I any case, before thinking about details of in-core LSM implementation for Postgres, I think that it is necessary to demonstrate workloads at which RocksDB (or any other existed DBMS with LSM) shows significant performance advantages comparing with Postgres with nbtree/Lsm3. May be if size of index will be 100 times larger then size of RAM, RocksDB will be significantly faster than Lsm3. But modern servers has 0.5-1Tb of RAM. Can't believe that there are databases with 100Tb indexes. Comparison of whole RAM size to single index size looks plain wrong for me. I think we can roughly compare whole RAM size to whole database size. But also not the whole RAM size is always available for caching data. Let's assume half of RAM is used for caching data. So, a modern server with 0.5-1Tb of RAM, which suffers from random B-tree insertions and badly needs LSM-like data-structure, runs a database of 25-50Tb. Frankly speaking, there is nothing counterintuitive for me. There is actually nothing counterintuitive. I just mean that there are not so much 25-50Tb OLTP databases.
Re: Unnecessary delay in streaming replication due to replay lag
On Sun, Aug 09, 2020 at 05:54:32AM +, Asim Praveen wrote: > I would like to revive this thready by submitting a rebased patch to > start streaming replication without waiting for startup process to > finish replaying all WAL. The start LSN for streaming is determined > to be the LSN that points to the beginning of the most recently > flushed WAL segment. > > The patch passes tests under src/test/recovery and top level “make check”. I have not really looked at the proposed patch, but it would be good to have some documentation. -- Michael signature.asc Description: PGP signature
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Sun, Apr 26, 2020 at 12:56:14PM -0500, Justin Pryzby wrote: > Rebased onto d12bdba77b0fce9df818bc84ad8b1d8e7a96614b > > Restored two tests from Alexey's original patch which exposed issue with > REINDEX DATABASE when allow_system_table_mods=off. I have been looking at 0001 as a start, and your patch is incorrect on a couple of aspects for the completion of REINDEX: - "(" is not proposed as a completion option after the initial REINDEX, and I think that it should. - completion gets incorrect for all the commands once a parenthesized list of options is present, as CONCURRENTLY goes missing. The presence of CONCURRENTLY makes the completion a bit more complex than the other commands, as we need to add this keyword if still not specified with the other objects of the wanted type to reindex, but it can be done as the attached. What do you think? -- Michael diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index c4af40bfa9..26a9b2a722 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3431,27 +3431,48 @@ psql_completion(const char *text, int start, int end) /* REINDEX */ else if (Matches("REINDEX")) + COMPLETE_WITH("TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", "("); + else if (Matches("REINDEX", "(*)")) COMPLETE_WITH("TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE"); - else if (Matches("REINDEX", "TABLE")) + else if (Matches("REINDEX", "TABLE") || + Matches("REINDEX", "(*)", "TABLE")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexables, " UNION SELECT 'CONCURRENTLY'"); - else if (Matches("REINDEX", "INDEX")) + else if (Matches("REINDEX", "INDEX") || + Matches("REINDEX", "(*)", "INDEX")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexes, " UNION SELECT 'CONCURRENTLY'"); - else if (Matches("REINDEX", "SCHEMA")) + else if (Matches("REINDEX", "SCHEMA") || + Matches("REINDEX", "(*)", "SCHEMA")) COMPLETE_WITH_QUERY(Query_for_list_of_schemas " UNION SELECT 'CONCURRENTLY'"); - else if (Matches("REINDEX", "SYSTEM|DATABASE")) + else if (Matches("REINDEX", "SYSTEM|DATABASE") || + Matches("REINDEX", "(*)", "DATABASE")) COMPLETE_WITH_QUERY(Query_for_list_of_databases " UNION SELECT 'CONCURRENTLY'"); - else if (Matches("REINDEX", "TABLE", "CONCURRENTLY")) + else if (Matches("REINDEX", "TABLE", "CONCURRENTLY") || + Matches("REINDEX", "(*)", "TABLE", "CONCURRENTLY")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexables, NULL); - else if (Matches("REINDEX", "INDEX", "CONCURRENTLY")) + else if (Matches("REINDEX", "INDEX", "CONCURRENTLY") || + Matches("REINDEX", "(*)", "INDEX", "CONCURRENTLY")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexes, NULL); - else if (Matches("REINDEX", "SCHEMA", "CONCURRENTLY")) + else if (Matches("REINDEX", "SCHEMA", "CONCURRENTLY") || + Matches("REINDEX", "(*)", "SCHEMA", "CONCURRENTLY")) COMPLETE_WITH_QUERY(Query_for_list_of_schemas); - else if (Matches("REINDEX", "SYSTEM|DATABASE", "CONCURRENTLY")) + else if (Matches("REINDEX", "SYSTEM|DATABASE", "CONCURRENTLY") || + Matches("REINDEX", "(*)", "SYSTEM|DATABASE", "CONCURRENTLY")) COMPLETE_WITH_QUERY(Query_for_list_of_databases); + else if (HeadMatches("REINDEX", "(*") && + !HeadMatches("REINDEX", "(*)")) + { + /* + * This fires if we're in an unfinished parenthesized option list. + * get_previous_words treats a completed parenthesized option list as + * one word, so the above test is correct. + */ + if (ends_with(prev_wd, '(') || ends_with(prev_wd, ',')) + COMPLETE_WITH("VERBOSE"); + } /* SECURITY LABEL */ else if (Matches("SECURITY")) signature.asc Description: PGP signature
Optimising latch signals
Hi hackers, Here are some more experimental patches to reduce system calls. 0001 skips sending signals when the recipient definitely isn't waiting, using a new flag-and-memory-barrier dance. This seems to skip around 12% of the kill() calls for "make check", and probably helps with some replication configurations that do a lot of signalling. Patch by Andres (with a small memory barrier adjustment by me). 0002 gets rid of the latch self-pipe on Linux systems. 0003 does the same on *BSD/macOS systems. The idea for 0002 and 0003 is to use a new dedicated signal just for latch wakeups, and keep it blocked (Linux) or ignored (BSD), except while waiting. There may be other ways to achieve this without bringing in a new signal, but it seemed important to leave SIGUSR1 unblocked for procsignals, and hard to figure out how to multiplex with existing SIGUSR2 users, so for the first attempt at prototyping this I arbitrarily chose SIGURG. From a45e9edaa315e62a0f823dbd55b42950fc8d5df3 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 8 Aug 2020 15:08:09 +1200 Subject: [PATCH 1/3] Optimize latches to send fewer signals. Author: Andres Freund --- src/backend/storage/ipc/latch.c | 26 ++ src/include/storage/latch.h | 1 + 2 files changed, 27 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 4153cc8557..37528f29b5 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -274,6 +274,7 @@ void InitLatch(Latch *latch) { latch->is_set = false; + latch->maybe_sleeping = false; latch->owner_pid = MyProcPid; latch->is_shared = false; @@ -321,6 +322,7 @@ InitSharedLatch(Latch *latch) #endif latch->is_set = false; + latch->maybe_sleeping = false; latch->owner_pid = 0; latch->is_shared = true; } @@ -523,6 +525,10 @@ SetLatch(Latch *latch) latch->is_set = true; + pg_memory_barrier(); + if (!latch->maybe_sleeping) + return; + #ifndef WIN32 /* @@ -589,6 +595,7 @@ ResetLatch(Latch *latch) { /* Only the owner should reset the latch */ Assert(latch->owner_pid == MyProcPid); + Assert(latch->maybe_sleeping == false); latch->is_set = false; @@ -1273,6 +1280,14 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * ordering, so that we cannot miss seeing is_set if a notification * has already been queued. */ + if (set->latch && !set->latch->is_set) + { + /* about to sleep on a latch */ + set->latch->maybe_sleeping = true; + pg_memory_barrier(); + /* and recheck */ + } + if (set->latch && set->latch->is_set) { occurred_events->fd = PGINVALID_SOCKET; @@ -1283,6 +1298,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout, occurred_events++; returned_events++; + /* could have been set above */ + set->latch->maybe_sleeping = false; + break; } @@ -1294,6 +1312,12 @@ WaitEventSetWait(WaitEventSet *set, long timeout, rc = WaitEventSetWaitBlock(set, cur_timeout, occurred_events, nevents); + if (set->latch) + { + Assert(set->latch->maybe_sleeping); + set->latch->maybe_sleeping = false; + } + if (rc == -1) break;/* timeout occurred */ else @@ -1383,6 +1407,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, if (cur_event->events == WL_LATCH_SET && cur_epoll_event->events & (EPOLLIN | EPOLLERR | EPOLLHUP)) { + Assert(set->latch); + /* There's data in the self-pipe, clear it. */ drainSelfPipe(); diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 7c742021fb..d0da7e5d31 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -110,6 +110,7 @@ typedef struct Latch { sig_atomic_t is_set; + sig_atomic_t maybe_sleeping; bool is_shared; int owner_pid; #ifdef WIN32 -- 2.20.1 From 0cac26c1499fffa2a6f4e71f59e7a4e9a31be833 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 8 Aug 2020 16:32:22 +1200 Subject: [PATCH 2/3] Remove self-pipe in WAIT_USE_EPOLL builds. Use SIGURG instead of SIGUSR1 for latch wake-ups, and unblock it only while in epoll_pwait() to avoid the race traditionally solved by the self-pipe trick. --- src/backend/libpq/pqsignal.c| 13 +++- src/backend/postmaster/postmaster.c | 4 +- src/backend/storage/ipc/latch.c | 97 +++-- src/backend/tcop/postgres.c | 2 +- src/include/libpq/pqsignal.h| 3 +- src/include/storage/latch.h | 2 + 6 files changed, 96 insertions(+), 25 deletions(-) diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c index 9289493118..13ec3cd533 100644 --- a/src/backend/libpq/pqsignal.c +++ b/src/backend/libpq/pqsignal.c @@ -20,6 +20,7 @@ /* Global variables */ sigset_t UnBlockSig, + UnBlockAllSig, BlockSig, StartupBlockSig; @@ -35,13 +36,21 @@ sigset_t UnBlockSig, * collection; it's essentially BlockSig minus SIGTERM, SIGQUIT, SIGALRM. * * UnBlockSig is the set of signals
tab completion of IMPORT FOREIGN SCHEMA
I use IMPORT FOREIGN SCHEMA a bit to set up systems for testing. But not enough that I can ever remember whether INTO or FROM SERVER comes first in the syntax. Here is an improvement to the tab completion, so I don't have to keep looking it up in the docs. It accidentally (even before this patch) completes "IMPORT FOREIGN SCHEMA" with a list of local schemas. This is probably wrong, but I find this convenient as I often do this in a loop-back setup where the list of foreign schema would be the same as the local ones. So I don't countermand that behavior here. Cheers, Jeff foreign_schema_tab_complete.patch Description: Binary data
Re: tab completion of IMPORT FOREIGN SCHEMA
Jeff Janes writes: > It accidentally (even before this patch) completes "IMPORT FOREIGN SCHEMA" > with a list of local schemas. This is probably wrong, but I find this > convenient as I often do this in a loop-back setup where the list of > foreign schema would be the same as the local ones. So I don't countermand > that behavior here. I don't see how psql could obtain a "real" list of foreign schemas from an arbitrary FDW, even if it magically knew which server the user would specify later in the command. So this behavior seems fine. It has some usefulness, while not completing at all would have none. It might be a good idea to figure out where that completion is happening and annotate it about this point. regards, tom lane
Re: 回复:how to create index concurrently on partitioned table
Thanks for looking. On Sun, Aug 09, 2020 at 02:00:09PM +0900, Michael Paquier wrote: > > exactly what's needed. > > For now, I would recommend to focus first on 0001 to add support for > partitioned tables and indexes to REINDEX. CIC is much more > complicated btw, but I am not entering in the details now. > > + /* Avoid erroring out */ > if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > { > This comment does not help, and actually this becomes incorrect as > reindex for this relkind becomes supported once 0001 is done. I made a minimal change to avoid forgetting to eventually change that part. > ReindexPartitionedRel() fails to consider the concurrent case here for > partition indexes and tables, as reindex_index()/reindex_relation() > are the APIs used in the non-concurrent case. Once you consider the > concurrent case correctly, we also need to be careful with partitions > that have a temporary persistency (note that we don't allow partition > trees to mix persistency types, all partitions have to be temporary or > permanent). Fixed. > I think that you are right to make the entry point to handle > partitioned index in ReindexIndex() and partitioned table in > ReindexTable(), but the structure of the patch should be different: > - The second portion of ReindexMultipleTables() should be moved into a > separate routine, taking in input a list of relation OIDs. This needs > to be extended a bit so as reindex_index() gets called for an index > relkind if the relpersistence is temporary or if we have a > non-concurrent reindex. The idea is that we finish with a single code > path able to work on a list of relations. And your patch adds more of > that as of ReindexPartitionedRel(). It's a good idea. > - We should *not* handle directly partitioned index and/or table in > ReindexRelationConcurrently() to not complicate the logic where we > gather all the indexes of a table/matview. So I think that the list > of partition indexes/tables to work on should be built directly in > ReindexIndex() and ReindexTable(), and then this should call the > second part of ReindexMultipleTables() refactored in the previous > point. I think I addressed these mostly as you intended. > This way, each partition index gets done individually in its > own transaction. For a partition table, all indexes of this partition > are rebuilt in the same set of transactions. For the concurrent case, > we have already reindex_concurrently_swap that it able to switch the > dependencies of two indexes within a partition tree, so we can rely on > that so as a failure in the middle of the operation never leaves the > a partition structure in an inconsistent state. -- Justin >From 25486d56303de512368f322c87528c2be29af0de Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 7 Jun 2020 15:59:54 -0500 Subject: [PATCH v5 1/3] Implement REINDEX of partitioned tables/indexes --- doc/src/sgml/ref/reindex.sgml | 5 - src/backend/catalog/index.c| 8 +- src/backend/commands/indexcmds.c | 137 ++--- src/backend/tcop/utility.c | 6 +- src/include/commands/defrem.h | 6 +- src/test/regress/expected/create_index.out | 18 +-- src/test/regress/sql/create_index.sql | 13 +- 7 files changed, 122 insertions(+), 71 deletions(-) diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index aac5d5be23..e2e32b3ba0 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -258,11 +258,6 @@ REINDEX [ ( option [, ...] ) ] { IN can always reindex anything. - - Reindexing partitioned tables or partitioned indexes is not supported. - Each individual partition can be reindexed separately instead. - - Rebuilding Indexes Concurrently diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1be27eec52..f69f027890 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3660,12 +3660,8 @@ reindex_relation(Oid relid, int flags, int options) */ rel = table_open(relid, ShareLock); - /* - * This may be useful when implemented someday; but that day is not today. - * For now, avoid erroring out when called in a multi-table context - * (REINDEX SCHEMA) and happen to come across a partitioned table. The - * partitions may be reindexed on their own anyway. - */ + /* Skip partitioned tables if called in a multi-table context. The + * partitions are reindexed on their own. */ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { ereport(WARNING, diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 2baca12c5f..68d75916ae 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -88,7 +88,10 @@ static List *ChooseIndexColumnNames(List *indexElems); static void RangeVarCallbackForReindexIndex(const RangeVar *relation, Oid relId, Oid oldRelId, v
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Sun, Aug 09, 2020 at 08:02:52PM +0900, Michael Paquier wrote: > I have been looking at 0001 as a start, and your patch is incorrect on > a couple of aspects for the completion of REINDEX: > - "(" is not proposed as a completion option after the initial > REINDEX, and I think that it should. That part of your patch handles REINDEX and REINDEX(*) differently than mine. Yours is technically more correct/complete. But, I recall Tom objected a different patch because of completing to a single char. I think the case is arguable either way: if only some completions are shown, then it hides the others.. https://www.postgresql.org/message-id/14255.1536781...@sss.pgh.pa.us - else if (Matches("REINDEX") || Matches("REINDEX", "(*)")) + else if (Matches("REINDEX")) + COMPLETE_WITH("TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", "("); + else if (Matches("REINDEX", "(*)")) COMPLETE_WITH("TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE"); > - completion gets incorrect for all the commands once a parenthesized > list of options is present, as CONCURRENTLY goes missing. The rest of your patch looks fine. In my mind, REINDEX(CONCURRENTLY) was the "new way" to write things, and it's what's easy to support, so I think I didn't put special effort into making tab completion itself complete. -- Justin
Re: Unnecessary delay in streaming replication due to replay lag
> On 09-Aug-2020, at 2:11 PM, Michael Paquier wrote: > > I have not really looked at the proposed patch, but it would be good > to have some documentation. > Ah, right. The basic idea is to reuse the logic to allow read-only connections to also start WAL streaming. The patch borrows a new GUC “wal_receiver_start_condition” introduced by another patch alluded to upthread. It affects when to start WAL receiver process on a standby. By default, the GUC is set to “replay”, which means no change in current behavior - WAL receiver is started only after replaying all WAL already available in pg_wal. When set to “consistency”, WAL receiver process is started earlier, as soon as consistent state is reached during WAL replay. The LSN where to start streaming from is determined to be the LSN that points at the beginning of the WAL segment file that was most recently flushed in pg_wal. To find the most recently flushed WAL segment, first blocks of all WAL segment files in pg_wal, starting from the segment that contains currently replayed record, are inspected. The search stops when a first page with no valid header is found. The benefits of starting WAL receiver early are mentioned upthread but allow me to reiterate: as WAL streaming starts, any commits that are waiting for synchronous replication on the master are unblocked. The benefit of this is apparent in situations where significant replay lag has been built up and the replication is configured to be synchronous. Asim
Re: display offset along with block number in vacuum errors
On Fri, 7 Aug 2020 at 13:04, Amit Kapila wrote: > > On Fri, Aug 7, 2020 at 8:10 AM Masahiko Sawada > wrote: > > > > On Fri, 7 Aug 2020 at 10:49, Amit Kapila wrote: > > > > > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby wrote: > > > > > > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby > > > > > wrote: > > > > > > > > > > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes > > > > > > it to > > > > > > update vacrelstats. I think it should do what > > > > > > lazy_{vacuum/cleanup}_heap/page/index do and call > > > > > > update_vacuum_error_info() at > > > > > > its beginning (even though only the offset is changed), and then > > > > > > restore_vacuum_error_info() at its end (to "revert back" to the > > > > > > item number it > > > > > > started with). > > > > > > > > > > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > > > > not convinced that it is a good idea to sprinkle > > > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > > > > than required. I see that it might look a bit clean from the > > > > > perspective that if tomorrow we use the function > > > > > lazy_check_needs_freeze() for a different purpose then we don't need > > > > > to worry about the wrong phase information. If we are worried about > > > > > that then we should have an assert in that function to ensure that the > > > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > > > > > > > The motivation was to restore the offnum, which is set to Invalid at > > > > the start > > > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, > > > > but > > > > should be restored or re-set to Invalid when returns to > > > > lazy_scan_heap(). If > > > > you think it's important, we could just set vacrelstats->offnum = > > > > Invalid > > > > before returning, > > > > > > > > > > Yeah, I would prefer that and probably a comment to indicate why we > > > are doing that. > > > > +1 > > > > I'm concerned that we call the update and restore in > > heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this > > function is called for every vacuumed page. I commented that if we > > want to update the offset number during iterating tuples in the > > function we should change the phase to SCAN_HEAP at the beginning of > > the function because it's actually not vacuuming. > > > > AFAICS, heap_page_is_all_visible() is called from only one place and > that is lazy_vacuum_page(), so I think if there is any error in > heap_page_is_all_visible(), it should be considered as VACUUM_HEAP > phase error. It's true that heap_page_is_all_visible() is called from only lazy_vacuum_page() but I'm concerned it would lead misleading since it's not actually removing tuples but just checking after vacuum. I guess that the errcontext should show what the process is actually doing and therefore help the investigation, so I thought VACUUM_HEAP might not be appropriate for this case. But I see also your point. Other vacuum error context phases match with vacuum progress information phrases. So in heap_page_is_all_visible (), I agree it's better to update the offset number and keep the phase VACUUM_HEAP rather than do nothing. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add information to rm_redo_error_callback()
On Wed, 5 Aug 2020 at 00:37, Drouvot, Bertrand wrote: > > Hi hackers, > > I've attached a small patch to add information to rm_redo_error_callback(). > > The changes attached in this patch came while working on the "Add > information during standby recovery conflicts" patch (See [1]). > > The goal is to add more information during the callback (if doable), so > that something like: > > 2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for > Heap2/CLEAN: remxid 1168 > > would get extra information that way: > > 2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for > Heap2/CLEAN: remxid 1168, blkref #0: rel 1663/13586/16850 fork main blk 0 > > As this could be useful outside of [1], a dedicated "sub" patch has been > created (thanks Sawada for the suggestion). > > I will add this patch to the next commitfest. I look forward to your > feedback about the idea and/or implementation. > Thank you for starting the new thread for this patch! I think this patch is simple enough and improves information shown in errcontext. I have two comments on the patch: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 756b838e6a..8b2024e9e9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11749,10 +11749,22 @@ rm_redo_error_callback(void *arg) { XLogReaderState *record = (XLogReaderState *) arg; StringInfoData buf; + int block_id; + RelFileNode rnode; + ForkNumber forknum; + BlockNumber blknum; initStringInfo(&buf); xlog_outdesc(&buf, record); + for (block_id = 0; block_id <= record->max_block_id; block_id++) + { + if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum)) + appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u", + block_id, rnode.spcNode, rnode.dbNode, + rnode.relNode, forkNames[forknum], + blknum); + } /* translator: %s is a WAL record description */ errcontext("WAL redo at %X/%X for %s", (uint32) (record->ReadRecPtr >> 32), rnode, forknum and blknum can be declared within the for loop. I think it's better to put a new line just before the comment starting from "translator:". Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...
On 2020-07-07 01:08, Tom Lane wrote: Alvaro Herrera writes: On 2020-Jul-05, Anna Akenteva wrote: -- Swapping primary key's index for an equivalent index, -- but with INCLUDE-d attributes. CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info); ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX new_idx; ALTER TABLE referencing_tbl ALTER CONSTRAINT referencing_tbl_id_ref_fkey USING INDEX new_idx; How is this state represented by pg_dump? Even if it's possible to represent, I think we should flat out reject this "feature". Primary keys that aren't primary keys don't seem like a good idea. For one thing, it won't be possible to describe the constraint accurately in the information_schema. Do you think it could still be a good idea if we only swap the relfilenodes of indexes, as it was suggested in [1]? The original use case was getting rid of index bloat, which is now solved by REINDEX CONCURRENTLY, but this feature still has its own use case of adding INCLUDE-d columns to constraint indexes. [1] https://www.postgresql.org/message-id/flat/CABwTF4UxTg%2BkERo1Nd4dt%2BH2miJoLPcASMFecS1-XHijABOpPg%40mail.gmail.com -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Unnecessary delay in streaming replication due to replay lag
On Sun, 9 Aug 2020 at 14:54, Asim Praveen wrote: > > I would like to revive this thready by submitting a rebased patch to start > streaming replication without waiting for startup process to finish replaying > all WAL. The start LSN for streaming is determined to be the LSN that points > to the beginning of the most recently flushed WAL segment. > > The patch passes tests under src/test/recovery and top level “make check”. > The patch can be applied cleanly to the current HEAD but I got the error on building the code with this patch: xlog.c: In function 'StartupXLOG': xlog.c:7315:6: error: too few arguments to function 'RequestXLogStreaming' 7315 | RequestXLogStreaming(ThisTimeLineID, | ^~~~ In file included from xlog.c:59: ../../../../src/include/replication/walreceiver.h:463:13: note: declared here 463 | extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, | ^~~~ cfbot also complaints this. Could you please update the patch? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services