Re: pg_basebackup has an accidentaly separated help message
At Mon, 25 Dec 2023 15:42:41 +0900, Michael Paquier wrote in > On Mon, Dec 25, 2023 at 02:39:16PM +0900, Kyotaro Horiguchi wrote: > > The attached patch contains both of the above fixes. > > Good catches, let's fix them. You have noticed that while translating > these new messages, I guess? Yes. So, it turns out that they're found after they have been committed. Because handling a large volume of translations all at once is daunting, I am maintaining translations locally to avoid that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On 25.12.2023 02:38, Alexander Korotkov wrote: Pushed! Thanks a lot! With the best regards! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Trigger violates foreign key constraint
On 22.12.2023 14:39, Laurenz Albe wrote: Yes, that is better - shorter and avoids passive mode. Changed. Thanks. Also I don't really like "This is not considered a bug" part, since it looks like an excuse. In a way, it is an excuse, so why not be honest about it. I still think that the "this is not a bug" style is not suitable for documentation. But I checked the documentation and found 3-4 more such places. Therefore, it's ok from me, especially since it really looks like a bug. The example you provided in your other message (cascading triggers fail if the table ovner has revoked the required permissions from herself) is not really about breaking foreign keys. You hit a surprising error, but referential integrity will be maintained. Yes, referential integrity will be maintained. But perhaps it is worth adding a section to the documentation about system triggers that are used to implement foreign keys. Now it is not mentioned anywhere, but questions and problems arise from time to time. Such a section named "System triggers" may be added as a separate chapter to Part VII. Internals or as a subsection to Chapter 38.Triggers. I thought about this after your recent excellent article [1], which has an introduction to system triggers. This does not negate the need for the patch being discussed. Patch v3 is attached. For me, it is ready for committer. 1. https://www.cybertec-postgresql.com/en/broken-foreign-keys-postgresql/ -- Pavel Luzanov Postgres Professional:https://postgrespro.com
Proposal to include --exclude-extension Flag in pg_dump
Hi PostgreSQL Community, Recently I have been working on pg_dump regarding my project and wanted to exclude an extension from the dump generated. I wonder why it doesn't have --exclude-extension type of support whereas --extension exists! Since I needed that support, I took the initiative to contribute to the community by adding the --exclude-extension flag. Attached is the patch for the same. Looking forward to your feedback. Regards Ayush Vatsa Amazon Web services (AWS) v1-0001-Add-support-for-exclude-extension-in-pg_dump.patch Description: Binary data
Re: Proposal to include --exclude-extension Flag in pg_dump
Added a CF entry for the same https://commitfest.postgresql.org/46/4721/ Regards Ayush Vatsa Amazon Web Services (AWS) On Mon, 25 Dec 2023 at 15:48, Ayush Vatsa wrote: > Hi PostgreSQL Community, > Recently I have been working on pg_dump regarding my project and wanted to > exclude an extension from the dump generated. I wonder why it doesn't have > --exclude-extension type of support whereas --extension exists! > Since I needed that support, I took the initiative to contribute to the > community by adding the --exclude-extension flag. > Attached is the patch for the same. Looking forward to your feedback. > > Regards > Ayush Vatsa > Amazon Web services (AWS) >
Re: Proposal to include --exclude-extension Flag in pg_dump
Hi On Mon, Dec 25, 2023 at 6:22 PM Ayush Vatsa wrote: > > Added a CF entry for the same https://commitfest.postgresql.org/46/4721/ > > Regards > Ayush Vatsa > Amazon Web Services (AWS) > > On Mon, 25 Dec 2023 at 15:48, Ayush Vatsa wrote: >> >> Hi PostgreSQL Community, >> Recently I have been working on pg_dump regarding my project and wanted to >> exclude an extension from the dump generated. I wonder why it doesn't have >> --exclude-extension type of support whereas --extension exists! >> Since I needed that support, I took the initiative to contribute to the >> community by adding the --exclude-extension flag. >> Attached is the patch for the same. Looking forward to your feedback. >> >> Regards >> Ayush Vatsa >> Amazon Web services (AWS) printf(_(" -e, --extension=PATTERN dump the specified extension(s) only\n")); + printf(_(" --exclude-extension=PATTERN do NOT dump the specified extension(s)\n")); printf(_(" -E, --encoding=ENCODING dump the data in encoding ENCODING\n")); long options should not mess with short options, does the following make sense to you? printf(_(" --enable-row-security enable row security (dump only content user has\n" "access to)\n")); + printf(_(" --exclude-extension=PATTERN do NOT dump the specified extension(s)\n")); printf(_(" --exclude-table-and-children=PATTERN\n" -- Regards Junwang Zhao
Re: Avoid computing ORDER BY junk columns unnecessarily
On Sat, Dec 23, 2023 at 1:32 AM Tom Lane wrote: > Heikki Linnakangas writes: > > On 22/12/2023 17:24, Tom Lane wrote: > >> How much of your patchset still makes sense if we assume that we > >> can always extract the ORDER BY column values from the index? > > > That would make it much less interesting. But I don't think that's a > > good assumption. Especially in the kNN case, the ORDER BY value would > > not be stored in the index. Most likely the index needs to calculate it > > in some form, but it might take shortcuts like avoiding the sqrt(). > > Yeah, fair point. I'll try to take a look at your patchset after > the holidays. Agreed. I haven't looked into these patches, but it seems that there is an issue with how the targetlist is handled for foreign rels. The following test case for postgres_fdw hits the Assert in apply_tlist_labeling(). contrib_regression=# SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; server closed the connection unexpectedly When we create foreign final path in add_foreign_final_paths(), we use root->upper_targets[UPPERREL_FINAL] as the PathTarget. This PathTarget is set in grouping_planner(), without removing the junk columns. I think this is why the above query hits the Assert. Thanks Richard
Re: Optimization outcome depends on the index order
On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov wrote: > On 22/12/2023 11:48, Alexander Korotkov wrote: > > > Because we must trust all predictions made by the planner, we just > > > choose the most trustworthy path. According to the planner logic, it is > > > a path with a smaller selectivity. We can make mistakes anyway just > > > because of the nature of estimation. > > > > Even if we need to take selectivity into account here, it's still not > > clear why this should be on top of other logic later in add_path(). > I got your point now, thanks for pointing it out. In the next version of > the patch selectivity is used as a criteria only in the case of COSTS_EQUAL. It looks better now. But it's hard for me to judge these heuristics in add_path(). Tom, what do you think about this? -- Regards, Alexander Korotkov
Re: Proposal to include --exclude-extension Flag in pg_dump
Hi, > long options should not mess with short options, does the following > make sense to you? Yes that makes sense, a reason to keep them together is that they are of the same kind But I will update the patch accordingly. One more thing I wanted to ask is, Should I separate them in the pg_dump.sgml file too? Like writing documentation of --exclude-extension with other long options? After your feedback I will update on both places in a single patch Regards, Ayush Vatsa Amazon Web Services (AWS) On Mon, 25 Dec 2023 at 16:28, Junwang Zhao wrote: > Hi > > On Mon, Dec 25, 2023 at 6:22 PM Ayush Vatsa > wrote: > > > > Added a CF entry for the same https://commitfest.postgresql.org/46/4721/ > > > > Regards > > Ayush Vatsa > > Amazon Web Services (AWS) > > > > On Mon, 25 Dec 2023 at 15:48, Ayush Vatsa > wrote: > >> > >> Hi PostgreSQL Community, > >> Recently I have been working on pg_dump regarding my project and wanted > to exclude an extension from the dump generated. I wonder why it doesn't > have --exclude-extension type of support whereas --extension exists! > >> Since I needed that support, I took the initiative to contribute to the > community by adding the --exclude-extension flag. > >> Attached is the patch for the same. Looking forward to your feedback. > >> > >> Regards > >> Ayush Vatsa > >> Amazon Web services (AWS) > > printf(_(" -e, --extension=PATTERN dump the specified > extension(s) only\n")); > + printf(_(" --exclude-extension=PATTERN do NOT dump the specified > extension(s)\n")); > printf(_(" -E, --encoding=ENCODING dump the data in encoding > ENCODING\n")); > > long options should not mess with short options, does the following > make sense to you? > > printf(_(" --enable-row-security enable row security (dump only > content user has\n" > "access to)\n")); > + printf(_(" --exclude-extension=PATTERN do NOT dump the specified > extension(s)\n")); > printf(_(" --exclude-table-and-children=PATTERN\n" > > -- > Regards > Junwang Zhao >
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On Mon, Dec 18, 2023 at 3:01 PM Peter Eisentraut wrote: > On 11.12.23 13:22, Amul Sul wrote: > > > > create table t1 (a int, b int generated always as (a + 1) stored); > > alter table t1 add column c int, alter column b set expression as (a > > + c); > > ERROR: 42703: column "c" does not exist > > > > I think intuitively, this ought to work. Maybe just moving the new > > pass > > after AT_PASS_ADD_COL would do it. > > > > > > I think we can't support that (like alter type) since we need to place > > this new > > pass before AT_PASS_OLD_INDEX & AT_PASS_OLD_CONSTR to re-add indexes and > > constraints for the validation. > > Could we have AT_PASS_ADD_COL before AT_PASS_OLD_*? So overall it would be > > ... > AT_PASS_ALTER_TYPE, > AT_PASS_ADD_COL, // moved > AT_PASS_SET_EXPRESSION, // new > AT_PASS_OLD_INDEX, > AT_PASS_OLD_CONSTR, > AT_PASS_ADD_CONSTR, > ... > > This appears to not break any existing tests. > (Sorry, for the delay) Agree. I did that change in 0001 patch. Regards, Amul v7-0001-Code-refactor-convert-macro-listing-to-enum.patch Description: Binary data v7-0002-Code-refactor-separate-function-to-find-all-depen.patch Description: Binary data v7-0003-Allow-to-change-generated-column-expression.patch Description: Binary data
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for the review and feedback on your previous reply! On Mon, 25 Dec 2023 at 09:40, Michael Paquier wrote: > > On Mon, Dec 25, 2023 at 03:20:58PM +0900, Michael Paquier wrote: > > pgstat_tracks_io_bktype() does not look correct to me. Why is the WAL > > receiver considered as something correct in the list of backend types, > > while the intention is to *not* add it to pg_stat_io? I have tried to > > switche to the correct behavior of returning false for a > > B_WAL_RECEIVER, to notice that pg_rewind's test 002_databases.pl > > freezes on its shutdown sequence. Something weird is going on here. > > Could you look at it? See the XXX comment in the attached, which is > > the same behavior as v6-0002. It looks to me that the patch has > > introduced an infinite loop tweaking pgstat_tracks_io_bktype() in an > > incorrect way to avoid the root issue. > > Ah, that's because it would trigger an assertion failure: > TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object, > io_context, io_op)"), File: "pgstat_io.c", Line: 89, PID: 6824 > postgres: standby_local: walreceiver > (ExceptionalCondition+0xa8)[0x560d1b4dd38a] > > And the backtrace just tells that this is the WAL receiver > initializing a WAL segment: > #5 0x560d1b3322c8 in pgstat_count_io_op_n > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > cnt=1) at pgstat_io.c:89 > #6 0x560d1b33254a in pgstat_count_io_op_time > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > start_time=..., cnt=1) at pgstat_io.c:181 > #7 0x560d1ae7f932 in XLogFileInitInternal (logsegno=3, logtli=1, > added=0x7ffd2733c6eb, path=0x7ffd2733c2e0 "pg_wal/0001", '0' > , "3") at xlog.c:3115 > #8 0x560d1ae7fc4e in XLogFileInit (logsegno=3, logtli=1) at > xlog.c:3215 Correct. > > Wouldn't it be simpler to just bite the bullet in this case and handle > WAL receivers in the IO tracking? There is one problem and I couldn't decide how to solve it. We need to handle read IO in WALRead() in xlogreader.c. How many bytes the WALRead() function will read is controlled by a variable and it can be different from XLOG_BLCKSZ. This is a problem because pg_stat_io's op_bytes column is a constant. Here are all WALRead() function calls: 1- read_local_xlog_page_guts() in xlogutils.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 2- summarizer_read_local_xlog_page() in walsummarizer.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 3- logical_read_xlog_page() in walsender.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 4- XLogSendPhysical() in walsender.c => WALRead(nbytes) => nbytes can be different from XLOG_BLCKSZ. 5- WALDumpReadPage() in pg_waldump.c => WALRead(count) => count can be different from XLOG_BLCKSZ. 4 and 5 are the problematic calls. Melanie's answer to this problem on previous discussions: On Wed, 9 Aug 2023 at 21:52, Melanie Plageman wrote: > > If there is any combination of BackendType and IOContext which will > always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's > op_bytes. For other cases, we may have to consider using op_bytes 1 and > tracking reads and write IOOps in number of bytes (instead of number of > pages). I don't actually know if there is a clear separation by > BackendType for these different cases. Using op_bytes as 1 solves this problem but since it will be different from the rest of the pg_stat_io view it could be hard to understand. There is no clear separation by backends as it can be seen from the walsender. > > The other alternative I see is to use XLOG_BLCKSZ as the op_bytes and > treat op_bytes * number of reads as an approximation of the number of > bytes read. I don't actually know what makes more sense. I don't think I > would like having a number for bytes that is not accurate. Also, we have a similar problem in XLogPageRead() in xlogrecovery.c. pg_pread() call tries to read XLOG_BLCKSZ but it is not certain and we don't count IO if it couldn't read XLOG_BLCKSZ. IMO, this is not as important as the previous problem but it still is a problem. I would be glad to hear opinions on these problems. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Hello! It seems like the idea of "old" snapshot is still a valid one. > Should this deal with any potential XID wraparound, too? As far as I understand in our case, we are not affected by this in any way. Vacuum in our table is not possible because of locking, so, nothing may be frozen (see below). In the case of super long index building, transactional limits will stop new connections using current regular infrastructure because it is based on relation data (but not actual xmin of backends). > How does this behave when the newly inserted tuple's xmin gets frozen? > This would be allowed to happen during heap page pruning, afaik - no > rules that I know of which are against that - but it would create > issues where normal snapshot visibility rules would indicate it > visible to both snapshots regardless of whether it actually was > visible to the older snapshot when that snapshot was created... As I can see, heap_page_prune never freezes any tuples. In the case of regular vacuum, it used this way: call heap_page_prune and then call heap_prepare_freeze_tuple and then heap_freeze_execute_prepared. Merry Christmas, Mikhail.
Re: apply pragma system_header to python headers
I wrote: > Probably a better way is to invent a separate header "plpython_system.h" > that just includes the Python headers, to scope the pragma precisely. > (I guess it could have the fixup #defines we're wrapping those headers > in, too.) > The same idea would work in plperl. After updating one of my test machines to Fedora 39, I'm seeing these Python warnings too. So here's a fleshed-out patch proposal for doing it like that. regards, tom lane diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile index 975f540b3e..31fb212bf1 100644 --- a/src/pl/plperl/GNUmakefile +++ b/src/pl/plperl/GNUmakefile @@ -108,11 +108,11 @@ uninstall: uninstall-lib uninstall-data install-data: installdirs $(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) '$(DESTDIR)$(datadir)/extension/' - $(INSTALL_DATA) $(srcdir)/plperl.h $(srcdir)/ppport.h '$(DESTDIR)$(includedir_server)' + $(INSTALL_DATA) $(srcdir)/plperl.h $(srcdir)/plperl_system.h $(srcdir)/ppport.h '$(DESTDIR)$(includedir_server)' uninstall-data: rm -f $(addprefix '$(DESTDIR)$(datadir)/extension'/, $(notdir $(DATA))) - rm -f $(addprefix '$(DESTDIR)$(includedir_server)'/, plperl.h ppport.h) + rm -f $(addprefix '$(DESTDIR)$(includedir_server)'/, plperl.h plperl_system.h ppport.h) .PHONY: install-data uninstall-data diff --git a/src/pl/plperl/meson.build b/src/pl/plperl/meson.build index 6a28f40a11..182e0fe169 100644 --- a/src/pl/plperl/meson.build +++ b/src/pl/plperl/meson.build @@ -71,6 +71,7 @@ install_data( install_headers( 'plperl.h', + 'plperl_system.h', 'ppport.h', install_dir: dir_include_server, ) diff --git a/src/pl/plperl/plperl.h b/src/pl/plperl/plperl.h index 5d8e51089a..7f338084a9 100644 --- a/src/pl/plperl/plperl.h +++ b/src/pl/plperl/plperl.h @@ -18,200 +18,11 @@ /* defines free() by way of system headers, so must be included before perl.h */ #include "mb/pg_wchar.h" -/* stop perl headers from hijacking stdio and other stuff on Windows */ -#ifdef WIN32 -#define WIN32IO_IS_STDIO -#endif /* WIN32 */ - /* - * Supply a value of PERL_UNUSED_DECL that will satisfy gcc - the one - * perl itself supplies doesn't seem to. + * Pull in Perl headers via a wrapper header, to control the scope of + * the system_header pragma therein. */ -#define PERL_UNUSED_DECL pg_attribute_unused() - -/* - * Sometimes perl carefully scribbles on our *printf macros. - * So we undefine them here and redefine them after it's done its dirty deed. - */ -#undef vsnprintf -#undef snprintf -#undef vsprintf -#undef sprintf -#undef vfprintf -#undef fprintf -#undef vprintf -#undef printf - -/* - * Perl scribbles on the "_" macro too. - */ -#undef _ - -/* - * ActivePerl 5.18 and later are MinGW-built, and their headers use GCC's - * __inline__. Translate to something MSVC recognizes. Also, perl.h sometimes - * defines isnan, so undefine it here and put back the definition later if - * perl.h doesn't. - */ -#ifdef _MSC_VER -#define __inline__ inline -#ifdef isnan -#undef isnan -#endif -/* Work around for using MSVC and Strawberry Perl >= 5.30. */ -#define __builtin_expect(expr, val) (expr) -#endif - -/* - * Regarding bool, both PostgreSQL and Perl might use stdbool.h or not, - * depending on configuration. If both agree, things are relatively harmless. - * If not, things get tricky. If PostgreSQL does but Perl does not, define - * HAS_BOOL here so that Perl does not redefine bool; this avoids compiler - * warnings. If PostgreSQL does not but Perl does, we need to undefine bool - * after we include the Perl headers; see below. - */ -#ifdef PG_USE_STDBOOL -#define HAS_BOOL 1 -#endif - -/* - * Newer versions of the perl headers trigger a lot of warnings with our - * compiler flags (at least -Wdeclaration-after-statement, - * -Wshadow=compatible-local are known to be problematic). The system_header - * pragma hides warnings from within the rest of this file, if supported. - */ -#ifdef HAVE_PRAGMA_GCC_SYSTEM_HEADER -#pragma GCC system_header -#endif - -/* - * Get the basic Perl API. We use PERL_NO_GET_CONTEXT mode so that our code - * can compile against MULTIPLICITY Perl builds without including XSUB.h. - */ -#define PERL_NO_GET_CONTEXT -#include "EXTERN.h" -#include "perl.h" - -/* - * We want to include XSUB.h only within .xs files, because on some platforms - * it undesirably redefines a lot of libc functions. But it must appear - * before ppport.h, so use a #define flag to control inclusion here. - */ -#ifdef PG_NEED_PERL_XSUB_H -/* - * On Windows, win32_port.h defines macros for a lot of these same functions. - * To avoid compiler warnings when XSUB.h redefines them, #undef our versions. - */ -#ifdef WIN32 -#undef accept -#undef bind -#undef connect -#undef fopen -#undef fstat -#undef kill -#undef listen -#undef lstat -#undef mkdir -#undef open -#undef putenv -#undef recv -#undef rename -#undef select -#undef send -#undef socket -#undef stat -#undef unlink -#endif - -#include "XSUB.h" -#endif - -/* put back our
No LINGUAS file yet for pg_combinebackup
Since dc2123400, any "make" in src/bin/pg_combinebackup whines cat: ./po/LINGUAS: No such file or directory if you have selected --enable-nls. This is evidently from AVAIL_LANGUAGES := $(shell cat $(srcdir)/po/LINGUAS) I don't particularly care to see that warning until whenever it is that the translations first get populated. Perhaps we should hack up nls-global.mk to hide the warning, but that might bite somebody someday. I'm inclined to just add an empty LINGUAS file as a temporary measure. Thoughts? regards, tom lane
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Hi, Alexander! On Mon, 25 Dec 2023 at 02:51, Alexander Korotkov wrote: > On Tue, Dec 12, 2023 at 3:22 PM Alexander Korotkov > wrote: > > > > On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan wrote: > > > Will you be in Prague this week? If not this might have to wait. > > > > Sorry, I wouldn't be in Prague this week. Due to my current > > immigration status, I can't travel. > > I wish you to have a lovely time in Prague. I'm OK to wait, review > > once you can. I will probably provide a more polished version > > meanwhile. > > Please find the revised patchset attached. It comes with revised > comments and commit messages. Besides bug fixing the second patch > makes optimization easier to understand. Now the flag used for > skipping checks of same direction required keys is named > continuescanPrechecked and means exactly that *continuescan flag is > known to be true for the last item on the page. > > Any objections to pushing these two patches? > I've reviewed both patches: 0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument. It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough. 0002: continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks! >From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looks more clear to me. Could you especially give attention to the comments: "If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the page accordingly." "If the key is required for the opposite direction scan, we need to know there was already at least one matching item on the page. For those keys." > Prechecking the value of the continuescan flag for the last item on the >+ * page (according to the scan direction). Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)" Otherwise the patch 0002 looks like a good fix for the bug to be pushed. Kind regards, Pavel Borisov
Re: Show WAL write and fsync stats in pg_stat_io
On Mon, Dec 25, 2023 at 04:09:34PM +0300, Nazir Bilal Yavuz wrote: > On Wed, 9 Aug 2023 at 21:52, Melanie Plageman > wrote: >> If there is any combination of BackendType and IOContext which will >> always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's >> op_bytes. For other cases, we may have to consider using op_bytes 1 and >> tracking reads and write IOOps in number of bytes (instead of number of >> pages). I don't actually know if there is a clear separation by >> BackendType for these different cases. > > Using op_bytes as 1 solves this problem but since it will be different > from the rest of the pg_stat_io view it could be hard to understand. > There is no clear separation by backends as it can be seen from the walsender. I find the use of 1 in this context a bit confusing, because when referring to a counter at N, then it can be understood as doing N times a operation, but it would be much less than that. Another solution would be to use NULL (as a synonym of "I don't know") and then document that in this case all the bigint counters of pg_stat_io track the number of bytes rather than the number of operations? >> The other alternative I see is to use XLOG_BLCKSZ as the op_bytes and >> treat op_bytes * number of reads as an approximation of the number of >> bytes read. I don't actually know what makes more sense. I don't think I >> would like having a number for bytes that is not accurate. > > Also, we have a similar problem in XLogPageRead() in xlogrecovery.c. > pg_pread() call tries to read XLOG_BLCKSZ but it is not certain and we > don't count IO if it couldn't read XLOG_BLCKSZ. IMO, this is not as > important as the previous problem but it still is a problem. > > I would be glad to hear opinions on these problems. Correctness matters a lot for monitoring, IMO. -- Michael signature.asc Description: PGP signature
Re: pread, pwrite, etc return ssize_t not int
On Mon, Dec 25, 2023 at 7:09 AM Tom Lane wrote: > Coverity whinged this morning about the following bit in > the new pg_combinebackup code: > > 644 unsignedrb; > 645 > 646 /* Read the block from the correct source, except if > dry-run. */ > 647 rb = pg_pread(s->fd, buffer, BLCKSZ, offsetmap[i]); > 648 if (rb != BLCKSZ) > 649 { > >>> CID 1559912: Control flow issues (NO_EFFECT) > >>> This less-than-zero comparison of an unsigned value is never true. > >>> "rb < 0U". > 650 if (rb < 0) > 651 pg_fatal("could not read file \"%s\": %m", > s->filename); > > It's dead right to complain of course. (I kind of think that the > majority of places where reconstruct.c is using "unsigned" variables > are poorly-thought-through; many of them look like they should be > size_t, and I suspect some other ones beside this one are flat wrong > or at least unnecessarily fragile. But I digress.) Yeah. > While looking around for other places that might've made comparable > mistakes, I noted that we have places that are storing the result of > pg_pread[v]/pg_pwrite[v] into an "int" variable even though they are > passing a size_t count argument that there is no obvious reason to > believe must fit in int. This seems like trouble waiting to happen, > so I fixed some of these in the attached. The major remaining place > that I think we ought to change is the newly-minted > FileRead[V]/FileWrite[V] functions, which are declared to return int > but really should be returning ssize_t IMO. I didn't do that here > though. Agreed in theory. Note that we've only been using size_t in fd.c functions since: commit 2d4f1ba6cfc2f0a977f1c30bda9848041343e248 Author: Peter Eisentraut Date: Thu Dec 8 08:51:38 2022 +0100 Update types in File API Make the argument types of the File API match stdio better: - Change the data buffer to void *, from char *. - Change FileWrite() data buffer to const on top of that. - Change amounts to size_t, from int. I guess it was an oversight not to change the return type to match at the same time. That said, I think that would only be for tidiness. Some assorted observations: 1. We don't yet require "large file" support, meaning that we use off_t our fd.c and lseek()/p*() replacement functions, but we know it is only 32 bits on Windows, and we avoid creating large files. I think that means that a hypothetical very large write would break that assumption, creating data whose position cannot be named in those calls. We could fix that on Windows by adjusting our wrappers, either to work with pgoff_t instead off off_t (yuck) or redefining off_t (yuck), but given that both options are gross, so far we have imagined that we should move towards using large files only conditionally, on sizeof(off_t) >= 8. 2. Windows' native read() and write() functions still have prototypes like 1988 POSIX, eg int read(int filedes, void *buf, unsigned int nbyte). It was 2001 POSIX that changed them to ssize_t read(int filedesc, void *buf, size_t nbytes). I doubt there is much that can be done about that, except perhaps adding a wrapper that caps it. Seems like overkill for a hypothetical sort of a problem... 3. Windows' native ReadFile() and WriteFile() functions, which our 'positionified' pg_p*() wrapper functions use, work in terms of DWORD = unsigned long which is 32 bit on that cursed ABI. Our wrappers should probably cap. I think a number of Unixoid systems implemented the POSIX interface change by capping internally, which doesn't matter much in practice because no one really tries to transfer gigabytes at once, and any non-trivial transfer size probably requires handling short transfers. For example, man read on Linux: On Linux, read() (and similar system calls) will transfer at most 0x7000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.) > We could go further by insisting that *all* uses of pg_pread/pg_pwrite > use ssize_t result variables. I think that's probably overkill --- in > the example above, which is only asking to write BLCKSZ worth of data, > surely an int is sufficient. But you could argue that allowing this > pattern at all creates risk of copy/paste errors. Yeah. > Of course the real elephant in the room is that plain old read(2) > and write(2) also return ssize_t. I've not attempted to vet every > call of those, and I think it'd likely be a waste of effort, as > we're unlikely to ever try to shove more than INT_MAX worth of > data through them. But it's a bit harder to make that argument > for the iovec-based file APIs. I think we ought to try to keep > our uses of those functions clean on this point. Yeah I think it's OK for a caller that knows it's passing in an int value to (implicitly) cast the return to int.
Re: Statistics Import and Export
Hi, I finally had time to look at the last version of the patch, so here's a couple thoughts and questions in somewhat random order. Please take this as a bit of a brainstorming and push back if you disagree some of my comments. In general, I like the goal of this patch - not having statistics is a common issue after an upgrade, and people sometimes don't even realize they need to run analyze. So, it's definitely worth improving. I'm not entirely sure about the other use case - allowing people to tweak optimizer statistics on a running cluster, to see what would be the plan in that case. Or more precisely - I agree that would be an interesting and useful feature, but maybe the interface should not be the same as for the binary upgrade use case? interfaces -- When I thought about the ability to dump/load statistics in the past, I usually envisioned some sort of DDL that would do the export and import. So for example we'd have EXPORT STATISTICS / IMPORT STATISTICS commands, or something like that, and that'd do all the work. This would mean stats are "first-class citizens" and it'd be fairly straightforward to add this into pg_dump, for example. Or at least I think so ... Alternatively we could have the usual "functional" interface, with a functions to export/import statistics, replacing the DDL commands. Unfortunately, none of this works for the pg_upgrade use case, because existing cluster versions would not support this new interface, of course. That's a significant flaw, as it'd make this useful only for upgrades of future versions. So I think for the pg_upgrade use case, we don't have much choice other than using "custom" export through a view, which is what the patch does. However, for the other use case (tweaking optimizer stats) this is not really an issue - that always happens on the same instance, so no issue with not having the "export" function and so on. I'd bet there are more convenient ways to do this than using the export view. I'm sure it could share a lot of the infrastructure, ofc. I suggest we focus on the pg_upgrade use case for now. In particular, I think we really need to find a good way to integrate this into pg_upgrade. I'm not against having custom CLI commands, but it's still a manual thing - I wonder if we could extend pg_dump to dump stats, or make it built-in into pg_upgrade in some way (possibly disabled by default, or something like that). JSON format --- As for the JSON format, I wonder if we need that at all? Isn't that an unnecessary layer of indirection? Couldn't we simply dump pg_statistic and pg_statistic_ext_data in CSV, or something like that? The amount of new JSONB code seems to be very small, so it's OK I guess. I'm still a bit unsure about the "right" JSON schema. I find it a bit inconvenient that the JSON objects mimic the pg_statistic schema very closely. In particular, it has one array for stakind values, another array for stavalues, array for stanumbers etc. I understand generating this JSON in SQL is fairly straightforward, and for the pg_upgrade use case it's probably OK. But my concern is it's not very convenient for the "manual tweaking" use case, because the "related" fields are scattered in different parts of the JSON. That's pretty much why I envisioned a format "grouping" the arrays for a particular type of statistics (MCV, histogram) into the same object, as for example in { "mcv" : {"values" : [...], "frequencies" : [...]} "histogram" : {"bounds" : [...]} } But that's probably much harder to generate from plain SQL (at least I think so, I haven't tried). data missing in the export -- I think the data needs to include more information. Maybe not for the pg_upgrade use case, where it's mostly guaranteed not to change, but for the "manual tweak" use case it can change. And I don't think we want two different formats - we want one, working for everything. Consider for example about the staopN and stacollN fields - if we clone the stats from one table to the other, and the table uses different collations, will that still work? Similarly, I think we should include the type of each column, because it's absolutely not guaranteed the import function will fail if the type changes. For example, if the type changes from integer to text, that will work, but the ordering will absolutely not be the same. And so on. For the extended statistics export, I think we need to include also the attribute names and expressions, because these can be different between the statistics. And not only that - the statistics values reference the attributes by positions, but if the two tables have the attributes in a different order (when ordered by attnum), that will break stuff. more strict checks -- I think the code should be a bit more "defensive" when importing stuff, and do at least some sanity checks. For the pg_upgrade use case this should be mostly non-issue (except for maybe helping to d
Re: Track in pg_replication_slots the reason why slots conflict?
On Thu, Dec 21, 2023 at 8:21 PM Bertrand Drouvot wrote: > > On Thu, Dec 21, 2023 at 07:55:51PM +0530, Amit Kapila wrote: > > On Thu, Dec 21, 2023 at 5:05 PM Andres Freund wrote: > > > I'm not entirely sure I understand the difference - just whether we add > > > one > > > new column or replace the existing 'conflicting' column? I can see > > > arguments > > > for either. > > > > > > > Agreed. I think the argument against replacing the existing > > 'conflicting' column is that there is a chance that it is being used > > by some monitoring script which I guess shouldn't be a big deal to > > change. So, if we don't see that as a problem, I would prefer to have > > a single column with conflict reason where one of its values indicates > > there is no conflict. > > +1 > Does anyone else have a preference on whether to change the existing column or add a new one? -- With Regards, Amit Kapila.
Re: Avoid computing ORDER BY junk columns unnecessarily
On Fri, Dec 22, 2023 at 2:38 AM Heikki Linnakangas wrote: > v1-0004-Omit-columns-from-final-tlist-that-were-only-need.patch > > The main patch in this series. This patch filters out the junk columns created for ORDER BY/GROUP BY, and retains the junk columns created for RowLocks. I'm afraid this may have a problem about out-of-order resnos. For instance, create table mytable (foo text, bar text); # explain select foo from mytable order by bar for update of mytable; server closed the connection unexpectedly This query triggers another Assert in apply_tlist_labeling: Assert(dest_tle->resno == src_tle->resno); At first there are three TargetEntry items: foo, bar and ctid, with resnos being 1, 2 and 3. And then the second item 'bar' is removed, leaving only two items: foo and ctid, with resnos being 1 and 3. So now we have a missing resno, and finally hit the Assert. Thanks Richard
Re: Optimization outcome depends on the index order
On 25/12/2023 18:36, Alexander Korotkov wrote: On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov wrote: On 22/12/2023 11:48, Alexander Korotkov wrote: > Because we must trust all predictions made by the planner, we just > choose the most trustworthy path. According to the planner logic, it is > a path with a smaller selectivity. We can make mistakes anyway just > because of the nature of estimation. Even if we need to take selectivity into account here, it's still not clear why this should be on top of other logic later in add_path(). I got your point now, thanks for pointing it out. In the next version of the patch selectivity is used as a criteria only in the case of COSTS_EQUAL. It looks better now. But it's hard for me to judge these heuristics in add_path(). Tom, what do you think about this? Just food for thought: Another approach I have considered was to initialize the relation index list according to some consistent rule: place unique indexes at the head of the list, arrange indexes according to the number of columns involved and sort in some lexical order. But IMO, the implemented approach looks better. -- regards, Andrei Lepikhov Postgres Professional
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Dec 21, 2023 at 4:41 PM John Naylor wrote: > > On Thu, Dec 21, 2023 at 8:33 AM Masahiko Sawada wrote: > > > > I found the following comment and wanted to discuss: > > > > // this might be better as "iterate over nodes", plus a callback to > > RT_DUMP_NODE, > > // which should really only concern itself with single nodes > > RT_SCOPE void > > RT_DUMP(RT_RADIX_TREE *tree) > > > > If it means we need to somehow use the iteration functions also for > > dumping the whole tree, it would probably need to refactor the > > iteration codes so that the RT_DUMP() can use them while dumping > > visited nodes. But we need to be careful of not adding overheads to > > the iteration performance. > > Yeah, some months ago I thought a callback interface would make some > things easier. I don't think we need that at the moment (possibly > never), so that comment can be just removed. As far as these debug > functions, I only found useful the stats and dumping a single node, > FWIW. > > I've attached v47, which is v46 plus some fixes for radix tree. > > 0004 - moves everything for "delete" to the end -- gradually other > things will be grouped together in a sensible order > > 0005 - trivial LGTM. > > 0006 - shrink nodes -- still needs testing, but nothing crashes yet. Cool. The coverage test results showed the shrink codes are also covered. > This shows some renaming might be good: Previously we had > RT_CHUNK_CHILDREN_ARRAY_COPY for growing nodes, but for shrinking I've > added RT_COPY_ARRAYS_AND_DELETE, since the deletion happens by simply > not copying the slot to be deleted. This means when growing it would > be more clear to call the former RT_COPY_ARRAYS_FOR_INSERT, since that > reserves a new slot for the caller in the new node, but the caller > must do the insert itself. Agreed. > Note that there are some practical > restrictions/best-practices on whether shrinking should happen after > deletion or vice versa. Hopefully it's clear, but let me know if the > description can be improved. Also, it doesn't yet shrink from size > class 32 to 16, but it could with a bit of work. Sounds reasonable. > > 0007 - trivial, but could use a better comment. I also need to make > sure stats reporting works (may also need some cleanup work). > > 0008 - fixes RT_FREE_RECURSE -- I believe you wondered some months ago > if DSA could just free all our allocated segments without throwing > away the DSA, and that's still a good question. LGTM. > > 0009 - fixes the assert in RT_ITER_SET_NODE_FROM (btw, I don't think > this name is better than RT_UPDATE_ITER_STACK, so maybe we should go > back to that). Will rename it. > The assert doesn't fire, so I guess it does what it's > supposed to? Yes. > For me, the iteration logic is still the most confusing > piece out of the whole radix tree. Maybe that could be helped with > some better variable names, but I wonder if it needs more invasive > work. True. Maybe more comments would also help. > > 0010 - some fixes for number of children accounting in node256 > > 0011 - Long overdue pgindent of radixtree.h, without trying to fix up > afterwards. Feel free to throw out and redo if this interferes with > ongoing work. > LGTM. I'm working on the below review comments and most of them are already incorporated on the local branch: > The rest are from your v46. The bench doesn't work for tid store > anymore, so I squashed "disable bench for CI" until we get back to > that. Some more review comments (note: patch numbers are for v47, but > I changed nothing from v46 in this area): > > 0013: > > + * Internally, a tid is encoded as a pair of 64-bit key and 64-bit value, > + * and stored in the radix tree. > > Recently outdated. The variable length values seems to work, so let's > make everything match. > > +#define MAX_TUPLES_PER_PAGE MaxOffsetNumber > > Maybe we don't need this macro anymore? The name no longer fits, in any case. Removed. > > +TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber > *offsets, > + int num_offsets) > +{ > + char buf[MaxBlocktableEntrySize]; > + BlocktableEntry *page = (BlocktableEntry *) buf; > > I'm not sure this is safe with alignment. Maybe rather than plain > "char", it needs to be a union with BlocktableEntry, or something. I tried it in the new patch set but could you explain why it could not be safe with alignment? > > +static inline BlocktableEntry * > +tidstore_iter_kv(TidStoreIter *iter, uint64 *key) > +{ > + if (TidStoreIsShared(iter->ts)) > + return shared_rt_iterate_next(iter->tree_iter.shared, key); > + > + return local_rt_iterate_next(iter->tree_iter.local, key); > +} > > In the old encoding scheme, this function did something important, but > now it's a useless wrapper with one caller. Removed. > > + /* > + * In the shared case, TidStoreControl and radix_tree are backed by the > + * same DSA area and rt_memory_usage() returns the value including both. > + * So we don't need to add the size of TidStoreControl separ