kerberos/001_auth test fails on arm CPU darwin
Hi hackers, I wanted to add ARM CPU darwin to the CI but it seems that kerberos/001_auth fails on ARM CPU darwin. OS: Darwin admins-Virtual-Machine.local 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:26:07 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_VMAPPLE arm64 Error message: Can't exec "kdb5_util": No such file or directory at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Utils.pm line 338. [02:53:37.177](0.043s) Bail out! failed to execute command "kdb5_util create -s -P secret0": No such file or directory It seems that kerberos is installed at the '/opt/homebrew/opt/krb5' path on ARM CPU darwin instances instead of the '/usr/local/opt/krb5' path. I attached two patches: 0001-ci-Add-arm-CPU-for-darwin.patch is about adding ARM CPU darwin to the CI. 0002-fix-darwin-ARM-CPU-darwin-krb5-path-fix.patch is about fixing the error. CI run after ARM CPU darwin is added: https://cirrus-ci.com/build/5772792711872512 CI run after fix applied: https://cirrus-ci.com/build/5686842363215872 Regards, Nazir Bilal Yavuz 0001-ci-Add-arm-CPU-for-darwin.patch Description: Binary data 0002-fix-darwin-ARM-CPU-darwin-krb5-path-fix.patch Description: Binary data
Re: [RFC] building postgres with meson - v10
Hi, On 2022-07-18 23:23:27 +0300, Andres Freund wrote: > Bilal, Peter previously commented on the pg_regress change for ecpg, > perhaps > you can comment on that? > > In > https://postgr.es/m/0e81e45c-c9a5-e95b-2782-ab2dfec8bf57%40enterprisedb.com > On 2022-07-06 11:03:31 +0200, Peter Eisentraut wrote: > > dff7b5a960 meson: prereq: regress: allow to specify director containing > > expected files. > > > > This could use a bit more explanation, but it doesn't look > > controversial so far While testing ECPG, C and exe files are generated by meson so these files are in the meson's build directory but expected files are in the source directory. However; there was no way to set different paths for inputs (C and exe files') and expected files' directory. So, I added `--expecteddir` to separately set expected files' directory. Greetings, Nazir Bilal Yavuz On Mon, 18 Jul 2022 at 23:23, Andres Freund wrote: > Hi, > > On 2022-07-18 11:33:09 +0200, Peter Eisentraut wrote: > > The following patches are ok to commit IMO: > > > > a1c5542929 prereq: Deal with paths containing \ and spaces in > basebackup_to_shell tests > > e37951875d meson: prereq: psql: Output dir and dependency generation for > sql_help > > 18cc9fbd02 meson: prereq: ecpg: Add and use output directory argument > for preproc/*.pl > > 58a32694e9 meson: prereq: Move snowball_create.sql creation into perl > file > > 59b8bffdaf meson: prereq: Add output path arg in generate-lwlocknames.pl > > 2db97b00d5 meson: prereq: generate-errcodes.pl: Accept output file > > fb8f52f21d meson: prereq: unicode: Allow to specify output directory > > 8f1e4410d6 meson: prereq: Refactor dtrace postprocessing make rules > > 3d18a20b11 meson: prereq: Add --outdir option to gen_node_support.pl > > I pushed these. Thanks for the reviews and patches! > > The symbol export stuff has also been pushed (discussed in a separate > thread). > > It's nice to see the meson patchset length reduced by this much. > > I pushed a rebased version of the remaining branches to git. I'll be on > vacation for a bit, I'm not sure I can get a new version with further > cleanups > out before. > > > Given that we can't use src/tools/gen_versioning_script.pl for the make > build, > due to not depending on perl for tarball builds, I'm inclined to rewrite it > python (which we depend on via meson anyway) and consider it a meson > specific > wrapper? > > > Bilal, Peter previously commented on the pg_regress change for ecpg, > perhaps > you can comment on that? > > In > https://postgr.es/m/0e81e45c-c9a5-e95b-2782-ab2dfec8bf57%40enterprisedb.com > On 2022-07-06 11:03:31 +0200, Peter Eisentraut wrote: > > dff7b5a960 meson: prereq: regress: allow to specify director containing > > expected files. > > > > This could use a bit more explanation, but it doesn't look > > controversial so far. > > Greetings, > > Andres Freund >
Re: [RFC] building postgres with meson - v10
Hi, Sorry for the first email. On Mon, 18 Jul 2022 at 23:23, Andres Freund wrote: > > In https://postgr.es/m/0e81e45c-c9a5-e95b-2782-ab2dfec8bf57%40enterprisedb.com > On 2022-07-06 11:03:31 +0200, Peter Eisentraut wrote: > > dff7b5a960 meson: prereq: regress: allow to specify director containing > > expected files. > > > > This could use a bit more explanation, but it doesn't look > > controversial so far. While testing ECPG, C and exe files are generated by meson so these files are in the meson's build directory but expected files are in the source directory. However; there was no way to set different paths for inputs (C and exe files') and expected files' directory. So, I added `--expecteddir` to separately set expected files' directory. Greetings, Nazir Bilal Yavuz
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Wed, 10 Jan 2024 at 08:25, Michael Paquier wrote: > > On Wed, Jan 03, 2024 at 04:10:58PM +0300, Nazir Bilal Yavuz wrote: > > > > I thought removing op_bytes completely ( as you said "This patch > > extends it with two more operation sizes, and there are even cases > > where it may be a variable" ) from pg_stat_io view then adding > > something like {read | write | extend}_bytes and {read | write | > > extend}_calls could be better, so that we don't lose any information. > > But then you'd lose the possibility to analyze correlations between > the size and the number of the operations, which is something that > matters for more complex I/O scenarios. This does not need to be > tackled in this patch, which is useful on its own, though I am really > wondering if this is required for the recent work done by Thomas. > Perhaps Andres, Thomas or Melanie could comment on that? Yes, you are right. > >> Yeah, a limitation like that may be acceptable for now. Tracking the > >> WAL writer and WAL sender activities can be relevant in a lot of cases > >> even if we don't have the full picture for the WAL receiver yet. > > > > I added that and disabled B_WAL_RECEIVER backend with comments > > explaining why. v8 is attached. > > I can see that's what you have been adding here, which should be OK: > > > -if (track_io_timing) > > +/* > > + * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp > > IOs > > + * but these IOs are not countable for now because IOOP_READ IOs' > > op_bytes > > + * (number of bytes per unit of I/O) might not be the same all the > > time. > > + * The current implementation requires that the op_bytes must be the > > same > > + * for the same IOObject, IOContext and IOOp. To avoid confusion, the > > + * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for > > + * now. > > + */ > > +if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) > > +return; > > This could be worded better, but that's one of these nits from me I > usually tweak when committing stuff. Thanks for doing that! Do you have any specific comments that can help improve it? > > +/* > > + * Decide if IO timings need to be tracked. Timings associated to > > + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled, > > + * else rely on track_io_timing. > > + */ > > +static bool > > +pgstat_should_track_io_time(IOObject io_object) > > +{ > > +if (io_object == IOOBJECT_WAL) > > +return track_wal_io_timing; > > + > > +return track_io_timing; > > +} > > One thing I was also considering is if eliminating this routine would > make pgstat_count_io_op_time() more readable the result, but I cannot > get to that. I could not think of a way to eliminate pgstat_should_track_io_time() route without causing performance regressions. What do you think about moving inside of 'pgstat_should_track_io_time(io_object) if check' to another function and call this function from pgstat_count_io_op_time()? This does not change anything but IMO it increases the readability. > > if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) > > { > > - > > pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); > > +if (io_object != IOOBJECT_WAL) > > + > > pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + > > if (io_object == IOOBJECT_RELATION) > > INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, > > io_time); > > else if (io_object == IOOBJECT_TEMP_RELATION) > > @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext > > io_context, IOOp io_op, > > } > > else if (io_op == IOOP_READ) > > { > > - > > pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); > > +if (io_object != IOOBJECT_WAL) > > + > > pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); > > + > > if (io_object == IOOBJECT_RELATION) > > INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, > > io_time); > > else if (io_object == IOOBJECT_TEMP_RELATION) > > A second thing is if this would be better with more switch/cases, say: > switch (io_op): > { > case IOOP_EXTEND: > case IOOP_WRITE: > switch (io_object): > {
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Thu, 11 Jan 2024 at 08:01, Michael Paquier wrote: > > On Wed, Jan 10, 2024 at 07:24:50PM -0500, Melanie Plageman wrote: > > I have code review feedback as well, but I've saved that for my next email. > > Ah, cool. > > > On Wed, Jan 3, 2024 at 8:11 AM Nazir Bilal Yavuz wrote: > >> On Sun, 31 Dec 2023 at 03:58, Michael Paquier wrote: > >> Oh, I understand it now. Yes, that makes sense. > >> I thought removing op_bytes completely ( as you said "This patch > >> extends it with two more operation sizes, and there are even cases > >> where it may be a variable" ) from pg_stat_io view then adding > >> something like {read | write | extend}_bytes and {read | write | > >> extend}_calls could be better, so that we don't lose any information. > > > > Upthread, Michael says: > > > >> I find the use of 1 in this context a bit confusing, because when > >> referring to a counter at N, then it can be understood as doing N > >> times a operation, > > > > I didn't understand this argument, so I'm not sure if I agree or > > disagree with it. > > Nazir has mentioned upthread one thing: what should we do for the case > where a combination of (io_object,io_context) does I/O with a > *variable* op_bytes, because that may be the case for the WAL > receiver? For this case, he has mentioned that we should set op_bytes > to 1, but that's something I find confusing because it would mean that > we are doing read, writes or extends 1 byte at a time. My suggestion > would be to use op_bytes = -1 or NULL for the variable case instead, > with reads, writes and extends referring to a number of bytes rather > than a number of operations. I agree but we can't do this only for the *variable* cases since B_WAL_RECEIVER and other backends use the same pgstat_count_io_op_time(IOObject, IOContext, ...) call. What I mean is, if two backends use the same pgstat_count_io_op_time() function call in the code; they must count the same thing (number of calls, bytes, etc.). It could be better to count the number of bytes for all the IOOBJECT_WAL IOs. > > I think these are the three proposals for handling WAL reads: > > > > 1) setting op_bytes to 1 and the number of reads is the number of bytes > > 2) setting op_bytes to XLOG_BLCKSZ and the number of reads is the > > number of calls to pg_pread() or similar > > 3) setting op_bytes to NULL and the number of reads is the number of > > calls to pg_pread() or similar > > 3) could be a number of bytes, actually. One important point is that we can't change only reads, if we decide to count the number of bytes for the reads; writes and extends should be counted as a number of bytes as well. > > Looking at the patch, I think it is still doing 2. > > The patch disables stats for the WAL receiver, while the startup > process reads WAL with XLOG_BLCKSZ, so yeah that's 2) with a trick to > discard the variable case. > > > For an unpopular idea: we could add separate [IOOp]_bytes columns for > > all those IOOps for which it would be relevant. It kind of stinks but > > it would give us the freedom to document exactly what a single IOOp > > means for each combination of BackendType, IOContext, IOObject, and > > IOOp (as relevant) and still have an accurate number in the *bytes > > columns. Everyone will probably hate us if we do that, though. > > Especially because having bytes for the existing IOObjects is an > > existing feature. > > An issue I have with this one is that having multiple tuples for > each (object,context) if they have multiple op_bytes leads to > potentially a lot of bloat in the view. That would be up to 8k extra > tuples just for the sake of op_byte's variability. Yes, that doesn't seem applicable to me. > > A separate question: suppose [1] goes in (to read WAL from WAL buffers > > directly). Now, WAL reads are not from permanent storage anymore. Are > > we only tracking permanent storage I/O in pg_stat_io? I also had this > > question for some of the WAL receiver functions. Should we track any > > I/O other than permanent storage I/O? Or did I miss this being > > addressed upthread? > > That's a good point. I guess that this should just be a different > IOOp? That's not a IOOP_READ. A IOOP_HIT is also different. I think different IOContext rather than IOOp suits better for this. > > In terms of what I/O we should track in a streaming/asynchronous > > world, the options would be: > > > > 1) track read/write syscalls > > 2) track blocks of BLCKSZ submitted to the kernel > > 3) track bytes submitted to the kernel >
Re: make pg_ctl more friendly
Hi, On Wed, 10 Jan 2024 at 06:33, Junwang Zhao wrote: > > Hi Nazir, > > On Tue, Jan 9, 2024 at 9:23 PM Nazir Bilal Yavuz wrote: > > > > v2-0001-PITR-shutdown-should-not-report-error-by-pg_ctl.patch: > > > > - "Examine the log output.\n"), > > + "Examine the log output\n"), > > progname); > > > > I don't think that this is needed. > There seems to be no common sense for the ending dot when using > write_stderr, so I will leave this not changed. > > > > > Other than that, the patch looks good and I confirm that after PITR > > shutdown: > > > > "PITR shutdown" > > "update configuration for startup again if needed" > > > > message shows up, instead of: > > > > "pg_ctl: could not start server" > > "Examine the log output.". > > > > nitpick: It would be better if the order of the error message cases > > and enums is the same ( i.e. 'POSTMASTER_RECOVERY_SHUTDOWN' before > > 'POSTMASTER_FAILED' in enum ) > Agreed, fixed in V3 Thank you for the update. It looks good to me. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Thu, 11 Jan 2024 at 17:28, Melanie Plageman wrote: > > On Thu, Jan 11, 2024 at 6:19 AM Nazir Bilal Yavuz wrote: > > > > On Thu, 11 Jan 2024 at 08:01, Michael Paquier wrote: > > > > > > On Wed, Jan 10, 2024 at 07:24:50PM -0500, Melanie Plageman wrote: > > > > On Wed, Jan 3, 2024 at 8:11 AM Nazir Bilal Yavuz > > > > wrote: > > > >> On Sun, 31 Dec 2023 at 03:58, Michael Paquier > > > >> wrote: > > > >> Oh, I understand it now. Yes, that makes sense. > > > >> I thought removing op_bytes completely ( as you said "This patch > > > >> extends it with two more operation sizes, and there are even cases > > > >> where it may be a variable" ) from pg_stat_io view then adding > > > >> something like {read | write | extend}_bytes and {read | write | > > > >> extend}_calls could be better, so that we don't lose any information. > > > > > > > > Upthread, Michael says: > > > > > > > >> I find the use of 1 in this context a bit confusing, because when > > > >> referring to a counter at N, then it can be understood as doing N > > > >> times a operation, > > > > > > > > I didn't understand this argument, so I'm not sure if I agree or > > > > disagree with it. > > > > > > Nazir has mentioned upthread one thing: what should we do for the case > > > where a combination of (io_object,io_context) does I/O with a > > > *variable* op_bytes, because that may be the case for the WAL > > > receiver? For this case, he has mentioned that we should set op_bytes > > > to 1, but that's something I find confusing because it would mean that > > > we are doing read, writes or extends 1 byte at a time. My suggestion > > > would be to use op_bytes = -1 or NULL for the variable case instead, > > > with reads, writes and extends referring to a number of bytes rather > > > than a number of operations. > > > > I agree but we can't do this only for the *variable* cases since > > B_WAL_RECEIVER and other backends use the same > > pgstat_count_io_op_time(IOObject, IOContext, ...) call. What I mean > > is, if two backends use the same pgstat_count_io_op_time() function > > call in the code; they must count the same thing (number of calls, > > bytes, etc.). It could be better to count the number of bytes for all > > the IOOBJECT_WAL IOs. > > I'm a bit confused by this. pgstat_count_io_op_time() can check > MyBackendType. In fact, you do this to ban the wal receiver already. > It is true that you would need to count all wal receiver normal > context wal object IOOps in the variable way, but I don't see how > pgstat_count_io_op_time() is the limiting factor as long as the > callsite is always doing either the number of bytes or the number of > calls. Apologies for not being clear. Let me try to explain this by giving examples: Let's assume that there are 3 different pgstat_count_io_op_time() calls in the code base and they are labeled as 1, 2 and 3. And let's' assume that: WAL receiver uses 1st and 2nd pgstat_count_io_op_time(), autovacuum uses 2nd and 3rd pgstat_count_io_op_time() and checkpointer uses 3rd pgstat_count_io_op_time() to count IOs. The 1st one is the only pgstat_count_io_op_time() call that must count the number of bytes because of the variable cases and the others count the number of calls or blocks. a) WAL receiver uses both 1st and 2nd => 1st and 2nd pgstat_count_io_op_time() must count the same thing => 2nd pgstat_count_io_op_time() must count the number of bytes as well. b) 2nd pgstat_count_io_op_time() started to count the number of bytes => Autovacuum will start to count the number of bytes => 2nd and 3rd both are used by autocavuum => 3rd pgstat_count_io_op_time() must count the number of bytes as well. c) 3rd pgstat_count_io_op_time() started to count the number of bytes => Checkpointer will start to count the number of bytes. The list goes on like this and if we don't have [write | read | extend]_bytes, this effect will be multiplied. > > > > I think these are the three proposals for handling WAL reads: > > > > > > > > 1) setting op_bytes to 1 and the number of reads is the number of bytes > > > > 2) setting op_bytes to XLOG_BLCKSZ and the number of reads is the > > > > number of calls to pg_pread() or similar > > > > 3) setting op_bytes to NULL and the number of reads is the number of > > > > calls to pg_pread() or similar > > > > > > 3) could be a number of bytes, actu
Re: speed up a logical replica setup
Hi, On Fri, 12 Jan 2024 at 09:32, Hayato Kuroda (Fujitsu) wrote: > > Good, all warnings were removed. However, the patch failed to pass tests on > FreeBSD twice. > I'm quite not sure the ERROR, but is it related with us? FreeBSD errors started after FreeBSD's CI image was updated [1]. I do not think error is related to this. [1] https://cirrus-ci.com/task/4700394639589376 -- Regards, Nazir Bilal Yavuz Microsoft
Re: Create shorthand for including all extra tests
Hi, On Wed, 10 Jan 2024 at 23:48, Peter Eisentraut wrote: > > On 05.09.23 19:26, Nazir Bilal Yavuz wrote: > > Thanks for the feedback! I updated the patch, 'needs-private-lo' > > option enables kerberos, ldap, load_balance and ssl extra tests now. > > As was discussed, I don't think "needs private lo" is the only condition > for these tests. At least kerberos and ldap also need extra software > installed, and load_balance might need editing the system's hosts file. > So someone would still need to familiarize themselves with these tests > individually before setting a global option like this. > > Also, if we were to create test groupings like this, I think the > implementation should be different. The way you have it, there is a > sort of central registry of all affected tests in > src/test/perl/PostgreSQL/Test/Utils.pm and a mapping of groups to tests. > I would prefer a more decentralized approach where each test decides > on its own whether to run, with pseudo-code conditionals like > > if (!(PG_TEST_EXTRA contains "ldap" or PG_TEST_EXTRA contains > "needs-private-lo")) >skip_all > > Anyway, at the moment, I don't see a sensible way to group these things > beyond what we have now (effectively, "ldap" is already a group, because > it affects more than one test suite). Right now, we have six possible > values, which is probably just about doable to keep track of manually. > If we get a lot more, then we need to look into this again, but maybe > then we'll also have more patterns to group things around. I see your point. It looks like the best option is to reevaluate this if there are more PG_TEST_EXTRA options. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Mon, 15 Jan 2024 at 09:27, Michael Paquier wrote: > > On Fri, Jan 12, 2024 at 04:23:26PM +0300, Nazir Bilal Yavuz wrote: > > On Thu, 11 Jan 2024 at 17:28, Melanie Plageman > > wrote: > >> Even if we made a separate view for WAL I/O stats, we would still have > >> this issue of variable sized I/O vs block sized I/O and would probably > >> end up solving it with separate columns for the number of bytes and > >> number of operations. > > > > Yes, I think it is more about flexibility and not changing the already > > published pg_stat_io view. > > I don't know. Adding more columns or changing op_bytes with an extra > mode that reflects on what the other columns mean is kind of the same > thing to me: we want pg_stat_io to report more modes so as all I/O can > be evaluated from a single view, but the complication is now that > everything is tied to BLCKSZ. > > IMHO, perhaps we'd better put this patch aside until we are absolutely > *sure* of what we want to achieve when it comes to WAL, and I am > afraid that this cannot happen until we're happy with the way we > handle WAL reads *and* writes, including WAL receiver or anything that > has the idea of pulling its own page callback with > XLogReaderAllocate() in the backend. Well, writes should be > relatively "easy" as things happen with XLOG_BLCKSZ, mainly, but > reads are the unknown part. > > That also seems furiously related to the work happening with async I/O > or the fact that we may want to have in the view a separate meaning > for cached pages or pages read directly from disk. The worst thing > that we would do is rush something into the tree and then have to deal > with the aftermath of what we'd need to deal with in terms of > compatibility depending on the state of the other I/O related work > when the new view is released. That would not be fun for the users > and any hackers who would have to deal with that (aka mainly me if I > were to commit something), because pg_stat_io could mean something in > version N, still mean something entirely different in version N+1. I agree with your points. While the other I/O related work is happening we can discuss what we should do in the variable op_byte cases. Also, this is happening only for WAL right now but if we try to extend pg_stat_io in the future, that problem possibly will rise again. So, it could be good to come up with a general solution, not only for WAL. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Simplify documentation related to Windows builds
Hi, On Sun, 31 Dec 2023 at 09:13, Michael Paquier wrote: > > Hi all, > > As promised around thread [1] that has moved the docs related to > Windows into a new sub-section for Visual, here is a follow-up to > improve portions of its documentation, for discussion in the next CF. > > Some of my notes: > - How much does command line editing work on Windows? When it came to > VS, I always got the impression that this never worked. Andres in [2] > mentioned otherwise because meson makes that easier? I do not know that either. > - ActiveState perl should not be recommended IMO, as being able to use > a perl binary requires one to *register* into their service for tokens > that can be used to kick perl sessions, last time I checked. Two > alternatives: > -- MinGW perl binary. > -- strawberry perl (?) and Chocolatey. > -- MSYS I agree. Also, its installation & use steps are complicated IMO. It is not like install it, add it to PATH and forget. > - http://www.mingw.org/ is a dead end. This could be replaced by > links to https://www.mingw-w64.org/ instead? Correct. > At the end, the main issue that I have with this part of the > documentation is the lack of consistency leading to a confusing user > experience in the builds of Windows. My recent impressions were that > Andrew has picked up Chocolatey in some (all?) of his buildfarm > animals with Strawberry Perl. I've had a good experience with it, > FWIW, but Andres has also mentioned me a couple of weeks ago while in > Prague that Strawberry could lead to unpredictible results (sorry I > don't remember all the exact details). Postgres CI uses Strawberry Perl [1] as well but it is directly installed from the strawberryperl.com and its version is locked to '5.26.3.1' for now. > Between MSYS2, mingw-w64 and Chocolatey, there are a lot of options > available to users. So shouldn't we try to recommend only of them, > then align the buildfarm and the CI to use one of them? Supporting > more than one, at most two, would be OK for me, my end goal would be > to get rid entirely of the list of build dependencies in this "Visual" > section, because that's just a duplicate of what meson lists, except > that meson should do a better job at detecting dependencies than what > the now-dead MSVC scripts did. If we support two, the CI and the > buildfarm should run them. I agree. > I am attaching a patch that's an embryon of work (little time for > hacking as of life these days, still I wanted to get the discussion > started), but let's discuss which direction we should take moving > forward for 17~. The current changes look good. Both Bison and Flex are included in the msys tool suite, available -from http://www.mingw.org/wiki/MSYS";> as part of the -MinGW compiler suite. +from https://www.msys2.org/";>. Since we are changing that part, I think we need to change 'You will need to add the directory containing flex.exe and bison.exe to the PATH environment variable. In the case of MinGW, the directory is the \msys\1.0\bin subdirectory of your MinGW installation directory.' sentence to its msys2 version. My initial testing showed that the directory is the '\usr\bin' subdirectory of the msys2 installation directory in my environment. [1] https://github.com/anarazel/pg-vm-images/blob/main/scripts/windows_install_perl.ps1 -- Regards, Nazir Bilal Yavuz Microsoft
003_extrafiles.pl test fails on Windows with the newer Perl versions
Hi, I was trying to install newer Perl versions to Windows CI images and found that 003_extrafiles.pl test fails on Windows with: (0.183s) not ok 2 - file lists match (0.000s) # Failed test 'file lists match' # at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81. (0.000s) # Structures begin differing at: # $got->[0] = 'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir' # $expected->[0] = 'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_local_data/pgdata/tst_both_dir' (0.263s) not ok 5 - file lists match (0.000s) # Failed test 'file lists match' # at C:/cirrus/src/bin/pg_rewind/t/003_extrafiles.pl line 81. (0.000s) # Structures begin differing at: # $got->[0] = 'C:/cirrus/build/testrun/pg_rewind/003_extrafiles/data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir' # $expected->[0] = 'C:\cirrus\build/testrun/pg_rewind/003_extrafiles\data/t_003_extrafiles_primary_remote_data/pgdata/tst_both_dir' It looks like File::Find converts backslashes to slashes in the newer Perl versions. I tried to find the related commit and found this: https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d To solve this, I did the same conversion for Windows before comparing the paths. And to support all Perl versions, I decided to always convert them on Windows regardless of the Perl's version. The fix is attached. I looked at other File::Find appearances in the code but they do not compare the paths. So, I do not think there is any need to fix them. Any kind of feedback would be appreciated. -- Regards, Nazir Bilal Yavuz Microsoft From b28f48fe7d98d3dd7d2fcf652bfa5c8a4cd1c2d6 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 30 Jan 2024 13:12:41 +0300 Subject: [PATCH v1] Fix 003_extrafiles.pl test for the Windows File::Find converts backslashes to slashes in the newer Perl versions. See: https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d So, do the same conversion for Windows before comparing paths. To support all Perl versions, always convert them on Windows regardless of the Perl's version. --- src/bin/pg_rewind/t/003_extrafiles.pl | 13 + 1 file changed, 13 insertions(+) diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl index d54dcddcbb6..7e4c2771ce7 100644 --- a/src/bin/pg_rewind/t/003_extrafiles.pl +++ b/src/bin/pg_rewind/t/003_extrafiles.pl @@ -78,6 +78,19 @@ sub run_test }, $test_primary_datadir); @paths = sort @paths; + + # File::Find converts backslashes to slashes in the newer Perl + # versions. To support all Perl versions, do the same conversion + # for Windows before comparing the paths. + if ($PostgreSQL::Test::Utils::windows_os) + { + for my $filename (@paths) + { + $filename =~ s{\\}{/}g; + } + $test_primary_datadir =~ s{\\}{/}g; + } + is_deeply( \@paths, [ -- 2.43.0
Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions
Hi, On Tue, 30 Jan 2024 at 14:21, Nazir Bilal Yavuz wrote: > > It looks like File::Find converts backslashes to slashes in the newer > Perl versions. I tried to find the related commit and found this: > https://github.com/Perl/perl5/commit/414f14df98cb1c9a20f92c5c54948b67c09f072d I forgot to mention that I used Strawberry Perl and these errors started with 'Perl v5.36.1.1'. -- Regards, Nazir Bilal Yavuz Microsoft
Re: compiling postgres on windows arm using meson
Hi, On Thu, 18 Jan 2024 at 05:07, Dave Cramer wrote: > > Greetings, > Getting the following error > > [1146/2086] Generating src/backend/postgres.def with a custom command > (wrapped by meson to set PATH) > FAILED: src/backend/postgres.def > "C:\Program Files\Meson\meson.exe" "--internal" "exe" "--unpickle" > "C:\Users\davec\projects\postgresql\build\meson-private\meson_exe_perl.EXE_53b41ebc2e76cfc92dd6a2af212140770543faae.dat" > while executing ['c:\\perl\\bin\\perl.EXE', > '../src/backend/../tools/msvc_gendef.pl', '--arch', 'aarch64', '--tempdir', > 'src/backend/postgres.def.p', '--deffile', 'src/backend/postgres.def', > 'src/backend/postgres_lib.a', 'src/common/libpgcommon_srv.a', > 'src/port/libpgport_srv.a'] > --- stdout --- > > --- stderr --- > Usage: msvc_gendef.pl --arch --deffile --tempdir > files-or-directories > arch: x86 | x86_64 > deffile: path of the generated file > tempdir: directory for temporary files > files or directories: object files or directory containing object files > > log attached >From the docs [1]: PostgreSQL will only build for the x64 architecture on 64-bit Windows. So, I think that is expected. [1] https://www.postgresql.org/docs/current/install-windows-full.html#INSTALL-WINDOWS-FULL-64-BIT -- Regards, Nazir Bilal Yavuz Microsoft
Re: 003_extrafiles.pl test fails on Windows with the newer Perl versions
Hi, On Wed, 31 Jan 2024 at 01:18, Andrew Dunstan wrote: > > Pushed to all live branches. Thanks for the patch. Thanks for the push! -- Regards, Nazir Bilal Yavuz Microsoft
Re: Checking MINIMUM_VERSION_FOR_WAL_SUMMARIES
Hi, On Fri, 2 Feb 2024 at 03:11, Artur Zakirov wrote: > > Hi hackers, > > during reading the source code of new incremental backup functionality > I noticed that the following condition can by unintentional: > > /* > * For newer server versions, likewise create pg_wal/summaries > */ > if (PQserverVersion(conn) < MINIMUM_VERSION_FOR_WAL_SUMMARIES) > { > ... > > if (pg_mkdir_p(summarydir, pg_dir_create_mode) != 0 && > errno != EEXIST) > pg_fatal("could not create directory \"%s\": %m", summarydir); > } > > This is from src/bin/pg_basebackup/pg_basebackup.c. > > Is the condition correct? Shouldn't it be ">=". Otherwise the function > will create "/summaries" only for older PostgreSQL versions. You seem right, nice catch. Also, this change makes the check in snprintf(summarydir, sizeof(summarydir), "%s/%s/summaries", basedir, PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ? "pg_xlog" : "pg_wal"); redundant. PQserverVersion(conn) will always be higher than MINIMUM_VERSION_FOR_PG_WAL. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Checking MINIMUM_VERSION_FOR_WAL_SUMMARIES
Hi, On Fri, 2 Feb 2024 at 12:11, Artur Zakirov wrote: > > On Fri, 2 Feb 2024 at 09:41, Nazir Bilal Yavuz wrote: > > You seem right, nice catch. Also, this change makes the check in > > > > snprintf(summarydir, sizeof(summarydir), "%s/%s/summaries", > > basedir, > > PQserverVersion(conn) < MINIMUM_VERSION_FOR_PG_WAL ? > > "pg_xlog" : "pg_wal"); > > > > redundant. PQserverVersion(conn) will always be higher than > > MINIMUM_VERSION_FOR_PG_WAL. > > Thank you both for the comments. Indeed, that part now looks redundant. > I've attached a patch to remove checking MINIMUM_VERSION_FOR_PG_WAL. Thanks for the update. The patch looks good to me. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Streaming I/O, vectored I/O (WIP)
Hi, Thanks for working on this! On Wed, 10 Jan 2024 at 07:14, Thomas Munro wrote: > Thanks! I committed the patches up as far as smgr.c before the > holidays. The next thing to commit would be the bufmgr.c one for > vectored ReadBuffer(), v5-0001. Here's my response to your review of > that, which triggered quite a few changes. > > See also new version of the streaming_read.c patch, with change list > at end. (I'll talk about v5-0002, the SMgrRelation lifetime one, over > on the separate thread about that where Heikki posted a better > version.) I have a couple of comments / questions. 0001-Provide-vectored-variant-of-ReadBuffer: - Do we need to pass the hit variable to ReadBuffer_common()? I think it can be just declared in the ReadBuffer_common() now. 0003-Provide-API-for-streaming-reads-of-relations: - Do we need to re-think about getting a victim buffer logic? StrategyGetBuffer() function errors if it can not find any unpinned buffers, this can be more common in the async world since we pin buffers before completing the read (while looking ahead). - If the returned block from the callback is an invalid block, pg_streaming_read_look_ahead() sets pgsr->finished = true. Could there be cases like the returned block being an invalid block but we should continue to read after this invalid block? - max_pinned_buffers and pinned_buffers_trigger variables are set in the initialization part (in the pg_streaming_read_buffer_alloc_internal() function) then they do not change. In some cases there could be no acquirable buffers to pin while initializing the pgsr (LimitAdditionalPins() set max_pinned_buffers to 1) but while the read is continuing there could be chances to create larger reads (other consecutive reads are finished while this read is continuing). Do you think that trying to reset max_pinned_buffers and pinned_buffers_trigger to have higher values after the initialization to have larger reads make sense? +/* Is there a head range that we can't extend? */ +head_range = &pgsr->ranges[pgsr->head]; +if (head_range->nblocks > 0 && +(!need_complete || + !head_range->need_complete || + head_range->blocknum + head_range->nblocks != blocknum)) +{ +/* Yes, time to start building a new one. */ +head_range = pg_streaming_read_new_range(pgsr); - I think if both need_complete and head_range->need_complete are false, we can extend the head range regardless of the consecutiveness of the blocks. 0006-Allow-streaming-reads-to-ramp-up-in-size: - ramp_up_pin_limit variable is declared as an int but we do not check the overflow while doubling it. This could be a problem in longer reads. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Simplify documentation related to Windows builds
Hi, On Tue, 30 Jan 2024 at 11:02, Michael Paquier wrote: > > On Fri, Jan 19, 2024 at 06:11:40AM -0500, Andrew Dunstan wrote: > > FYI Strawberry was a bit stuck for a while at 5.32, but they are now up to > > 5.38. See <https://strawberryperl.com/releases.html> > > > > I agree we shouldn't be recommending any particular perl distro, especially > > not ASPerl which now has annoying license issues. > > The more I think about this thread, the more I'd tend to wipe out most > of "windows-requirements" for the sole reason that it is the far-west > regarding the various ways it is possible to get the dependencies we > need for the build and at runtime. We could keep it minimal with the > set of requirements we are listing under meson in terms of versions: > https://www.postgresql.org/docs/devel/install-requirements.html > > Then we could have one sentence recommending one, at most two > facilities used the buildfarm, like https://www.msys2.org/ or > chocolatey as these group basically all the dependencies we need for a > meson build (right?) while linking back to the meson page about the > version requirements. I tested both msys2 and chocolatey on the fresh Windows containers and I confirm that Postgres can be built using these. I tested the dependencies that are required to build and run Postgres. If more dependencies are required to be checked, I can test again. As these will be continuously tested by the buildfarm, I agree that what you suggested looks better. > One issue I have with the meson page listing the requirements is that > we don't directly mention Diff, but that's critical for the tests. I think that is because most distros come with a preinstalled diffutils package. It is mentioned under the Windows requirements page [1] since it does not come preinstalled. However, I agree that it could be good to mention it under the meson page listing the requirements. [1] https://www.postgresql.org/docs/devel/installation-platform-notes.html#WINDOWS-REQUIREMENTS -- Regards, Nazir Bilal Yavuz Microsoft
Re: Refactor calculations to use instr_time
Hi, Thanks for the review. On Wed, 22 Feb 2023 at 05:50, Kyotaro Horiguchi wrote: > > PgStat_PendingStats should be included in typedefs.list. Done. > > + * Created for accumulating wal_write_time and wal_sync_time as a instr_time > + * but instr_time can't be used as a type where it ends up on-disk > + * because its units may change. PgStat_WalStats type is used for > + * in-memory/on-disk data. So, PgStat_PendingWalStats is created for > + * accumulating intervals as a instr_time. > + */ > +typedef struct PgStat_PendingWalStats > > IMHO, this comment looks somewhat off. Maybe we could try something > like the following instead? > > > This struct stores wal-related durations as instr_time, which makes it > > easier to accumulate them without requiring type conversions. Then, > > during stats flush, they will be moved into shared stats with type > > conversions. Done. And I think we should write why we didn't change PgStat_WalStats's variable types and instead created a new struct. Maybe we can explain it in the commit description? > > The aim of this patch is to keep using instr_time for accumulation. > So it seems like we could do the same refactoring for > pgStatBlockReadTime, pgStatBlockWriteTime, pgStatActiveTime and > pgStatTransactionIdleTime. What do you think - should we include this > additional refactoring in the same patch or make a separate one for > it? I tried a bit but it seems the required changes for additional refactoring aren't small. So, I think we can create a separate patch for these changes. Regards, Nazir Bilal Yavuz Microsoft v4-0001-Refactor-instr_time-calculations.patch Description: Binary data
Re: meson: Optionally disable installation of test modules
Hi, Thanks for the review. On Mon, 20 Feb 2023 at 21:44, Peter Eisentraut wrote: > > I tested this a bit. It works fine. The approach makes sense to me. > > The install_additional_files script could be simplified a bit. You > could use os.makedirs(dest, exist_ok=True) and avoid the error checking. >I don't think any callers try to copy a directory source, so the > shutil.copytree() stuff isn't necessary. Run pycodestyle over the > script. And let's put the script into src/tools/ like the other support > scripts. > I updated the patch in line with your comments. Regards, Nazir Bilal Yavuz Microsoft From 7804aa928de7595cb89bfd7668952e1e1fe48038 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 23 Feb 2023 20:35:22 +0300 Subject: [PATCH v3] meson: prevent installation of test files during main install --- meson.build | 32 ++- src/backend/meson.build | 7 src/test/modules/delay_execution/meson.build | 6 ++-- src/test/modules/dummy_index_am/meson.build | 9 ++ src/test/modules/dummy_seclabel/meson.build | 9 ++ src/test/modules/plsample/meson.build | 10 ++ src/test/modules/spgist_name_ops/meson.build | 10 ++ .../ssl_passphrase_callback/meson.build | 6 ++-- src/test/modules/test_bloomfilter/meson.build | 9 ++ .../modules/test_copy_callbacks/meson.build | 9 ++ .../modules/test_custom_rmgrs/meson.build | 9 ++ src/test/modules/test_ddl_deparse/meson.build | 9 ++ src/test/modules/test_extensions/meson.build | 4 +-- .../modules/test_ginpostinglist/meson.build | 9 ++ src/test/modules/test_integerset/meson.build | 9 ++ src/test/modules/test_lfind/meson.build | 9 ++ src/test/modules/test_oat_hooks/meson.build | 6 ++-- src/test/modules/test_parser/meson.build | 9 ++ .../test_pg_db_role_setting/meson.build | 9 ++ src/test/modules/test_pg_dump/meson.build | 4 +-- src/test/modules/test_predtest/meson.build| 9 ++ src/test/modules/test_rbtree/meson.build | 9 ++ src/test/modules/test_regex/meson.build | 9 ++ src/test/modules/test_rls_hooks/meson.build | 6 ++-- src/test/modules/test_shm_mq/meson.build | 9 ++ src/test/modules/test_slru/meson.build| 9 ++ src/test/modules/worker_spi/meson.build | 9 ++ src/test/regress/meson.build | 18 --- src/tools/install_additional_files| 28 29 files changed, 139 insertions(+), 151 deletions(-) create mode 100644 src/tools/install_additional_files diff --git a/meson.build b/meson.build index f5347044526..1e4b3d5445a 100644 --- a/meson.build +++ b/meson.build @@ -2791,6 +2791,10 @@ backend_code = declare_dependency( dependencies: os_deps + backend_both_deps + backend_deps, ) +# install these files only during test, not main install +test_install_files = [] +test_install_libs = [] + # src/backend/meson.build defines backend_mod_code used for extension # libraries. @@ -2811,6 +2815,10 @@ subdir('doc/src/sgml') generated_sources_ac += {'': ['GNUmakefile']} +# After processing src/test, add test_install_libs to the testprep_targets +# to build them +testprep_targets += test_install_libs + # If there are any files in the source directory that we also generate in the # build directory, they might get preferred over the newly generated files, @@ -2893,14 +2901,36 @@ meson_install_args = meson_args + ['install'] + { 'muon': [] }[meson_impl] +# setup tests should be run first, +# so define priority for these +setup_tests_priority = 100 test('tmp_install', meson_bin, args: meson_install_args , env: {'DESTDIR':test_install_destdir}, -priority: 100, +priority: setup_tests_priority, timeout: 300, is_parallel: false, suite: ['setup']) +# get full paths of test_install_libs to copy them +test_install_libs_fp = [] +foreach lib: test_install_libs + test_install_libs_fp += lib.full_path() +endforeach + +install_additional_files = files('src/tools/install_additional_files') +test('install_additional_files', +python, args: [ + install_additional_files, + '--sharedir', test_install_location / contrib_data_args['install_dir'], + '--libdir', test_install_location / dir_lib_pkg, + '--install_files', test_install_files, + '--install_libs', test_install_libs_fp, +], +priority: setup_tests_priority, +is_parallel: false, +suite: ['setup']) + test_result_dir = meson.build_root() / 'testrun' diff --git a/src/backend/meson.build b/src/backend/meson.build index 4fdd209b826..ccfc382fcfd 100644 --- a/src/backend/meson.bu
Re: allow meson to find ICU in non-standard localtion
Hi, Thanks for the patch. On Wed, 22 Feb 2023 at 21:26, Jeff Davis wrote: > > I'm not sure it's the right thing to do though. One downside is that it > doesn't output the version that it finds, it only outputs "YES". - icu = dependency('icu-uc', required: icuopt.enabled()) - icu_i18n = dependency('icu-i18n', required: icuopt.enabled()) I think you can do dependency checks with 'required: false' first and if they weren't found by dependency checks; then you can do cc.find_library() checks. This also solves only the outputting "YES" problem if they were found by dependency checks. Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Wed, 22 Feb 2023 at 12:14, Peter Eisentraut wrote: > > On 21.02.23 17:32, Nazir Bilal Yavuz wrote: > >>> I like the second approach, with a 'uuid' feature option. As you wrote > >>> earlier, adding an 'auto' choice to a combo option doesn't work fully > >>> like a > >>> real feature option. > >> But we can make it behave exactly like one, by checking the auto_features > >> option. > > Yes, we can set it like `uuidopt = get_option('auto_features')`. > > However, if someone wants to set 'auto_features' to 'disabled' but > > 'uuid' to 'enabled'(to find at least one working uuid library); this > > won't be possible. We can add 'enabled', 'disabled and 'auto' choices > > to 'uuid' combo option to make all behaviours possible but adding > > 'uuid' feature option and 'uuid_library' combo option seems better to > > me. > > I think the uuid side of this is making this way too complicated. I'm > content leaving this as a manual option for now. > > There is much more value in making the ssl option work automatically. > So I would welcome a patch that just makes -Dssl=auto work smoothly, > perhaps using the "trick" that Andres described. > Thanks for the feedback. I updated the ssl patch and if you like changes, I can apply the same logic to uuid. Regards, Nazir Bilal Yavuz Microsoft v2-0001-meson-Refactor-SSL-option.patch Description: Binary data
Re: meson: Optionally disable installation of test modules
Hi, On Wed, 1 Mar 2023 at 22:21, Peter Eisentraut wrote: > > Looks good to me. I did a small pass over it to adjust some namings. > For example, I renamed test_install_files to test_install_data, so it's > consistent with the overall meson naming: > > -install_data( > +test_install_data += files( > > Let me know if you have any concerns about this version. Otherwise, I'm > happy to commit it. That makes sense, thanks! Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, Thanks for the review. On Wed, 1 Mar 2023 at 18:52, Peter Eisentraut wrote: > > Maybe we can make some of the logic less nested. Right now there is > > if sslopt != 'none' > >if not ssl.found() and sslopt in ['auto', 'openssl'] > > I think at that point, ssl.found() is never true, so it can be removed. I agree, ssl.found() can be removed. > And the two checks for sslopt are nearly redundant. > > At the end of the block, there is > ># At least one SSL library must be found, otherwise throw an error >if sslopt == 'auto' and auto_features.enabled() > error('SSL Library could not be found') >endif > endif > > which also implies sslopt != 'none'. So I think the whole thing could be > > if sslopt in ['auto', 'openssl'] > >... > > endif > > if sslopt == 'auto' and auto_features.enabled() >error('SSL Library could not be found') > endif > > both at the top level. > I am kind of confused. I added these checks for considering other SSL implementations in the future, for this reason I have two nested if checks. The top one is for checking if we need to search an SSL library and the nested one is for checking if we need to search this specific SSL library. What do you think? The other thing is(which I forgot before) I need to add "and not ssl.found()" condition to the "if sslopt == 'auto' and auto_features.enabled()" check. > Another issue, I think this is incorrect: > > + openssl_required ? error('openssl function @0@ is > required'.format(func)) : \ > + message('openssl function @0@ is > required'.format(func)) > > We don't want to issue a message like this when a non-required function > is missing. I agree, the message part can be removed. Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Fri, 3 Mar 2023 at 12:16, Peter Eisentraut wrote: > > On 02.03.23 11:41, Nazir Bilal Yavuz wrote: > > I am kind of confused. I added these checks for considering other SSL > > implementations in the future, for this reason I have two nested if > > checks. The top one is for checking if we need to search an SSL > > library and the nested one is for checking if we need to search this > > specific SSL library. What do you think? > > I suppose that depends on how you envision integrating other SSL > libraries into this logic. It's not that important right now; if the > structure makes sense to you, that's fine. > > Please send an updated patch with the small changes that have been > mentioned. > The updated patch is attached. Regards, Nazir Bilal Yavuz Microsoft From 9cb9d50ba008e2a385a7b72219a759490a3de00e Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 3 Mar 2023 12:24:46 +0300 Subject: [PATCH v3] meson: Refactor SSL option --- .cirrus.yml | 7 +- meson.build | 121 ++- meson_options.txt| 4 +- src/interfaces/libpq/meson.build | 2 +- src/makefiles/meson.build| 2 +- src/test/ssl/meson.build | 2 +- 6 files changed, 80 insertions(+), 58 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index f2129787529..aaf4066366c 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -181,7 +181,7 @@ task: su postgres <<-EOF meson setup \ --buildtype=debug \ --Dcassert=true -Dssl=openssl -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ +-Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \ build @@ -243,7 +243,6 @@ LINUX_CONFIGURE_FEATURES: &LINUX_CONFIGURE_FEATURES >- LINUX_MESON_FEATURES: &LINUX_MESON_FEATURES >- -Dllvm=enabled - -Dssl=openssl -Duuid=e2fs @@ -497,7 +496,7 @@ task: -Dextra_include_dirs=${brewpath}/include \ -Dextra_lib_dirs=${brewpath}/lib \ -Dcassert=true \ - -Dssl=openssl -Duuid=e2fs -Ddtrace=auto \ + -Duuid=e2fs -Ddtrace=auto \ -Dsegsize_blocks=6 \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build @@ -568,7 +567,7 @@ task: # Use /DEBUG:FASTLINK to avoid high memory usage during linking configure_script: | vcvarsall x64 -meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=true -Dssl=openssl -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build +meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build build_script: | vcvarsall x64 diff --git a/meson.build b/meson.build index 26be83afb61..1e9411eb247 100644 --- a/meson.build +++ b/meson.build @@ -43,6 +43,7 @@ cc = meson.get_compiler('c') not_found_dep = dependency('', required: false) thread_dep = dependency('threads') +auto_features = get_option('auto_features') @@ -1171,7 +1172,16 @@ cdata.set('USE_SYSTEMD', systemd.found() ? 1 : false) # Library: SSL ### -if get_option('ssl') == 'openssl' +ssl = not_found_dep +ssl_library = 'none' +sslopt = get_option('ssl') + +if (sslopt == 'auto' and auto_features.disabled()) + sslopt = 'none' +endif + +if sslopt in ['auto', 'openssl'] + openssl_required = sslopt == 'openssl' # Try to find openssl via pkg-config et al, if that doesn't work # (e.g. because it's provided as part of the OS, like on FreeBSD), look for @@ -1192,59 +1202,72 @@ if get_option('ssl') == 'openssl' ssl_int = [ssl_lib, crypto_lib] ssl = declare_dependency(dependencies: ssl_int, - include_directories: postgres_inc) - else -cc.has_header('openssl/ssl.h', args: test_c_args, dependencies: ssl, required: true) -cc.has_header('openssl/err.h', args: test_c_args, dependencies: ssl, required: true) - -ssl_int = [ssl] +include_directories: postgres_inc) + elif cc.has_header('openssl/ssl.h', args: test_c_args, dependencies: ssl, required: openssl_required) and \ +cc.has_header('openssl/err.h', args: test_c_args, dependencies: ssl, required: openssl_required) + ssl_int = [ssl] endif - check_funcs = [ -['CRYPTO_new_ex_data', {'required
Re: meson: Optionally disable installation of test modules
Hi, On Fri, 3 Mar 2023 at 22:43, Andrew Dunstan wrote: > These changes have broken the buildfarm adaptation work in different ways on > different platforms. > > On Windows (but not Linux), the install_test_files are apparently getting > installed under runpython in the build directory rather than in the > tmp_install location, so those tests fail. Meanwhile, it's not clear to me > how to install them in a standard installation, which means that on Linux the > corresponding -running tests are failing. Is there a way to see the 'meson-logs/testlog.txt' file under the build directory? Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Optionally disable installation of test modules
Hi, On Mon, 6 Mar 2023 at 18:30, Andrew Dunstan wrote: > There are two separate issues here, but let's deal with the Windows issue. > Attached is the log output and also a listing of the runpython directory in > the build directory. Thanks for the logs but I couldn't understand the problem. Is there a way to reproduce this? For the Linux problem, Andres's patch solves this but you need to run an extra command. [1] After applying Andres's patch, you need to run: $ meson compile install-test-files -C $pgsql before running the 'running tests'. I tested on my local and .. $ meson compile install-test-files -C $pgsql $ meson test -C $pgsql --setup running --print-errorlogs --no-rebuild --logbase installcheckworld --no-suite regress-running --no-suite isolation-running --no-suite ecpg-running passed successfully. [1] https://www.postgresql.org/message-id/20230308012940.edexipb3vqylcu6r%40awork3.anarazel.de Regards, Nazir Bilal Yavuz Microsoft
Re: buildfarm + meson
Hi, On Wed, 8 Mar 2023 at 16:57, Andrew Dunstan wrote: > So if I understand this right, the way to use this would be something like: > > > local $ENV{DESTDIR} = $installdir; > > run_log("meson compile -C $pgsql install-test-files"); > > > Is that right? I did that but it didn't work :-( I think you shouldn't set DESTDIR to the $installdir. If DESTDIR is set, it joins $DESTDIR and $install_dir(-Dprefix). So, when you run local $ENV{DESTDIR} = $installdir; run_log("meson compile -C $pgsql install-test-files"); it installs these files to the '$install_dir/$install_dir'. Could you try only running 'run_log("meson compile -C $pgsql install-test-files");' without setting DESTDIR, this could work. Regards, Nazir Bilal Yavuz Microsoft
Re: Refactor calculations to use instr_time
Hi, There was a warning while applying the patch, v5 is attached. Regards, Nazir Bilal Yavuz Microsoft From dcd49e48a0784a95b8731df1c6ee7c3a612a8529 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 9 Mar 2023 15:35:38 +0300 Subject: [PATCH v5] Refactor instr_time calculations In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. So, this patch aims to refactor 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Also, some calculations are refactored to use 'INSTR_TIME_ACCUM_DIFF' instead of using 'INSTR_TIME_SUBTRACT' and 'INSTR_TIME_ADD'. --- src/backend/access/transam/xlog.c | 6 ++--- src/backend/storage/file/buffile.c | 6 ++--- src/backend/utils/activity/pgstat_wal.c | 31 + src/include/pgstat.h| 17 +- src/tools/pgindent/typedefs.list| 1 + 5 files changed, 37 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 543d4d897ae..46821ad6056 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2206,8 +2206,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_write_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start); } PendingWalStats.wal_write++; @@ -8204,8 +8203,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_sync_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start); } PendingWalStats.wal_sync++; diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 0a51624df3b..e55f86b675e 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -469,8 +469,7 @@ BufFileLoadBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_TIME_ADD(pgBufferUsage.temp_blk_read_time, io_time); + INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_read_time, io_time, io_start); } /* we choose not to advance curOffset here */ @@ -544,8 +543,7 @@ BufFileDumpBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_TIME_ADD(pgBufferUsage.temp_blk_write_time, io_time); + INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_write_time, io_time, io_start); } file->curOffset += bytestowrite; diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index e8598b2f4e0..58daae3fbd6 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -21,7 +21,7 @@ #include "executor/instrument.h" -PgStat_WalStats PendingWalStats = {0}; +PgStat_PendingWalStats PendingWalStats = {0}; /* * WAL usage counters saved from pgWALUsage at the previous call to @@ -70,7 +70,7 @@ bool pgstat_flush_wal(bool nowait) { PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; - WalUsage diff = {0}; + WalUsage wal_usage_diff = {0}; Assert(IsUnderPostmaster || !IsPostmasterEnvironment); Assert(pgStatLocal.shmem != NULL && @@ -88,25 +88,26 @@ pgstat_flush_wal(bool nowait) * Calculate how much WAL usage counters were increased by subtracting the * previous counters from the current ones. */ - WalUsageAccumDiff(&diff, &pgWalUsage, &prevWalUsage); - PendingWalStats.wal_records = diff.wal_records; - PendingWalStats.wal_fpi = diff.wal_fpi; - PendingWalStats.wal_bytes = diff.wal_bytes; + WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevWalUsage); if (!nowait) LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE); else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE)) return true; -#define WALSTAT_ACC(fld) stats_shmem->stats.fld += PendingWalStats.fld - WALSTAT_ACC(wal_records); - WALSTAT_ACC(wal_fpi); - WALSTAT_ACC(wal_bytes); - WALSTAT_ACC(wal_buffers_full); - WALSTAT_ACC(wal_write); - WALSTAT_ACC(wal_sync); - WALSTAT_ACC(wal_write_time); - WALSTAT_ACC(wal_sync_time); +#define WALSTAT_ACC(fld, var_to_add) \ + (stats_shmem->stats.fld += var_to_add.fld) +#define WALLSTAT_ACC_INSTR_TIME_TYPE(fld) \ + (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fl
Re: meson: Non-feature feature options
Hi, On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: > > > On 9 Mar 2023, at 14:45, Peter Eisentraut > > wrote: > > > How about we just hardcode "openssl" here instead? We could build that > > array dynamically, of course, but maybe we leave that until we actually > > have a need? > > At least for 16 keeping it hardcoded is an entirely safe bet so +1 for leaving > additional complexity for when needed. We already have the 'ssl_library' variable. Can't we use that instead of hardcoding 'openssl'? e.g: summary( { 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, }, section: 'External libraries', list_sep: ', ', ) And it will output: ssl : YES 3.0.8, (openssl) I don't think that using 'ssl_library' will increase the complexity. Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Thu, 9 Mar 2023 at 17:18, Peter Eisentraut wrote: > > On 09.03.23 15:12, Nazir Bilal Yavuz wrote: > > Hi, > > > > On Thu, 9 Mar 2023 at 16:54, Daniel Gustafsson wrote: > >> > >>> On 9 Mar 2023, at 14:45, Peter Eisentraut > >>> wrote: > >> > >>> How about we just hardcode "openssl" here instead? We could build that > >>> array dynamically, of course, but maybe we leave that until we actually > >>> have a need? > >> > >> At least for 16 keeping it hardcoded is an entirely safe bet so +1 for > >> leaving > >> additional complexity for when needed. > > > > We already have the 'ssl_library' variable. Can't we use that instead > > of hardcoding 'openssl'? e.g: > > > > summary( > >{ > > 'ssl': ssl.found() ? [ssl, '(@0@)'.format(ssl_library)] : ssl, > >}, > >section: 'External libraries', > >list_sep: ', ', > > ) > > > > And it will output: > > ssl: YES 3.0.8, (openssl) > > > > I don't think that using 'ssl_library' will increase the complexity. > > Then we might as well use ssl_library as the key, like: > > { > ... > 'selinux': selinux, > ssl_library: ssl, > 'systemd': systemd, > ... > } > There will be a problem if ssl is not found. It will output 'none: NO' because 'ssl_library' is initialized as 'none' for now. We can initialize 'ssl_library' as 'ssl' but I am not sure that is a good idea. Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Mon, 13 Mar 2023 at 21:04, Nathan Bossart wrote: > > On Mon, Mar 13, 2023 at 07:27:18AM +0100, Peter Eisentraut wrote: > > I have committed it like this. > > I noticed that after 6a30027, if you don't have the OpenSSL headers > installed, 'meson setup' will fail: > > meson.build:1195:4: ERROR: C header 'openssl/ssl.h' not found > > Shouldn't "auto" cause Postgres to be built without OpenSSL if the required > headers are not present? Yes, I tested again and it is working as expected on my end. It shouldn't fail like that unless the 'ssl' option is set to 'openssl'. Is it possible that it has been set to 'openssl' without you noticing? Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Mon, 13 Mar 2023 at 23:14, Andres Freund wrote: > > Hi, > > On 2023-03-13 21:57:22 +0300, Nazir Bilal Yavuz wrote: > > On Mon, 13 Mar 2023 at 21:04, Nathan Bossart > > wrote: > > > > > > On Mon, Mar 13, 2023 at 07:27:18AM +0100, Peter Eisentraut wrote: > > > > I have committed it like this. > > > > > > I noticed that after 6a30027, if you don't have the OpenSSL headers > > > installed, 'meson setup' will fail: > > > > > > meson.build:1195:4: ERROR: C header 'openssl/ssl.h' not found > > > > > > Shouldn't "auto" cause Postgres to be built without OpenSSL if the > > > required > > > headers are not present? > > > > Yes, I tested again and it is working as expected on my end. It > > shouldn't fail like that unless the 'ssl' option is set to 'openssl'. > > Is it possible that it has been set to 'openssl' without you noticing? > > It worked for the dependency() path, but not the cc.find_library() path. See > the patch I just sent. Thanks for the patch, I understand the problem now and your patch fixes this. Regards, Nazir Bilal Yavuz Microsoft
Re: meson: Non-feature feature options
Hi, On Wed, 22 Feb 2023 at 12:14, Peter Eisentraut wrote: > > On 21.02.23 17:32, Nazir Bilal Yavuz wrote: > >>> I like the second approach, with a 'uuid' feature option. As you wrote > >>> earlier, adding an 'auto' choice to a combo option doesn't work fully > >>> like a > >>> real feature option. > >> But we can make it behave exactly like one, by checking the auto_features > >> option. > > Yes, we can set it like `uuidopt = get_option('auto_features')`. > > However, if someone wants to set 'auto_features' to 'disabled' but > > 'uuid' to 'enabled'(to find at least one working uuid library); this > > won't be possible. We can add 'enabled', 'disabled and 'auto' choices > > to 'uuid' combo option to make all behaviours possible but adding > > 'uuid' feature option and 'uuid_library' combo option seems better to > > me. > > I think the uuid side of this is making this way too complicated. I'm > content leaving this as a manual option for now. > > There is much more value in making the ssl option work automatically. > So I would welcome a patch that just makes -Dssl=auto work smoothly, > perhaps using the "trick" that Andres described. I tried to implement what we did for ssl to uuid as well, do you have any comments? Regards, Nazir Bilal Yavuz Microsoft From 0e92050c15ca6e9ebeddaafad229eee53fc9e7b0 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 14 Mar 2023 16:38:02 +0300 Subject: [PATCH v1] meson: Make auto the default of the uuid option --- .cirrus.yml | 5 +-- meson.build | 80 ++- meson_options.txt | 4 +- src/makefiles/meson.build | 2 +- 4 files changed, 59 insertions(+), 32 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 505c50f3285..1ccb14c6340 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -181,7 +181,7 @@ task: su postgres <<-EOF meson setup \ --buildtype=debug \ --Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ +-Dcassert=true -Dtcl_version=tcl86 -Ddtrace=auto \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \ build @@ -243,7 +243,6 @@ LINUX_CONFIGURE_FEATURES: &LINUX_CONFIGURE_FEATURES >- LINUX_MESON_FEATURES: &LINUX_MESON_FEATURES >- -Dllvm=enabled - -Duuid=e2fs task: @@ -496,7 +495,7 @@ task: -Dextra_include_dirs=${brewpath}/include \ -Dextra_lib_dirs=${brewpath}/lib \ -Dcassert=true \ - -Duuid=e2fs -Ddtrace=auto \ + -Ddtrace=auto \ -Dsegsize_blocks=6 \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build diff --git a/meson.build b/meson.build index 2ebdf914c1b..7955ae42d03 100644 --- a/meson.build +++ b/meson.build @@ -1282,34 +1282,61 @@ endif ### uuidopt = get_option('uuid') +uuid_library = 'none' +uuid = not_found_dep +if uuidopt == 'auto' and auto_features.disabled() + uuidopt = 'none' +endif + if uuidopt != 'none' - uuidname = uuidopt.to_upper() - if uuidopt == 'e2fs' -uuid = dependency('uuid', required: true) -uuidfunc = 'uuid_generate' -uuidheader = 'uuid/uuid.h' - elif uuidopt == 'bsd' -# libc should have uuid function -uuid = declare_dependency() -uuidfunc = 'uuid_to_string' -uuidheader = 'uuid.h' - elif uuidopt == 'ossp' -uuid = dependency('ossp-uuid', required: true) -uuidfunc = 'uuid_export' -uuidheader = 'uuid.h' - else -error('huh') - endif - if not cc.has_header_symbol(uuidheader, uuidfunc, args: test_c_args, dependencies: uuid) -error('uuid library @0@ missing required function @1@'.format(uuidopt, uuidfunc)) - endif - cdata.set('HAVE_@0@'.format(uuidheader.underscorify().to_upper()), 1) + uuids = { +'e2fs': { + 'uuiddep': 'uuid', + 'uuidfunc': 'uuid_generate', + 'uuidheader': 'uuid/uuid.h', +}, +'bsd': { + 'uuiddep': '', + 'uuidfunc': 'uuid_to_string', + 'uuidheader': 'uuid.h', +}, +'ossp': { + 'uuiddep': 'ossp-uuid', + 'uuidfunc': 'uuid_export', + 'uuidheader': 'uuid.h', +} + } - cdata.set('HAVE_UUID_@0@'.format(uuidname), 1, - description: 'De
Re: meson: Non-feature feature options
Hi, On Wed, 15 Mar 2023 at 11:12, Peter Eisentraut wrote: > > On 14.03.23 15:07, Nazir Bilal Yavuz wrote: > >> I think the uuid side of this is making this way too complicated. I'm > >> content leaving this as a manual option for now. > >> > >> There is much more value in making the ssl option work automatically. > >> So I would welcome a patch that just makes -Dssl=auto work smoothly, > >> perhaps using the "trick" that Andres described. > > I tried to implement what we did for ssl to uuid as well, do you have > > any comments? > > For the uuid option, we have three different choices. What should be > the search order and why? Docs [1] say that: OSSP uuid library is not well maintained, and is becoming increasingly difficult to port to newer platforms; so we can put 'uuid-ossp' to the end. Between 'uuid-e2fs' and 'uuid-bsd', I believe 'uuid-e2fs' is used more often than 'uuid-bsd'. Hence, they can be ordered as 'uuid-e2fs', 'uuid-bsd', 'uuid-ossp'. Does that make sense? Regards, Nazir Bilal Yavuz Microsoft [1] https://www.postgresql.org/docs/current/uuid-ossp.html
Re: Refactor calculations to use instr_time
Hi, Thanks for the review. On Fri, 17 Mar 2023 at 02:02, Melanie Plageman wrote: > I think you want one less L here? > WALLSTAT_ACC_INSTR_TIME_TYPE -> WALSTAT_ACC_INSTR_TIME_TYPE Done. > Also, I don't quite understand why TYPE is at the end of the name. I > think it would still be clear without it. Done. > I might find it clearer if the WALSTAT_ACC_INSTR_TIME_TYPE macro was > defined before using it for those fields instead of defining it right > after defining WALSTAT_ACC. Since it is undefined together with WALSTAT_ACC, defining them together makes sense to me. > > + * This struct stores wal-related durations as instr_time, which makes it > > + * easier to accumulate them without requiring type conversions. Then, > > + * during stats flush, they will be moved into shared stats with type > > + * conversions. > > + */ > > +typedef struct PgStat_PendingWalStats > > +{ > > + PgStat_Counter wal_buffers_full; > > + PgStat_Counter wal_write; > > + PgStat_Counter wal_sync; > > + instr_time wal_write_time; > > + instr_time wal_sync_time; > > +} PgStat_PendingWalStats; > > + > > So, I am not a fan of having this second struct (PgStat_PendingWalStats) > which only has a subset of the members of PgStat_WalStats. It is pretty > confusing. > > It is okay to have two structs -- one that is basically "in-memory" and > one that is a format that can be on disk, but these two structs with > different members are confusing and don't convey why we have the two > structs. > > I would either put WalUsage into PgStat_PendingWalStats (so that it has > all the same members as PgStat_WalStats), or figure out a way to > maintain WalUsage separately from PgStat_WalStats or something else. > Worst case, add more comments to the struct definitions to explain why > they have the members they have and how WalUsage relates to them. Yes, but like Andres said this could be better done as a separate patch. v6 is attached. Regards, Nazir Bilal Yavuz Microsoft From 02f1b5426ad9394a673c28a6df53fc56ff6ffa89 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Wed, 22 Mar 2023 15:25:44 +0300 Subject: [PATCH v6] Refactor instr_time calculations In instr_time.h it is stated that: * When summing multiple measurements, it's recommended to leave the * running sum in instr_time form (ie, use INSTR_TIME_ADD or * INSTR_TIME_ACCUM_DIFF) and convert to a result format only at the end. So, this patch aims to refactor 'PendingWalStats' to use 'instr_time' instead of 'PgStat_Counter' while accumulating 'wal_write_time' and 'wal_sync_time'. Also, some calculations are refactored to use 'INSTR_TIME_ACCUM_DIFF' instead of using 'INSTR_TIME_SUBTRACT' and 'INSTR_TIME_ADD'. --- src/backend/access/transam/xlog.c | 6 ++--- src/backend/storage/file/buffile.c | 6 ++--- src/backend/utils/activity/pgstat_wal.c | 31 + src/include/pgstat.h| 17 +- src/tools/pgindent/typedefs.list| 1 + 5 files changed, 37 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 543d4d897ae..46821ad6056 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2206,8 +2206,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_write_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start); } PendingWalStats.wal_write++; @@ -8204,8 +8203,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) instr_time duration; INSTR_TIME_SET_CURRENT(duration); - INSTR_TIME_SUBTRACT(duration, start); - PendingWalStats.wal_sync_time += INSTR_TIME_GET_MICROSEC(duration); + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start); } PendingWalStats.wal_sync++; diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c index 0a51624df3b..e55f86b675e 100644 --- a/src/backend/storage/file/buffile.c +++ b/src/backend/storage/file/buffile.c @@ -469,8 +469,7 @@ BufFileLoadBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_TIME_ADD(pgBufferUsage.temp_blk_read_time, io_time); + INSTR_TIME_ACCUM_DIFF(pgBufferUsage.temp_blk_read_time, io_time, io_start); } /* we choose not to advance curOffset here */ @@ -544,8 +543,7 @@ BufFileDumpBuffer(BufFile *file) if (track_io_timing) { INSTR_TIME_SET_CURRENT(io_time); - INSTR_TIME_SUBTRACT(io_time, io_start); - INSTR_T
Re: Change prefetch and read strategies to use range in pg_prewarm ... and raise a question about posix_fadvise WILLNEED
Hi, On Thu, 7 Mar 2024 at 15:26, Cédric Villemain wrote: > > On 07/03/2024 12:19, Nazir Bilal Yavuz wrote: > > > > I did not test read performance but I am planning to do that soon. I did not have the time to check other things you mentioned but I tested the read performance. The table size is 5.5GB, I did 20 runs in total. When the OS cache is cleared: Master -> Median: 2266.293ms, Avg: 2265.5038ms Patched -> Median: 2166.493ms, Avg: 2183.6208ms When the buffers are in the OS cache: Master -> Median: 585.719ms, Avg: 583.5032ms Patched -> Median: 533.071ms, Avg: 532.7119ms Patched version is better on both. ~4% when the OS cache is cleared, ~%9 when the buffers are in the OS cache. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Building with meson on NixOS/nixpkgs
Hi, Thank you for the patches! On Sat, 16 Mar 2024 at 14:48, Wolfgang Walther wrote: > > To build on NixOS/nixpkgs I came up with a few small patches to > meson.build. All of this works fine with Autoconf/Make already. I do not have NixOS but I confirm that patches cleanly apply to master and do pass CI. I have a small feedback: 0001 & 0002: Adding code comments to explain why they have fallback could be nice. 0003: Looks good to me. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Streaming I/O, vectored I/O (WIP)
Hi, On Sat, 16 Mar 2024 at 02:53, Thomas Munro wrote: > > I am planning to push the bufmgr.c patch soon. At that point the new > API won't have any direct callers yet, but the traditional > ReadBuffer() family of functions will internally reach > StartReadBuffers(nblocks=1) followed by WaitReadBuffers(), > ZeroBuffer() or nothing as appropriate. Any more thoughts or > objections? Naming, semantics, correctness of buffer protocol, > sufficiency of comments, something else? +if (StartReadBuffers(bmr, + &buffer, + forkNum, + blockNum, + &nblocks, + strategy, + flags, + &operation)) +WaitReadBuffers(&operation); I think we need to call WaitReadBuffers when 'mode != RBM_ZERO_AND_CLEANUP_LOCK && mode != RBM_ZERO_AND_LOCK' or am I missing something? Couple of nitpicks: It would be nice to explain what the PrepareReadBuffer function does with a comment. +if (nblocks == 0) +return;/* nothing to do */ It is guaranteed that nblocks will be bigger than 0. Can't we just use Assert(operation->io_buffers_len > 0);? -- Regards, Nazir Bilal Yavuz Microsoft
Re: Use streaming read API in ANALYZE
Hi, On Wed, 28 Feb 2024 at 14:42, Nazir Bilal Yavuz wrote: > > > The new version of the streaming read API [1] is posted. I updated the > streaming read API changes patch (0001), using the streaming read API > in ANALYZE patch (0002) remains the same. This should make it easier > to review as it can be applied on top of master > > The new version of the streaming read API is posted [1]. I rebased the patch on top of master and v9 of the streaming read API. There is a minimal change in the 'using the streaming read API in ANALYZE patch (0002)', I changed STREAMING_READ_FULL to STREAMING_READ_MAINTENANCE to copy exactly the same behavior as before. Also, some benchmarking results: I created a 22 GB table and set the size of shared buffers to 30GB, the rest is default. ╔═══╦═╦╗ ║ ║ Avg Timings in ms ║║ ╠═══╬══╦══╬╣ ║ ║ master ║ patched ║ percentage ║ ╠═══╬══╬══╬╣ ║ Both OS cache and ║ ║ ║║ ║ shared buffers are clear ║ 513.9247 ║ 463.1019 ║%9.9║ ╠═══╬══╬══╬╣ ║ OS cache is loaded but ║ ║ ║║ ║ shared buffers are clear ║ 423.1097 ║ 354.3277 ║%16.3 ║ ╠═══╬══╬══╬╣ ║ Shared buffers are loaded ║ ║ ║║ ║ ║ 89.2846 ║ 84.6952 ║%5.1║ ╚═══╩══╩══╩╝ Any kind of feedback would be appreciated. [1]: https://www.postgresql.org/message-id/CA%2BhUKGL-ONQnnnp-SONCFfLJzqcpAheuzZ%2B-yTrD9WBM-GmAcg%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft From 7278c354d80e4d1f21c6fa0d810a723789f2d722 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 26 Feb 2024 23:48:31 +1300 Subject: [PATCH v3 1/2] Streaming read API changes that are not committed to master yet Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com --- src/include/storage/bufmgr.h | 45 +- src/include/storage/streaming_read.h | 50 ++ src/backend/storage/Makefile | 2 +- src/backend/storage/aio/Makefile | 14 + src/backend/storage/aio/meson.build | 5 + src/backend/storage/aio/streaming_read.c | 678 + src/backend/storage/buffer/bufmgr.c | 706 -- src/backend/storage/buffer/localbuf.c | 14 +- src/backend/storage/meson.build | 1 + src/backend/utils/misc/guc_tables.c | 14 + src/backend/utils/misc/postgresql.conf.sample | 1 + doc/src/sgml/config.sgml | 14 + src/tools/pgindent/typedefs.list | 2 + 13 files changed, 1309 insertions(+), 237 deletions(-) create mode 100644 src/include/storage/streaming_read.h create mode 100644 src/backend/storage/aio/Makefile create mode 100644 src/backend/storage/aio/meson.build create mode 100644 src/backend/storage/aio/streaming_read.c diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index d51d46d3353..1cc198bde21 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -14,6 +14,7 @@ #ifndef BUFMGR_H #define BUFMGR_H +#include "port/pg_iovec.h" #include "storage/block.h" #include "storage/buf.h" #include "storage/bufpage.h" @@ -133,6 +134,10 @@ extern PGDLLIMPORT bool track_io_timing; extern PGDLLIMPORT int effective_io_concurrency; extern PGDLLIMPORT int maintenance_io_concurrency; +#define MAX_BUFFER_IO_SIZE PG_IOV_MAX +#define DEFAULT_BUFFER_IO_SIZE Min(MAX_BUFFER_IO_SIZE, (128 * 1024) / BLCKSZ) +extern PGDLLIMPORT int buffer_io_size; + extern PGDLLIMPORT int checkpoint_flush_after; extern PGDLLIMPORT int backend_flush_after; extern PGDLLIMPORT int bgwriter_flush_after; @@ -158,7 +163,6 @@ extern PGDLLIMPORT int32 *LocalRefCount; #define BUFFER_LOCK_SHARE 1 #define BUFFER_LOCK_EXCLUSIVE 2 - /* * prototypes for functions in bufmgr.c */ @@ -177,6 +181,42 @@ extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy, bool permanent); + +#define READ_BUFFERS_ZERO_ON_ERROR 0x01 +#define READ_BUFFERS_ISSUE_ADVICE 0x02 + +/* + * Private state used by StartReadBuffers() and WaitReadBuffers(). Declared + * in public header only to allow inclusion in other structs, but contents + * should not be accessed. + */ +struct ReadBuffersOperation +{ + /* Parameters passed in to StartReadBuffers(). */ + BufferManagerRelation bmr; + Buffer
Re: Building with meson on NixOS/nixpkgs
Hi, >From your prior reply: On Thu, 21 Mar 2024 at 23:44, Wolfgang Walther wrote: > > Nazir Bilal Yavuz: > > 0001 & 0002: Adding code comments to explain why they have fallback > > could be nice. > > 0003: Looks good to me. > > Added some comments in the attached. Comments look good, thanks. On Fri, 29 Mar 2024 at 21:48, wrote: > > In v3, I added another small patch for meson, this one about proper > handling of -Dlibedit_preferred when used together with -Dreadline=enabled. You are right. I confirm the bug and your proposed patch fixes this. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Use streaming read API in ANALYZE
Hi, Thanks for the review! On Wed, 27 Mar 2024 at 23:15, Melanie Plageman wrote: > > On Tue, Mar 26, 2024 at 02:51:27PM +0300, Nazir Bilal Yavuz wrote: > > Hi, > > > > On Wed, 28 Feb 2024 at 14:42, Nazir Bilal Yavuz wrote: > > > > > > > > > The new version of the streaming read API [1] is posted. I updated the > > > streaming read API changes patch (0001), using the streaming read API > > > in ANALYZE patch (0002) remains the same. This should make it easier > > > to review as it can be applied on top of master > > > > > > > > > > The new version of the streaming read API is posted [1]. I rebased the > > patch on top of master and v9 of the streaming read API. > > > > There is a minimal change in the 'using the streaming read API in ANALYZE > > patch (0002)', I changed STREAMING_READ_FULL to STREAMING_READ_MAINTENANCE > > to copy exactly the same behavior as before. Also, some benchmarking > > results: > > > > I created a 22 GB table and set the size of shared buffers to 30GB, the > > rest is default. > > > > ╔═══╦═╦╗ > > ║ ║ Avg Timings in ms ║║ > > ╠═══╬══╦══╬╣ > > ║ ║ master ║ patched ║ percentage ║ > > ╠═══╬══╬══╬╣ > > ║ Both OS cache and ║ ║ ║║ > > ║ shared buffers are clear ║ 513.9247 ║ 463.1019 ║%9.9║ > > ╠═══╬══╬══╬╣ > > ║ OS cache is loaded but ║ ║ ║║ > > ║ shared buffers are clear ║ 423.1097 ║ 354.3277 ║%16.3 ║ > > ╠═══╬══╬══╬╣ > > ║ Shared buffers are loaded ║ ║ ║║ > > ║ ║ 89.2846 ║ 84.6952 ║%5.1║ > > ╚═══╩══╩══╩╝ > > > > Any kind of feedback would be appreciated. > > Thanks for working on this! > > A general review comment: I noticed you have the old streaming read > (pgsr) naming still in a few places (including comments) -- so I would > just make sure and update everywhere when you rebase in Thomas' latest > version of the read stream API. Done. > > > From c7500cc1b9068ff0b704181442999cd8bed58658 Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz > > Date: Mon, 19 Feb 2024 14:30:47 +0300 > > Subject: [PATCH v3 2/2] Use streaming read API in ANALYZE > > > > --- a/src/backend/commands/analyze.c > > +++ b/src/backend/commands/analyze.c > > @@ -1102,6 +1102,26 @@ examine_attribute(Relation onerel, int attnum, Node > > *index_expr) > > return stats; > > } > > > > +/* > > + * Prefetch callback function to get next block number while using > > + * BlockSampling algorithm > > + */ > > +static BlockNumber > > +pg_block_sampling_streaming_read_next(StreamingRead *stream, > > + > > void *user_data, > > + > > void *per_buffer_data) > > I don't think you need the pg_ prefix Done. > > > +{ > > + BlockSamplerData *bs = user_data; > > + BlockNumber *current_block = per_buffer_data; > > Why can't you just do BufferGetBlockNumber() on the buffer returned from > the read stream API instead of allocating per_buffer_data for the block > number? Done. > > > + > > + if (BlockSampler_HasMore(bs)) > > + *current_block = BlockSampler_Next(bs); > > + else > > + *current_block = InvalidBlockNumber; > > + > > + return *current_block; > > > I think we'd like to keep the read stream code in heapam-specific code. > Instead of doing streaming_read_buffer_begin() here, you could put this > in heap_beginscan() or initscan() guarded by > scan->rs_base.rs_flags & SO_TYPE_ANALYZE In the recent changes [1], heapam_scan_analyze_next_[block | tuple] are removed from tableam. They are directly called from heapam-specific code now. So, IMO, no need to do this now. v4 is rebased on top of v14 streaming read API changes. [1] 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa -- Regards, Nazir Bilal Yavuz Microsoft From 725a3d875fb1d675f5d99d8602a87b5e47b765db Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 26 Feb 2024 23:48:31 +1300 Subjec
Re: Use streaming read API in ANALYZE
Hi, On Tue, 2 Apr 2024 at 10:23, Nazir Bilal Yavuz wrote: > > v4 is rebased on top of v14 streaming read API changes. Streaming API has been committed but the committed version has a minor change, the read_stream_begin_relation function takes Relation instead of BufferManagerRelation now. So, here is a v5 which addresses this change. -- Regards, Nazir Bilal Yavuz Microsoft From 43e2f2b32e2fdb7e1fd787b1d8595768741f4792 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Wed, 3 Apr 2024 01:22:59 +0300 Subject: [PATCH v5] Use streaming read API in ANALYZE ANALYZE command gets random tuples using BlockSampler algorithm. Use streaming reads to get these tuples by using BlockSampler algorithm in streaming read API prefetch logic. --- src/include/access/heapam.h | 4 +- src/backend/access/heap/heapam_handler.c | 16 ++--- src/backend/commands/analyze.c | 85 3 files changed, 36 insertions(+), 69 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index b632fe953c4..4e35caeb42e 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -25,6 +25,7 @@ #include "storage/bufpage.h" #include "storage/dsm.h" #include "storage/lockdefs.h" +#include "storage/read_stream.h" #include "storage/shm_toc.h" #include "utils/relcache.h" #include "utils/snapshot.h" @@ -374,8 +375,7 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup, /* in heap/heapam_handler.c*/ extern void heapam_scan_analyze_next_block(TableScanDesc scan, - BlockNumber blockno, - BufferAccessStrategy bstrategy); + ReadStream *stream); extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin, double *liverows, double *deadrows, diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index c86000d245b..0533d9660c4 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -1054,16 +1054,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, } /* - * Prepare to analyze block `blockno` of `scan`. The scan has been started - * with SO_TYPE_ANALYZE option. + * Prepare to analyze block returned from streaming object. The scan has been + * started with SO_TYPE_ANALYZE option. * * This routine holds a buffer pin and lock on the heap page. They are held * until heapam_scan_analyze_next_tuple() returns false. That is until all the * items of the heap page are analyzed. */ void -heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, - BufferAccessStrategy bstrategy) +heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream) { HeapScanDesc hscan = (HeapScanDesc) scan; @@ -1076,11 +1075,12 @@ heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, * doing much work per tuple, the extra lock traffic is probably better * avoided. */ - hscan->rs_cblock = blockno; - hscan->rs_cindex = FirstOffsetNumber; - hscan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, - blockno, RBM_NORMAL, bstrategy); + hscan->rs_cbuf = read_stream_next_buffer(stream, NULL); + Assert(BufferIsValid(hscan->rs_cbuf)); LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); + + hscan->rs_cblock = BufferGetBlockNumber(hscan->rs_cbuf); + hscan->rs_cindex = FirstOffsetNumber; } /* diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 2fb39f3ede1..105285c3ea2 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -1102,6 +1102,20 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr) return stats; } +/* + * Prefetch callback function to get next block number while using + * BlockSampling algorithm + */ +static BlockNumber +block_sampling_streaming_read_next(ReadStream *stream, + void *user_data, + void *per_buffer_data) +{ + BlockSamplerData *bs = user_data; + + return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber; +} + /* * acquire_sample_rows -- acquire a random sample of rows from the heap * @@ -1154,10 +1168,7 @@ acquire_sample_rows(Relation onerel, int elevel, TableScanDesc scan; BlockNumber nblocks; BlockNumber blksdone = 0; -#ifdef USE_PREFETCH - int prefetch_maximum = 0; /* blocks to prefetch if enabled */ - BlockSamplerData prefetch_bs; -#endif + ReadStream *stream; Assert(targrows > 0); @@ -1170,13 +1181,6 @@ acquire_sample_rows(Relation onerel, int elevel, randseed = pg_prng_uint32(&pg_global_prng_state); nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed); -#ifdef USE_PREFETCH - prefetch_maximum = get_tablespace_maintenance_io_concurrency(onerel->rd_rel->reltablespace); - /* Create another BlockSam
Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c
Hi, On Wed, 3 Apr 2024 at 12:11, Daniel Gustafsson wrote: > > > On 6 Mar 2024, at 09:59, Daniel Gustafsson wrote: > > > Nothing sticks out from reading through these patches, they seem quite > > ready to > > me. > > Took another look at this today and committed it. Thanks! Thanks for the commit! -- Regards, Nazir Bilal Yavuz Microsoft
Re: Use streaming read API in ANALYZE
Hi Jakub, Thank you for looking into this and doing a performance analysis. On Wed, 3 Apr 2024 at 11:42, Jakub Wartak wrote: > > On Tue, Apr 2, 2024 at 9:24 AM Nazir Bilal Yavuz wrote: > [..] > > v4 is rebased on top of v14 streaming read API changes. > > Hi Nazir, so with streaming API committed, I gave a try to this patch. > With autovacuum=off and 30GB table on NVMe (with standard readahead of > 256kb and ext4, Debian 12, kernel 6.1.0, shared_buffers = 128MB > default) created using: create table t as select repeat('a', 100) || i > || repeat('b', 500) as filler from generate_series(1, 4500) as i; > > on master, effect of mainteance_io_concurency [default 10] is like > that (when resetting the fs cache after each ANALYZE): > > m_io_c = 0: > Time: 3137.914 ms (00:03.138) > Time: 3094.540 ms (00:03.095) > Time: 3452.513 ms (00:03.453) > > m_io_c = 1: > Time: 2972.751 ms (00:02.973) > Time: 2939.551 ms (00:02.940) > Time: 2904.428 ms (00:02.904) > > m_io_c = 2: > Time: 1580.260 ms (00:01.580) > Time: 1572.132 ms (00:01.572) > Time: 1558.334 ms (00:01.558) > > m_io_c = 4: > Time: 938.304 ms > Time: 931.772 ms > Time: 920.044 ms > > m_io_c = 8: > Time: 666.025 ms > Time: 660.241 ms > Time: 648.848 ms > > m_io_c = 16: > Time: 542.450 ms > Time: 561.155 ms > Time: 539.683 ms > > m_io_c = 32: > Time: 538.487 ms > Time: 541.705 ms > Time: 538.101 ms > > with patch applied: > > m_io_c = 0: > Time: 3106.469 ms (00:03.106) > Time: 3140.343 ms (00:03.140) > Time: 3044.133 ms (00:03.044) > > m_io_c = 1: > Time: 2959.817 ms (00:02.960) > Time: 2920.265 ms (00:02.920) > Time: 2911.745 ms (00:02.912) > > m_io_c = 2: > Time: 1581.912 ms (00:01.582) > Time: 1561.444 ms (00:01.561) > Time: 1558.251 ms (00:01.558) > > m_io_c = 4: > Time: 908.116 ms > Time: 901.245 ms > Time: 901.071 ms > > m_io_c = 8: > Time: 619.870 ms > Time: 620.327 ms > Time: 614.266 ms > > m_io_c = 16: > Time: 529.885 ms > Time: 526.958 ms > Time: 528.474 ms > > m_io_c = 32: > Time: 521.185 ms > Time: 520.713 ms > Time: 517.729 ms > > No difference to me, which seems to be good. I've double checked and > patch used the new way > > acquire_sample_rows -> heapam_scan_analyze_next_block -> > ReadBufferExtended -> ReadBuffer_common (inlined) -> WaitReadBuffers > -> mdreadv -> FileReadV -> pg_preadv (inlined) > acquire_sample_rows -> heapam_scan_analyze_next_block -> > ReadBufferExtended -> ReadBuffer_common (inlined) -> StartReadBuffer > -> ... > > I gave also io_combine_limit to 32 (max, 256kB) a try and got those > slightly better results: > > [..] > m_io_c = 16: > Time: 494.599 ms > Time: 496.345 ms > Time: 973.500 ms > > m_io_c = 32: > Time: 461.031 ms > Time: 449.037 ms > Time: 443.375 ms > > and that (last one) apparently was able to push it to ~50-60k still > random IOPS range, the rareq-sz was still ~8 (9.9) kB as analyze was > still reading random , so I assume no merging was done: > > Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz > w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s > drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util > nvme0n1 61212.00591.82 0.00 0.000.10 9.90 > 2.00 0.02 0.00 0.000.0012.000.00 0.00 > 0.00 0.000.00 0.000.000.006.28 85.20 > > So in short it looks good to me. My results are similar to yours, also I realized a bug while working on your benchmarking cases. I will share the cause and the fix soon. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Use streaming read API in ANALYZE
Hi, Thank you for looking into this! On Wed, 3 Apr 2024 at 20:17, Heikki Linnakangas wrote: > > On 03/04/2024 13:31, Nazir Bilal Yavuz wrote: > > Streaming API has been committed but the committed version has a minor > > change, the read_stream_begin_relation function takes Relation instead > > of BufferManagerRelation now. So, here is a v5 which addresses this > > change. > > I'm getting a repeatable segfault / assertion failure with this: > > postgres=# CREATE TABLE tengiga (i int, filler text) with (fillfactor=10); > CREATE TABLE > postgres=# insert into tengiga select g, repeat('x', 900) from > generate_series(1, 140) g; > INSERT 0 140 > postgres=# set default_statistics_target = 10; ANALYZE tengiga; > SET > ANALYZE > postgres=# set default_statistics_target = 100; ANALYZE tengiga; > SET > ANALYZE > postgres=# set default_statistics_target =1000; ANALYZE tengiga; > SET > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > TRAP: failed Assert("BufferIsValid(hscan->rs_cbuf)"), File: > "heapam_handler.c", Line: 1079, PID: 262232 > postgres: heikki postgres [local] > ANALYZE(ExceptionalCondition+0xa8)[0x56488a0de9d8] > postgres: heikki postgres [local] > ANALYZE(heapam_scan_analyze_next_block+0x63)[0x5648899ece34] > postgres: heikki postgres [local] ANALYZE(+0x2d3f34)[0x564889b6af34] > postgres: heikki postgres [local] ANALYZE(+0x2d2a3a)[0x564889b69a3a] > postgres: heikki postgres [local] ANALYZE(analyze_rel+0x33e)[0x564889b68fa9] > postgres: heikki postgres [local] ANALYZE(vacuum+0x4b3)[0x564889c2dcc0] > postgres: heikki postgres [local] ANALYZE(ExecVacuum+0xd6f)[0x564889c2d7fe] > postgres: heikki postgres [local] > ANALYZE(standard_ProcessUtility+0x901)[0x564889f0b8b9] > postgres: heikki postgres [local] > ANALYZE(ProcessUtility+0x136)[0x564889f0afb1] > postgres: heikki postgres [local] ANALYZE(+0x6728c8)[0x564889f098c8] > postgres: heikki postgres [local] ANALYZE(+0x672b3b)[0x564889f09b3b] > postgres: heikki postgres [local] ANALYZE(PortalRun+0x320)[0x564889f09015] > postgres: heikki postgres [local] ANALYZE(+0x66b2c6)[0x564889f022c6] > postgres: heikki postgres [local] > ANALYZE(PostgresMain+0x80c)[0x564889f06fd7] > postgres: heikki postgres [local] ANALYZE(+0x667876)[0x564889efe876] > postgres: heikki postgres [local] > ANALYZE(postmaster_child_launch+0xe6)[0x564889e1f4b3] > postgres: heikki postgres [local] ANALYZE(+0x58e68e)[0x564889e2568e] > postgres: heikki postgres [local] ANALYZE(+0x58b7f0)[0x564889e227f0] > postgres: heikki postgres [local] > ANALYZE(PostmasterMain+0x152b)[0x564889e2214d] > postgres: heikki postgres [local] ANALYZE(+0xb4)[0x564889cdb4b4] > /lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f7d83b6724a] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f7d83b67305] > postgres: heikki postgres [local] ANALYZE(_start+0x21)[0x564889971a61] > 2024-04-03 20:15:49.157 EEST [262101] LOG: server process (PID 262232) > was terminated by signal 6: Aborted I realized the same error while working on Jakub's benchmarking results. Cause: I was using the nblocks variable to check how many blocks will be returned from the streaming API. But I realized that sometimes the number returned from BlockSampler_Init() is not equal to the number of blocks that BlockSampler_Next() will return as BlockSampling algorithm decides how many blocks to return on the fly by using some random seeds. There are a couple of solutions I thought of: 1- Use BlockSampler_HasMore() instead of nblocks in the main loop in the acquire_sample_rows(): Streaming API uses this function to prefetch block numbers. BlockSampler_HasMore() will reach to the end first as it is used while prefetching, so it will start to return false while there are still buffers to return from the streaming API. That will cause some buffers at the end to not be processed. 2- Expose something (function, variable etc.) from the streaming API to understand if the read is finished and there is no buffer to return: I think this works but I am not sure if the streaming API allows something like that. 3- Check every buffer returned from the streaming API, if it is invalid stop the main loop in the acquire_sample_rows(): This solves the problem but there will be two if checks for each buffer returned, - in heapam_scan_analyze_next_block() to check if the returned buffer is invalid - to break main loop in acquire_sample_rows() if heapam_scan_analyze_next_block() returns false One of the if cases can be bypassed by moving heapam_scan_analyze_next_block()'s code to the main loop in the acquire_sample_rows(). I implemented the third solution, here is v6. -- Regards, Nazir Bilal Yavu
Re: Use streaming read API in ANALYZE
Hi, On Wed, 3 Apr 2024 at 23:44, Melanie Plageman wrote: > > > I've reviewed the patches inline below and attached a patch that has > some of my ideas on top of your patch. Thank you! > > > From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz > > Date: Wed, 3 Apr 2024 15:14:15 +0300 > > Subject: [PATCH v6] Use streaming read API in ANALYZE > > > > ANALYZE command gets random tuples using BlockSampler algorithm. Use > > streaming reads to get these tuples by using BlockSampler algorithm in > > streaming read API prefetch logic. > > --- > > src/include/access/heapam.h | 6 +- > > src/backend/access/heap/heapam_handler.c | 22 +++--- > > src/backend/commands/analyze.c | 85 > > 3 files changed, 42 insertions(+), 71 deletions(-) > > > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > > index a307fb5f245..633caee9d95 100644 > > --- a/src/include/access/heapam.h > > +++ b/src/include/access/heapam.h > > @@ -25,6 +25,7 @@ > > #include "storage/bufpage.h" > > #include "storage/dsm.h" > > #include "storage/lockdefs.h" > > +#include "storage/read_stream.h" > > #include "storage/shm_toc.h" > > #include "utils/relcache.h" > > #include "utils/snapshot.h" > > @@ -388,9 +389,8 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup, > > struct > > GlobalVisState *vistest); > > > > /* in heap/heapam_handler.c*/ > > -extern void heapam_scan_analyze_next_block(TableScanDesc scan, > > - > > BlockNumber blockno, > > - > > BufferAccessStrategy bstrategy); > > +extern bool heapam_scan_analyze_next_block(TableScanDesc scan, > > + > > ReadStream *stream); > > extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan, > > > > TransactionId OldestXmin, > > > > double *liverows, double *deadrows, > > diff --git a/src/backend/access/heap/heapam_handler.c > > b/src/backend/access/heap/heapam_handler.c > > index 0952d4a98eb..d83fbbe6af3 100644 > > --- a/src/backend/access/heap/heapam_handler.c > > +++ b/src/backend/access/heap/heapam_handler.c > > @@ -1054,16 +1054,16 @@ heapam_relation_copy_for_cluster(Relation OldHeap, > > Relation NewHeap, > > } > > > > /* > > - * Prepare to analyze block `blockno` of `scan`. The scan has been started > > - * with SO_TYPE_ANALYZE option. > > + * Prepare to analyze block returned from streaming object. If the block > > returned > > + * from streaming object is valid, true is returned; otherwise false is > > returned. > > + * The scan has been started with SO_TYPE_ANALYZE option. > > * > > * This routine holds a buffer pin and lock on the heap page. They are > > held > > * until heapam_scan_analyze_next_tuple() returns false. That is until > > all the > > * items of the heap page are analyzed. > > */ > > -void > > -heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > > - > > BufferAccessStrategy bstrategy) > > +bool > > +heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream) > > { > > HeapScanDesc hscan = (HeapScanDesc) scan; > > > > @@ -1076,11 +1076,15 @@ heapam_scan_analyze_next_block(TableScanDesc scan, > > BlockNumber blockno, > >* doing much work per tuple, the extra lock traffic is probably > > better > >* avoided. > > Personally I think heapam_scan_analyze_next_block() should be inlined. > It only has a few lines. I would find it clearer inline. At the least, > there is no reason for it (or heapam_scan_analyze_next_tuple()) to take > a TableScanDesc instead of a HeapScanDesc. I agree. > > >*/ > > - hscan->rs_cblock = blockno; > > - hscan->rs_cindex = FirstOffsetNumber; > > - hscan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, > > -
Re: Change prefetch and read strategies to use range in pg_prewarm ... and raise a question about posix_fadvise WILLNEED
Hi Andrey, On Sun, 7 Apr 2024 at 08:29, Andrey M. Borodin wrote: > > > > > On 15 Mar 2024, at 17:12, Nazir Bilal Yavuz wrote: > > > > I did not have the time to check other things you mentioned but I > > tested the read performance. The table size is 5.5GB, I did 20 runs in > > total. > > Hi Nazir! > > Do you plan to review anything else? Or do you think it worth to look at by > someone else? Or is the patch Ready for Committer? If so, please swap CF > entry [0] to status accordingly, currently it's "Waiting on Author". Thanks for reminding me! I think this needs review by someone else (especially the prefetch part) so I changed it to 'Needs review'. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Streaming I/O, vectored I/O (WIP)
Hi, On Tue, 2 Apr 2024 at 11:40, Thomas Munro wrote: > > I had been planning to commit v14 this morning but got cold feet with > the BMR-based interface. Heikki didn't like it much, and in the end, > neither did I. I have now removed it, and it seems much better. No > other significant changes, just parameter types and inlining details. > For example: > > * read_stream_begin_relation() now takes a Relation, likes its name says > * StartReadBuffers()'s operation takes smgr and optional rel > * ReadBuffer_common() takes smgr and optional rel Read stream objects can be created only using Relations now. There could be read stream users which do not have a Relation but SMgrRelations. So, I created another constructor for the read streams which use SMgrRelations instead of Relations. Related patch is attached. -- Regards, Nazir Bilal Yavuz Microsoft From 38b57ec7e54a54a0c7b0117a0ecaaf68c643e1b0 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Sun, 7 Apr 2024 20:16:00 +0300 Subject: [PATCH] Add a way to create read stream object by using SMgrRelation Currently read stream object can be created only by using Relation, there is no way to create it by using SMgrRelation. To achieve that, read_stream_begin_impl() function is created and contents of the read_stream_begin_relation() is moved into this function. Both read_stream_begin_relation() and read_stream_begin_sgmr_relation() calls read_stream_begin_impl() by Relation and SMgrRelation respectively. --- src/include/storage/read_stream.h | 8 +++ src/backend/storage/aio/read_stream.c | 74 ++- 2 files changed, 69 insertions(+), 13 deletions(-) diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h index fae09d2b4cc..601a7fcf92b 100644 --- a/src/include/storage/read_stream.h +++ b/src/include/storage/read_stream.h @@ -15,6 +15,7 @@ #define READ_STREAM_H #include "storage/bufmgr.h" +#include "storage/smgr.h" /* Default tuning, reasonable for many users. */ #define READ_STREAM_DEFAULT 0x00 @@ -56,6 +57,13 @@ extern ReadStream *read_stream_begin_relation(int flags, ReadStreamBlockNumberCB callback, void *callback_private_data, size_t per_buffer_data_size); +extern ReadStream *read_stream_begin_sgmr_relation(int flags, + BufferAccessStrategy strategy, + SMgrRelation smgr, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size); extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_private); extern void read_stream_reset(ReadStream *stream); extern void read_stream_end(ReadStream *stream); diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index f54dacdd914..a9a3b0de6c9 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -406,14 +406,15 @@ read_stream_look_ahead(ReadStream *stream, bool suppress_advice) * write extra data for each block into the space provided to it. It will * also receive callback_private_data for its own purposes. */ -ReadStream * -read_stream_begin_relation(int flags, - BufferAccessStrategy strategy, - Relation rel, - ForkNumber forknum, - ReadStreamBlockNumberCB callback, - void *callback_private_data, - size_t per_buffer_data_size) +static ReadStream * +read_stream_begin_impl(int flags, + BufferAccessStrategy strategy, + Relation rel, + SMgrRelation smgr, + ForkNumber forknum, + ReadStreamBlockNumberCB callback, + void *callback_private_data, + size_t per_buffer_data_size) { ReadStream *stream; size_t size; @@ -422,9 +423,6 @@ read_stream_begin_relation(int flags, int strategy_pin_limit; uint32 max_pinned_buffers; Oid tablespace_id; - SMgrRelation smgr; - - smgr = RelationGetSmgr(rel); /* * Decide how many I/Os we will allow to run at the same time. That @@ -434,7 +432,7 @@ read_stream_begin_relation(int flags, */ tablespace_id = smgr->smgr_rlocator.locator.spcOid; if (!OidIsValid(MyDatabaseId) || - IsCatalogRelation(rel) || + (rel && IsCatalogRelation(rel)) || IsCatalogRelationOid(smgr->smgr_rlocator.locator.relNumber)) { /* @@ -548,7 +546,7 @@ read_stream_begin_relation(int flags, for (int i = 0; i < max_ios; ++i) { stream->ios[i].op.rel = rel; - stream->ios[i].op.smgr = RelationGetSmgr(rel); + stream->ios[i].op.smgr = smgr; stream->ios[i].op.smgr_persistence = 0; stream->ios[i].op.forknum = forknum; stream->ios[i].op.strategy = strategy; @@ -557,6 +555,56 @@ read_stream_begin_relation(int flags, return stream; } +/* + * Create a new read stream for reading a relation. + * See read_stream_begi
Re: Streaming I/O, vectored I/O (WIP)
Hi, On Sun, 7 Apr 2024 at 20:33, Nazir Bilal Yavuz wrote: > > Hi, > > On Tue, 2 Apr 2024 at 11:40, Thomas Munro wrote: > > > > I had been planning to commit v14 this morning but got cold feet with > > the BMR-based interface. Heikki didn't like it much, and in the end, > > neither did I. I have now removed it, and it seems much better. No > > other significant changes, just parameter types and inlining details. > > For example: > > > > * read_stream_begin_relation() now takes a Relation, likes its name says > > * StartReadBuffers()'s operation takes smgr and optional rel > > * ReadBuffer_common() takes smgr and optional rel > > Read stream objects can be created only using Relations now. There > could be read stream users which do not have a Relation but > SMgrRelations. So, I created another constructor for the read streams > which use SMgrRelations instead of Relations. Related patch is > attached. After sending this, I realized that I forgot to add persistence value to the new constructor. While working on it I also realized that current code sets persistence in PinBufferForBlock() function and this function is called for each block, which can be costly. So, I moved setting persistence to the out of PinBufferForBlock() function. Setting persistence outside of the PinBufferForBlock() function (0001) and creating the new constructor that uses SMgrRelations (0002) are attached. -- Regards, Nazir Bilal Yavuz Microsoft From 04fd860ce8c4c57830930cb362799fd155c92613 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Sun, 7 Apr 2024 22:33:36 +0300 Subject: [PATCH 1/2] Refactor PinBufferForBlock() to remove if checks about persistence There are if checks in PinBufferForBlock() function to set persistence of the relation and this function is called for the each block in the relation. Instead of that, set persistence of the relation before PinBufferForBlock() function. --- src/backend/storage/aio/read_stream.c | 2 +- src/backend/storage/buffer/bufmgr.c | 31 +++ 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index f54dacdd914..d155dde5ce3 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -549,7 +549,7 @@ read_stream_begin_relation(int flags, { stream->ios[i].op.rel = rel; stream->ios[i].op.smgr = RelationGetSmgr(rel); - stream->ios[i].op.smgr_persistence = 0; + stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence; stream->ios[i].op.forknum = forknum; stream->ios[i].op.strategy = strategy; } diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 06e9ffd2b00..b4fcefed78a 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1067,24 +1067,10 @@ PinBufferForBlock(Relation rel, BufferDesc *bufHdr; IOContext io_context; IOObject io_object; - char persistence; Assert(blockNum != P_NEW); - /* - * If there is no Relation it usually implies recovery and thus permanent, - * but we take an argmument because CreateAndCopyRelationData can reach us - * with only an SMgrRelation for an unlogged relation that we don't want - * to flag with BM_PERMANENT. - */ - if (rel) - persistence = rel->rd_rel->relpersistence; - else if (smgr_persistence == 0) - persistence = RELPERSISTENCE_PERMANENT; - else - persistence = smgr_persistence; - - if (persistence == RELPERSISTENCE_TEMP) + if (smgr_persistence == RELPERSISTENCE_TEMP) { io_context = IOCONTEXT_NORMAL; io_object = IOOBJECT_TEMP_RELATION; @@ -1101,7 +1087,7 @@ PinBufferForBlock(Relation rel, smgr->smgr_rlocator.locator.relNumber, smgr->smgr_rlocator.backend); - if (persistence == RELPERSISTENCE_TEMP) + if (smgr_persistence == RELPERSISTENCE_TEMP) { bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, foundPtr); if (*foundPtr) @@ -1109,7 +1095,7 @@ PinBufferForBlock(Relation rel, } else { - bufHdr = BufferAlloc(smgr, persistence, forkNum, blockNum, + bufHdr = BufferAlloc(smgr, smgr_persistence, forkNum, blockNum, strategy, foundPtr, io_context); if (*foundPtr) pgBufferUsage.shared_blks_hit++; @@ -1157,6 +1143,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, ReadBuffersOperation operation; Buffer buffer; int flags; + char persistence; /* * Backward compatibility path, most code should use ExtendBufferedRel() @@ -1195,7 +1182,15 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, flags = 0; operation.smgr = smgr; operation.rel = rel; - operation.smgr_persistence = smgr_persistence; + + if (rel) + persistence = rel->rd_rel->relpersistence; + else if (smgr_persistence == 0) +
Re: Streaming I/O, vectored I/O (WIP)
Hi, On Mon, 8 Apr 2024 at 00:01, Nazir Bilal Yavuz wrote: > > Hi, > > On Sun, 7 Apr 2024 at 20:33, Nazir Bilal Yavuz wrote: > > > > Hi, > > > > On Tue, 2 Apr 2024 at 11:40, Thomas Munro wrote: > > > > > > I had been planning to commit v14 this morning but got cold feet with > > > the BMR-based interface. Heikki didn't like it much, and in the end, > > > neither did I. I have now removed it, and it seems much better. No > > > other significant changes, just parameter types and inlining details. > > > For example: > > > > > > * read_stream_begin_relation() now takes a Relation, likes its name says > > > * StartReadBuffers()'s operation takes smgr and optional rel > > > * ReadBuffer_common() takes smgr and optional rel > > > > Read stream objects can be created only using Relations now. There > > could be read stream users which do not have a Relation but > > SMgrRelations. So, I created another constructor for the read streams > > which use SMgrRelations instead of Relations. Related patch is > > attached. > > After sending this, I realized that I forgot to add persistence value > to the new constructor. While working on it I also realized that > current code sets persistence in PinBufferForBlock() function and this > function is called for each block, which can be costly. So, I moved > setting persistence to the out of PinBufferForBlock() function. > > Setting persistence outside of the PinBufferForBlock() function (0001) > and creating the new constructor that uses SMgrRelations (0002) are > attached. Melanie noticed there was a 'sgmr -> smgr' typo in 0002. Fixed in attached. -- Regards, Nazir Bilal Yavuz Microsoft From 04fd860ce8c4c57830930cb362799fd155c92613 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Sun, 7 Apr 2024 22:33:36 +0300 Subject: [PATCH 1/2] Refactor PinBufferForBlock() to remove if checks about persistence There are if checks in PinBufferForBlock() function to set persistence of the relation and this function is called for the each block in the relation. Instead of that, set persistence of the relation before PinBufferForBlock() function. --- src/backend/storage/aio/read_stream.c | 2 +- src/backend/storage/buffer/bufmgr.c | 31 +++ 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index f54dacdd914..d155dde5ce3 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -549,7 +549,7 @@ read_stream_begin_relation(int flags, { stream->ios[i].op.rel = rel; stream->ios[i].op.smgr = RelationGetSmgr(rel); - stream->ios[i].op.smgr_persistence = 0; + stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence; stream->ios[i].op.forknum = forknum; stream->ios[i].op.strategy = strategy; } diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 06e9ffd2b00..b4fcefed78a 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1067,24 +1067,10 @@ PinBufferForBlock(Relation rel, BufferDesc *bufHdr; IOContext io_context; IOObject io_object; - char persistence; Assert(blockNum != P_NEW); - /* - * If there is no Relation it usually implies recovery and thus permanent, - * but we take an argmument because CreateAndCopyRelationData can reach us - * with only an SMgrRelation for an unlogged relation that we don't want - * to flag with BM_PERMANENT. - */ - if (rel) - persistence = rel->rd_rel->relpersistence; - else if (smgr_persistence == 0) - persistence = RELPERSISTENCE_PERMANENT; - else - persistence = smgr_persistence; - - if (persistence == RELPERSISTENCE_TEMP) + if (smgr_persistence == RELPERSISTENCE_TEMP) { io_context = IOCONTEXT_NORMAL; io_object = IOOBJECT_TEMP_RELATION; @@ -1101,7 +1087,7 @@ PinBufferForBlock(Relation rel, smgr->smgr_rlocator.locator.relNumber, smgr->smgr_rlocator.backend); - if (persistence == RELPERSISTENCE_TEMP) + if (smgr_persistence == RELPERSISTENCE_TEMP) { bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, foundPtr); if (*foundPtr) @@ -1109,7 +1095,7 @@ PinBufferForBlock(Relation rel, } else { - bufHdr = BufferAlloc(smgr, persistence, forkNum, blockNum, + bufHdr = BufferAlloc(smgr, smgr_persistence, forkNum, blockNum, strategy, foundPtr, io_context); if (*foundPtr) pgBufferUsage.shared_blks_hit++; @@ -1157,6 +1143,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence, ReadBuffersOperation operation; Buffer buffer; int flags; + char persistence; /* * Backward compatibility path, most code should use ExtendBuffere
Re: Speed up clean meson builds by ~25%
Hi, On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin wrote: > > Hello Michael, > > 08.04.2024 08:23, Michael Paquier wrote: > > On Fri, Apr 05, 2024 at 06:19:20PM +0200, Jelte Fennema-Nio wrote: > >> Agreed. While not a full solution, I think this patch is still good to > >> address some of the pain: Waiting 10 seconds at the end of my build > >> with only 1 of my 10 cores doing anything. > >> > >> So while this doesn't decrease CPU time spent it does decrease > >> wall-clock time significantly in some cases, with afaict no downsides. > > Well, this is also painful with ./configure. So, even if we are going > > to move away from it at this point, we still need to support it for a > > couple of years. It looks to me that letting the clang folks know > > about the situation is the best way forward. > > > > As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > is fixed already. > Perhaps it's worth rechecking... > > [1] > https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com I had this problem on my local computer. My build times are: gcc: 20s clang-15: 24s clang-16: 105s clang-17: 111s clang-18: 25s -- Regards, Nazir Bilal Yavuz Microsoft
Re: Do away with zero-padding assumption before WALRead()
Hi, On Tue, 13 Feb 2024 at 09:17, Bharath Rupireddy wrote: > > Hi, > > I noticed an assumption [1] at WALRead() call sites expecting the > flushed WAL page to be zero-padded after the flush LSN. I think this > can't always be true as the WAL can get flushed after determining the > flush LSN before reading it from the WAL file using WALRead(). I've > hacked the code up a bit to check if that's true - > https://github.com/BRupireddy2/postgres/tree/ensure_extra_read_WAL_page_is_zero_padded_at_the_end_WIP, > the tests hit the Assert(false); added. Which means, the zero-padding > comment around WALRead() call sites isn't quite right. > > I'm wondering why the WALRead() callers are always reading XLOG_BLCKSZ > despite knowing exactly how much to read. Is it to tell the OS to > explicitly fetch the whole page from the disk? If yes, the OS will do > that anyway because the page transfers from disk to OS page cache are > always in terms of disk block sizes, no? I am curious about the same. The page size and disk block size could be different, so the reason could be explicitly fetching the whole page from the disk as you said. Is this the reason or are there any other benefits of always reading XLOG_BLCKSZ instead of reading the sufficient part? I tried to search in older threads and code comments but I could not find an explanation. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Thu, 18 Jan 2024 at 04:22, Michael Paquier wrote: > > On Wed, Jan 17, 2024 at 03:20:39PM +0300, Nazir Bilal Yavuz wrote: > > I agree with your points. While the other I/O related work is > > happening we can discuss what we should do in the variable op_byte > > cases. Also, this is happening only for WAL right now but if we try to > > extend pg_stat_io in the future, that problem possibly will rise > > again. So, it could be good to come up with a general solution, not > > only for WAL. > > Okay, I've marked the patch as RwF for this CF. I wanted to inform you that the 73f0a13266 commit changed all WALRead calls to read variable bytes, only the WAL receiver was reading variable bytes before. -- Regards, Nazir Bilal Yavuz Microsoft
Use streaming read API in ANALYZE
Hi, I worked on using the currently proposed streaming read API [1] in ANALYZE. The patch is attached. 0001 is the not yet merged streaming read API code changes that can be applied to the master, 0002 is the actual code. The blocks to analyze are obtained by using the streaming read API now. - Since streaming read API is already doing prefetch, I removed the #ifdef USE_PREFETCH code from acquire_sample_rows(). - Changed 'while (BlockSampler_HasMore(&bs))' to 'while (nblocks)' because the prefetch mechanism in the streaming read API will advance 'bs' before returning buffers. - Removed BlockNumber and BufferAccessStrategy from the declaration of scan_analyze_next_block(), passing pgsr (PgStreamingRead) instead of them. I counted syscalls of analyzing ~5GB table. It can be seen that the patched version did ~1300 less read calls. Patched: % time seconds usecs/call callserrors syscall -- --- --- - - 39.670.012128 0 29809 pwrite64 36.960.011299 0 28594 pread64 23.240.007104 0 27611 fadvise64 Master (21a71648d3): % time seconds usecs/call callserrors syscall -- --- --- - - 38.940.016457 0 29816 pwrite64 36.790.015549 0 29850 pread64 23.910.010106 0 29848 fadvise64 Any kind of feedback would be appreciated. [1]: https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft From 509e55997c084f831fcbcb46cabe79d4f97aa93e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 22 Jul 2023 17:31:54 +1200 Subject: [PATCH v1 1/2] Streaming read API changes that are not committed to master yet Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com --- src/include/storage/bufmgr.h | 22 + src/include/storage/streaming_read.h | 52 ++ src/backend/storage/Makefile | 2 +- src/backend/storage/aio/Makefile | 14 + src/backend/storage/aio/meson.build | 5 + src/backend/storage/aio/streaming_read.c | 472 ++ src/backend/storage/buffer/bufmgr.c | 592 +++ src/backend/storage/buffer/localbuf.c| 14 +- src/backend/storage/meson.build | 1 + src/tools/pgindent/typedefs.list | 2 + 10 files changed, 953 insertions(+), 223 deletions(-) create mode 100644 src/include/storage/streaming_read.h create mode 100644 src/backend/storage/aio/Makefile create mode 100644 src/backend/storage/aio/meson.build create mode 100644 src/backend/storage/aio/streaming_read.c diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index d51d46d3353..a38f1acb37a 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -14,6 +14,7 @@ #ifndef BUFMGR_H #define BUFMGR_H +#include "port/pg_iovec.h" #include "storage/block.h" #include "storage/buf.h" #include "storage/bufpage.h" @@ -158,6 +159,11 @@ extern PGDLLIMPORT int32 *LocalRefCount; #define BUFFER_LOCK_SHARE 1 #define BUFFER_LOCK_EXCLUSIVE 2 +/* + * Maximum number of buffers for multi-buffer I/O functions. This is set to + * allow 128kB transfers, unless BLCKSZ and IOV_MAX imply a a smaller maximum. + */ +#define MAX_BUFFERS_PER_TRANSFER Min(PG_IOV_MAX, (128 * 1024) / BLCKSZ) /* * prototypes for functions in bufmgr.c @@ -177,6 +183,18 @@ extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy, bool permanent); +extern Buffer PrepareReadBuffer(BufferManagerRelation bmr, +ForkNumber forkNum, +BlockNumber blockNum, +BufferAccessStrategy strategy, +bool *foundPtr); +extern void CompleteReadBuffers(BufferManagerRelation bmr, +Buffer *buffers, +ForkNumber forknum, +BlockNumber blocknum, +int nblocks, +bool zero_on_error, +BufferAccessStrategy strategy); extern void ReleaseBuffer(Buffer buffer); extern void UnlockReleaseBuffer(Buffer buffer); extern bool BufferIsExclusiveLocked(Buffer buffer); @@ -247,9 +265,13 @@ extern void LockBufferForCleanup(Buffer buffer); extern bool ConditionalLockBufferForCleanup(Buffer buffer); extern bool IsBufferCleanupOK(Buffer buffer); extern bool HoldingBufferPinThatDelaysRecovery(void); +extern void ZeroBuffer(Buffer buffer, ReadBufferMode mode); extern bool BgBufferSync(struct WritebackContext *wb_context); +extern void LimitAdditionalPins(uint32 *additional_pins); +extern void LimitAdditionalLocalPins(uint32 *additional_pin
Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c
Hi, Thanks for the review! On Thu, 22 Feb 2024 at 16:55, Daniel Gustafsson wrote: > > > On 6 Dec 2023, at 14:03, Nazir Bilal Yavuz wrote: > > > There is an ongoing thread [1] for adding missing SQL error codes to > > PANIC and FATAL error reports in xlogrecovery.c file. I did the same > > but for xlog.c and relcache.c files. > > - elog(PANIC, "space reserved for WAL record does not match what was > written"); > + ereport(PANIC, > + (errcode(ERRCODE_DATA_CORRUPTED), > +errmsg("space reserved for WAL record does not match > what was written"))); > > elogs turned into ereports should use errmsg_internal() to keep the strings > from being translated. Does errmsg_internal() need to be used all the time when turning elogs into ereports? errmsg_internal()'s comment says that "This should be used for "can't happen" cases that are probably not worth spending translation effort on.". Is it enough to check if the error message has a translation, and then decide the use of errmsg_internal() or errmsg()? > - elog(FATAL, "could not write init file"); > + ereport(FATAL, > + (errcode_for_file_access(), > +errmsg("could not write init file"))); > > Is it worthwhile adding %m on these to get a little more help when debugging > errors that shouldn't happen? I believe it is worthwhile, so I will add. > - elog(FATAL, "could not write init file"); > + ereport(FATAL, > + (errcode_for_file_access(), > > The extra parenthesis are no longer needed, I don't know if we have a policy > to > remove them when changing an ereport call but we should at least not introduce > new ones. > > - elog(FATAL, "cannot read pg_class without having selected a > database"); > + ereport(FATAL, > + (errcode(ERRCODE_INTERNAL_ERROR), > > ereport (and thus elog) already defaults to ERRCODE_INTERNAL_ERROR for ERROR > or > higher, so unless there is a better errcode an elog() call if preferrable > here. I did not know these, thanks. > > I couldn't find a suitable error code for the "cache lookup failed for > > relation" error in relcache.c and this error comes up in many places. > > Would it be reasonable to create a new error code specifically for > > this? > > We use ERRCODE_UNDEFINED_OBJECT for similar errors elsewhere, perhaps we can > use that for these as well? It looks okay to me, ERRCODE_UNDEFINED_OBJECT is mostly used for the 'not exist' errors and it seems the main reason for the 'cache lookup failed for relation' error is because heap tuple does not exist anymore. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c
Hi, On Fri, 23 Feb 2024 at 15:34, Daniel Gustafsson wrote: > > > On 23 Feb 2024, at 13:09, Nazir Bilal Yavuz wrote: > > > Does errmsg_internal() need to be used all the time when turning elogs > > into ereports? errmsg_internal()'s comment says that "This should be > > used for "can't happen" cases that are probably not worth spending > > translation effort on.". Is it enough to check if the error message > > has a translation, and then decide the use of errmsg_internal() or > > errmsg()? > > If it's an elog then it won't have a translation as none are included in the > translation set. If the errmsg is generic enough to be translated anyways via > another (un)related ereport call then we of course use that, but ideally not > create new ones. Thanks for the explanation. All of your feedback is addressed in v2. -- Regards, Nazir Bilal Yavuz Microsoft From fa45a69731da1488560b2749023e9b9573231c2b Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 23 Feb 2024 16:49:10 +0300 Subject: [PATCH v2 1/2] (xlog.c) Add missing error codes to PANIC/FATAL error reports --- src/backend/access/transam/xlog.c | 77 +-- 1 file changed, 52 insertions(+), 25 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c1162d55bff..d295cef9606 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1354,7 +1354,9 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata, } if (CurrPos != EndPos) - elog(PANIC, "space reserved for WAL record does not match what was written"); + ereport(PANIC, +errcode(ERRCODE_DATA_CORRUPTED), +errmsg_internal("space reserved for WAL record does not match what was written")); } /* @@ -4040,7 +4042,8 @@ ValidateXLOGDirectoryStructure(void) if (stat(XLOGDIR, &stat_buf) != 0 || !S_ISDIR(stat_buf.st_mode)) ereport(FATAL, -(errmsg("required WAL directory \"%s\" does not exist", +(errcode_for_file_access(), + errmsg("required WAL directory \"%s\" does not exist", XLOGDIR))); /* Check for archive_status */ @@ -4050,7 +4053,8 @@ ValidateXLOGDirectoryStructure(void) /* Check for weird cases where it exists but isn't a directory */ if (!S_ISDIR(stat_buf.st_mode)) ereport(FATAL, - (errmsg("required WAL directory \"%s\" does not exist", + (errcode_for_file_access(), + errmsg("required WAL directory \"%s\" does not exist", path))); } else @@ -4059,7 +4063,8 @@ ValidateXLOGDirectoryStructure(void) (errmsg("creating missing WAL directory \"%s\"", path))); if (MakePGDirectory(path) < 0) ereport(FATAL, - (errmsg("could not create missing directory \"%s\": %m", + (errcode_for_file_access(), + errmsg("could not create missing directory \"%s\": %m", path))); } @@ -4296,7 +4301,8 @@ ReadControlFile(void) if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0) ereport(FATAL, -(errmsg("database files are incompatible with server"), +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("database files are incompatible with server"), errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x)," " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).", ControlFile->pg_control_version, ControlFile->pg_control_version, @@ -4305,7 +4311,8 @@ ReadControlFile(void) if (ControlFile->pg_control_version != PG_CONTROL_VERSION) ereport(FATAL, -(errmsg("database files are incompatible with server"), +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("database files are incompatible with server"), errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d," " but the server was compiled with PG_CONTROL_VERSION %d.", ControlFile->pg_control_version, PG_CONTROL_VERSION), @@ -4320,7 +4327,8 @@ ReadControlFile(void) if (!EQ_CRC32C(crc, ControlFile->crc)) ereport(FATAL, -(errmsg("incorrect checksum in control file"))); +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("incorrect checksum in control file"))); /* * Do compatibility checking immediately. If the database isn't @@ -4329,68 +4337,78 @@ ReadControlFile(void) */ if (ControlFile->catalog_version_no != CATALOG_VERSION_NO) ereport(FATAL, -(errmsg("database files are incompatible wit
Re: Use streaming read API in ANALYZE
Hi, On Mon, 19 Feb 2024 at 18:13, Nazir Bilal Yavuz wrote: > > I worked on using the currently proposed streaming read API [1] in ANALYZE. > The patch is attached. 0001 is the not yet merged streaming read API code > changes that can be applied to the master, 0002 is the actual code. > > The blocks to analyze are obtained by using the streaming read API now. > > - Since streaming read API is already doing prefetch, I removed the #ifdef > USE_PREFETCH code from acquire_sample_rows(). > > - Changed 'while (BlockSampler_HasMore(&bs))' to 'while (nblocks)' because > the prefetch mechanism in the streaming read API will advance 'bs' before > returning buffers. > > - Removed BlockNumber and BufferAccessStrategy from the declaration of > scan_analyze_next_block(), passing pgsr (PgStreamingRead) instead of them. > > I counted syscalls of analyzing ~5GB table. It can be seen that the patched > version did ~1300 less read calls. > > Patched: > > % time seconds usecs/call callserrors syscall > -- --- --- - - > 39.670.012128 0 29809 pwrite64 > 36.960.011299 0 28594 pread64 > 23.240.007104 0 27611 fadvise64 > > Master (21a71648d3): > > % time seconds usecs/call callserrors syscall > -- --- --- - - > 38.940.016457 0 29816 pwrite64 > 36.790.015549 0 29850 pread64 > 23.910.010106 0 29848 fadvise64 > > > Any kind of feedback would be appreciated. > > [1]: > https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com The new version of the streaming read API [1] is posted. I updated the streaming read API changes patch (0001), using the streaming read API in ANALYZE patch (0002) remains the same. This should make it easier to review as it can be applied on top of master [1]: https://www.postgresql.org/message-id/CA%2BhUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft From 21d9043501284c6bae996522ff2f3ac693f81986 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 26 Feb 2024 23:48:31 +1300 Subject: [PATCH v2 1/2] Streaming read API changes that are not committed to master yet Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com --- src/include/storage/bufmgr.h | 45 ++ src/include/storage/streaming_read.h | 52 ++ src/backend/storage/Makefile | 2 +- src/backend/storage/aio/Makefile | 14 + src/backend/storage/aio/meson.build | 5 + src/backend/storage/aio/streaming_read.c | 612 src/backend/storage/buffer/bufmgr.c | 687 +++ src/backend/storage/buffer/localbuf.c| 14 +- src/backend/storage/meson.build | 1 + src/tools/pgindent/typedefs.list | 3 + 10 files changed, 1202 insertions(+), 233 deletions(-) create mode 100644 src/include/storage/streaming_read.h create mode 100644 src/backend/storage/aio/Makefile create mode 100644 src/backend/storage/aio/meson.build create mode 100644 src/backend/storage/aio/streaming_read.c diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index d51d46d3353..b57f71f97e3 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -14,6 +14,7 @@ #ifndef BUFMGR_H #define BUFMGR_H +#include "port/pg_iovec.h" #include "storage/block.h" #include "storage/buf.h" #include "storage/bufpage.h" @@ -158,6 +159,11 @@ extern PGDLLIMPORT int32 *LocalRefCount; #define BUFFER_LOCK_SHARE 1 #define BUFFER_LOCK_EXCLUSIVE 2 +/* + * Maximum number of buffers for multi-buffer I/O functions. This is set to + * allow 128kB transfers, unless BLCKSZ and IOV_MAX imply a a smaller maximum. + */ +#define MAX_BUFFERS_PER_TRANSFER Min(PG_IOV_MAX, (128 * 1024) / BLCKSZ) /* * prototypes for functions in bufmgr.c @@ -177,6 +183,42 @@ extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode, BufferAccessStrategy strategy, bool permanent); + +#define READ_BUFFERS_ZERO_ON_ERROR 0x01 +#define READ_BUFFERS_ISSUE_ADVICE 0x02 + +/* + * Private state used by StartReadBuffers() and WaitReadBuffers(). Declared + * in public header only to allow inclusion in other structs, but contents + * should not be accessed. + */ +struct ReadBuffersOperation +{ + /* Parameters passed in to StartReadBuffers(). */ + BufferManagerRelation bmr; + Buffer *buf
Re: Change prefetch and read strategies to use range in pg_prewarm ... and raise a question about posix_fadvise WILLNEED
lly significant: full > relation is 0.3ms and block by block is 1550ms! > You might think it's because of generate_series or whatever, but I have > the exact same behavior with pgfincore. > I can compare loading and unloading duration for similar "async" work, > here each call is from block 0 with len of 132744 and a range of 1 block > (i.e. posix_fadvise on 8kB at a time). > So they have exactly the same number of operations doing DONTNEED or > WILLNEED, but distinct duration on the first "load": > > ``` > > postgres=# select * from > vm_relation_fadvise('foo','main',0,132744,1,'POSIX_FADV_DONTNEED'); > vm_relation_fadvise > - > > (1 row) > > Time: 25.202 ms > postgres=# select * from > vm_relation_fadvise('foo','main',0,132744,1,'POSIX_FADV_WILLNEED'); > vm_relation_fadvise > - > > (1 row) > > Time: 1523.636 ms (00:01.524) <- not free ! > postgres=# select * from > vm_relation_fadvise('foo','main',0,132744,1,'POSIX_FADV_WILLNEED'); > vm_relation_fadvise > - > > (1 row) > > Time: 24.967 ms > ``` I confirm that there is a time difference between calling pg_prewarm by full relation and block by block, but IMO this is expected. When pg_prewarm is called by full relation, it does the initialization part just once but when it is called block by block, it does initialization for each call, right? I run 'select pg_prewarm('foo','prefetch', 'main', n, n) FROM generate_series(0, 132744)n;' a couple of times consecutively but I could not see the time difference between first run (first load) and the consecutive runs. Am I doing something wrong? [1] https://man7.org/linux/man-pages/man2/posix_fadvise.2.html#DESCRIPTION -- Regards, Nazir Bilal Yavuz Microsoft
Re: Change prefetch and read strategies to use range in pg_prewarm ... and raise a question about posix_fadvise WILLNEED
Hi, On Wed, 6 Mar 2024 at 18:23, Cédric Villemain wrote: > > Hi Nazir, > > > thank you for your review. I comment below. > > > On 05/03/2024 12:07, Nazir Bilal Yavuz wrote: > >> 2. The second one does implement smgrprefetch with range and loops by > >> default per segment to still have a check for interrupts. > > It looks good codewise but RELSEG_SIZE is too big to prefetch. Man > > page of posix_fadvise [1] states that: "The amount of data read may be > > decreased by the kernel depending on virtual memory load. (A few > > megabytes will usually be fully satisfied, and more is rarely > > useful.)". It is trying to prefetch 1GB data now. That could explain > > your observation about differences between nr_cache numbers. > > From an "adminsys" point of view I will find beneficial to get a single > syscall per file, respecting the logic and behavior of underlying system > call. I agree. > The behavior is 100% OK, and in fact it might a bad idea to prefetch > block by block as the result is just to put more pressure on a system if > it is already under pressure. > > Though there are use cases and it's nice to be able to do that too at > this per page level. Yes, I do not know which one is more important, cache more blocks but create more pressure or create less pressure but cache less blocks. Also, pg_prewarm is designed to be run at startup so I guess there will not be much pressure. > About [1], it's very old statement about resources. And Linux manages a > part of the problem for us here I think [2]: > > /* > * Chunk the readahead into 2 megabyte units, so that we don't pin too much > * memory at once. > */ > void force_page_cache_ra() Thanks for pointing out the actual code. Yes, it looks like the kernel is already doing that. I would like to do more testing when you forward vm_relation functions into pgfincore. > >> Q: posix_fadvise may not work exactly the way you think it does, or does > >> it ? > >> > >> > >> In details, and for the question: > >> > >> However, if instead you provide a real range, or the magic len=0 to > >> posix_fadvise, then blocks are "more" loaded according to effective vm > >> pressure (which is not the case on the previous example). > >> As a result only a small part of the relation might be loaded, and this > >> is probably not what end-users expect despite being probably a good > >> choice (you can still free cache beforehand to help the kernel). > > I think it's a matter of documenting well the feature, and if at all > possible, as usual, not let users be negatively impacted by default. > > > >> An example, below I'm using vm_relation_cachestat() which provides linux > >> cachestat output, and vm_relation_fadvise() to unload cache, and > >> pg_prewarm for the demo: > >> > >> # clear cache: (nr_cache is the number of file system pages in cache, > >> not postgres blocks) > >> > >> ``` > >> postgres=# select block_start, block_count, nr_pages, nr_cache from > >> vm_relation_cachestat('foo',range:=1024*32); > >> block_start | block_count | nr_pages | nr_cache > >> -+-+--+-- > >> 0 | 32768 |65536 |0 > >> 32768 | 32768 |65536 |0 > >> 65536 | 32768 |65536 |0 > >> 98304 | 32768 |65536 |0 > >>131072 |1672 | 3344 |0 > >> ``` > >> > >> # load full relation with pg_prewarm (patched) > >> > >> ``` > >> postgres=# select pg_prewarm('foo','prefetch'); > >> pg_prewarm > >> > >> 132744 > >> (1 row) > >> ``` > >> > >> # Checking results: > >> > >> ``` > >> postgres=# select block_start, block_count, nr_pages, nr_cache from > >> vm_relation_cachestat('foo',range:=1024*32); > >> block_start | block_count | nr_pages | nr_cache > >> -+-+--+-- > >> 0 | 32768 |65536 | 320 > >> 32768 | 32768 |65536 |0 > >> 65536 | 32768 |65536 |0 > >> 98304 | 32768 |65536 |0 > >>131072 |1672 | 3344 | 320 <-- segment 1 > >> > >> ``` > >> > >> # Load block by block and check: > >>
Use windows VMs instead of windows containers on the CI
Hi, I propose using windows VMs instead of containers, the patch is attached. Currently, windows containers are used on the CI, but these container images are needs to get pulled on every CI run, also they are slow to run. These VM images are created in the same way how container images are created [1]. The comparison between VMs and containers are (based on d952373a98 and with same numbers of CPU and memory): Scheduling step: VS 2019 MinGW64 VM [2] 00:17m 00:16m Container [3] 03:51m 04:28m Execution step: VS 2019 MinGW64 VM [2] 12:16m 07.55m Container [3] 26:02m 16:34m There is more than 2x speed gain when VMs are used. [1] https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl [2] https://cirrus-ci.com/build/4720774045499392 [3] https://cirrus-ci.com/build/5468256027279360 Regards, Nazir Bilal Yavuz Microsoft From 6981319d054f0736e474bc315ab094094d159979 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 10 Jan 2023 13:58:39 +0300 Subject: [PATCH] Use windows VMs instead of windows containers --- .cirrus.yml | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 69837bcd5a..5700b8cd66 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -549,8 +549,10 @@ task: depends_on: SanityCheck only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*' - windows_container: -image: $CONTAINER_REPO/windows_ci_vs_2019:latest + compute_engine_instance: +image_project: $IMAGE_PROJECT +image: family/pg-ci-windows-ci-vs-2019 +platform: windows cpu: $CPUS memory: 4G @@ -589,8 +591,10 @@ task: # otherwise it'll be sorted before other tasks depends_on: SanityCheck - windows_container: -image: $CONTAINER_REPO/windows_ci_mingw64:latest + compute_engine_instance: +image_project: $IMAGE_PROJECT +image: family/pg-ci-windows-ci-mingw64 +platform: windows cpu: $CPUS memory: 4G -- 2.25.1
Re: Use windows VMs instead of windows containers on the CI
Hi, Tables didn't seem nice on web interface. Re-sending with correct formatting. Scheduling step: | VS 2019 | MinGW64 -- VM | 00:17m | 00:16m -- Container | 03:51m | 04:28m Execution step: | VS 2019 | MinGW64 -- VM | 12:16m| 07:55m -- Container | 26:02m | 16:34m Regards, Nazir Bilal Yavuz Microsoft
Re: Use windows VMs instead of windows containers on the CI
Hi, It didn't work again. Sending numbers until I figure out how to solve this. Scheduling Step: VM + VS 2019: 00.17m Container + VS 2019: 03.51m VM + MinGW64: 00.16m Container + MinGW64: 04.28m Execution step: VM + VS 2019: 12.16m Container + VS 2019: 26.02m VM + MinGW64: 07.55m Container + MinGW64: 16.34m Sorry for the multiple mails. Regards, Nazir Bilal Yavuz Microsoft
Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}
Hi, On Tue, 10 Oct 2023 at 03:54, Michael Paquier wrote: > > In ~14, as far as I can see blk_write_time is only incremented for > shared buffers. FWIW, I agree that we should improve these stats for > local buffers but I am not on board with a solution where we'd use the > same counter for local and shared buffers while we've historically > only counted the former, because that could confuse existing > monitoring queries. It seems to me that the right solution is to do > the same separation as temp blocks with two separate counters, without > a backpatch. I'd like to go as far as renaming blk_read_time and > blk_write_time to respectively shared_blk_read_time and > shared_blk_write_time to know exactly what the type of block dealt > with is when querying this data, particularly for pg_stat_statements's > sake. Yes, that could be a better solution. Also, having more detailed stats for shared and local buffers is helpful. I updated patches in line with that: 0001: Counts extends same way as a write. 0002: Rename blk_{read|write}_time as shared_blk_{read|write}_time. 0003: Add new local_blk_{read|write}_time variables. Regards, Nazir Bilal Yavuz Microsoft From acf5a2781760f284aef770da0f64acf8685b6445 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 13 Oct 2023 17:35:00 +0300 Subject: [PATCH v3 1/3] Include IOOp IOOP_EXTENDs while calculating block write times Extends are counted as writes in block context, so include IOOP_EXTENDs while calculating block write times. --- src/backend/utils/activity/pgstat_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index eb7d35d422..8ec8670199 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_SET_CURRENT(io_time); INSTR_TIME_SUBTRACT(io_time, start_time); - if (io_op == IOOP_WRITE) + if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) { pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); if (io_object == IOOBJECT_RELATION) -- 2.42.0 From 1f0933c7057e20c074f6b9355f6d91a9277a8c09 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 12 Oct 2023 14:40:23 +0300 Subject: [PATCH v3 3/3] Add local_blk_{read|write}_time I/O timing statistics There was no I/O timing statistics for counting local blocks' timings. So, add local_blk_{read|write}_time variables for counting these. These counters are exposed in EXPLAIN and pg_stat_statements. pg_stat_statements is bumped to 1.12. --- contrib/pg_stat_statements/Makefile | 1 + .../expected/oldextversions.out | 60 contrib/pg_stat_statements/meson.build| 1 + .../pg_stat_statements--1.11--1.12.sql| 71 +++ .../pg_stat_statements/pg_stat_statements.c | 33 - .../pg_stat_statements.control| 2 +- .../pg_stat_statements/sql/oldextversions.sql | 5 ++ doc/src/sgml/pgstatstatements.sgml| 20 ++ src/backend/commands/explain.c| 23 +- src/backend/executor/instrument.c | 6 ++ src/backend/utils/activity/pgstat_io.c| 4 ++ src/include/executor/instrument.h | 2 + src/test/regress/expected/explain.out | 4 ++ 13 files changed, 228 insertions(+), 4 deletions(-) create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.11--1.12.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index eba4a95d91..b7d12bc7fb 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -7,6 +7,7 @@ OBJS = \ EXTENSION = pg_stat_statements DATA = pg_stat_statements--1.4.sql \ + pg_stat_statements--1.11--1.12.sql \ pg_stat_statements--1.10--1.11.sql \ pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \ pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out index 9e18fe2e47..b84e0c8484 100644 --- a/contrib/pg_stat_statements/expected/oldextversions.out +++ b/contrib/pg_stat_statements/expected/oldextversions.out @@ -308,4 +308,64 @@ SELECT count(*) > 0 AS has_data FROM pg_stat_statements; t (1 row) +-- New views for pg_stat_statements in 1.12 +AlTER EXTENSION pg_stat_statements UPDATE TO '1.12'; +\d pg_stat_statements + View "public.pg_stat_statements" + Column | Type | Collation | Nullable | Default ++--+---+--+- + userid | oid | | | + dbid
Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}
Hi, Thanks for the review! On Tue, 17 Oct 2023 at 11:40, Michael Paquier wrote: > > On Mon, Oct 16, 2023 at 01:07:07PM +0300, Nazir Bilal Yavuz wrote: > > Yes, that could be a better solution. Also, having more detailed stats > > for shared and local buffers is helpful. I updated patches in line > > with that: > > > > 0001: Counts extends same way as a write. > > It can change existing query results on an already-released branch, > but we already count the number of blocks when doing a relation > extension, so counting the write time is something I'd rather fix in > v16. If you have any objections, let me know. I agree. > > > 0002: Rename blk_{read|write}_time as shared_blk_{read|write}_time. > > Note that `git diff --check` complains here. > > --- a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql > +++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql > @@ -30,8 +30,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean, > OUT local_blks_written int8, > OUT temp_blks_read int8, > OUT temp_blks_written int8, > -OUT blk_read_time float8, > -OUT blk_write_time float8 > +OUT shared_blk_read_time float8, > +OUT shared_blk_write_time float8 > > Doing that in an extension upgrade script is incorrect. These should > not be touched. > > - Total time the statement spent reading data file blocks, in milliseconds > + Total time the statement spent reading shared data file blocks, in > milliseconds > > Or just shared blocks? That's what we use elsewhere for > pg_stat_statements. "shared data file blocks" sounds a bit confusing > for relation file blocks read/written from/to shared buffers. > > > 0003: Add new local_blk_{read|write}_time variables. > > DATA = pg_stat_statements--1.4.sql \ > + pg_stat_statements--1.11--1.12.sql \ > pg_stat_statements--1.10--1.11.sql \ > > There is no need to bump again pg_stat_statements, as it has already > been bumped to 1.11 on HEAD per the recent commit 5a3423ad8ee1 from > Daniel. So the new changes can just be added to 1.11. I updated patches based on your comments. v4 is attached. Regards, Nazir Bilal Yavuz Microsoft From 4b744c0b4e2cfdee4e95779ee19ac214b85aa150 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 13 Oct 2023 17:35:00 +0300 Subject: [PATCH v4 1/3] Include IOOp IOOP_EXTENDs while calculating block write times Extends are counted as writes in block context, so include IOOP_EXTENDs while calculating block write times. --- src/backend/utils/activity/pgstat_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index eb7d35d422..8ec8670199 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_SET_CURRENT(io_time); INSTR_TIME_SUBTRACT(io_time, start_time); - if (io_op == IOOP_WRITE) + if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) { pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); if (io_object == IOOBJECT_RELATION) -- 2.42.0 From 5f299ee59d443a53bf6f3a8e18467270f5ee8318 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 13 Oct 2023 16:28:03 +0300 Subject: [PATCH v4 2/3] Rename I/O timing statistics columns to shared_blk_{read|write}_time Since only shared blocks' timings are counted in blk_{read|write}_time, rename them as shared_blk_{read|write}_time --- .../expected/oldextversions.out | 4 +- .../pg_stat_statements--1.10--1.11.sql| 4 +- .../pg_stat_statements/pg_stat_statements.c | 14 ++- doc/src/sgml/pgstatstatements.sgml| 8 +- src/backend/commands/explain.c| 29 ++--- src/backend/executor/instrument.c | 12 +- src/backend/utils/activity/pgstat_io.c| 4 +- src/include/executor/instrument.h | 4 +- src/test/regress/expected/explain.out | 108 +- 9 files changed, 95 insertions(+), 92 deletions(-) diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out index 64982aad60..c2e0140a47 100644 --- a/contrib/pg_stat_statements/expected/oldextversions.out +++ b/contrib/pg_stat_statements/expected/oldextversions.out @@ -284,8 +284,8 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.11'; local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | - blk_read_time | double precision |
Re: gcc 12.1.0 warning
Hi, On Fri, 27 Oct 2023 at 12:34, Erik Rijkers wrote: > > Hi, > > Not sure if these compiler-mutterings are worth reporting but I guess > we're trying to get a silent compile. > > System: Debian 4.9.303-1 (2022-03-07) x86_64 GNU/Linux > Compiling with gcc 12.1.0 causes the below 'warning' and 'note'. > Compiling with --enable-cassert --enable-debug is silent, no warnings) > > In function ‘guc_var_compare’, > inlined from ‘bsearch’ at > /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23, > inlined from ‘find_option’ at guc.c:5649:35: > guc.c:5736:38: warning: array subscript ‘const struct config_generic[0]’ > is partly outside array bounds of ‘const char[8]’ [-Warray-bounds] > 5736 | return guc_name_compare(confa->name, confb->name); >| ~^~ > > guc.c: In function ‘find_option’: > guc.c:5636:25: note: object ‘name’ of size 8 > 5636 | find_option(const char *name, bool create_placeholders, bool > skip_errors, >| ^~~~ > > (Compiling with gcc 6.3.0 does not complain.) I was testing 'upgrading CI Debian images to bookworm'. I tested the bookworm on REL_15, REL_16 and upstream. REL_16 and upstream finished successfully but the CompilerWarnings task failed on REL_15 with the same error. gcc version: 12.2.0 CI Run: https://cirrus-ci.com/task/6151742664998912 Regards, Nazir Bilal Yavuz Microsoft
Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}
Hi, On Thu, 19 Oct 2023 at 08:26, Michael Paquier wrote: > > On Wed, Oct 18, 2023 at 02:56:42PM +0900, Michael Paquier wrote: > > Thanks for the new versions. I have applied 0001 and backpatched it > > for now. 0002 and 0003 look in much cleaner shape than previously. > > 0002 and 0003 have now been applied. I have split 0003 into two parts > at the end, mainly on clarity grounds: one for the counters with > EXPLAIN and a second for pg_stat_statements. > > There were a few things in the patch set. Per my notes: > - Some incorrect indentation. > - The additions of show_buffer_usage() did not handle correctly the > addition of a comma before/after the local timing block. The code > area for has_local_timing needs to check for has_temp_timing, while > the area of has_shared_timing needs to check for (has_local_timing || > has_temp_timing). > - explain.sgml was missing an update for the information related to > the read/write timings of the local blocks. Thanks for the changes, push and feedback! > > Remains what we should do about the "shared/local" string in > show_buffer_usage() for v16 and v15, as "local" is unrelated to that. > Perhaps we should just switch to "shared" in this case or just remove > the string entirely? Still that implies changing the output of > EXPLAIN on a stable branch in this case, so there could be an argument > for leaving this stuff alone. I think switching it to 'shared' makes sense. That shouldn't confuse existing monitoring queries much as the numbers won't change, right? Also, if we keep 'shared/local' there could be similar complaints to this thread in the future; so, at least adding comments can be helpful. Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thank you for the feedback! On Thu, 26 Oct 2023 at 09:28, Michael Paquier wrote: > > On Wed, Sep 20, 2023 at 10:57:48AM +0300, Nazir Bilal Yavuz wrote: > > Any kind of feedback would be appreciated. > > This was registered in the CF, so I have given it a look. Note that > 0001 has a conflict with pgstat_count_io_op_time(), so it cannot be > applied. > > +pgstat_should_track_io_time(IOObject io_object, IOContext io_context) > +{ > + /* > +* io times of IOOBJECT_WAL IOObject needs to be tracked when > +* 'track_wal_io_timing' is set regardless of 'track_io_timing'. > +*/ > + if (io_object == IOOBJECT_WAL) > + return track_wal_io_timing; > + > + return track_io_timing; > > I can see the temptation to do that, but I have mixed feelings about > the approach of mixing two GUCs in a code path dedicated to pg_stat_io > where now we only rely on track_io_timing. The result brings > confusion, while making pg_stat_io, which is itself only used for > block-based operations, harder to read. > > The suggestion I am seeing here to have a pg_stat_io_wal (with a SRF) > is quite tempting, actually, creating a neat separation between the > existing pg_stat_io and pg_stat_wal (not a SRF), with a third view > that provides more details about the contexts and backend types for > the WAL stats with its relevant fields: > https://www.postgresql.org/message-id/caakru_bm55pj3pprw0nd_-pawhlrkou69r816aeztbba-n1...@mail.gmail.com > > And perhaps just putting that everything that calls > pgstat_count_io_op_time() under track_io_timing is just natural? > What's the performance regression you would expect if both WAL and > block I/O are controlled by that, still one would expect only one of > them? I will check these and I hope I will come back with something meaningful. > > + /* Report pending statistics to the cumulative stats system */ > + pgstat_report_wal(false); > > This is hidden in 0001, still would be better if handled as a patch on > its own and optionally backpatch it as we did for the bgwriter with > e64c733bb1? I thought about it again and found the use of 'pgstat_report_wal(false);' here wrong. This was mainly for flushing WAL stats because of the WAL reads but pg_stat_wal doesn't have WAL read stats, so there is no need to flush WAL stats here. I think this should be replaced with 'pgstat_flush_io(false);'. > > Side note: I think that we should spend more efforts in documenting > what IOContext and IOOp mean. Not something directly related to this > patch, still this patch or things similar make it a bit harder which > part of it is used for what by reading pgstat.h. I agree. Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Tue, 31 Oct 2023 at 16:57, Nazir Bilal Yavuz wrote: > On Thu, 26 Oct 2023 at 09:28, Michael Paquier wrote: > > > > And perhaps just putting that everything that calls > > pgstat_count_io_op_time() under track_io_timing is just natural? > > What's the performance regression you would expect if both WAL and > > block I/O are controlled by that, still one would expect only one of > > them? > > I will check these and I hope I will come back with something meaningful. I applied the patches on upstream postgres and then run pgbench for each available clock sources couple of times: # Set fsync = off and track_io_timing = on # pgbench -i -s 100 test # pgbench -M prepared -c16 -j8 -f <( echo "SELECT pg_logical_emit_message(true, \:client_id::text, '1234567890');") -T60 test Results are: ╔═╦═══╦╗ ║ ║ track_wal_io_timing ║║ ╠═╬═══╦═══╬╣ ║ clock ║ on ║ off ║ change ║ ║ sources ║ ║ ║║ ╠═╬═══╬═══╬╣ ║ tsc ║ ║ ║║ ║ ║ 514814.459170 ║ 519826.284139 ║ %1 ║ ╠═╬═══╬═══╬╣ ║ hpet ║ ║ ║║ ║ ║ 132116.272121 ║ 141820.548447 ║ %7 ║ ╠═╬═══╬═══╬╣ ║ acpi_pm ║ ║ ║║ ║ ║ 394793.092255 ║ 403723.874719 ║ %2 ║ ╚═════╩═══╩═══╩╝ Regards, Nazir Bilal Yavuz Microsoft
Re: Adding facility for injection points (or probe points?) for more advanced tests
Hi, On Wed, 25 Oct 2023 at 07:13, Michael Paquier wrote: > > Hi all, > > I don't remember how many times in the last few years when I've had to > hack the backend to produce a test case that involves a weird race > condition across multiple processes running in the backend, to be able > to prove a point or just test a fix (one recent case: 2b8e5273e949). > Usually, I come to hardcoding stuff for the following situations: > - Trigger a PANIC, to force recovery. > - A FATAL, to take down a session, or just an ERROR. > - palloc() failure injection. > - Sleep to slow down a code path. > - Pause and release with condition variable. I liked the idea; thanks for working on this! What do you think about creating a function for updating the already created injection point's callback or name (mostly callback)? For now, you need to drop and recreate the injection point to change the callback or the name. Here is my code correctness review: diff --git a/meson_options.txt b/meson_options.txt +option('injection_points', type: 'boolean', value: true, + description: 'Enable injection points') + It is enabled by default while building with meson. diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c +LWLockRelease(InjectionPointLock); + +/* If not found, do nothing? */ +if (!found) +return; It would be good to log a warning message here. I tried to compile that with -Dwerror=true -Dinjection_points=false and got some errors (warnings): injection_point.c: In function ‘InjectionPointShmemSize’: injection_point.c:59:1: error: control reaches end of non-void function [-Werror=return-type] injection_point.c: At top level: injection_point.c:32:14: error: ‘InjectionPointHashByName’ defined but not used [-Werror=unused-variable] test_injection_points.c: In function ‘test_injection_points_run’: test_injection_points.c:69:21: error: unused variable ‘name’ [-Werror=unused-variable] The test_injection_points test runs and passes although I set -Dinjection_points=false. That could be misleading, IMO the test should be skipped if Postgres is not compiled with the injection points. Regards, Nazir Bilal Yavuz Microsoft
Re: Remove unnecessary 'always:' from CompilerWarnings task
Hi, Thanks for the review. On Wed, 8 Nov 2023 at 10:31, Peter Eisentraut wrote: > > On 05.09.23 12:25, Nazir Bilal Yavuz wrote: > > There are multiple 'always:' keywords under the CompilerWarnings task. > > Instead of that, we can use one 'always:' and move the instructions > > under this. So, I removed unnecessary ones and rearranged indents > > according to that change. > > I'm not sure this change is beneficial. The way the code is currently > arranged, it's a bit easier to move or change individual blocks, and > it's also easier to read the file, because the "always:" is next to each > "script" and doesn't scroll off the screen. That makes sense. I am planning to withdraw this soon if there are no other objections. Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for all the feedback! On Wed, 8 Nov 2023 at 08:59, Michael Paquier wrote: > > By the way, note that the patch is failing to apply, and that I've > switched it as waiting on author on 10/26. Here is an updated patchset in attachment. Rebased on the latest HEAD and changed 'pgstat_report_wal(false)' to 'pgstat_flush_io(false)' in xlogrecovery.c. I will share the new version of the patchset once I address the feedback. Regards, Nazir Bilal Yavuz Microsoft From e5db5cd6d8c47cadde0539f06bbee22368d17a41 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 26 Oct 2023 12:12:32 +0300 Subject: [PATCH v4 1/2] Show WAL stats (except streaming replication WAL) on pg_stat_io This patch aims to showing WAL stats per backend on pg_stat_io view. With this patch, it can be seen how many WAL operations it makes, their context, types and total timings per backend in pg_stat_io view. This patchset currently covers: - IOOBJECT_WAL / IOCONTEXT_NORMAL read, write and fsync. - IOOBJECT_WAL / IOCONTEXT_INIT write and fsync. doesn't cover: - Streaming replication WAL IO. --- src/backend/access/transam/xlog.c | 60 - src/backend/access/transam/xlogrecovery.c | 10 ++ src/backend/utils/activity/pgstat_io.c| 144 -- src/backend/utils/adt/pgstatfuncs.c | 10 +- src/include/pgstat.h | 8 +- 5 files changed, 178 insertions(+), 54 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b541be8eec..d265b8c032 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2256,38 +2256,22 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) Size nbytes; Size nleft; int written; - instr_time start; + instr_time io_start; /* OK to write the page(s) */ from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ; nbytes = npages * (Size) XLOG_BLCKSZ; nleft = nbytes; + + io_start = pgstat_prepare_io_time(); do { errno = 0; -/* Measure I/O timing to write WAL data */ -if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); -else - INSTR_TIME_SET_ZERO(start); - pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); written = pg_pwrite(openLogFile, from, nleft, startoffset); pgstat_report_wait_end(); -/* - * Increment the I/O timing and the number of times WAL data - * were written out to disk. - */ -if (track_wal_io_timing) -{ - instr_time end; - - INSTR_TIME_SET_CURRENT(end); - INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start); -} - PendingWalStats.wal_write++; if (written <= 0) @@ -2312,6 +2296,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) startoffset += written; } while (nleft > 0); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, + IOOP_WRITE, io_start, npages); + npages = 0; /* @@ -3005,6 +2992,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, int fd; int save_errno; int open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY; + instr_time io_start; Assert(logtli != 0); @@ -3048,6 +3036,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, (errcode_for_file_access(), errmsg("could not create file \"%s\": %m", tmppath))); + /* start timing writes for stats */ + io_start = pgstat_prepare_io_time(); + pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE); save_errno = 0; if (wal_init_zero) @@ -3083,6 +3074,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, } pgstat_report_wait_end(); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE, + io_start, 1); + if (save_errno) { /* @@ -3099,6 +3093,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, errmsg("could not write to file \"%s\": %m", tmppath))); } + /* start timing fsyncs for stats */ + io_start = pgstat_prepare_io_time(); + pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC); if (pg_fsync(fd) != 0) { @@ -3111,6 +3108,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, } pgstat_report_wait_end(); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, + IOOP_FSYNC, io_start, 1); + if (close(fd) != 0) ereport(ERROR, (errcode_for_file_access(), @@ -8282,7 +8282,7 @@ void issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) { char *msg = NULL; - instr_time start; + instr_time io_start; Assert(tli != 0); @@ -8295,11 +8295,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC) return; - /* Measure I/O timing to sync the WAL file */ - if (track_wal_io_timing) - INSTR_TIME_SET_CURRENT(start); - else - INSTR_TIME_SET_ZERO(start); + io_start = pgstat_prepare_io_time();
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Wed, 8 Nov 2023 at 04:19, Andres Freund wrote: > > Hi, > > On 2023-11-08 09:52:16 +0900, Michael Paquier wrote: > > By the way, if the write/sync quantities and times begin to be tracked > > by pg_stat_io, I'd see a pretty good argument in removing the > > equivalent columns in pg_stat_wal. It looks like this would reduce > > the confusion related to the handling of PendingWalStats added in > > pgstat_io.c, for one. > > Another approach would be to fetch the relevant columns from pg_stat_io in the > pg_stat_wal view. That'd avoid double accounting and breaking existing > monitoring. There are some differences between pg_stat_wal and pg_stat_io while collecting WAL stats. For example in the XLogWrite() function in the xlog.c file, pg_stat_wal counts wal_writes as write system calls. This is not something we want for pg_stat_io since pg_stat_io counts the number of blocks rather than the system calls, so instead incremented pg_stat_io by npages. Could that cause a problem since pg_stat_wal's behaviour will be changed? Of course, as an alternative we could change pg_stat_io's behaviour but in the end either pg_stat_wal's or pg_stat_io's behaviour will be changed. Regards, Nazir Bilal Yavuz Microsoft
Re: Failure during Building Postgres in Windows with Meson
Hi, Adding Kyotaro to CC because Kyotaro reported a similar issue before [1]. On Thu, 9 Nov 2023 at 11:59, Shlok Kyal wrote: > > Hi, > I am trying to build postgres with meson on Windows. And I am stuck in > the process. > > Steps I followed: > > 1. I clone postgres repo > > 2.Installed meson and ninja > pip install meson ninja > > 3. Then running following command: > meson setup build --buildtype debug > > 4. Then I ran > cd build > ninja > > Got following error > D:\project\repo\pg_meson\postgres\build>C:\Users\kyals\AppData\Roaming\Python\Python311\Scripts\ninja > ninja: error: 'src/backend/postgres_lib.a.p/meson_pch-c.obj', needed > by 'src/backend/postgres.exe', missing and no known rule to make it. I am able to reproduce the error. This error was introduced at meson v1.2.0, v1.1.0 and before work successfully. It seems meson tries to use pch files although Postgres is compiled with b_pch=false. This error occurs when Developer Powershell for VS is used and Postgres is compiled with b_pch=false option (which is the default on Postgres). If the -Db_pch=true option or the default powershell is used, Postgres gets built successfully. [1] https://www.postgresql.org/message-id/20231018.113148.1275969479525954369.horikyota@gmail.com -- Regards, Nazir Bilal Yavuz Microsoft
Re: Failure during Building Postgres in Windows with Meson
Hi, On Thu, 9 Nov 2023 at 18:27, Tristan Partin wrote: > > Can you try with Meson v1.2.3? I tried with Meson v1.2.3 and upstream, both failed with the same error. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for the feedback. On Mon, 20 Nov 2023 at 10:47, Michael Paquier wrote: > > On Thu, Nov 09, 2023 at 02:39:26PM +0300, Nazir Bilal Yavuz wrote: > > There are some differences between pg_stat_wal and pg_stat_io while > > collecting WAL stats. For example in the XLogWrite() function in the > > xlog.c file, pg_stat_wal counts wal_writes as write system calls. This > > is not something we want for pg_stat_io since pg_stat_io counts the > > number of blocks rather than the system calls, so instead incremented > > pg_stat_io by npages. > > > > Could that cause a problem since pg_stat_wal's behaviour will be > > changed? Of course, as an alternative we could change pg_stat_io's > > behaviour but in the end either pg_stat_wal's or pg_stat_io's > > behaviour will be changed. > > Yep, that could be confusing for existing applications that track the > information of pg_stat_wal. The number of writes is not something > that can be correctly shared between both. The timings for the writes > and the syncs could be shared at least, right? Yes, the timings for the writes and the syncs should work. Another question I have in mind is the pg_stat_reset_shared() function. When we call it with 'io' it will reset pg_stat_wal's timings and when we call it with 'wal' it won't reset them, right? > > This slightly relates to pgstat_count_io_op_n() in your latest patch, > where it feels a bit weird to see an update of > PendingWalStats.wal_sync sit in the middle of a routine dedicated to > pg_stat_io.. I am not completely sure what's the right balance here, > but I would try to implement things so as pg_stat_io paths does not > need to know about PendingWalStats. Write has block vs system calls differentiation but it is the same for sync. Because of that I put PendingWalStats.wal_sync to pg_stat_io but I agree that it looks a bit weird. -- Regards, Nazir Bilal Yavuz Microsoft
SSL tests fail on OpenSSL v3.2.0
Hi, SSL tests fail on OpenSSL v3.2.0. I tested both on macOS (CI) and debian (my local) and both failed with the same errors. To trigger these errors on CI, you may need to clear the repository cache; otherwise macOS won't install the v3.2.0 of the OpenSSL. 001_ssltests: psql exited with signal 6 (core dumped): 'psql: error: connection to server at "127.0.0.1", port 56718 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq -d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid sslcrldir=invalid user=ssltestuser dbname=trustdb hostaddr=127.0.0.1 host=common-name.pg-ssltest.test sslrootcert=invalid sslmode=require -f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm 002_scram: psql exited with signal 6 (core dumped): 'psql: error: connection to server at "127.0.0.1", port 54531 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq -d dbname=trustdb sslmode=require sslcert=invalid sslrootcert=invalid hostaddr=127.0.0.1 host=localhost user=ssltestuser -f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 1997. 003_sslinfo: psql exited with signal 6 (core dumped): 'psql: error: connection to server at "127.0.0.1", port 59337 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq -d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid sslcrldir=invalid sslrootcert=ssl/root+server_ca.crt sslmode=require dbname=certdb hostaddr=127.0.0.1 host=localhost user=ssltestuser sslcert=ssl/client_ext.crt sslkey=/Users/admin/pgsql/build/testrun/ssl/003_sslinfo/data/tmp_test_q11O/client_ext.key -f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 1997. macOS CI run: https://cirrus-ci.com/task/5128008789393408 I couldn't find the cause yet but just wanted to inform you. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Wed, 8 Nov 2023 at 10:34, Bharath Rupireddy wrote: > > Is there any plan to account for WAL read stats in the WALRead() > function which will cover walsenders i.e. WAL read by logical and > streaming replication, WAL read by pg_walinspect and so on? I see the > patch already covers WAL read stats by recovery in XLogPageRead(), but > not other page_read callbacks which will end up in WALRead() > eventually. If added, the feature at > https://www.postgresql.org/message-id/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com > can then extend it to cover WAL read from WAL buffer stats. Yes, I am planning to create a patch for that after this patch is done. Thanks for informing! -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for all the feedback. I am sharing the new version of the patchset. Current status of the patchset is: - IOOBJECT_WAL / IOCONTEXT_NORMAL / read, write, fsync stats and their tests are added. - IOOBJECT_WAL / IOCONTEXT_INIT stats and their tests are added. - Documentation is updated. - pg_stat_io shows different op_bytes for the IOOBJECT_WAL operations. - PendingWalStats.wal_sync and PendingWalStats.wal_write_time / PendingWalStats.wal_sync_time are moved to pgstat_count_io_op_n() / pgstat_count_io_op_time() respectively. Updates & Discussion items: - Try to set op_bytes for BackendType / IOContext: I think we don't need this now, we will need this when we add streaming replication WAL IOs. - Decide which 'BackendType / IOContext / IOOp' should not be tracked: -- IOOBJECT_WAL / IOCONTEXT_INIT + IOCONTEXT_NORMAL / write and fsync IOs can be done on every backend that tracks IO statistics. Because of that and since we have a pgstat_tracks_io_bktype(bktype) check, I didn't add another check for this. -- I found that only the standalone backend and startup backend do IOOBJECT_WAL / IOCONTEXT_NORMAL / read IOs. So, I added a check for that but I am not sure if there are more backends that do WAL reads on WAL recovery. - For the IOOBJECT_WAL / IOCONTEXT_INIT and IOOBJECT_WAL / IOCONTEXT_NORMAL / read tests, I used initial WAL IOs to check these stats. I am not sure if that is the correct way or enough to test these stats. - To not calculate WAL timings on pg_stat_wal and pg_stat_io view, pg_stat_wal view's WAL timings are fetched from pg_stat_io. Since these timings are fetched from pg_stat_io, pg_stat_reset_shared('io') will reset pg_stat_wal's timings too. - I didn't move 'PendingWalStats.wal_sync' out from the 'pgstat_count_io_op_n' function because they count the same thing (block vs system calls) but I agree that this doesn't look good. Any kind of feedback would be appreciated. -- Regards, Nazir Bilal Yavuz Microsoft From 4ad85b11d418ae78237ed70eced6e3b46d086ef5 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 1 Dec 2023 10:03:21 +0300 Subject: [PATCH v5 3/4] Add IOOBJECT_WAL / IOCONTEXT_INIT / write and fsync tests --- src/test/regress/expected/stats.out | 19 +++ src/test/regress/sql/stats.sql | 10 ++ 2 files changed, 29 insertions(+) diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 4d3a515bdd..4adda9e479 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -862,6 +862,25 @@ WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid(); t (1 row) +-- Test pg_stat_io IOOBJECT_WAL / IOCONTEXT_INIT / write and fsync. +-- When the servers starts, the initial WAL file must be created, +-- so check these stats before stats get resetted. +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'init' and object = 'wal' \gset io_sum_wal_init_ +SELECT :io_sum_wal_init_writes > 0; + ?column? +-- + t +(1 row) + +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_init_fsyncs > 0; + ?column? +-- + t +(1 row) + - -- Test that resetting stats works for reset timestamp - diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index aa48e65dc8..72e864a0d2 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -442,6 +442,16 @@ SELECT (current_schemas(true))[1] = ('pg_temp_' || beid::text) AS match FROM pg_stat_get_backend_idset() beid WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid(); +-- Test pg_stat_io IOOBJECT_WAL / IOCONTEXT_INIT / write and fsync. +-- When the servers starts, the initial WAL file must be created, +-- so check these stats before stats get resetted. +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE context = 'init' and object = 'wal' \gset io_sum_wal_init_ +SELECT :io_sum_wal_init_writes > 0; +SELECT current_setting('fsync') = 'off' + OR :io_sum_wal_init_fsyncs > 0; + - -- Test that resetting stats works for reset timestamp - -- 2.43.0 From c7ae6c12cd02806d9d8201d738920179985cee7a Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 10 Nov 2023 14:52:22 +0300 Subject: [PATCH v5 2/4] Add IOOBJECT_WAL / IOCONTEXT_NORMAL / write and fsync tests --- src/test/regress/expected/stats.out | 26 ++ src/test/regress/sql/stats.sql | 11 +++ 2 files changed, 37 insertions(+) diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 346e10a3d2..4d3a515bdd 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1255,6 +1255,7 @@ SELECT pg_stat_get_subscription_stats(NULL); -- - extends of relations us
Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c
Hi, There is an ongoing thread [1] for adding missing SQL error codes to PANIC and FATAL error reports in xlogrecovery.c file. I did the same but for xlog.c and relcache.c files. I couldn't find a suitable error code for the "cache lookup failed for relation" error in relcache.c and this error comes up in many places. Would it be reasonable to create a new error code specifically for this? Any kind of feedback would be appreciated. [1] https://www.postgresql.org/message-id/CAPMWgZ8g17Myb5ZRE5aTNowUohafk4j48mZ_5_Zn9JnR5p2u0w%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft v1-0001-xlog.c-Add-missing-error-codes-to-PANIC-FATAL-err.patch Description: Binary data v1-0002-relcache.c-Add-missing-error-codes-to-PANIC-FATAL.patch Description: Binary data
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for the feedback! The new version of the patch is attached. On Tue, 5 Dec 2023 at 09:16, Michael Paquier wrote: > > - if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) > + if (io_op == IOOP_EXTEND || io_op == IOOP_WRITE) > > Unrelated diff. Done. > > + if (io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL && > + io_op == IOOP_FSYNC) > + PendingWalStats.wal_sync += cnt; > > Nah, I really don't think that adding this dependency within > pg_stat_io is a good idea. > > - PendingWalStats.wal_sync++; > + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, > + io_start, 1); > > This is the only caller where this matters, and the count is always 1. I reverted that, pgstat_count_io_op_n doesn't count PendingWalStats.wal_sync now. > > + no_wal_normal_read = bktype == B_AUTOVAC_LAUNCHER || > + bktype == B_AUTOVAC_WORKER || bktype == B_BACKEND || > + bktype == B_BG_WORKER || bktype == B_BG_WRITER || > + bktype == B_CHECKPOINTER || bktype == B_WAL_RECEIVER || > + bktype == B_WAL_SENDER || bktype == B_WAL_WRITER; > + > + if (no_wal_normal_read && > + (io_object == IOOBJECT_WAL && > +io_op == IOOP_READ)) > + return false; > > This may be more readable if an enum is applied, without a default > clause so as it would not be forgotten if a new type is added, perhaps > in its own little routine. Done. > > - if (track_io_timing) > + if (track_io_timing || track_wal_io_timing) > INSTR_TIME_SET_CURRENT(io_start); > else > > This interface from pgstat_prepare_io_time() is not really good, > because we could finish by setting io_start in the existing code paths > calling this routine even if track_io_timing is false when > track_wal_io_timing is true. Why not changing this interface a bit > and pass down a GUC (track_io_timing or track_wal_io_timing) as an > argument of the function depending on what we expect to trigger the > timings? Done in 0001. > > - /* Convert counters from microsec to millisec for display */ > - values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / > 1000.0); > - values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / > 1000.0); > + /* > +* There is no need to calculate timings for both pg_stat_wal and > +* pg_stat_io. So, fetch timings from pg_stat_io to make stats > gathering > +* cheaper. Note that, since timings are fetched from pg_stat_io; > +* pg_stat_reset_shared('io') will reset pg_stat_wal's timings too. > +* > +* Convert counters from microsec to millisec for display > +*/ > + values[6] = Float8GetDatum(pg_stat_get_io_time(IOOBJECT_WAL, > + > IOCONTEXT_NORMAL, > + > IOOP_WRITE)); > + values[7] = Float8GetDatum(pg_stat_get_io_time(IOOBJECT_WAL, > + > IOCONTEXT_NORMAL, > + > IOOP_FSYNC)); > > Perhaps it is simpler to remove these columns from pg_stat_get_wal() > and plug an SQL upgrade to the view definition of pg_stat_wal? Done in 0003 but I am not sure if that is what you expected. > Finding a good balance between the subroutines, the two GUCs, the > contexts, the I/O operation type and the objects is the tricky part of > this patch. If the dependency to PendingWalStats is removed and if > the interface of pgstat_prepare_io_time is improved, things are a bit > cleaner, but it feels like we could do more.. Nya. I agree. The patch is not logically complicated but it is hard to select the best way. Any kind of feedback would be appreciated. -- Regards, Nazir Bilal Yavuz Microsoft From b7bf7b92fa274775136314ecfde90fa32ed435cb Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Wed, 29 Nov 2023 15:30:03 +0300 Subject: [PATCH v6 6/6] Add IOOBJECT_WAL / IOCONTEXT_NORMAL / read tests --- src/test/regress/expected/stats.out | 12 src/test/regress/sql/stats.sql | 8 2 files changed, 20 insertions(+) diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 4adda9e479..7f5340cd7e 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -881,6 +881,18 @@ SELECT current_sett
Re: Add code indentation check to cirrus-ci (was Re: Add BF member koel-like indentation checks to SanityCheck CI)
Hi, You may want to check out the WIP patch [1] about adding meson targets to run pgindent by Andres. [1] https://www.postgresql.org/message-id/20231019044907.ph6dw637loqg3lqk%40awork3.anarazel.de -- Regards, Nazir Bilal Yavuz Microsoft
Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs
Hi, Sorry for the late reply. On Fri, 6 Oct 2023 at 17:07, Tom Lane wrote: > > As a quick cross-check, I searched our commit log to see how many > README-only commits there were so far this year. I found 11 since > January. (Several were triggered by the latest round of pgindent > code and process changes, so maybe this is more than typical.) > > Not sure what that tells us about the value of changing the CI > logic, but it does seem like it could be worth the one-liner > change needed to teach buildfarm animals to ignore READMEs. I agree that it could be worth implementing this logic on buildfarm animals. In case we want to implement the same logic on the CI, I added a new version of the patch; it skips CI completely if the changes are only in the README files. -- Regards, Nazir Bilal Yavuz Microsoft From 6c268233d13262a31965baf2dbb42913d83dab1d Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 14 Dec 2023 16:23:28 +0300 Subject: [PATCH v3] If the changes are only in the README files, skip all the tasks. If the changes are only in the README files, skip all the tasks to save CI time. --- .cirrus.tasks.yml | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850..dfd12f8b04 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -4,6 +4,12 @@ # further details, see src/tools/ci/README +# If the changes are only in the README files, skip all the tasks. +# +# This skip is overriden in 'SanityCheck' task. So, the same check is +# added 'SanityCheck' task too. +skip: changesIncludeOnly('**/README') + env: # The lower depth accelerates git clone. Use a bit of depth so that # concurrent tasks and retrying older jobs have a chance of working. @@ -55,11 +61,12 @@ on_failure_meson: &on_failure_meson task: name: SanityCheck - # If a specific OS is requested, don't run the sanity check. This shortens - # push-wait-for-ci cycle time a bit when debugging operating system specific - # failures. Uses skip instead of only_if, as cirrus otherwise warns about - # only_if conditions not matching. - skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' + # Don't run the sanity check if the changes are only in the README files or + # if a specific OS is requested. The latter shortens push-wait-for-ci cycle + # time a bit when debugging operating system specific failures. Uses skip + # instead of only_if, as cirrus otherwise warns about only_if conditions not + # matching. + skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || changesIncludeOnly('**/README') env: CPUS: 4 -- 2.43.0
Re: Remove MSVC scripts from the tree
Hi, On Tue, 19 Dec 2023 at 18:24, Peter Eisentraut wrote: > > On 18.12.23 14:52, Peter Eisentraut wrote: > >> 2) I had seen that if sed/gzip is not available meson build will fail: > >> 2.a) > >> Program gsed sed found: NO > >> meson.build:334:6: ERROR: Program 'gsed sed' not found or not executable > > > > Yes, this would need to be improved. Currently, sed is only required if > > either selinux or dtrace is enabled, which isn't supported on Windows. > > But we should adjust the build scripts to not fail the top-level setup > > run unless those options are enabled. > > > >> 2.b) > >> Program gzip found: NO > >> meson.build:337:7: ERROR: Program 'gzip' not found or not executable > > > > gzip is only required for certain test suites, so again we should adjust > > the build scripts to not fail the build but instead skip the tests as > > appropriate. > > Here are patches for these two issues. More testing would be appreciated. 0001-meson-Require-sed-only-when-needed: +sed = find_program(get_option('SED'), 'sed', native: true, + required: get_option('dtrace').enabled() or get_option('selinux').enabled()) dtrace is disabled as default but selinux is set to auto. So, meson could find selinux ( because of the auto ) and fail to find sed, then compilation will fail with: contrib/sepgsql/meson.build:34:19: ERROR: Tried to use not-found external program in "command" I think we need to require sed when dtrace or selinux is found, not by looking at the return value of the get_option().enabled(). Second patch looks good to me. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, Thanks for the review and feedback on your previous reply! On Mon, 25 Dec 2023 at 09:40, Michael Paquier wrote: > > On Mon, Dec 25, 2023 at 03:20:58PM +0900, Michael Paquier wrote: > > pgstat_tracks_io_bktype() does not look correct to me. Why is the WAL > > receiver considered as something correct in the list of backend types, > > while the intention is to *not* add it to pg_stat_io? I have tried to > > switche to the correct behavior of returning false for a > > B_WAL_RECEIVER, to notice that pg_rewind's test 002_databases.pl > > freezes on its shutdown sequence. Something weird is going on here. > > Could you look at it? See the XXX comment in the attached, which is > > the same behavior as v6-0002. It looks to me that the patch has > > introduced an infinite loop tweaking pgstat_tracks_io_bktype() in an > > incorrect way to avoid the root issue. > > Ah, that's because it would trigger an assertion failure: > TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object, > io_context, io_op)"), File: "pgstat_io.c", Line: 89, PID: 6824 > postgres: standby_local: walreceiver > (ExceptionalCondition+0xa8)[0x560d1b4dd38a] > > And the backtrace just tells that this is the WAL receiver > initializing a WAL segment: > #5 0x560d1b3322c8 in pgstat_count_io_op_n > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > cnt=1) at pgstat_io.c:89 > #6 0x560d1b33254a in pgstat_count_io_op_time > (io_object=IOOBJECT_WAL, io_context=IOCONTEXT_INIT, io_op=IOOP_WRITE, > start_time=..., cnt=1) at pgstat_io.c:181 > #7 0x560d1ae7f932 in XLogFileInitInternal (logsegno=3, logtli=1, > added=0x7ffd2733c6eb, path=0x7ffd2733c2e0 "pg_wal/0001", '0' > , "3") at xlog.c:3115 > #8 0x560d1ae7fc4e in XLogFileInit (logsegno=3, logtli=1) at > xlog.c:3215 Correct. > > Wouldn't it be simpler to just bite the bullet in this case and handle > WAL receivers in the IO tracking? There is one problem and I couldn't decide how to solve it. We need to handle read IO in WALRead() in xlogreader.c. How many bytes the WALRead() function will read is controlled by a variable and it can be different from XLOG_BLCKSZ. This is a problem because pg_stat_io's op_bytes column is a constant. Here are all WALRead() function calls: 1- read_local_xlog_page_guts() in xlogutils.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 2- summarizer_read_local_xlog_page() in walsummarizer.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 3- logical_read_xlog_page() in walsender.c => WALRead(XLOG_BLCKSZ) => always reads XLOG_BLCKSZ. 4- XLogSendPhysical() in walsender.c => WALRead(nbytes) => nbytes can be different from XLOG_BLCKSZ. 5- WALDumpReadPage() in pg_waldump.c => WALRead(count) => count can be different from XLOG_BLCKSZ. 4 and 5 are the problematic calls. Melanie's answer to this problem on previous discussions: On Wed, 9 Aug 2023 at 21:52, Melanie Plageman wrote: > > If there is any combination of BackendType and IOContext which will > always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's > op_bytes. For other cases, we may have to consider using op_bytes 1 and > tracking reads and write IOOps in number of bytes (instead of number of > pages). I don't actually know if there is a clear separation by > BackendType for these different cases. Using op_bytes as 1 solves this problem but since it will be different from the rest of the pg_stat_io view it could be hard to understand. There is no clear separation by backends as it can be seen from the walsender. > > The other alternative I see is to use XLOG_BLCKSZ as the op_bytes and > treat op_bytes * number of reads as an approximation of the number of > bytes read. I don't actually know what makes more sense. I don't think I > would like having a number for bytes that is not accurate. Also, we have a similar problem in XLogPageRead() in xlogrecovery.c. pg_pread() call tries to read XLOG_BLCKSZ but it is not certain and we don't count IO if it couldn't read XLOG_BLCKSZ. IMO, this is not as important as the previous problem but it still is a problem. I would be glad to hear opinions on these problems. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Tue, 26 Dec 2023 at 03:06, Michael Paquier wrote: > > On Mon, Dec 25, 2023 at 04:09:34PM +0300, Nazir Bilal Yavuz wrote: > > On Wed, 9 Aug 2023 at 21:52, Melanie Plageman > > wrote: > >> If there is any combination of BackendType and IOContext which will > >> always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's > >> op_bytes. For other cases, we may have to consider using op_bytes 1 and > >> tracking reads and write IOOps in number of bytes (instead of number of > >> pages). I don't actually know if there is a clear separation by > >> BackendType for these different cases. > > > > Using op_bytes as 1 solves this problem but since it will be different > > from the rest of the pg_stat_io view it could be hard to understand. > > There is no clear separation by backends as it can be seen from the > > walsender. > > I find the use of 1 in this context a bit confusing, because when > referring to a counter at N, then it can be understood as doing N > times a operation, but it would be much less than that. Another > solution would be to use NULL (as a synonym of "I don't know") and > then document that in this case all the bigint counters of pg_stat_io > track the number of bytes rather than the number of operations? Yes, that makes sense. Maybe it is better to create a pg_stat_io_wal view like you said before. We could remove unused columns and add op_bytes for each writes and reads. Also, we can track both the number of bytes and the number of the operations. This doesn't fully solve the problem but it will be easier to modify it to meet our needs. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Tue, 26 Dec 2023 at 13:10, Michael Paquier wrote: > > On Tue, Dec 26, 2023 at 11:27:16AM +0300, Nazir Bilal Yavuz wrote: > > Maybe it is better to create a pg_stat_io_wal view like you said > > before. We could remove unused columns and add op_bytes for each > > writes and reads. Also, we can track both the number of bytes and the > > number of the operations. This doesn't fully solve the problem but it > > will be easier to modify it to meet our needs. > > I am not sure while the whole point of the exercise is to have all the > I/O related data in a single view. Something that I've also found a > bit disturbing yesterday while looking at your patch is the fact that > the operation size is guessed from the context and object type when > querying the view because now everything is tied to BLCKSZ. This > patch extends it with two more operation sizes, and there are even > cases where it may be a variable. Could it be a better option to > extend pgstat_count_io_op_time() so as callers can themselves give the > size of the operation? Do you mean removing the op_bytes column and tracking the number of bytes in reads, writes, and extends? If so, that makes sense to me but I don't want to remove the number of operations; I believe that has a value too. We can extend the pgstat_count_io_op_time() so it can both track the number of bytes and the number of operations. Also, it is not directly related to this patch but vectored IO [1] is coming soon; so the number of operations could be wrong since vectored IO could merge a couple of operations. > > The whole patch is kind of itself complicated enough, so I'd be OK to > discard the case of the WAL receiver for now. Now, if we do so, the > code stack of pgstat_io.c should handle WAL receivers as something > entirely disabled until all the known issues are solved. There is > still a lot of value in tracking WAL data associated to the WAL > writer, normal backends and WAL senders. Why can't we add comments and leave it as it is? Is it because this could cause misunderstandings? If we want to entirely disable it, we can add if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) return; to the top of the pgstat_count_io_op_time() since all IOOBJECT_WAL calls are done by this function, then we can disable it at pgstat_tracks_io_bktype(). [1] https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft
Re: Show WAL write and fsync stats in pg_stat_io
Hi, On Sun, 31 Dec 2023 at 03:58, Michael Paquier wrote: > > On Tue, Dec 26, 2023 at 03:35:52PM +0300, Nazir Bilal Yavuz wrote: > > On Tue, 26 Dec 2023 at 13:10, Michael Paquier wrote: > >> I am not sure while the whole point of the exercise is to have all the > >> I/O related data in a single view. Something that I've also found a > >> bit disturbing yesterday while looking at your patch is the fact that > >> the operation size is guessed from the context and object type when > >> querying the view because now everything is tied to BLCKSZ. This > >> patch extends it with two more operation sizes, and there are even > >> cases where it may be a variable. Could it be a better option to > >> extend pgstat_count_io_op_time() so as callers can themselves give the > >> size of the operation? > > > > Do you mean removing the op_bytes column and tracking the number of > > bytes in reads, writes, and extends? If so, that makes sense to me but > > I don't want to remove the number of operations; I believe that has a > > value too. We can extend the pgstat_count_io_op_time() so it can both > > track the number of bytes and the number of operations. > > Apologies if my previous wording sounded confusing. The idea I had in > mind was to keep op_bytes in pg_stat_io, and extend it so as a value > of NULL (or 0, or -1) is a synonym as "writes", "extends" and "reads" > as a number of bytes. Oh, I understand it now. Yes, that makes sense. I thought removing op_bytes completely ( as you said "This patch extends it with two more operation sizes, and there are even cases where it may be a variable" ) from pg_stat_io view then adding something like {read | write | extend}_bytes and {read | write | extend}_calls could be better, so that we don't lose any information. > > Also, it is not directly related to this patch but vectored IO [1] is > > coming soon; so the number of operations could be wrong since vectored > > IO could merge a couple of operations. > > Hmm. I have not checked this patch series so I cannot say for sure, > but we'd likely just want to track the number of bytes if a single > operation has a non-equal size rather than registering in pg_stat_io N > rows with different op_bytes, no? Yes, that is correct. > I am looping in Thomas Munro in CC for comments. Thanks for doing that. > > If we want to entirely disable it, we can add > > > > if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL) > > return; > > > > to the top of the pgstat_count_io_op_time() since all IOOBJECT_WAL > > calls are done by this function, then we can disable it at > > pgstat_tracks_io_bktype(). > > Yeah, a limitation like that may be acceptable for now. Tracking the > WAL writer and WAL sender activities can be relevant in a lot of cases > even if we don't have the full picture for the WAL receiver yet. I added that and disabled B_WAL_RECEIVER backend with comments explaining why. v8 is attached. -- Regards, Nazir Bilal Yavuz Microsoft From dc258a5b168f15c9771f174924261b151193 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Wed, 3 Jan 2024 15:36:19 +0300 Subject: [PATCH v8] Show WAL stats on pg_stat_io (except streaming replication) This patch aims to showing WAL stats per backend on pg_stat_io view. With this patch, it can be seen how many WAL operations it makes, their context, types and total timings per backend in pg_stat_io view. In this path new IOContext IOCONTEXT_INIT is introduced, it is for IO operations done while creating the things. Currently, it is used only together with IOObject IOOBJECT_WAL. IOOBJECT_WAL means IO operations related to WAL. IOOBJECT_WAL / IOCONTEXT_NORMAL means IO operations done on already created WAL segments. IOOBJECT_WAL / IOCONTEXT_INIT means IO operations done while creating the WAL segments. This patch currently covers: - Documentation - IOOBJECT_WAL / IOCONTEXT_NORMAL / read, write and fsync stats on pg_stat_io. - IOOBJECT_WAL / IOCONTEXT_INIT / write and fsync stats on pg_stat_io. doesn't cover: - Streaming replication WAL IO. --- src/include/catalog/pg_proc.dat | 6 +- src/include/pgstat.h | 6 +- src/backend/access/transam/xlog.c | 58 +-- src/backend/access/transam/xlogrecovery.c | 10 ++ src/backend/catalog/system_views.sql | 15 ++- src/backend/utils/activity/pgstat_io.c| 119 -- src/backend/utils/adt/pgstatfuncs.c | 24 ++--- src/test/regress/expected/rules.out | 27 +++-- src/test/regress/expected/stats.out | 53 ++ src/test/regress/sql/stats.sql| 25 + doc/src/sgml/monitoring.sgml |
Re: Confine vacuum skip logic to lazy_scan_skip
Hi, On Fri, 5 Jan 2024 at 02:25, Jim Nasby wrote: > > On 1/4/24 2:23 PM, Andres Freund wrote: > > On 2024-01-02 12:36:18 -0500, Melanie Plageman wrote: > > Subject: [PATCH v2 1/6] lazy_scan_skip remove unnecessary local var rel_pages > Subject: [PATCH v2 2/6] lazy_scan_skip remove unneeded local var > nskippable_blocks > > I think these may lead to worse code - the compiler has to reload > vacrel->rel_pages/next_unskippable_block for every loop iteration, because it > can't guarantee that they're not changed within one of the external functions > called in the loop body. > > Admittedly I'm not up to speed on recent vacuum changes, but I have to wonder > if the concept of skipping should go away in the context of vector IO? > Instead of thinking about "we can skip this range of blocks", why not > maintain a list of "here's the next X number of blocks that we need to > vacuum"? Sorry if I misunderstood. AFAIU, with the help of the vectored IO; "the next X number of blocks that need to be vacuumed" will be prefetched by calculating the unskippable blocks ( using the lazy_scan_skip() function ) and the X will be determined by Postgres itself. Do you have something different in your mind? -- Regards, Nazir Bilal Yavuz Microsoft
Re: make pg_ctl more friendly
Hi, Thank you for working on this! I agree that the current message is not friendly. On Thu, 9 Nov 2023 at 10:19, Junwang Zhao wrote: > > On Thu, Nov 9, 2023 at 3:08 PM Junwang Zhao wrote: > > > > After a PITR shutdown, the db state should be *shut down in recovery*, try > > the > > patch attached. > > > > previous patch has some format issues, V2 attached. v2-0001-PITR-shutdown-should-not-report-error-by-pg_ctl.patch: - "Examine the log output.\n"), + "Examine the log output\n"), progname); I don't think that this is needed. Other than that, the patch looks good and I confirm that after PITR shutdown: "PITR shutdown" "update configuration for startup again if needed" message shows up, instead of: "pg_ctl: could not start server" "Examine the log output.". nitpick: It would be better if the order of the error message cases and enums is the same ( i.e. 'POSTMASTER_RECOVERY_SHUTDOWN' before 'POSTMASTER_FAILED' in enum ) -- Regards, Nazir Bilal Yavuz Microsoft
Use the same Windows image on both VS and MinGW tasks
Hi, The VS and MinGW Windows images are merged on Andres' pg-vm-images repository now [1]. So, the old pg-ci-windows-ci-vs-2019 and pg-ci-windows-ci-mingw64 images will not be updated from now on. This new merged image (pg-ci-windows-ci) needs to be used on both VS and MinGW tasks. I attached a patch for using pg-ci-windows-ci Windows image on VS and MinGW tasks. CI run when pg-ci-windows-ci is used: https://cirrus-ci.com/build/6063036847357952 [1]: https://github.com/anarazel/pg-vm-images/commit/6747f676b97348d47f041b05aa9b36cde43c33fe Regards, Nazir Bilal Yavuz Microsoft From 8db9f8672a9216f02e8e599626121c0de5befd59 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 28 Aug 2023 13:23:41 +0300 Subject: [PATCH v1] windows: Use the same Windows image on both VS and MinGW tasks The VS and MinGW Windows images are merged now [1]. So, the old pg-ci-windows-ci-vs-2019 and pg-ci-windows-ci-mingw64 images will not be updated from now on. This new merged image (pg-ci-windows-ci) needs to be used on both VS and MinGW tasks. [1]: https://github.com/anarazel/pg-vm-images/commit/6747f676b97348d47f041b05aa9b36cde43c33fe --- .cirrus.tasks.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..66aee48a5d3 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -514,6 +514,7 @@ WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE PG_TEST_USE_UNIX_SOCKETS: 1 PG_REGRESS_SOCK_DIR: "c:/cirrus/" DISK_SIZE: 50 +IMAGE_FAMILY: pg-ci-windows-ci sysinfo_script: | chcp @@ -537,7 +538,6 @@ task: # given that it explicitly prevents crash dumps from working... # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX CIRRUS_WINDOWS_ERROR_MODE: 0x8001 -IMAGE_FAMILY: pg-ci-windows-ci-vs-2019 <<: *windows_task_template @@ -598,7 +598,6 @@ task: # Start bash in current working directory CHERE_INVOKING: 1 BASH: C:\msys64\usr\bin\bash.exe -l -IMAGE_FAMILY: pg-ci-windows-ci-mingw64 <<: *windows_task_template -- 2.40.1
Create shorthand for including all extra tests
Hi, I realized that I forgot to add the new extra test to my test scripts. So, I thought maybe we can use shorthand for including all extra tests. With that, running a full testsuite is easier without having to keep up with new tests and updates. I created an 'all' option for PG_TEST_EXTRA to enable all test suites defined under PG_TEST_EXTRA. I created the check_extra_tests_enabled() function in the Test/Utils.pm file. This function takes the test's name as an input and checks if PG_TEST_EXTRA contains 'all' or this test's name. I thought another advantage could be that this can be used in CI. But when 'wal_consistency_checking' is enabled, CI times get longer since it does resource intensive operations. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From be91a0aaf926c83bf266f92e0523f41ca333b048 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Mon, 4 Sep 2023 16:58:42 +0300 Subject: [PATCH v1] Create shorthand for including all extra tests This patch aims to make running full testsuite easier without having to keep up with new tests and updates. Create 'all' option for PG_TEST_EXTRA to enable all test suites defined under PG_TEST_EXTRA. That is achieved by creating check_extra_tests_enabled() function in Test/Utils.pm file. This function takes the test's name as an input and checks if PG_TEST_EXTRA contains 'all' or this test's name. --- doc/src/sgml/regress.sgml| 9 + src/interfaces/libpq/t/004_load_balance_dns.pl | 2 +- src/test/kerberos/t/001_auth.pl | 2 +- src/test/ldap/t/001_auth.pl | 2 +- src/test/ldap/t/002_bindpasswd.pl| 2 +- src/test/modules/Makefile| 2 +- .../t/001_mutated_bindpasswd.pl | 2 +- src/test/perl/PostgreSQL/Test/Utils.pm | 16 src/test/recovery/t/027_stream_regress.pl| 3 +-- src/test/ssl/t/001_ssltests.pl | 2 +- src/test/ssl/t/002_scram.pl | 2 +- src/test/ssl/t/003_sslinfo.pl| 2 +- 12 files changed, 35 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index 675db86e4d7..40269c258ef 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -262,6 +262,15 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance' The following values are currently supported: + + all + + + Enables all extra tests. + + + + kerberos diff --git a/src/interfaces/libpq/t/004_load_balance_dns.pl b/src/interfaces/libpq/t/004_load_balance_dns.pl index 875070e2120..62eeb21843e 100644 --- a/src/interfaces/libpq/t/004_load_balance_dns.pl +++ b/src/interfaces/libpq/t/004_load_balance_dns.pl @@ -6,7 +6,7 @@ use PostgreSQL::Test::Utils; use PostgreSQL::Test::Cluster; use Test::More; -if ($ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) +if (!PostgreSQL::Test::Utils::check_extra_text_enabled('load_balance')) { plan skip_all => 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index 0deb9bffc8d..59574178afc 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -28,7 +28,7 @@ if ($ENV{with_gssapi} ne 'yes') { plan skip_all => 'GSSAPI/Kerberos not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('kerberos')) { plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl index 3e113fd6ebb..9db07e801e9 100644 --- a/src/test/ldap/t/001_auth.pl +++ b/src/test/ldap/t/001_auth.pl @@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes') { plan skip_all => 'LDAP not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap')) { plan skip_all => 'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/002_bindpasswd.pl b/src/test/ldap/t/002_bindpasswd.pl index bcd4aa2b742..a1b1bd8c22f 100644 --- a/src/test/ldap/t/002_bindpasswd.pl +++ b/src/test/ldap/t/002_bindpasswd.pl @@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes') { plan skip_all => 'LDAP not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap')) { plan skip_all => 'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA'; diff --git a/src/test/modules/Makefile b/src/test/modu
Remove unnecessary 'always:' from CompilerWarnings task
Hi, There are multiple 'always:' keywords under the CompilerWarnings task. Instead of that, we can use one 'always:' and move the instructions under this. So, I removed unnecessary ones and rearranged indents according to that change. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From 3bd8f6ff5b8add90dcfe42761a3c1fe81a5c174a Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 5 Sep 2023 13:04:24 +0300 Subject: [PATCH v1] Remove unnecessary 'always:' from CompilerWarnings task Warning scripts and upload_caches instructions can be defined under one 'always:' keyword, so remove unnecessary ones and rearrange indents according to that change. --- .cirrus.tasks.yml | 43 ++- 1 file changed, 18 insertions(+), 25 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..e4a81a151b6 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -684,8 +684,8 @@ task: # different compilers to build with different combinations of dtrace on/off # and cassert on/off. - # gcc, cassert off, dtrace on always: +# gcc, cassert off, dtrace on gcc_warning_script: | time ./configure \ --cache gcc.cache \ @@ -695,8 +695,7 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - # gcc, cassert on, dtrace off - always: +# gcc, cassert on, dtrace off gcc_a_warning_script: | time ./configure \ --cache gcc.cache \ @@ -706,8 +705,7 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - # clang, cassert off, dtrace off - always: +# clang, cassert off, dtrace off clang_warning_script: | time ./configure \ --cache clang.cache \ @@ -716,8 +714,7 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - # clang, cassert on, dtrace on - always: +# clang, cassert on, dtrace on clang_a_warning_script: | time ./configure \ --cache clang.cache \ @@ -728,8 +725,7 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - # cross-compile to windows - always: +# cross-compile to windows mingw_cross_warning_script: | time ./configure \ --host=x86_64-w64-mingw32 \ @@ -740,11 +736,10 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - ### - # Verify docs can be built - ### - # XXX: Only do this if there have been changes in doc/ since last build - always: +### +# Verify docs can be built +### +# XXX: Only do this if there have been changes in doc/ since last build docs_build_script: | time ./configure \ --cache gcc.cache \ @@ -754,16 +749,15 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} -C doc - ### - # Verify headerscheck / cpluspluscheck succeed - # - # - Don't use ccache, the files are uncacheable, polluting ccache's - # cache - # - Use -fmax-errors, as particularly cpluspluscheck can be very verbose - # - XXX have to disable ICU to avoid errors: - # https://postgr.es/m/20220323002024.f2g6tivduzrktgfa%40alap3.anarazel.de - ### - always: +### +# Verify headerscheck / cpluspluscheck succeed +# +# - Don't use ccache, the files are uncacheable, polluting ccache's +# cache +# - Use -fmax-errors, as particularly cpluspluscheck can be very verbose +# - XXX have to disable ICU to avoid errors: +# https://postgr.es/m/20220323002024.f2g6tivduzrktgfa%40alap3.anarazel.de +### headers_headerscheck_script: | time ./configure \ ${LINUX_CONFIGURE_FEATURES} \ @@ -775,5 +769,4 @@ task: headers_cpluspluscheck_script: | time make -s cpluspluscheck EXTRAFLAGS='-fmax-errors=10' - always: upload_caches: ccache -- 2.40.1
Re: Create shorthand for including all extra tests
Hi, Thanks for the feedback! I updated the patch, 'needs-private-lo' option enables kerberos, ldap, load_balance and ssl extra tests now. > On Mon, Sep 04, 2023 at 04:30:31PM -0400, Tom Lane wrote: > > Yeah, I could live with something like that from the security standpoint. > > Not sure if it helps Nazir's use-case though. Maybe we could invent > > categories that can be used in place of individual test names? > > For now, Yes, that is not ideal for my use-case but still better. On Tue, 5 Sept 2023 at 00:09, Noah Misch wrote: > > I could imagine categories for filesystem bytes and RAM bytes. Also, while > needs-private-lo has a bounded definition, "slow" doesn't. If today's one > "slow" test increases check-world duration by 1.1x, we may not let a > 100x-increase test use the same keyword. I agree. I didn't create a new category as 'slow' but still open to suggestions. I am not very familiar with perl syntax, I would like to hear your opinions on how the implementation of the check_extra_text_enabled() function could be done better. Regards, Nazir Bilal Yavuz Microsoft From 3a33161eef699e4fbcfeb2d62ec387621a331d78 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 5 Sep 2023 20:14:15 +0300 Subject: [PATCH v2] Create shorthand for including extra tests that treat the loopback interface as private This patch aims to make running full testsuite easier without having to keep up with new tests and updates. Create 'needs-private-lo' option for PG_TEST_EXTRA to enable extra test suites that treat the loopback interface as private. That is achieved by creating check_extra_tests_enabled() function in the Test/Utils.pm file. This function takes the test's name as an input and checks if PG_TEST_EXTRA contains this extra test or any option that enables this extra test. --- doc/src/sgml/regress.sgml | 10 + .../libpq/t/004_load_balance_dns.pl | 2 +- src/test/kerberos/t/001_auth.pl | 2 +- src/test/ldap/t/001_auth.pl | 2 +- src/test/ldap/t/002_bindpasswd.pl | 2 +- src/test/modules/Makefile | 2 +- .../t/001_mutated_bindpasswd.pl | 2 +- src/test/perl/PostgreSQL/Test/Utils.pm| 45 +++ src/test/recovery/t/027_stream_regress.pl | 3 +- src/test/ssl/t/001_ssltests.pl| 2 +- src/test/ssl/t/002_scram.pl | 2 +- src/test/ssl/t/003_sslinfo.pl | 2 +- 12 files changed, 65 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index 675db86e4d7..a9ceb0342d3 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -313,6 +313,16 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance' + + + needs_private_lo + + + Enables the extra tests that treat the loopback interface as a private. + Currently enables kerberos, ldap, load_balance and ssl extra tests. + + + Tests for features that are not supported by the current build diff --git a/src/interfaces/libpq/t/004_load_balance_dns.pl b/src/interfaces/libpq/t/004_load_balance_dns.pl index 875070e2120..62eeb21843e 100644 --- a/src/interfaces/libpq/t/004_load_balance_dns.pl +++ b/src/interfaces/libpq/t/004_load_balance_dns.pl @@ -6,7 +6,7 @@ use PostgreSQL::Test::Utils; use PostgreSQL::Test::Cluster; use Test::More; -if ($ENV{PG_TEST_EXTRA} !~ /\bload_balance\b/) +if (!PostgreSQL::Test::Utils::check_extra_text_enabled('load_balance')) { plan skip_all => 'Potentially unsafe test load_balance not enabled in PG_TEST_EXTRA'; diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl index 0deb9bffc8d..59574178afc 100644 --- a/src/test/kerberos/t/001_auth.pl +++ b/src/test/kerberos/t/001_auth.pl @@ -28,7 +28,7 @@ if ($ENV{with_gssapi} ne 'yes') { plan skip_all => 'GSSAPI/Kerberos not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('kerberos')) { plan skip_all => 'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl index 3e113fd6ebb..9db07e801e9 100644 --- a/src/test/ldap/t/001_auth.pl +++ b/src/test/ldap/t/001_auth.pl @@ -18,7 +18,7 @@ if ($ENV{with_ldap} ne 'yes') { plan skip_all => 'LDAP not supported by this build'; } -elsif ($ENV{PG_TEST_EXTRA} !~ /\bldap\b/) +elsif (!PostgreSQL::Test::Utils::check_extra_text_enabled('ldap')) { plan skip_all => 'Potentially unsafe test LDAP not enabled in PG_TEST_EXTRA'; diff --git a/src/test/ldap/t/002_bindpasswd.pl b/sr
Add resource intensiveness as a reason to not running tests by default
Hi, With f47ed79cc8, the test suite doesn't run 'wal_consistency_checking' as default because it is resource intensive; but regress docs doesn't state resource intensiveness as a reason for not running tests by default. So, I created a patch for updating the docs. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From 847a06a3a0546a14f07654ff79da4039c5c7cc34 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 7 Sep 2023 13:33:46 +0300 Subject: [PATCH v1] Update the reasons of tests not running by default on docs With f47ed79cc8, test suite doesn't run 'wal_consistency_checking' as default because it is resource intensive. However, regress documentation only states 'not secure to run on a multiuser system' or 'requiring special software' as a reason to not run the tests by default. Therefore, update the docs by adding 'resource intensive' as a reason to not run tests by default. --- doc/src/sgml/regress.sgml | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index de065c0564a..69f627d7f43 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -253,11 +253,11 @@ make check-world -j8 >/dev/null Some test suites are not run by default, either because they are not secure - to run on a multiuser system or because they require special software. You - can decide which test suites to run additionally by setting the - make or environment variable - PG_TEST_EXTRA to a whitespace-separated list, for - example: + to run on a multiuser system, because they require special software or + because they are resource intensive. You can decide which test suites to + run additionally by setting the make or environment + variable PG_TEST_EXTRA to a whitespace-separated list, + for example: make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance' -- 2.40.1
Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs
Hi, There is an old work item about building the docs if there are changes in the docs, otherwise don't build the docs. I wanted to make an addition to that idea; if the changes are only in the docs, don't run all tasks except building the docs task; this could help to save more CI times. I attached two patches. I assumed that the docs related changes are limited with the changes in the docs folder but I am not really sure about that. v1-0001-Only-built-the-docs-if-there-are-changes-are-in-t.patch: This patch creates another task named 'Building the Docs' and moves building the doc script from 'CompilerWarnings' task to this task. This building the docs task only runs if there are changes in the docs (under the doc/**) or in the CI files ('.cirrus.yml', '.cirrus.tasks.yml') and if a specific OS is not requested. v1-0002-Just-run-the-Build-the-Docs-task-if-the-changes-a.patch: This patch adds that: if the changes are *only* in the docs (under the doc/**), *only* run building the docs task. As a summary: 1- If the changes are not in the docs: Don't run build the docs task. 2- If the changes are in the docs or in the CI files : Run build the docs task. 3- If the changes are only in the docs: Only run build the docs task. 4- If 'ci-os-only:' set (There could be changes in the docs): Don't run build the docs task. Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From 43e18ed9eb328b4703ad3d8cb95d7d4b6140db89 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 7 Sep 2023 17:58:06 +0300 Subject: [PATCH v1 1/2] Only built the docs if there are changes are in the docs Building the docs triggered although there are no changes in the docs. So, the new 'Building the Docs' task is created. This task only run if a specific OS is not requested and if there are changes in docs or in the CI files. --- .cirrus.tasks.yml | 67 --- 1 file changed, 52 insertions(+), 15 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..21e276beb3b 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -740,21 +740,6 @@ task: make -s -j${BUILD_JOBS} clean time make -s -j${BUILD_JOBS} world-bin - ### - # Verify docs can be built - ### - # XXX: Only do this if there have been changes in doc/ since last build - always: -docs_build_script: | - time ./configure \ ---cache gcc.cache \ -CC="ccache gcc" \ -CXX="ccache g++" \ -CLANG="ccache clang" - make -s -j${BUILD_JOBS} clean - time make -s -j${BUILD_JOBS} -C doc - - ### # Verify headerscheck / cpluspluscheck succeed # # - Don't use ccache, the files are uncacheable, polluting ccache's @@ -777,3 +762,55 @@ task: always: upload_caches: ccache + + +task: + name: Build the Docs + depends_on: SanityCheck + # Only run if a specific OS is not requested and if there are changes in docs + # or in the CI files. + skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*' || !changesInclude('doc/**', '.cirrus.yml', '.cirrus.tasks.yml') + + env: +CPUS: 4 +BUILD_JOBS: 4 +IMAGE_FAMILY: pg-ci-bullseye +CCACHE_MAXSIZE: "10M" +CCACHE_DIR: "/tmp/ccache_dir" + +LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES +LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES + + <<: *linux_task_template + + sysinfo_script: | +id +uname -a +cat /proc/cmdline +ulimit -a -H && ulimit -a -S +gcc -v +clang -v +export + + ccache_cache: +folder: $CCACHE_DIR + + setup_additional_packages_script: | +#apt-get update +#DEBIAN_FRONTEND=noninteractive apt-get -y install ... + + setup_script: echo "COPT=-Werror" > src/Makefile.custom + + # Verify docs can be built + always: +docs_build_script: | + time ./configure \ +--cache gcc.cache \ +CC="ccache gcc" \ +CXX="ccache g++" \ +CLANG="ccache clang" + make -s -j${BUILD_JOBS} clean + time make -s -j${BUILD_JOBS} -C doc + + always: +upload_caches: ccache -- 2.40.1 From d82eb7349924f6a2a32b05435086a91a0c767796 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 7 Sep 2023 17:10:45 +0300 Subject: [PATCH v1 2/2] Just run the Build the Docs task if the changes are only in docs If the changes are only in the docs, skip all tasks except 'Build the Docs' task. --- .cirrus.tasks.yml | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index 21e276beb3b..4ccf75cefda 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -3,6 +3,11 @@ # For instructions on how to enable the CI integration in a repository and # further details, see src/to
Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs
Hi, Thanks for the reply! On Fri, 8 Sept 2023 at 11:05, Daniel Gustafsson wrote: > > > On 7 Sep 2023, at 18:06, Nazir Bilal Yavuz wrote: > > > if the changes are only in the docs, don't run > > all tasks except building the docs task; this could help to save more > > CI times. > > A related idea for docs in order to save CI time: if the changes are only in > internal docs, ie README files, then don't run any tasks at all. Looking at > src/backend/parser/README the last two commits only touched that file, and > while such patches might not be all that common, spending precious CI > resources > on them seems silly if we can avoid it. > > It doesn't have to be included in this, just wanted to bring it up as it's > related. I liked the idea, I am planning to edit the 0002 patch. CI won't run any tasks if the changes are only in the README files. > Almost, but not entirely. There are a set of scripts which generate content > for the docs based on files in src/, like src/backend/catalog/sql_features.txt > and src/include/parser/kwlist.h. If those source files change, or their > scripts, it would be helpful to build docs. Thanks! Are these the only files that are not in the doc subfolders but effect docs? Regards, Nazir Bilal Yavuz Microsoft
Re: BufferUsage counters' values have changed
Hi, On Mon, 11 Sept 2023 at 14:28, Karina Litskevich wrote: > > Hi hackers, > > I noticed that BufferUsage counters' values are strangely different for the > same queries on REL_15_STABLE and REL_16_STABLE. For example, I run > > CREATE EXTENSION pg_stat_statements; > CREATE TEMP TABLE test(b int); > INSERT INTO test(b) SELECT generate_series(1,1000); > SELECT query, local_blks_hit, local_blks_read, local_blks_written, >local_blks_dirtied, temp_blks_written FROM pg_stat_statements; > > and output on REL_15_STABLE contains > > query | INSERT INTO test(b) SELECT generate_series($1,$2) > local_blks_hit | 1005 > local_blks_read| 2 > local_blks_written | 5 > local_blks_dirtied | 5 > temp_blks_written | 0 > > while output on REL_16_STABLE contains > > query | INSERT INTO test(b) SELECT generate_series($1,$2) > local_blks_hit | 1006 > local_blks_read| 0 > local_blks_written | 0 > local_blks_dirtied | 5 > temp_blks_written | 8 > > > I found a bug that causes one of the differences. Wrong counter is incremented > in ExtendBufferedRelLocal(). The attached patch fixes it and should be applied > to REL_16_STABLE and master. With the patch applied output contains Nice finding! I agree, it should be changed. > query | INSERT INTO test(b) SELECT generate_series($1,$2) > local_blks_hit | 1006 > local_blks_read| 0 > local_blks_written | 8 > local_blks_dirtied | 5 > temp_blks_written | 0 > > > I still wonder why local_blks_written is greater than it was on REL_15_STABLE, > and why local_blks_read became zero. These changes are caused by fcdda1e4b5. > This code is new to me, and I'm still trying to understand whether it's a bug > in computing the counters or just changes in how many blocks are read/written > during the query execution. If anyone can help me, I would be grateful. I spent some time on it: local_blks_read became zero because: 1_ One more cache hit. It was supposed to be local_blks_read but it is local_blks_hit now. This is an assumption, I didn't check this deeply. 2_ Before fcdda1e4b5, there was one local_blks_read coming from buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL) in freespace.c -> ReadBuffer_common() -> pgBufferUsage.local_blks_read++. But buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL) is moved into the else case, so it didn't called and local_blks_read isn't incremented. local_blks_written is greater because of the combination of fcdda1e4b5 and 00d1e02be2. In PG_15: RelationGetBufferForTuple() -> ReadBufferBI(P_NEW, RBM_ZERO_AND_LOCK) -> ReadBufferExtended() -> ReadBuffer_common() -> pgBufferUsage.local_blks_written++; (called 5 times) [0] In PG_16: 1_ 5 of the local_blks_written is coming from: RelationGetBufferForTuple() -> RelationAddBlocks() -> ExtendBufferedRelBy() -> ExtendBufferedRelCommon() -> ExtendBufferedRelLocal() -> pgBufferUsage.local_blks_written += extend_by; (extend_by is 1, this is called 5 times) [1] 2_ 3 of the local_blks_written is coming from: RelationGetBufferForTuple() -> RecordAndGetPageWithFreeSpace() -> fsm_set_and_search() -> fsm_readbuf() -> fsm_extend() -> ExtendBufferedRelTo() -> ExtendBufferedRelCommon() -> ExtendBufferedRelLocal() -> pgBufferUsage.local_blks_written += extend_by; (extend_by is 3, this is called 1 time) [2] I think [0] is the same path as [1] but [2] is new. 'fsm extends' wasn't counted in local_blks_written in PG_15. Calling ExtendBufferedRelTo() from fsm_extend() caused 'fsm extends' to be counted in local_blks_written. I am not sure which one is correct. I hope these help. Regards, Nazir Bilal Yavuz Microsoft
pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}
Hi, I found that pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}. For example, when you run (track_io_timing should be on): CREATE EXTENSION pg_stat_statements; CREATE TEMP TABLE example_table (id serial PRIMARY KEY, data text); INSERT INTO example_table (data) SELECT 'Some data' FROM generate_series(1, 10); UPDATE example_table SET data = 'Updated data'; SELECT query, local_blks_read, local_blks_written, blk_read_time, blk_write_time FROM pg_stat_statements WHERE query like '%UPDATE%'; on master: query | UPDATE example_table SET data = $1 local_blks_read| 467 local_blks_written | 2087 blk_read_time | 0 blk_write_time | 0 There are two reasons of that: 1- When local_blks_{read|written} are incremented, pgstat_count_io_op_time() is called with IOOBJECT_TEMP_RELATION. But in pgstat_count_io_op_time(), pgBufferUsage.blk_{read|write}_time increments are called only when io_object is IOOBJECT_RELATION. The first patch attached fixes that. 2- in ExtendBufferedRelLocal() and in ExtendBufferedRelShared(), extend calls increment local_blks_written and shared_blks_written respectively. But these extends are not counted while calculating the blk_write_time. If there is no specific reason to not do that, I think these extends needs to be counted in blk_write_time. The second patch attached does that. Results after applying first patch: query | UPDATE example_table SET data = $1 local_blks_read| 467 local_blks_written | 2087 blk_read_time | 0.30085 blk_write_time | 1.475123 Results after applying both patches: query | UPDATE example_table SET data = $1 local_blks_read| 467 local_blks_written | 2087 blk_read_time | 0.329597 blk_write_time | 4.050305 Any kind of feedback would be appreciated. Regards, Nazir Bilal Yavuz Microsoft From 96f7b82e2ec5e8f68b509ea58131fba42deac7c0 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 15 Sep 2023 11:55:43 +0300 Subject: [PATCH v1 2/2] Include IOOp IOOP_EXTENDs while calculating block write times --- src/backend/utils/activity/pgstat_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index c8058b57962..56051fc6072 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, INSTR_TIME_SET_CURRENT(io_time); INSTR_TIME_SUBTRACT(io_time, start_time); - if (io_op == IOOP_WRITE) + if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND) { pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); if (io_object == IOOBJECT_RELATION -- 2.40.1 From de0f429280b24c42c951d86a468118ed09183a6e Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Fri, 15 Sep 2023 12:35:01 +0300 Subject: [PATCH v1 1/2] Increase pgBufferUsage.blk_{read|write}_time when IOObject is IOOBJECT_TEMP_RELATION --- src/backend/utils/activity/pgstat_io.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c index eb7d35d4225..c8058b57962 100644 --- a/src/backend/utils/activity/pgstat_io.c +++ b/src/backend/utils/activity/pgstat_io.c @@ -122,13 +122,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op, if (io_op == IOOP_WRITE) { pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time)); - if (io_object == IOOBJECT_RELATION) + if (io_object == IOOBJECT_RELATION + || io_object == IOOBJECT_TEMP_RELATION) INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time); } else if (io_op == IOOP_READ) { pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time)); - if (io_object == IOOBJECT_RELATION) + if (io_object == IOOBJECT_RELATION + || io_object == IOOBJECT_TEMP_RELATION) INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time); } -- 2.40.1