from:"Thomas"

Re: Patch: Range Merge Join

2021-11-10 Thread Thomas

Dear all,
thanks for the feedback!

We had a closer look at the previous patches and the CustomScan
infrastructure.

Compared to the previous patch, we do not (directly) focus on joins
with the overlap (&&) condition in this patch.  Instead we consider
joins with containment (@>) between a range and an element, and joins
with conditions over scalars of the form "right.element BETWEEN
left.start AND left.end", and more generally left.start >(=)
right.element AND right.element <(=) left.end. We call such conditions
range conditions and these conditions can be combined with equality
conditions in the Range Merge Join.

The Range Merge Join can use (optional) equality conditions and one
range condition of the form shown above. In this case the inputs are
sorted first by the attributes used for equality and then one input by
the range (or start in the case of scalars) and the other input by the
element. The Range Merge Join is then a simple extension of the Merge
Join that in addition to the (optional) equality attributes also uses
the range condition in the merge join states. This is similar to an
index-nested loop with scalars for cases when the relation containing
the element has an index on the equality attributes followed by the
element. The Range Merge Join uses sorting and thus does not require
the index for this purpose and performs better.

The patch uses the optimizer estimates to evaluate if the Range Merge
Join is beneficial as compared to other execution strategies, but when
no equality attributes are present, it becomes the only efficient
option for the above range conditions. If a join contains multiple
range conditions, then based on the estimates the most effective
strategy is chosen for the Range Merge Join.

Although we do not directly focus on joins with the overlap (&&)
condition between two ranges, we show in [1] that these joins can be
evaluated using the union (UNION ALL) of two joins with a range
condition, where intuitively, one tests that the start of one input
falls within the range of the other and vice versa. We evaluated this
using regular (B-tree) indices and compare it to joins with the
overlap (&&) condition using GiST, SP-GiST and others, and found that
it performs better. The Range Merge Join would improve this further
and would not require the creation of an index.

We did not consider an implementation as a CustomScan, as we feel the
join is rather general, can be implemented using a small extension of
the existing Merge Join, and would require a substantial duplication
of the Merge Join code.

Kind regards,
Thomas, Anton, Johann, Michael, Peter

[1] https://doi.org/10.1007/s00778-021-00692-3 (open access)

Am Di., 5. Okt. 2021 um 02:30 Uhr schrieb Jaime Casanova <
jcasa...@systemguards.com.ec>:

> > On Mon, Oct 04, 2021 at 04:27:54PM -0500, Jaime Casanova wrote:
> >> On Thu, Jun 10, 2021 at 07:14:32PM -0700, Jeff Davis wrote:
> >> >
> >> > I'll start with the reason I set the work down before: it did not work
> >> > well with multiple join keys. That might be fine, but I also started
> >> > thinking it was specialized enough that I wanted to look into doing it
> >> > as an extension using the CustomScan mechanism.
> >> >
> >> > Do you have any solution to working better with multiple join keys?
> And
> >> > do you have thoughts on whether it would be a good candidate for the
> >> > CustomScan extension mechanism, which would make it easier to
> >> > experiment with?
> >> >
> >>
> >> Hi,
> >>
> >> It seems this has been stalled since jun-2021. I intend mark this as
> >> RwF unless someone speaks in the next hour or so.
> >>
>
> Thomas  wrote me:
>
> > Hi,
> >
> > I registered this patch for the commitfest in july. It had not been
> reviewed and moved to the next CF. I still like to submit it.
> >
> > Regards,
> > Thomas
> >
>
> Just for clarification RwF doesn't imply reject of the patch.
> Nevertheless, given that there has been no real review I will mark this
> patch as "Waiting on Author" and move it to the next CF.
>
> Meanwhile, cfbot (aka http://commitfest.cputube.org) says this doesn't
> compile. Here is a little patch to fix the compilation errors, after
> that it passes all tests in make check-world.
>
> Also attached a rebased version of your patch with the fixes so we turn
> cfbot entry green again
>
> --
> Jaime Casanova
> Director de Servicios Profesionales
> SystemGuards - Consultores de PostgreSQL
>

postgres.patch
Description: Binary data

Re: Patch: Range Merge Join

2021-11-17 Thread Thomas

Thank you for the feedback and sorry for the oversight. I fixed the bug and
attached a new version of the patch.

Kind Regards, Thomas

Am Mi., 17. Nov. 2021 um 15:03 Uhr schrieb Daniel Gustafsson <
dan...@yesql.se>:

> This patch fails to compile due to an incorrect function name in an
> assertion:
>
>   nodeMergejoin.c:297:9: warning: implicit declaration of function
> 'list_legth' is invalid in C99 [-Wimplicit-function-declaration]
>   Assert(list_legth(node->rangeclause) < 3);
>  ^
>
> --
> Daniel Gustafsson   https://vmware.com/
>
>


postgres-rmj_11-17-21.patch
Description: Binary data

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2022-10-25 Thread thomas

On Tue, 25 Oct 2022 01:03:23 +0100, Jacob Champion
 said:
> I'd like to try to get this conversation started again. To pique
> interest I've attached a new version of 0001, which implements
> `sslrootcert=system` instead as suggested upthread. In 0002 I went
> further and switched the default sslmode to `verify-full` when using
> the system CA roots, because I feel pretty strongly that anyone
> interested in using public CA systems is also interested in verifying
> hostnames. (Otherwise, why make the switch?)

Yeah I agree that not forcing verify-full when using system CAs is a
giant foot-gun, and many will stop configuring just until it works.

Is there any argument for not checking hostname when using a CA pool
for which literally anyone can create a cert that passes?

It makes sense for self-signed, or "don't care", since that provides
at least protection against passive attacks, but if someone went out
of their way to get a third party signed cert, then it doesn't.

One downside to this approach is that now one option will change the
value of another option. For SSL mode (my rejected patch :-) ) that
makes maybe some more sense.

For users, what is more surprising: A foot-gun that sounds safe, or
one option that overrides another?

--
typedef struct me_s {
  char name[]  = { "Thomas Habets" };
  char email[] = { "tho...@habets.se" };
  char kernel[]= { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2022-11-01 Thread thomas

On Mon, 31 Oct 2022 23:36:16 +, Jacob Champion
 said:
>> I wanted to get feedback on the approach before wordsmithing too
>> much.
>
> I've added this to tomorrow's CF [1]. Thomas, if you get (or already
> have) a PG community username, I can add you as an author.

Sweet. I just created an account with username `habets`.

--
typedef struct me_s {
  char name[]  = { "Thomas Habets" };
  char email[] = { "tho...@habets.se" };
  char kernel[]= { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: Patch: Range Merge Join

2021-06-10 Thread Thomas

Thank you for the feedback.
I removed the redundant comments from the patch and added this thread to
the July CF [1].

Best Regards,
Thomas Mannhart

[1] https://commitfest.postgresql.org/33/3160/

Am Do., 10. Juni 2021 um 05:10 Uhr schrieb David Rowley <
dgrowle...@gmail.com>:

> On Thu, 10 Jun 2021 at 03:05, Thomas  wrote:
> > We have implemented the Range Merge Join algorithm by extending the
> > existing Merge Join to also support range conditions, i.e., BETWEEN-AND
> > or @> (containment for range types).
>
> It shouldn't be a blocker for you, but just so you're aware, there was
> a previous proposal for this in [1] and a patch in [2].  I've include
> Jeff here just so he's aware of this. Jeff may wish to state his
> intentions with his own patch. It's been a few years now.
>
> I only just glanced over the patch. I'd suggest getting rid of the /*
> Thomas */ comments.  We use git, so if you need an audit trail about
> changes then you'll find it in git blame.  If you have those for an
> internal audit trail then you should consider using git. No committer
> would commit those to PostgreSQL, so they might as well disappear.
>
> For further review, please add the patch to the July commitfest [3].
> We should be branching for pg15 sometime before the start of July.
> There will be more focus on new patches around that time. Further
> details in [4].
>
> Also, I see this if your first post to this list, so welcome, and
> thank you for the contribution.  Also, just to set expectations;
> patches like this almost always take a while to get into shape for
> PostgreSQL. Please expect a lot of requests to change things. That's
> fairly standard procedure.  The process often drags on for months and
> in some less common cases, years.
>
> David
>
> [1]
> https://www.postgresql.org/message-id/flat/6227.1334559170%40sss.pgh.pa.us#82c771950ba486dec911923a5e91
> [2]
> https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com
> [3] https://commitfest.postgresql.org/33/
> [4] https://wiki.postgresql.org/wiki/CommitFest
>


postgres-rmj.patch
Description: Binary data

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-06 Thread thomas

On Mon, 6 Sep 2021 20:47:37 +0100, Tom Lane  said:
> I'm confused by your description of this patch.  AFAIK, OpenSSL verifies
> against the system-wide CA pool by default.  Why do we need to do
> anything?

Experimentally, no it doesn't. Or if it does, then it doesn't verify
the CN/altnames of the cert.

sslmode=require allows self-signed and name mismatch.

verify-ca errors out if there is no ~/.postgresql/root.crt. verify-full too.

It seems that currently postgresql verifies the name if and only if
verify-full is used, and then only against ~/.postgresql/root.crt CA file.

But could be that I missed a config option?

--
typedef struct me_s {
  char name[]  = { "Thomas Habets" };
  char email[] = { "tho...@habets.se" };
  char kernel[]= { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-07 Thread thomas

On Tue, 7 Sep 2021 15:16:51 +0100, Andrew Dunstan  said:
> can't you specify a CA cert in the system's
> CA store if necessary? e.g. on my Fedora system I think it's
> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

I could, but that seems more like a workaround, where I have to change
things around as LetsEncrypt switches to another root (I believe they
have in the past, but I'm not sure), or the server decides to switch
from LetsEncrypt to something else. Then all clients need to update.

Such a decision could actually be made by whoever runs the webserver,
not the database, and the database just reuses the cert and gets a
free ride for cert renewals.

So in other words postgresql currently doesn't use the system database
at all, and the workaround is to find and copy from the system
database. I agree that is a workaround.

If you think this is enough of a corner case that the workaround is
acceptable, or the added complexity of another sslmode setting isn't
worth fixing this edge case, then I assume you have more knowledge
about postgres is used in the field than I do.

But it's not just about today. I would hope that now with LE that
every user of SSL starts using "real" certs. Postgres default settings
imply that most people who even enable SSL will not verify the CA nor
the name, which is a shame.

--
typedef struct me_s {
  char name[]  = { "Thomas Habets" };
  char email[] = { "tho...@habets.se" };
  char kernel[]= { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2021-09-28 Thread thomas

On Tue, 28 Sep 2021 02:09:11 +0100, Bruce Momjian  said:
> I don't think public CA's are not a good idea for complex setups since
> they open the ability for an external party to create certificates that
> are trusted by your server's CA, e.g., certificate authentication.

I'm not arguing for, and in fact would argue against, public CA for
client certs.

So that's a separate issue.

Note that mTLS prevents a MITM attack that exposes server data even if
server cert is compromised or re-issued, so if the install is using
client certs (with private CA) then the public CA for server matters
much less.

You can end up at the wrong server, yes, and provide data as INSERT,
but can't steal or corrupt existing data.

And you say for complex setups. Fair enough. But currently I'd say the
default is wrong, and what should be default is not configurable.

--
typedef struct me_s {
  char name[]  = { "Thomas Habets" };
  char email[] = { "tho...@habets.se" };
  char kernel[]= { "Linux" };
  char *pgpKey[]   = { "http://www.habets.pp.se/pubkey.txt"; };
  char pgp[] = { "9907 8698 8A24 F52F 1C2E  87F6 39A4 9EEA 460A 0169" };
  char coolcmd[]   = { "echo '. ./_&. ./_'>_;. ./_" };
} me_t;

Re: killing perl2host

2022-02-20 Thread Thomas Munro

On Sun, Feb 20, 2022 at 10:51 AM Andrew Dunstan  wrote:
> On 2/19/22 16:34, Andres Freund wrote:
> >> The first
> >> removes perl2host completely and adjusts  all the places that use it.
> >> The second removes almost all the remaining msys special processing in
> >> TAP tests.

Very nice improvement.  Thanks for sorting this out.  I didn't enjoy
playing "Wordle" with the build farm.

Re: Missed condition-variable wakeups on FreeBSD

2022-02-27 Thread Thomas Munro

On Sun, Feb 27, 2022 at 8:07 AM Tom Lane  wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,

Yeah, at least FreeBSD and macOS interrupt system calls when you send
SIGSTOP + SIGCONT directly, or when gdb and lldb do something similar
via ptrace.  The same applies on Linux, except that Linux restarts the
system call automatically (even though the man page has this in the
list of system calls that never restart) so you don't usually notice
(though you can see it with strace).  It's not really clear to me what
should happen given the language around restarts is all about signal
handlers, and stop + cont are system magic, not signal handlers.
Anyway...

> suggesting that we lost an interrupt somewhere.

So it's happening on an i386 kernel, with WAIT_USE_POLL (not
WAIT_USE_KQUEUE), but only under the build farm script (not when you
ran it manually in loop)  hmmm.

Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-27 Thread Thomas Munro

On Sun, Feb 27, 2022 at 10:29 PM Julien Rouhaud  wrote:
> On Sun, Feb 27, 2022 at 06:10:45PM +0900, Michael Paquier wrote:
> > On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote:
> > > I suspect the easiest is to just convert that usleep to a WaitLatch(). 
> > > That'd
> > > require adding a new enum value to WaitEventTimeout in 14. Which probably 
> > > is
> > > fine?
> >
> > We've added wait events in back-branches in the past, so this does not
> > strike me as a problem as long as you add the new entry at the end of
> > the enum, while keeping things ordered on HEAD.
>
> +1

+1

Sleeps like these are also really bad for ProcSignalBarrier, which I
was expecting to be the impetus for fixing any remaining loops like
this.

With the attached, 027_stream_regress.pl drops from ~29.5s to ~19.6s
on my FreeBSD workstation!

It seems a little strange to introduce a new wait event that will very
often appear into a stable branch, but ... it is actually telling the
truth, so there is that.

The sleep/poll loop in RegisterSyncRequest() may also have another
problem.  The comment explains that it was a deliberate choice not to
do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
think there's an excuse to ignore postmaster death in a loop that
presumably becomes infinite if the checkpointer exits.  I guess we
could do:

-   pg_usleep(1L);
+   WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
WAIT_EVENT_SYNC_REQUEST);

But... really, this should be waiting on a condition variable that the
checkpointer broadcasts on when the queue goes from full to not full,
no?  Perhaps for master only?
From 23abdf95dea74b914dc56a46739a63ff86035f51 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 28 Feb 2022 11:27:05 +1300
Subject: [PATCH] Wake up for latches in CheckpointWriteDelay().

The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily
when shared buffers is very small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml| 4 
 src/backend/postmaster/checkpointer.c   | 5 -
 src/backend/utils/activity/wait_event.c | 3 +++
 src/include/utils/wait_event.h  | 1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf7625d988..9dc3978f35 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2236,6 +2236,10 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   BaseBackupThrottle
   Waiting during base backup when throttling activity.

+ 
+  CheckpointerWriteDelay
+  Waiting between writes while performing a checkpoint.
+ 

   PgSleep
   Waiting due to a call to pg_sleep or
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4488e3a443..9d9aad166e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -726,7 +726,10 @@ CheckpointWriteDelay(int flags, double progress)
 		 * Checkpointer and bgwriter are no longer related so take the Big
 		 * Sleep.
 		 */
-		pg_usleep(10L);
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+  100,
+  WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+		ResetLatch(MyLatch);
 	}
 	else if (--absorb_counter <= 0)
 	{
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 60972c3a75..0706e922b5 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_BASE_BACKUP_THROTTLE:
 			event_name = "BaseBackupThrottle";
 			break;
+		case WAIT_EVENT_CHECKPOINT_WRITE_DELAY:
+			event_name = "CheckpointWriteDelay";
+			break;
 		case WAIT_EVENT_PG_SLEEP:
 			event_name = "PgSleep";
 			break;
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 395d325c5f..d0345c6b49 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -141,6

Re: Missed condition-variable wakeups on FreeBSD

2022-02-28 Thread Thomas Munro

On Sun, Feb 27, 2022 at 11:18 AM Melanie Plageman
 wrote:
> How could it be that worker 2 is waiting on the build barrier in
> PHJ_BUILD_HASHING_INNER and worker 1 and the leader are waiting on it
> with it supposedly in PHJ_BUILD_HASHING_OUTER?

That'd be consistent with a wakeup going missing, somehow.  W2 was
supposed to be released but somehow didn't get the message (until a
random debugger-induced EINTR releases it and it sees all the state it
needs to proceed in shared memory), meanwhile the other two got to the
end of the next phase and are waiting for it.

Re: Missed condition-variable wakeups on FreeBSD

2022-02-28 Thread Thomas Munro

On Sun, Feb 27, 2022 at 9:44 AM Andres Freund  wrote:
> > (gdb) p debug_query_string
> > $1 = 0x21873090 "select count(*) from simple r join simple s using (id);"
> > (gdb) bt
> > #0  _poll () at _poll.S:4
> > #1  0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at 
> > /usr/src/lib/libthr/thread/thr_syscalls.c:338
> > #2  0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at 
> > /usr/src/lib/libc/sys/poll.c:47
> > #3  0x0097b0fd in WaitEventSetWaitBlock (set=, 
> > cur_timeout=-1, occurred_events=, nevents=) 
> > at latch.c:1171
>
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?

As a basic sanity check I'd like to see pfd[0], pfd[1] and the output
of fstat -p PID to see unconsumed bytes in the pipe.  For example
"echo hello | sleep 60" shows "0* pipe ... 6 rw" for sleep, but "echo
hello | more" shows "0* pipe ... 0 rw".

Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-03-01 Thread Thomas Munro

On Mon, Feb 28, 2022 at 2:36 PM Andres Freund  wrote:
> On February 27, 2022 4:19:21 PM PST, Thomas Munro  
> wrote:
> >It seems a little strange to introduce a new wait event that will very
> >often appear into a stable branch, but ... it is actually telling the
> >truth, so there is that.
>
> In the back branches it needs to be at the end of the enum - I assume you 
> intended that just to be for HEAD.

Yeah.

> I wonder whether in HEAD we shouldn't make that sleep duration be computed 
> from the calculation in IsOnSchedule...

I might look into this.

> >The sleep/poll loop in RegisterSyncRequest() may also have another
> >problem.  The comment explains that it was a deliberate choice not to
> >do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
> >think there's an excuse to ignore postmaster death in a loop that
> >presumably becomes infinite if the checkpointer exits.  I guess we
> >could do:
> >
> >-   pg_usleep(1L);
> >+   WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
> >WAIT_EVENT_SYNC_REQUEST);
> >
> >But... really, this should be waiting on a condition variable that the
> >checkpointer broadcasts on when the queue goes from full to not full,
> >no?  Perhaps for master only?
>
> Looks worth improving, but yes, I'd not do it in the back branches.

0003 is a first attempt at that, for master only (on top of 0002 which
is the minimal fix).  This shaves another second off
027_stream_regress.pl on my workstation.  The main thing I realised is
that I needed to hold interrupts while waiting, which seems like it
should go away with 'tombstone' files as discussed in other threads.
That's not a new problem in this patch, it just looks more offensive
to the eye when you spell it out, instead of hiding it with an
unreported sleep/poll loop...

> I do think it's worth giving that sleep a proper wait event though, even in 
> the back branches.

I'm thinking that 0002 should be back-patched all the way, but 0001
could be limited to 14.
From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 28 Feb 2022 11:27:05 +1300
Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay().

The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily when shared
buffers is very small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml| 4 
 src/backend/postmaster/checkpointer.c   | 8 +++-
 src/backend/utils/activity/wait_event.c | 3 +++
 src/include/utils/wait_event.h  | 1 +
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9fb62fec8e..8620aaddc7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2235,6 +2235,10 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   BaseBackupThrottle
   Waiting during base backup when throttling activity.

+ 
+  CheckpointerWriteDelay
+  Waiting between writes while performing a checkpoint.
+ 

   PgSleep
   Waiting due to a call to pg_sleep or
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4488e3a443..a59c3cf020 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -484,6 +484,9 @@ CheckpointerMain(void)
 			}

 			ckpt_active = false;
+
+			/* We may have received an interrupt during the checkpoint. */
+			HandleCheckpointerInterrupts();
 		}

 		/* Check for archive_timeout and switch xlog files if necessary. */
@@ -726,7 +729,10 @@ CheckpointWriteDelay(int flags, double progress)
 		 * Checkpointer and bgwriter are no longer related so take the Big
 		 * Sleep.
 		 */
-		pg_usleep(10L);
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+  100,
+  WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+		ResetLatch(MyLatch);
 	}
 	else if (--absorb_counter <= 0)
 	{
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-04 Thread Thomas Munro

On Fri, Mar 4, 2022 at 10:04 PM Kyotaro Horiguchi
 wrote:
> And I made a quick hack on do_pg_start_backup.  And I found that
> pg_basebackup copies in-place tablespaces under the *current
> directory*, which is not ok at all:(

Hmm.  Which OS are you on?  Looks OK here -- the "in place" tablespace
gets copied as a directory under pg_tblspc, no symlink:

postgres=# set allow_in_place_tablespaces = on;
SET
postgres=# create tablespace ts1 location '';
CREATE TABLESPACE
postgres=# create table t (i int) tablespace ts1;
CREATE TABLE
postgres=# insert into t values (1), (2);
INSERT 0 2
postgres=# create user replication replication;
CREATE ROLE

$ pg_basebackup --user replication  -D pgdata2
$ ls -slaph pgdata/pg_tblspc/
total 4.0K
   0 drwx--  3 tmunro tmunro   19 Mar  4 23:16 ./
4.0K drwx-- 19 tmunro tmunro 4.0K Mar  4 23:16 ../
   0 drwx--  3 tmunro tmunro   29 Mar  4 23:16 16384/
$ ls -slaph pgdata2/pg_tblspc/
total 4.0K
   0 drwx--  3 tmunro tmunro   19 Mar  4 23:16 ./
4.0K drwx-- 19 tmunro tmunro 4.0K Mar  4 23:16 ../
   0 drwx--  3 tmunro tmunro   29 Mar  4 23:16 16384/
$ ls -slaph pgdata/pg_tblspc/16384/PG_15_202203031/5/
total 8.0K
   0 drwx-- 2 tmunro tmunro   19 Mar  4 23:16 ./
   0 drwx-- 3 tmunro tmunro   15 Mar  4 23:16 ../
8.0K -rw--- 1 tmunro tmunro 8.0K Mar  4 23:16 16385
$ ls -slaph pgdata2/pg_tblspc/16384/PG_15_202203031/5/
total 8.0K
   0 drwx-- 2 tmunro tmunro   19 Mar  4 23:16 ./
   0 drwx-- 3 tmunro tmunro   15 Mar  4 23:16 ../
8.0K -rw--- 1 tmunro tmunro 8.0K Mar  4 23:16 16385

The warning from readlink() while making the mapping file isn't ideal,
and perhaps we should suppress that with something like the attached.
Or does the missing map file entry break something on Windows?
From 6f001ec46c5e5f6ffa8e103f2b0d711e6904b763 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 4 Mar 2022 23:07:46 +1300
Subject: [PATCH] Fix warning in basebackup of in-place tablespaces.

While building the tablespace map for a base backup, don't call
readlink() on directories under pg_tblspc.
---
 src/backend/access/transam/xlog.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..b92cc66afb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -66,6 +66,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -8292,6 +8293,10 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 
 			snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
 
+			/* Skip in-place tablespaces (testing use only) */
+			if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR)
+continue;
+
 #if defined(HAVE_READLINK) || defined(WIN32)
 			rllen = readlink(fullpath, linkpath, sizeof(linkpath));
 			if (rllen < 0)
-- 
2.30.2

Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b

2022-03-04 Thread Thomas Munro

On Fri, Mar 4, 2022 at 10:37 PM Bharath Rupireddy
 wrote:
> It looks like regression tests are failing on Windows Server 2019 [1].
> Ignore if it's reported elsewhere.

It's broken since 46ab07ff "Clean up assorted failures under clang's
-fsanitize=undefined checks.":

https://github.com/postgres/postgres/commits/master

I don't see what that patch has to do with the symptoms.  It looks a
bit like the cluster started by the "startcreate" step ("pg_ctl.exe
start ...") is mysteriously no longer running when we get to the
"test_pl" step ("Connection refused").

Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b

2022-03-04 Thread Thomas Munro

On Sat, Mar 5, 2022 at 7:10 AM Andres Freund  wrote:
> Or perhaps pg_ctl ought to pass CREATE_NEW_PROCESS_GROUP to CreateProcess()?
> The lack of a process group would explain why we're getting signalled on
> ctrl-c...

I thought that sounded promising, given that the recent Cirrus agent
commit you pointed to says "Always kill spawned shell's process group
to avoid pipe FD hangs", and given that we do something conceptually
similar on Unix.  It doesn't seem to help, though...

https://cirrus-ci.com/task/5572163880091648

Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b

2022-03-04 Thread Thomas Munro

On Sat, Mar 5, 2022 at 10:56 AM Andres Freund  wrote:
> I suspect one also needs the console detach thing.

I tried adding DETACHED_PROCESS (which should be like calling
FreeConsole() in child process) and then I tried CREATE_NEW_CONSOLE
instead, on top of CREATE_NEW_PROCESS_GROUP.  Neither helped, though I
lost the postmaster's output.

Things I tried and their output are here:
https://github.com/macdice/postgres/commits/hack3

Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b

2022-03-04 Thread Thomas Munro

On Sat, Mar 5, 2022 at 4:19 PM Andres Freund  wrote:
> https://github.com/cirruslabs/cirrus-ci-agent/issues/218#issuecomment-1059657781

Oof.  Nice detective work.

> As indicated in the message above it, there's a workaround. Not sure if worth
> committing, if they were to fix it in a few days?  cfbot could be repaired by
> just adding a repo environment variable of CIRRUS_AGENT_VERSION 1.73.2...
> OTOH, committing and reverting a single line + comment is cheap.

I vote for committing that workaround into the tree temporarily,
because it's not just cfbot, it's also everyone's dev branches on
Github + the official mirror that are red.

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Thomas Munro

On Tue, Mar 8, 2022 at 12:58 AM Michael Paquier  wrote:
> On Fri, Mar 04, 2022 at 11:26:43PM +1300, Thomas Munro wrote:
> > + /* Skip in-place tablespaces (testing use only) */
> > + if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR)
> > + continue;
>
> I saw the warning when testing base backups with in-place tablespaces
> and it did not annoy me much, but, yes, that can be confusing.
>
> Junction points are directories, no?  Are you sure that this works
> correctly on WIN32?  It seems to me that we'd better use readlink()
> only for entries in pg_tlbspc/ that are PGFILETYPE_LNK on non-WIN32
> and pgwin32_is_junction() on WIN32.

Thanks, you're right.  Test on a Win10 VM.  Here's a new version.
From e231fd45e153279bfb2dd3441b2a34797c446289 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 8 Mar 2022 10:25:59 +1300
Subject: [PATCH] Suppress pg_basebackup warning for in-place tablespaces.

Reported-by: Kyotaro Horiguchi 
Reviewed-by: Michael Paquier 
Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..ab8be77bd2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -66,6 +66,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -8292,6 +8293,20 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 
 			snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
 
+			/*
+			 * Skip anything that isn't a symlink/junction.  For testing only,
+			 * we sometimes use allow_in_place_tablespaces to create
+			 * directories directly under pg_tblspc, which would produce
+			 * spurious warnings below.
+			 */
+#ifdef WIN32
+			if (!pgwin32_is_junction(fullpath))
+continue;
+#else
+			if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
+continue;
+#endif
+
 #if defined(HAVE_READLINK) || defined(WIN32)
 			rllen = readlink(fullpath, linkpath, sizeof(linkpath));
 			if (rllen < 0)
-- 
2.35.1

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Thomas Munro

On Tue, Mar 8, 2022 at 10:39 AM Thomas Munro  wrote:
> Test on a Win10 VM.

Erm, "Tested" (as in, I tested), I meant to write...

Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro

On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> On Wed, Mar 09, 2022 at 10:12:54AM -0800, Andres Freund wrote:
> > On 2022-03-09 11:47:23 -0600, Justin Pryzby wrote:
> > > I'm curious what you think of this patch.
> > >
> > > It makes check-world on freebsd over 30% faster - saving 5min.
> >
> > That's nice! While -Og makes interactive debugging noticeably harder IME, 
> > it's
> > not likely to be a large enough difference just for backtraces etc.
>
> Yeah.  gcc(1) claims that -Og can improve debugging.

Wow, I see the effect on Cirrus -- test_world ran in 8:55 instead of
12:43 when I tried (terrible absolute times, but fantastic
improvement!).  Hmm, on my local FreeBSD 13 box I saw 5:07 -> 5:03
with this change.  My working theory had been that there is something
bad happening in the I/O stack under concurrency making FreeBSD on
Cirrus/GCP very slow (ie patterns to stall on slow cloud I/O waits,
something I hope to dig into when post-freeze round tuits present
themselves), but that doesn't gel with this huge improvement from
noodling with optimiser details, and I don't know why I don't see
something similar locally.  I'm confused.

Just BTW it's kinda funny that we say -ggdb for macOS and then we use
lldb to debug cores in cores_backtrace.sh.  I suppose it would be more
correct to say -glldb, but doubt it matters much...

Re: Adding CI to our tree

2022-03-09 Thread Thomas Munro

On Thu, Mar 10, 2022 at 4:33 PM Andres Freund  wrote:
> On 2022-03-10 15:43:16 +1300, Thomas Munro wrote:
> > I'm confused.
>
> The "terrible IO wait" thing was before we reduced the number of CPUs and
> concurrent jobs. It makes sense to me that with just two CPUs we're CPU bound,
> in which case -Og obviously can make a difference.

Oh, duh, yeah, that makes sense when you put it that way and
considering the CPU graph:

-O0: https://cirrus-ci.com/task/4578631912521728
-Og: https://cirrus-ci.com/task/5239486182326272

Re: WIP: WAL prefetch (another approach)

2022-03-10 Thread Thomas Munro

On Fri, Mar 11, 2022 at 6:31 PM Thomas Munro  wrote:
> Thanks for your review of 0001!  It gave me a few things to think
> about and some good improvements.

And just in case it's useful, here's what changed between v21 and v22..
diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index 86a7b4c5c8..0d0c556b7c 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -90,8 +90,8 @@ XLogReaderSetDecodeBuffer(XLogReaderState *state, void 
*buffer, size_t size)
 
state->decode_buffer = buffer;
state->decode_buffer_size = size;
-   state->decode_buffer_head = buffer;
state->decode_buffer_tail = buffer;
+   state->decode_buffer_head = buffer;
 }
 
 /*
@@ -271,7 +271,7 @@ XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr)
 
 /*
  * See if we can release the last record that was returned by
- * XLogNextRecord(), to free up space.
+ * XLogNextRecord(), if any, to free up space.
  */
 void
 XLogReleasePreviousRecord(XLogReaderState *state)
@@ -283,16 +283,16 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 
/*
 * Remove it from the decoded record queue.  It must be the oldest item
-* decoded, decode_queue_tail.
+* decoded, decode_queue_head.
 */
record = state->record;
-   Assert(record == state->decode_queue_tail);
+   Assert(record == state->decode_queue_head);
state->record = NULL;
-   state->decode_queue_tail = record->next;
+   state->decode_queue_head = record->next;
 
-   /* It might also be the newest item decoded, decode_queue_head. */
-   if (state->decode_queue_head == record)
-   state->decode_queue_head = NULL;
+   /* It might also be the newest item decoded, decode_queue_tail. */
+   if (state->decode_queue_tail == record)
+   state->decode_queue_tail = NULL;
 
/* Release the space. */
if (unlikely(record->oversized))
@@ -302,11 +302,11 @@ XLogReleasePreviousRecord(XLogReaderState *state)
}
else
{
-   /* It must be the tail record in the decode buffer. */
-   Assert(state->decode_buffer_tail == (char *) record);
+   /* It must be the head (oldest) record in the decode buffer. */
+   Assert(state->decode_buffer_head == (char *) record);
 
/*
-* We need to update tail to point to the next record that is 
in the
+* We need to update head to point to the next record that is 
in the
 * decode buffer, if any, being careful to skip oversized ones
 * (they're not in the decode buffer).
 */
@@ -316,8 +316,8 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 
if (record)
{
-   /* Adjust tail to release space up to the next record. 
*/
-   state->decode_buffer_tail = (char *) record;
+   /* Adjust head to release space up to the next record. 
*/
+   state->decode_buffer_head = (char *) record;
}
else
{
@@ -327,8 +327,8 @@ XLogReleasePreviousRecord(XLogReaderState *state)
 * we'll keep overwriting the same piece of memory if 
we're not
 * doing any prefetching.
 */
-   state->decode_buffer_tail = state->decode_buffer;
state->decode_buffer_head = state->decode_buffer;
+   state->decode_buffer_tail = state->decode_buffer;
}
}
 }
@@ -351,7 +351,7 @@ XLogNextRecord(XLogReaderState *state, char **errormsg)
/* Release the last record returned by XLogNextRecord(). */
XLogReleasePreviousRecord(state);
 
-   if (state->decode_queue_tail == NULL)
+   if (state->decode_queue_head == NULL)
{
*errormsg = NULL;
if (state->errormsg_deferred)
@@ -376,7 +376,7 @@ XLogNextRecord(XLogReaderState *state, char **errormsg)
 * XLogRecXXX(xlogreader) macros, which work with the decoder rather 
than
 * the record for historical reasons.
 */
-   state->record = state->decode_queue_tail;
+   state->record = state->decode_queue_head;
 
/*
 * Update the pointers to the beginning and one-past-the-end of this
@@ -428,12 +428,12 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
if (!XLogReaderHasQueuedRecordOrError(state))
XLogReadAhead(state, false /* nonblocking */ );
 
-   /* Consume the tail record or error. */
+   /* Consume the head record or error. */
decoded = XLogNextRecord(state, errormsg);

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-14 Thread Thomas Munro

On Tue, Mar 8, 2022 at 4:01 PM Kyotaro Horiguchi
 wrote:
> At Tue, 8 Mar 2022 10:28:46 +0900, Michael Paquier  
> wrote in
> > On Tue, Mar 08, 2022 at 10:06:50AM +0900, Kyotaro Horiguchi wrote:
> > > At Tue, 8 Mar 2022 10:39:06 +1300, Thomas Munro  
> > > wrote in
> > >> Thanks, you're right.  Test on a Win10 VM.  Here's a new version.
> >
> > Looks fine to me.
> >
> > > FYI, on Windows11, pg_basebackup didn't work correctly without the
> > > patch.  So this looks like fixing an undiscovered bug as well.
> >
> > Well, that's not really a long-time bug but just a side effect of
> > in-place tablespaces because we don't use them in many test cases
> > yet, is it?
>
> No, we don't. So just FYI.

Ok, I pushed the fix for pg_basebackup.

As for the complaint about pg_tablespace_location() failing, would it
be better to return an empty string?  That's what was passed in as
LOCATION.  Something like the attached.
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 4568749d23..1cd70a8eaa 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -306,6 +307,24 @@ pg_tablespace_location(PG_FUNCTION_ARGS)
 	 */
 	snprintf(sourcepath, sizeof(sourcepath), "pg_tblspc/%u", tablespaceOid);
 
+	/* Return empty string for in-place tablespaces. */
+#ifdef WIN32
+	if (!pgwin32_is_junction(sourcepath))
+		PG_RETURN_TEXT_P(cstring_to_text(""));
+#else
+	{
+		struct stat	stat_buf;
+
+		if (lstat(sourcepath, &stat_buf) < 0)
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not stat file \"%s\": %m",
+			sourcepath)));
+		if (!S_ISLNK(stat_buf.st_mode))
+			PG_RETURN_TEXT_P(cstring_to_text(""));
+	}
+#endif
+
 	rllen = readlink(sourcepath, targetpath, sizeof(targetpath));
 	if (rllen < 0)
 		ereport(ERROR,

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-14 Thread Thomas Munro

On Tue, Mar 15, 2022 at 2:33 PM Thomas Munro  wrote:
> As for the complaint about pg_tablespace_location() failing, would it
> be better to return an empty string?  That's what was passed in as
> LOCATION.  Something like the attached.

(Hrrmm, the contract for pgwin32_is_junction() is a little weird:
false means "success, but no" and also "failure, you should check
errno".  But we never do.)

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-14 Thread Thomas Munro

On Tue, Mar 15, 2022 at 2:50 PM Michael Paquier  wrote:
> On Tue, Mar 15, 2022 at 02:33:17PM +1300, Thomas Munro wrote:
> > As for the complaint about pg_tablespace_location() failing, would it
> > be better to return an empty string?  That's what was passed in as
> > LOCATION.  Something like the attached.
>
> Hmm, I don't think so.  The point of the function is to be able to
> know the location of a tablespace at SQL level so as we don't have any
> need to hardcode its location within any external tests (be it a
> pg_regress test or a TAP test) based on how in-place tablespace paths
> are built in the backend, so I think that we'd better report either a
> relative path from data_directory or an absolute path, but not an
> empty string.
>
> In any case, I'd suggest to add a regression test.  What I have sent
> upthread would be portable enough.

Fair enough.  No objections here.

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-15 Thread Thomas Munro

On Tue, Mar 15, 2022 at 10:30 PM Michael Paquier  wrote:
> So, which one of a relative path or an absolute path do you think
> would be better for the user?  My preference tends toward the relative
> path, as we know that all those tablespaces stay in pg_tblspc/ so one
> can make the difference with normal tablespaces more easily.  The
> barrier is thin, though :p

Sounds good to me.

Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-03-15 Thread Thomas Munro

On Wed, Mar 2, 2022 at 10:58 AM Andres Freund  wrote:
> On 2022-03-02 06:46:23 +1300, Thomas Munro wrote:
> > From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001
> > From: Thomas Munro 
> > Date: Mon, 28 Feb 2022 11:27:05 +1300
> > Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay().
>
> LGTM. Would be nice to have this fixed soon, even if it's just to reduce test
> times :)

Thanks for the review.  Pushed to master and 14, with the wait event
moved to the end of the enum for the back-patch.

> > From 1eb0266fed7ccb63a2430e4fbbaef2300f2aa0d0 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro 
> > Date: Tue, 1 Mar 2022 11:38:27 +1300
> > Subject: [PATCH v2 2/3] Fix waiting in RegisterSyncRequest().
>
> LGTM.

Pushed as far back as 12.  It could be done for 10 & 11, but indeed
the code starts getting quite different back there, and since there
are no field reports, I think that's OK for now.

A simple repro, for the record: run installcheck with
shared_buffers=256kB, and then partway through, kill -STOP
$checkpointer to simulate being stalled on IO for a while.  Backends
will soon start waiting for the checkpointer to drain the queue while
dropping relations.  This state was invisible to pg_stat_activity, and
hangs forever if you kill the postmaster and CONT the checkpointer.

> > From 50060e5a0ed66762680ddee9e30acbad905c6e98 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro 
> > Date: Tue, 1 Mar 2022 17:34:43 +1300
> > Subject: [PATCH v2 3/3] Use condition variable to wait when sync request 
> > queue
> >  is full.

> [review]

I'll come back to 0003 (condition variable-based improvement) a bit later.

Re: USE_BARRIER_SMGRRELEASE on Linux?

2022-03-15 Thread Thomas Munro

On Sat, Feb 19, 2022 at 7:01 AM Nathan Bossart  wrote:
> Here is a new patch.  This is essentially the same as v1, but I've spruced
> up the comments and the commit message.

Hi Nathan,

Pushed and back-patched (it's slightly different before 12).  Thanks!

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-16 Thread Thomas Munro

On Thu, Mar 17, 2022 at 3:53 PM Michael Paquier  wrote:
>
> On Wed, Mar 16, 2022 at 05:15:58PM +0900, Kyotaro Horiguchi wrote:
> > I'm not sure that the "of the symbolic link in pg_tblspc/" is
> > needed. And allow_in_place_tablespaces alone doesn't create in-place
> > tablespace. So this might need rethink at least for the second point.
>
> Surely this can be improved.  I was not satisfied with this paragraph
> after re-reading it this morning, so I have just removed it, rewording
> slightly the part for in-place tablespaces that is still necessary.

+   
+A relative path to the data directory is returned for tablespaces
+created when  is
+enabled.
+   
+   

I think what Horiguchi-san was pointing out above is that you need to
enable the GUC *and* say LOCATION '', which the new paragraph doesn't
capture.  What do you think about this:

A path relative to the data directory is returned for in-place
tablespaces (see ).

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-16 Thread Thomas Munro

On Thu, Mar 17, 2022 at 7:18 PM Michael Paquier  wrote:
> On Thu, Mar 17, 2022 at 04:34:30PM +1300, Thomas Munro wrote:
> > I think what Horiguchi-san was pointing out above is that you need to
> > enable the GUC *and* say LOCATION '', which the new paragraph doesn't
> > capture.  What do you think about this:
> >
> > A path relative to the data directory is returned for in-place
> > tablespaces (see ).
>
> An issue I have with this wording is that we give nowhere in the docs
> an explanation of about the term "in-place tablespace", even if it can
> be guessed from the name of the GUC.

Maybe we don't need this paragraph at all.  Who is it aimed at?

> Another idea would be something like that:
> "A relative path to the data directory is returned for tablespaces
> created with an empty location string specified in the CREATE
> TABLESPACE query when allow_in_place_tablespaces is enabled (see link
> blah)."
>
> But perhaps that's just me being overly pedantic :p

I don't really want to spill details of this developer-only stuff onto
more manual sections...  It's not really helping users if we confuse
them with irrelevant details of a feature they shouldn't be using, is
it?  And the existing treatment "Returns the file system path that
this tablespace is located in" is not invalidated by this special
case, so maybe we shouldn't mention it?

Re: WIP: WAL prefetch (another approach)

2022-03-17 Thread Thomas Munro

On Mon, Mar 14, 2022 at 8:17 PM Julien Rouhaud  wrote:
> Great!  I'm happy with 0001 and I think it's good to go!

I'll push 0001 today to let the build farm chew on it for a few days
before moving to 0002.

Re: Declare PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY for aarch64

2022-03-17 Thread Thomas Munro

On Thu, Mar 17, 2022 at 1:32 AM Yura Sokolov  wrote:
> So I believe it is safe to define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
> for aarch64

Agreed, and pushed.  There was another thread that stalled, so I added
a reference and a reviewer from that to your commit message.

This should probably also be set for RISCV64.

Re: Probable CF bot degradation

2022-03-18 Thread Thomas Munro

On Sat, Mar 19, 2022 at 5:07 AM Julien Rouhaud  wrote:
> On Fri, Mar 18, 2022 at 07:43:47PM +0400, Pavel Borisov wrote:
> > Hi, hackers!
> > I've noticed that CF bot hasn't been running active branches from yesterday:
> > https://github.com/postgresql-cfbot/postgresql/branches/active
> >
> > Also, there is no new results on the current CF page on cputube.
> > I don't know if it is a problem or kind of scheduled maintenance though.
>
> There was a github incident yesterday, that was resolved a few hours ago 
> ([1]),
> maybe the cfbot didn't like that.

Yeah, for a while it was seeing:

remote: Internal Server Error
To github.com:postgresql-cfbot/postgresql.git
 ! [remote rejected]   commitfest/37/3489 -> commitfest/37/3489
(Internal Server Error)
error: failed to push some refs to 'github.com:postgresql-cfbot/postgresql.git'

Unfortunately cfbot didn't handle that failure very well and it was
waiting for a long timeout before scheduling more jobs.  It's going
again now, and I'll try to make it more resilient against that type of
failure...

Re: [PATCH] Add native windows on arm64 support

2022-03-18 Thread Thomas Munro

On Wed, Feb 23, 2022 at 11:09 PM Niyas Sait  wrote:
> I have created a patch that adds support for building Postgres for the 
> windows/arm64 platform using the MSVC toolchain. Following changes have been 
> included
>
> 1. Extend MSVC scripts to handle ARM64 platform.
> 2. Add arm64 definition of spin_delay function.
> 3. Exclude arm_acle.h import with MSVC compiler.
>
> Compilation steps are consistent and similar to other windows platforms.
>
> The change has been tested on windows/x86_64 and windows/arm64 and all 
> regression tests passes on both platforms.

+# arm64 linker only supports dynamic base address
+my $cfgrandbaseaddress = $self->{platform} eq 'ARM64' ? 'true' : 'false';
...
+  $cfgrandbaseaddress

Does that mean that you can't turn off ASLR on this arch?  Does it
cause random backend failures due to inability to map shared memory at
the expected address?  Our experience with EXEC_BACKEND on various
Unix systems tells us that you have to turn off ASLR if you want 100%
success rate on starting backends, but I guess it depends on the
details of how the randomisation is done.

Any interest in providing a build farm animal, so that we could know
that this works?

Re: Probable CF bot degradation

2022-03-18 Thread Thomas Munro

On Sat, Mar 19, 2022 at 9:41 AM Pavel Borisov  wrote:
>>
>> remote: Internal Server Error
>> To github.com:postgresql-cfbot/postgresql.git
>>  ! [remote rejected]   commitfest/37/3489 -> commitfest/37/3489
>> (Internal Server Error)
>> error: failed to push some refs to 
>> 'github.com:postgresql-cfbot/postgresql.git'
>
> I am seeing commitfest/37/3489 in "triggered" state for a long time. No 
> progress is seen on this branch, though I started to see successful runs on 
> the other branches now.
> Could you see this particular branch and maybe restart it manually?

I don't seem to have a way to delete that...  it looks like when
github told us "Internal Server Error", it had partially succeeded and
the new branch (partially?) existed, but something was b0rked and it
confused Cirrus.  🤷  There is already another build for 3489 that is
almost finished now so I don't think that stale TRIGGERED one is
stopping anything from working and I guess it will eventually go away
by itself somehow...

Re: A test for replay of regression tests

2022-03-19 Thread Thomas Munro

Another failure under 027_stream_regress.pl:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2022-03-16%2005%3A58%3A05

 vacuum   ... FAILED 3463 ms

I'll try to come up with the perl needed to see the regression.diffs
next time...

Re: WIP: WAL prefetch (another approach)

2022-03-19 Thread Thomas Munro

On Fri, Mar 18, 2022 at 9:59 AM Thomas Munro  wrote:
> I'll push 0001 today to let the build farm chew on it for a few days
> before moving to 0002.

Clearly 018_wal_optimize.pl is flapping and causing recoveryCheck to
fail occasionally, but that predates the above commit.  I didn't
follow the existing discussion on that, so I'll try to look into that
tomorrow.

Here's a rebase of the 0002 patch, now called 0001
From 3ac04122e635b98c50d6e48677fe74535d631388 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 20 Mar 2022 16:56:12 +1300
Subject: [PATCH v24] Prefetch referenced data in recovery, take II.

Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Since not all OSes have that system call, "try" is provided so that
it can be enabled on operating systems where it is available, and that
is used in 027_stream_regress.pl so that we effectively exercise on and
off behaviors in the build farm.  Better mechanisms will follow in later
work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size limits the maximum distance we are
prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Tomas Vondra 
Reviewed-by: Alvaro Herrera  (earlier version)
Reviewed-by: Andres Freund  (earlier version)
Reviewed-by: Justin Pryzby  (earlier version)
Tested-by: Tomas Vondra  (earlier version)
Tested-by: Jakub Wartak  (earlier version)
Tested-by: Dmitry Dolgov <9erthali...@gmail.com> (earlier version)
Tested-by: Sait Talha Nisanci  (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |  64 ++
 doc/src/sgml/monitoring.sgml  |  77 +-
 doc/src/sgml/wal.sgml |  12 +
 src/backend/access/transam/Makefile   |   1 +
 src/backend/access/transam/xlog.c |   2 +
 src/backend/access/transam/xlogprefetcher.c   | 968 ++
 src/backend/access/transam/xlogreader.c   |  13 +
 src/backend/access/transam/xlogrecovery.c | 160 ++-
 src/backend/access/transam/xlogutils.c|  27 +-
 src/backend/catalog/system_views.sql  |  13 +
 src/backend/storage/freespace/freespace.c |   3 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/utils/misc/guc.c  |  53 +-
 src/backend/utils/misc/postgresql.conf.sample |   5 +
 src/include/access/xlog.h |   1 +
 src/include/access/xlogprefetcher.h   |  51 +
 src/include/access/xlogreader.h   |   8 +
 src/include/access/xlogutils.h|   3 +-
 src/include/catalog/pg_proc.dat   |   8 +
 src/include/utils/guc.h   |   4 +
 src/test/recovery/t/027_stream_regress.pl |   3 +
 src/test/regress/expected/rules.out   |  10 +
 src/tools/pgindent/typedefs.list  |   7 +
 23 files changed, 1434 insertions(+), 62 deletions(-)
 create mode 100644 src/backend/access/transam/xlogprefetcher.c
 create mode 100644 src/include/access/xlogprefetcher.h

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7a48973b3c..ce84f379a8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3644,6 +3644,70 @@ include_dir 'conf.d'

+   
+
+Recovery
+
+ 
+  configuration
+  of recovery
+  general settings
+ 
+
+
+ This section describes the settings that apply to recovery in general,
+ affecting crash recovery, streaming replication and archive-based
+ replication.
+
+
+
+
+ 
+  recovery_prefetch (enum)
+  
+   recovery_prefetch configuration parameter
+  
+  
+  
+   
+Whether to try to prefetch blocks that are referenced in the WAL that
+are not yet in the buffer pool, during recovery.  Valid values are
+off (the default), on and
+try.  The setting try enables
+prefetching only if the operating system provides the
+posix_fadvise function, which is currently used
+to implement prefetching.  Note that some operating systems provide the
+function, but don't actually perform any prefetching.
+   
+   
+Prefetching blocks that will soon be needed can reduce I/O wait times
+during recovery with some workloads.
+See also the  and
+ settings, which limit
+prefetching activity.
+   
+  
+ 
+
+ 
+  wal_decode_buffer_size (integer)
+

Re: WIP: WAL prefetch (another approach)

2022-03-20 Thread Thomas Munro

On Sun, Mar 20, 2022 at 5:36 PM Thomas Munro  wrote:
> Clearly 018_wal_optimize.pl is flapping

Correction, 019_replslot_limit.pl, discussed at
https://www.postgresql.org/message-id/flat/83b46e5f-2a52-86aa-fa6c-8174908174b8%40iki.fi
.

Re: A test for replay of regression tests

2022-03-20 Thread Thomas Munro

On Sun, Mar 20, 2022 at 5:20 PM Thomas Munro  wrote:
> Another failure under 027_stream_regress.pl:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2022-03-16%2005%3A58%3A05
>
>  vacuum   ... FAILED 3463 ms
>
> I'll try to come up with the perl needed to see the regression.diffs
> next time...

Here's my proposed change to achieve that.

Here's an example of where it shows up if it fails (from my
deliberately sabotaged CI run
https://cirrus-ci.com/build/6730380228165632 where I was verifying
that it also works on Windows):

Unix: 
https://api.cirrus-ci.com/v1/artifact/task/5421419923243008/log/src/test/recovery/tmp_check/log/regress_log_027_stream_regress
Windows: 
https://api.cirrus-ci.com/v1/artifact/task/4717732481466368/log/src/test/recovery/tmp_check/log/regress_log_027_stream_regress
From 5f6257c4b4a44750e85d77b01c20d6cd39c5c795 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 20 Mar 2022 21:20:42 +1300
Subject: [PATCH] Log regression.diffs in 027_stream_regress.pl.

To help diagnose the reasons for a regression test failure inside this
TAP test, dump the contents of regression.diffs to the log.  While the
CI scripts show it automatically, the build farm client does not.

Discussion: https://postgr.es/m/CA%2BhUKGK-q9utvxorA8sCwF3S4SzBmxxDgneP4rLqeHWdZxM4Gg%40mail.gmail.com
---
 src/test/recovery/t/027_stream_regress.pl | 30 ---
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index c40951b7ba..aa972f8958 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -53,15 +53,27 @@ my $outputdir = $PostgreSQL::Test::Utils::tmp_check;
 
 # Run the regression tests against the primary.
 my $extra_opts = $ENV{EXTRA_REGRESS_OPTS} || "";
-system_or_bail($ENV{PG_REGRESS} . " $extra_opts " .
-			   "--dlpath=\"$dlpath\" " .
-			   "--bindir= " .
-			   "--host=" . $node_primary->host . " " .
-			   "--port=" . $node_primary->port . " " .
-			   "--schedule=../regress/parallel_schedule " .
-			   "--max-concurrent-tests=20 " .
-			   "--inputdir=../regress " .
-			   "--outputdir=\"$outputdir\"");
+my $rc = system($ENV{PG_REGRESS} . " $extra_opts " .
+			"--dlpath=\"$dlpath\" " .
+			"--bindir= " .
+			"--host=" . $node_primary->host . " " .
+			"--port=" . $node_primary->port . " " .
+			"--schedule=../regress/parallel_schedule " .
+			"--max-concurrent-tests=20 " .
+			"--inputdir=../regress " .
+			"--outputdir=\"$outputdir\"");
+if ($rc != 0)
+{
+	# Dump out the regression diffs file, if there is one
+	my $diffs = "$outputdir/regression.diffs";
+	if (-e $diffs)
+	{
+		print "=== dumping $diffs ===\n";
+		print slurp_file($diffs);
+		print "=== EOF ===\n";
+	}
+}
+is($rc, 0, 'regression tests pass');
 
 # Clobber all sequences with their next value, so that we don't have
 # differences between nodes due to caching.
-- 
2.30.2

Re: A test for replay of regression tests

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 2:34 AM Andrew Dunstan  wrote:
> On 3/20/22 05:36, Thomas Munro wrote:
> > On Sun, Mar 20, 2022 at 5:20 PM Thomas Munro  wrote:
> >> I'll try to come up with the perl needed to see the regression.diffs
> >> next time...
> > Here's my proposed change to achieve that.
>
> I think that's OK.

Thanks for looking!  Pushed.

Re: Probable CF bot degradation

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 1:58 AM Matthias van de Meent
 wrote:
> Would you know how long the expected bitrot re-check period for CF
> entries that haven't been updated is, or could the bitrot-checking
> queue be displayed somewhere to indicate the position of a patch in
> this queue?

I see that your patches were eventually retested.

It was set to try to recheck every ~48 hours, though it couldn't quite
always achieve that when the total number of eligible submissions is
too large.  In this case it had stalled for too long after the github
outage, which I'm going to try to improve.  The reason for the 48+
hour cycle is the Windows tests now take ~25 minutes (since we started
actually running all the tests on that platform), and we could only
have two Windows tasts running at a time in practice, because the
limit for Windows was 8 CPUs, and we use 4 for each task, which means
we could only test ~115 branches per day, or actually a shade fewer
because it's pretty dumb and only wakes up once a minute to decide
what to do, and we currently have 242 submissions (though some don't
apply, those are free, so the number varies over time...).  There are
limits on the Unixes too but they are more generous, and the Unix
tests only take 4-10 minutes, so we can ignore that for now, it's all
down to Windows.

I had been meaning to stump up the USD$10/month it costs to double the
CPU limits from the basic free Cirrus account, and I've just now done
that and told cfbot it's allowed to test 4 branches at once and to try
to test every branch every 24 hours.  Let's see how that goes.

Here's hoping we can cut down the time it takes to run the tests on
Windows... there's some really dumb stuff happening there.  Top items
I'm aware of:  (1) general lack of test concurrency, (2) exec'ing new
backends is glacially slow on that OS but we do it for every SQL
statement in the TAP tests and every regression test script (I have
some patches for this to share after the code freeze).

> Additionally, are there plans to validate commits of the main branch
> before using them as a base for CF entries, so that "bad" commits on
> master won't impact CFbot results as easy?

How do you see this working?

I have wondered about some kind of way to click a button to say "do
this one again now", but I guess that sort of user interaction should
ideally happen after merging this thing into the Commitfest app,
because it already has auth, and interactive Python/Django web stuff.

Re: Probable CF bot degradation

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 12:23 PM Thomas Munro  wrote:
> On Mon, Mar 21, 2022 at 1:58 AM Matthias van de Meent
>  wrote:
> > Would you know how long the expected bitrot re-check period for CF
> > entries that haven't been updated is, or could the bitrot-checking
> > queue be displayed somewhere to indicate the position of a patch in
> > this queue?

Also, as for the show-me-the-queue page, yeah that's a good idea and
quite feasible.  I'll look into that in a bit.

> > Additionally, are there plans to validate commits of the main branch
> > before using them as a base for CF entries, so that "bad" commits on
> > master won't impact CFbot results as easy?
>
> How do you see this working?

[Now with more coffee on board]  Oh, right, I see, you're probably
thinking that we could look at
https://github.com/postgres/postgres/commits/master and take the most
recent passing commit as a base.  Hmm, interesting idea.

Re: Probable CF bot degradation

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 1:41 PM Peter Geoghegan  wrote:
> BTW, I think that the usability of the CFBot website would be improved
> if there was a better visual indicator of what each "green tick inside
> a circle" link actually indicates -- what are we testing for each
> green tick/red X shown?
>
> I already see tooltips which show a descriptive string (for example a
> tooltip that says "FreeBSD - 13: COMPLETED" which comes from
>  tags), which is something. But seeing these tooltips
> requires several seconds of mouseover on my browser (Chrome). I'd be
> quite happy if I could see similar tooltips immediately on mouseover
> (which isn't actually possible with standard generic tooltips IIUC),
> or something equivalent. Any kind of visual feedback on the nature of
> the thing tested by a particular CI run that the user can drill down
> to (you know, a Debian logo next to the tick, that kind of thing).

Nice idea, if someone with graphics skills is interested in looking into it...

Those tooltips come from the "name" elements of the .cirrus.yml file
where tasks are defined, with Cirrus's task status appended.  If we
had a set of monochrome green and red icons with a Linux penguin,
FreeBSD daemon, Windows logo and Apple logo of matching dimensions, a
config file could map task names to icons, and fall back to
ticks/crosses for anything unknown/new, including the
"CompilerWarnings" one that doesn't have an obvious icon.  Another
thing to think about is the 'solid' and 'hollow' variants, the former
indicating a recent change.  So we'd need 4 variants of each logo.
Also I believe there is a proposal to add NetBSD and OpenBSD in the
works.

Re: Probable CF bot degradation

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 3:11 PM Andres Freund  wrote:
> On 2022-03-21 14:44:55 +1300, Thomas Munro wrote:
> > Those tooltips come from the "name" elements of the .cirrus.yml file
> > where tasks are defined, with Cirrus's task status appended.  If we
> > had a set of monochrome green and red icons with a Linux penguin,
> > FreeBSD daemon, Windows logo and Apple logo of matching dimensions, a
> > config file could map task names to icons, and fall back to
> > ticks/crosses for anything unknown/new, including the
> > "CompilerWarnings" one that doesn't have an obvious icon.  Another
> > thing to think about is the 'solid' and 'hollow' variants, the former
> > indicating a recent change.  So we'd need 4 variants of each logo.
> > Also I believe there is a proposal to add NetBSD and OpenBSD in the
> > works.
>
> Might even be sufficient to add just the first letter of the task inside the
> circle, instead of the "check" and x. Right now the letters are unique.

Nice idea, because it retains the information density.  If someone
with web skills would like to pull down the cfbot page and hack up one
of the rows to show an example of a pass, fail, recent-pass,
recent-fail as a circle with a letter in it, and also an "in progress"
symbol that occupies the same amoutn of space, I'd be keen to try
that.  (The current "in progress" blue circle was originally supposed
to be a pie filling up slowly according to a prediction of finished
time based on past performance, but I never got to that... it's stuck
at 1/4 :-))

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-20 Thread Thomas Munro

(defaults to 0, the main fork)\n"));
+	printf(_("  -l, --relation=N/N/N   only show records that touch a specific relation\n"));
 	printf(_("  -n, --limit=N  number of records to display\n"));
 	printf(_("  -p, --path=PATHdirectory in which to find log segment files or a\n"
 			 " directory with a ./pg_wal that contains such files\n"
@@ -777,6 +837,7 @@ usage(void)
 			 " (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -x, --xid=XID  only show records with transaction ID XID\n"));
+	printf(_("  -w, --fullpage only show records with a full page write\n"));
 	printf(_("  -z, --stats[=record]   show statistics instead of records\n"
 			 " (optionally, show per-record statistics)\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
@@ -800,12 +861,16 @@ main(int argc, char **argv)
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
+		{"block", required_argument, NULL, 'k'},
 		{"end", required_argument, NULL, 'e'},
 		{"follow", no_argument, NULL, 'f'},
+		{"fork", required_argument, NULL, 1},
+		{"fullpage", no_argument, NULL, 'w'},
 		{"help", no_argument, NULL, '?'},
 		{"limit", required_argument, NULL, 'n'},
 		{"path", required_argument, NULL, 'p'},
 		{"quiet", no_argument, NULL, 'q'},
+		{"relation", required_argument, NULL, 'l'},
 		{"rmgr", required_argument, NULL, 'r'},
 		{"start", required_argument, NULL, 's'},
 		{"timeline", required_argument, NULL, 't'},
@@ -858,6 +923,10 @@ main(int argc, char **argv)
 	config.filter_by_rmgr_enabled = false;
 	config.filter_by_xid = InvalidTransactionId;
 	config.filter_by_xid_enabled = false;
+	config.filter_by_relation_enabled = false;
+	config.filter_by_relation_block_enabled = false;
+	config.filter_by_relation_forknum = MAIN_FORKNUM;
+	config.filter_by_fpw = false;
 	config.stats = false;
 	config.stats_per_record = false;
 
@@ -870,7 +939,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "be:fn:p:qr:s:t:x:z",
+	while ((option = getopt_long(argc, argv, "be:fk:l:n:p:qr:s:t:wx:z",
  long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -890,6 +959,41 @@ main(int argc, char **argv)
 			case 'f':
 config.follow = true;
 break;
+ 		case 1: /* fork number */
+if (sscanf(optarg, "%u", &config.filter_by_relation_forknum) != 1 ||
+	config.filter_by_relation_forknum >= MAX_FORKNUM)
+{
+	pg_log_error("could not parse valid fork number (0..%d) \"%s\"",
+ MAX_FORKNUM - 1, optarg);
+	goto bad_argument;
+}
+break;
+			case 'k':
+if (sscanf(optarg, "%u", &config.filter_by_relation_block) != 1 ||
+	!BlockNumberIsValid(config.filter_by_relation_block))
+{
+	pg_log_error("could not parse valid block number \"%s\"", optarg);
+	goto bad_argument;
+}
+config.filter_by_relation_block_enabled = true;
+break;
+			case 'l':
+if (sscanf(optarg, "%u/%u/%u",
+		   &config.filter_by_relation.spcNode,
+		   &config.filter_by_relation.dbNode,
+		   &config.filter_by_relation.relNode) != 3 ||
+	!OidIsValid(config.filter_by_relation.spcNode) ||
+	!OidIsValid(config.filter_by_relation.dbNode) ||
+	!OidIsValid(config.filter_by_relation.relNode)
+	)
+{
+	pg_log_error("could not parse valid relation from \"%s\"/"
+ " (expecting \"tablespace OID/database OID/"
+ "relation filenode\")", optarg);
+	goto bad_argument;
+}
+config.filter_by_relation_enabled = true;
+break;
 			case 'n':
 if (sscanf(optarg, "%d", &config.stop_after_records) != 1)
 {
@@ -947,6 +1051,9 @@ main(int argc, char **argv)
 	goto bad_argument;
 }
 break;
+			case 'w':
+config.filter_by_fpw = true;
+break;
 			case 'x':
 if (sscanf(optarg, "%u", &config.filter_by_xid) != 1)
 {
@@ -976,6 +1083,12 @@ main(int argc, char **argv)
 		}
 	}
 
+	if (config.filter_by_relation_block_enabled && !config.filter_by_relation_enabled)
+	{
+		pg_log_error("--block option requires --relation option to be specified");
+		goto bad_argument;
+	}
+
 	if ((optind + 2) < argc)
 	{

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-20 Thread Thomas Munro

On Mon, Mar 21, 2022 at 4:36 PM Thomas Munro  wrote:
> On Sat, Feb 26, 2022 at 7:58 AM David Christensen
>  wrote:
> > Attached is V2 with additional feedback from this email, as well as the 
> > specification of the
> > ForkNumber and FPW as specifiable options.
>
> Trivial fixup needed after commit 3f1ce973.

[04:30:50.630] pg_waldump.c:963:26: error: format ‘%u’ expects
argument of type ‘unsigned int *’, but argument 3 has type ‘ForkNumber
*’ [-Werror=format=]
[04:30:50.630] 963 | if (sscanf(optarg, "%u",
&config.filter_by_relation_forknum) != 1 ||
[04:30:50.630] | ~^ ~~
[04:30:50.630] | | |
[04:30:50.630] | | ForkNumber *
[04:30:50.630] | unsigned int *

And now that this gets to the CompilerWarnings CI task, it looks like
GCC doesn't like an enum as a scanf %u destination (I didn't see that
warning locally when I compiled the above fixup because clearly Clang
is cool with it...).  Probably needs a temporary unsigned int to
sscanf into first.

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-21 Thread Thomas Munro

On Tue, Mar 22, 2022 at 6:14 AM David Christensen
 wrote:
> Updated to include the V3 fixes as well as the unsigned int/enum fix.

Hi David,

I ran this though pg_indent and adjusted some remaining
non-project-style whitespace, and took it for a spin.  Very minor
comments:

pg_waldump: error: could not parse valid relation from ""/ (expecting
"tablespace OID/database OID/relation filenode")
-> There was a stray "/" in that message, which I've removed in the attached.

pg_waldump: error: could not parse valid relation from "1664/0/1262"
(expecting "tablespace OID/database OID/relation filenode")
-> Why not?  Shared relations like pg_database have invalid database
OID, so I think it should be legal to write --relation=1664/0/1262.  I
took out that restriction.

+   if (sscanf(optarg, "%u",
&forknum) != 1 ||
+   forknum >= MAX_FORKNUM)
+   {
+   pg_log_error("could
not parse valid fork number (0..%d) \"%s\"",
+
  MAX_FORKNUM - 1, optarg);
+   goto bad_argument;
+   }

I guess you did this because init fork references aren't really
expected in the WAL, but I think it's more consistent to allow up to
MAX_FORKNUM, not least because your documentation mentions 3 as a
valid value.  So I adjust this to allow MAX_FORKNUM.  Make sense?

Here are some more details I noticed, as a likely future user of this
very handy feature, which I haven't changed, because they seem more
debatable and you might disagree...

1.  I think it'd be less surprising if the default value for --fork
wasn't 0... why not show all forks?
2.  I think it'd be less surprising if --fork without --relation
either raised an error (like --block without --relation), or were
allowed, with the meaning "show me this fork of all relations".
3.  It seems funny to have no short switch for --fork when everything
else has one... what about -F?
From 596181e9dfe2db9d5338b3ac3c899d230fe0fc78 Mon Sep 17 00:00:00 2001
From: David Christensen 
Date: Fri, 25 Feb 2022 12:52:56 -0600
Subject: [PATCH v5] Add additional filtering options to pg_waldump.

This feature allows you to only output records that are touch a specific
RelFileNode and optionally BlockNumber and ForkNum in this relation.  We
also add the independent ability to filter Full Page Write records.

Author: David Christensen 
Reviewed-by: Peter Geoghegan 
Reviewed-by: Japin Li 
Reviewed-by: Bharath Rupireddy 
Reviewed-by: Cary Huang 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/lzzgmgm6e5.fsf%40veeddrois.attlocal.net
---
 doc/src/sgml/ref/pg_waldump.sgml |  48 +++
 src/bin/pg_waldump/pg_waldump.c  | 132 ++-
 2 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 5735a161ce..f157175764 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -100,6 +100,44 @@ PostgreSQL documentation
   
  
 
+ 
+  -k block
+  --block=block
+  
+   
+Display only records touching the given block. (Requires also
+providing the relation via --relation.)
+   
+  
+ 
+
+ 
+  --fork=fork
+  
+   
+When using the --relation filter, output only records
+from the given fork.  The valid values here are 0
+for the main fork, 1 for the Free Space
+Map, 2 for the Visibility Map,
+and 3 for the Init fork.  If unspecified, defaults
+to the main fork.
+   
+  
+ 
+
+ 
+  -l tbl/db/rel
+  --relation=tbl/db/rel
+  
+   
+Display only records touching the given relation.  The relation is
+specified via tablespace OID, database OID, and relfilenode separated
+by slashes, for instance, 1234/12345/12345.  This
+is the same format used for relations in the WAL dump output.
+   
+  
+ 
+
  
   -n limit
   --limit=limit
@@ -183,6 +221,16 @@ PostgreSQL documentation

  
 
+ 
+   -w
+   --fullpage
+   
+   
+   Filter records to only those which have full page writes.
+   
+   
+ 
+
  
   -x xid
   --xid=xid
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index fc081adfb8..eb0c9b2dcb 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -55,6 +55,12 @@ typedef struct XLogDumpConfig
 	bool		filter_by_rmgr_enabled;
 	TransactionId filter_by_xid;
 	bool		filter_by_xid_enabled;
+	RelFileNode filter_by_relation;
+	bool		filter_by_relation_enabled;
+	BlockNumber filter_by_relation_block;
+	bool

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-03-21 Thread Thomas Munro

I have a new socket abstraction patch that should address the known
Windows socket/event bugs, but it's a little bigger than I thought it
would be, not quite ready, and now too late to expect people to review
for 15, so I think it should go into the next cycle.  I've bounced
https://commitfest.postgresql.org/37/3523/ into the next CF.  We'll
need to do something like 75674c7e for master.

Broken make dependency somewhere near win32ver.o?

2022-03-21 Thread Thomas Munro

Hi,

Here's a strange one-off failure seen on CI[1], in the
CompilerWarnings task where we check that mingw cross-compile works:

[10:48:29.045] time make -s -j${BUILD_JOBS} world-bin
[10:48:38.705] x86_64-w64-mingw32-gcc: error: win32ver.o: No such file
or directory
[10:48:38.705] make[3]: *** [Makefile:44: pg_dumpall] Error 1
[10:48:38.705] make[3]: *** Waiting for unfinished jobs
[10:48:38.709] make[2]: *** [Makefile:43: all-pg_dump-recurse] Error 2
[10:48:38.709] make[2]: *** Waiting for unfinished jobs
[10:48:38.918] make[1]: *** [Makefile:42: all-bin-recurse] Error 2
[10:48:38.918] make: *** [GNUmakefile:21: world-bin-src-recurse] Error 2

I guess this implies a dependency problem somewhere around
src/makefiles/Makefile.win32 but I'm not familiar with how that .rc
stuff is supposed to work and I figured I'd mention it here in case
it's obvious to someone else...

[1] https://cirrus-ci.com/task/5546921619095552

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-03-21 Thread Thomas Munro

On Tue, Mar 22, 2022 at 4:13 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > I have a new socket abstraction patch that should address the known
> > Windows socket/event bugs, but it's a little bigger than I thought it
> > would be, not quite ready, and now too late to expect people to review
> > for 15, so I think it should go into the next cycle.  I've bounced
> > https://commitfest.postgresql.org/37/3523/ into the next CF.  We'll
> > need to do something like 75674c7e for master.
>
> OK.  You want me to push 75674c7e to HEAD?

Thanks, yes, please do.

Re: A test for replay of regression tests

2022-03-21 Thread Thomas Munro

On Tue, Mar 22, 2022 at 4:31 PM Masahiko Sawada  wrote:
>  SELECT pg_relation_size('vac_truncate_test') = 0;
>   ?column?
>  --
> - t
> + f

Thanks.  Ahh, déjà vu...  this probably needs the same treatment as
b700f96c and 3414099c provided for the reloptions test.  Well, at
least the first one.  Here's a patch like that.
From ba05f03c202bf66c7692787ec24ece13b193a897 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 22 Mar 2022 16:54:42 +1300
Subject: [PATCH] Try to stabilize vacuum test.

As commits b700f96c and 3414099c did for the reloptions test, take
measures to make sure that vacuum always truncates the table.

Discussion: https://postgr.es/m/CAD21AoCNoWjYkdEtr%2BVDoF9v__V905AedKZ9iF%3DArgCtrbxZqw%40mail.gmail.com
---
 src/test/regress/expected/vacuum.out | 6 +++---
 src/test/regress/sql/vacuum.sql  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 3e70e4c788..c63a157e5f 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -165,19 +165,19 @@ VACUUM (INDEX_CLEANUP FALSE) vaccluster;
 VACUUM (INDEX_CLEANUP AUTO) vactst; -- index cleanup option is ignored if no indexes
 VACUUM (INDEX_CLEANUP FALSE, FREEZE TRUE) vaccluster;
 -- TRUNCATE option
-CREATE TABLE vac_truncate_test(i INT NOT NULL, j text)
+CREATE TEMP TABLE vac_truncate_test(i INT NOT NULL, j text)
 	WITH (vacuum_truncate=true, autovacuum_enabled=false);
 INSERT INTO vac_truncate_test VALUES (1, NULL), (NULL, NULL);
 ERROR:  null value in column "i" of relation "vac_truncate_test" violates not-null constraint
 DETAIL:  Failing row contains (null, null).
-VACUUM (TRUNCATE FALSE) vac_truncate_test;
+VACUUM (TRUNCATE FALSE, DISABLE_PAGE_SKIPPING) vac_truncate_test;
 SELECT pg_relation_size('vac_truncate_test') > 0;
  ?column? 
 --
  t
 (1 row)
 
-VACUUM vac_truncate_test;
+VACUUM (DISABLE_PAGE_SKIPPING) vac_truncate_test;
 SELECT pg_relation_size('vac_truncate_test') = 0;
  ?column? 
 --
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 18cb7fd08a..9faa8a34a6 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -147,12 +147,12 @@ VACUUM (INDEX_CLEANUP AUTO) vactst; -- index cleanup option is ignored if no ind
 VACUUM (INDEX_CLEANUP FALSE, FREEZE TRUE) vaccluster;
 
 -- TRUNCATE option
-CREATE TABLE vac_truncate_test(i INT NOT NULL, j text)
+CREATE TEMP TABLE vac_truncate_test(i INT NOT NULL, j text)
 	WITH (vacuum_truncate=true, autovacuum_enabled=false);
 INSERT INTO vac_truncate_test VALUES (1, NULL), (NULL, NULL);
-VACUUM (TRUNCATE FALSE) vac_truncate_test;
+VACUUM (TRUNCATE FALSE, DISABLE_PAGE_SKIPPING) vac_truncate_test;
 SELECT pg_relation_size('vac_truncate_test') > 0;
-VACUUM vac_truncate_test;
+VACUUM (DISABLE_PAGE_SKIPPING) vac_truncate_test;
 SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
-- 
2.30.2

Re: New Object Access Type hooks

2022-03-21 Thread Thomas Munro

On Fri, Mar 18, 2022 at 4:22 PM Mark Dilger
 wrote:
> (FYI, I got a test failure from src/test/recovery/t/013_crash_restart.pl when 
> testing v1-0001.  I'm not sure yet what that is about.)

Doesn't look like 0001 has anything to do with that...   Are you on a
Mac?  Did it look like this recent failure from CI?

https://cirrus-ci.com/task/4686108033286144
https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/regress_log_013_crash_restart
https://api.cirrus-ci.com/v1/artifact/task/4686108033286144/log/src/test/recovery/tmp_check/log/013_crash_restart_primary.log

I have no idea what is going on there, but searching for discussion
brought me here...

Re: Broken make dependency somewhere near win32ver.o?

2022-03-21 Thread Thomas Munro

On Tue, Mar 22, 2022 at 4:14 PM Andres Freund  wrote:
> The problem looks to be that pg_dumpall doesn't have a dependency on OBJs,
> which in turn is what contains the dependency on WIN32RES, which in turn
> contains win32ver.o. So the build succeeds if pg_dump/restores's dependencies
> are built first, but not if pg_dumpall starts to be built before that...
>
> Seems we just need to add $(WIN32RES) to pg_dumpall: ?

Ah, yeah, that looks right.  I don't currently have a mingw setup to
test, but clearly $(WIN32RES) is passed to $(CC) despite not being
listed as a dependency.

Re: MDAM techniques and Index Skip Scan patch

2022-03-22 Thread Thomas Munro

On Tue, Mar 22, 2022 at 2:34 PM Andres Freund  wrote:
> IMO it's pretty clear that having "duelling" patches below one CF entry is a
> bad idea. I think they should be split, with inactive approaches marked as
> returned with feeback or whatnot.

I have the impression that this thread is getting some value from
having a CF entry, as a multi-person collaboration where people are
trading ideas and also making progress that no one wants to mark as
returned, but it's vexing for people managing the CF because it's not
really proposed for 15.  Perhaps what we lack is a new status, "Work
In Progress" or something?

Re: Probable CF bot degradation

2022-03-22 Thread Thomas Munro

On Mon, Mar 21, 2022 at 12:46 PM Thomas Munro  wrote:
> On Mon, Mar 21, 2022 at 12:23 PM Thomas Munro  wrote:
> > On Mon, Mar 21, 2022 at 1:58 AM Matthias van de Meent
> >  wrote:
> > > Additionally, are there plans to validate commits of the main branch
> > > before using them as a base for CF entries, so that "bad" commits on
> > > master won't impact CFbot results as easy?
> >
> > How do you see this working?
>
> [Now with more coffee on board]  Oh, right, I see, you're probably
> thinking that we could look at
> https://github.com/postgres/postgres/commits/master and take the most
> recent passing commit as a base.  Hmm, interesting idea.

A nice case in point today: everything is breaking on Windows due to a
commit in master, which could easily be avoided by looking back a
certain distance for a passing commit from postgres/postgres to use as
a base.  Let's me see if this is easy to fix...

https://www.postgresql.org/message-id/20220322231311.GK28503%40telsasoft.com

Re: cpluspluscheck vs ICU

2022-03-22 Thread Thomas Munro

On Wed, Mar 23, 2022 at 3:23 PM Andres Freund  wrote:
> On 2022-03-22 17:20:24 -0700, Andres Freund wrote:
> > I was about to propose adding headerscheck / cpluspluscheck to the CI file 
> > so
> > that cfbot can catch future issues.
>
> The attached patch does so, with ICU disabled to avoid the problems discussed
> in the thread. Example run:
> https://cirrus-ci.com/task/6326161696358400?logs=headers_headerscheck#L0
>
> Unless somebody sees a reason not to, I'm planning to commit this soon.

LGTM.

Re: [PATCH] Add native windows on arm64 support

2022-03-22 Thread Thomas Munro

On Tue, Mar 22, 2022 at 11:30 PM Julien Rouhaud  wrote:
> On Tue, Mar 22, 2022 at 09:37:46AM +, Niyas Sait wrote:
> > Yes, we could look into providing a build machine. Do you have any
> > reference to what the CI system looks like now for PostgresSQL and how to
> > add new workers etc.?
>
> It's all documented at
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto.

It seems likely there will be more and more Windows/ARM users, so yeah
having a machine to test that combination would be great.  I wonder if
ASLR isn't breaking for you by good luck only...

Re: Adding CI to our tree

2022-03-23 Thread Thomas Munro

On Thu, Mar 10, 2022 at 9:37 AM Justin Pryzby  wrote:
> -Og

Adding this to CXXFLAGS caused a torrent of warnings from g++ about
LLVM headers, which I also see on my local system for LLVM 11 and LLVM
14:

[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h: In member
function ‘llvm::CallInst*
llvm::IRBuilderBase::CreateCall(llvm::FunctionType*, llvm::Value*,
llvm::ArrayRef, const llvm::Twine&, llvm::MDNode*)’:
[19:47:11.047] /usr/lib/llvm-11/include/llvm/ADT/Twine.h:229:16:
warning: ‘.llvm::Twine::LHS.llvm::Twine::Child::twine’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
[19:47:11.047] 229 | !LHS.twine->isBinary())
[19:47:11.047] | ^

https://cirrus-ci.com/task/5597526098182144?logs=build#L6

Not sure who to complain to about that...

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-23 Thread Thomas Munro

On Thu, Mar 24, 2022 at 9:59 AM Peter Geoghegan  wrote:
> On Wed, Mar 23, 2022 at 1:53 PM Robert Haas  wrote:
> > And therefore I favor defining it to mean that we don't
> > skip any work at all.
>
> But even today DISABLE_PAGE_SKIPPING won't do pruning when we cannot
> acquire a cleanup lock on a page, unless it happens to have XIDs from
> before FreezeLimit (which is probably 50 million XIDs behind
> OldestXmin, the vacuum_freeze_min_age default). I don't see much
> difference.

Yeah, I found it confusing that DISABLE_PAGE_SKIPPING doesn't disable
all page skipping, so 3414099c turned out to be not enough.

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-23 Thread Thomas Munro

On Thu, Mar 24, 2022 at 9:53 AM Peter Eisentraut
 wrote:
> On 21.03.22 05:55, Thomas Munro wrote:
> > [04:30:50.630] pg_waldump.c:963:26: error: format ‘%u’ expects
> > argument of type ‘unsigned int *’, but argument 3 has type ‘ForkNumber
> > *’ [-Werror=format=]
> > [04:30:50.630] 963 | if (sscanf(optarg, "%u",
> > &config.filter_by_relation_forknum) != 1 ||
> > [04:30:50.630] | ~^ ~~
> > [04:30:50.630] | | |
> > [04:30:50.630] | | ForkNumber *
> > [04:30:50.630] | unsigned int *
> >
> > And now that this gets to the CompilerWarnings CI task, it looks like
> > GCC doesn't like an enum as a scanf %u destination (I didn't see that
> > warning locally when I compiled the above fixup because clearly Clang
> > is cool with it...).  Probably needs a temporary unsigned int to
> > sscanf into first.
>
> That's because ForkNum is a signed type.  You will probably succeed if
> you use "%d" instead.

Erm, is that really OK?  C says "Each enumerated type shall be
compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,
but shall be capable of representing the values of all the members of
the enumeration."  It could even legally vary from enum to enum,
though in practice most compilers probably just use ints all the time
unless you use weird pragma pack incantation.  Therefore I think you
need an intermediate variable with the size and signedness matching the
format string, if you're going to scanf directly into it, which
David's V6 did.

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-23 Thread Thomas Munro

On Wed, Feb 16, 2022 at 12:11 PM Nathan Bossart
 wrote:
> On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote:
> > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote:
> >> I generally think it'd be a good exercise to go through an use
> >> get_dirent_type() in nearly all ReadDir() style loops - it's a nice 
> >> efficiency
> >> win in general, and IIRC a particularly big one on windows.
> >>
> >> It'd probably be good to add a reference to get_dirent_type() to
> >> ReadDir[Extended]()'s docs.
> >
> > Agreed.  I might give this a try.
>
> Alright, here is a new patch set where I've tried to replace as many
> stat()/lstat() calls as possible with get_dirent_type().  0002 and 0003 are
> the same as v9.  I noticed a few remaining stat()/lstat() calls that don't
> appear to be doing proper error checking, but I haven't had a chance to try
> fixing those yet.

0001:  These get_dirent_type() conversions are nice.  LGTM.

0002:

 /* we're only handling directories here, skip if it's not ours */
-if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode))
+if (lstat(path, &statbuf) != 0)
+ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", path)));
+else if (!S_ISDIR(statbuf.st_mode))
 return;

Why is this a good place to silently ignore non-directories?
StartupReorderBuffer() is already in charge of skipping random
detritus found in the directory, so would it be better to do "if
(get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then
drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs()
completely?  Then perhaps its ReadDirExtended() shoud be using ERROR
instead of INFO, so that missing/non-dir/b0rked directories raise an
error.  I don't understand why it's reporting readdir() errors at INFO
but unlink() errors at ERROR, and as far as I can see the other paths
that reach this code shouldn't be sending in paths to non-directories
here unless something is seriously busted and that's ERROR-worthy.

0003:  I haven't studied this cure vs disease thing... no opinion today.

Checking pgwin32_is_junction() errors

2022-03-23 Thread Thomas Munro

Hi,

The comment for pgwin32_is_junction() says "Assumes the file exists,
so will return false if it doesn't (since a nonexistent file is not a
junction)".  In fact that's the behaviour for any kind of error, and
although we set errno in that case, no caller ever checks it.

I think it'd be better to add missing_ok and elevel parameters,
following existing patterns.  Unfortunately, it can't use the generic
frontend logging to implement elevel in frontend code from its current
location, because pgport can't call pgcommon.  For now I came up with
a kludge to work around that problem, but I don't like it, and would
need to come up with something better...

Sketch code attached.
From 8e0e51359c0191cf673c3ddc87a3262b13f530a4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 16 Mar 2022 23:05:39 +1300
Subject: [PATCH] Raise errors in pgwin32_is_junction().

XXX It's not very nice to use elevel like this in frontend code
---
 src/backend/access/transam/xlog.c|  2 +-
 src/backend/replication/basebackup.c |  4 ++--
 src/backend/storage/file/fd.c|  2 +-
 src/backend/utils/adt/misc.c |  2 +-
 src/bin/pg_checksums/pg_checksums.c  |  2 +-
 src/bin/pg_rewind/file_ops.c |  2 +-
 src/common/file_utils.c  |  5 +++-
 src/include/port.h   |  2 +-
 src/include/port/win32_port.h|  2 +-
 src/port/dirmod.c| 36 +---
 10 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4ac3871c74..819b05177d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8314,7 +8314,7 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 			 * directories directly under pg_tblspc, which would fail below.
 			 */
 #ifdef WIN32
-			if (!pgwin32_is_junction(fullpath))
+			if (!pgwin32_is_junction(fullpath, false, ERROR))
 continue;
 #else
 			if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6884cad2c0..4be9090445 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1352,7 +1352,7 @@ sendDir(bbsink *sink, const char *path, int basepathlen, bool sizeonly,
 #ifndef WIN32
 			S_ISLNK(statbuf.st_mode)
 #else
-			pgwin32_is_junction(pathbuf)
+			pgwin32_is_junction(pathbuf, true, ERROR)
 #endif
 			)
 		{
@@ -1840,7 +1840,7 @@ convert_link_to_directory(const char *pathbuf, struct stat *statbuf)
 #ifndef WIN32
 	if (S_ISLNK(statbuf->st_mode))
 #else
-	if (pgwin32_is_junction(pathbuf))
+	if (pgwin32_is_junction(pathbuf, true, ERROR))
 #endif
 		statbuf->st_mode = S_IFDIR | pg_dir_create_mode;
 }
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 14b77f2861..ec2f49fefa 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -3453,7 +3453,7 @@ SyncDataDirectory(void)
 			xlog_is_symlink = true;
 	}
 #else
-	if (pgwin32_is_junction("pg_wal"))
+	if (pgwin32_is_junction("pg_wal", false, LOG))
 		xlog_is_symlink = true;
 #endif
 
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 89690be2ed..723504ab10 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -317,7 +317,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS)
 	 * found, a relative path to the data directory is returned.
 	 */
 #ifdef WIN32
-	if (!pgwin32_is_junction(sourcepath))
+	if (!pgwin32_is_junction(sourcepath, false, ERROR))
 		PG_RETURN_TEXT_P(cstring_to_text(sourcepath));
 #else
 	if (lstat(sourcepath, &st) < 0)
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5f0f5ee62d..5720907d33 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -404,7 +404,7 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly)
 #ifndef WIN32
 		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
 #else
-		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn))
+		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn, false, 1))
 #endif
 		{
 			/*
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 6cb288f099..1eda2baf2f 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -434,7 +434,7 @@ recurse_dir(const char *datadir, const char *parentpath,
 #ifndef WIN32
 		else if (S_ISLNK(fst.st_mode))
 #else
-		else if (pgwin32_is_junction(fullpath))
+		else if (pgwin32_is_junction(fullpath, false, 1))
 #endif
 		{
 #if defined(HAVE_READLINK) || defined(WIN32)
diff --git a/src/common/file_utils.c b/src/common/file_utils.c
index 7138068633..2db909a3aa 100644
--- a/src/common/file_utils.c
+++ b/src/common/file_utils.c
@@ -88,8 +88,11 @@ fsync_pgdata(const char *pg_data,

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

2020-11-30 Thread Thomas Munro

On Mon, Nov 30, 2020 at 6:59 PM Thomas Munro  wrote:
> On Wed, Nov 25, 2020 at 8:00 AM Pavel Borisov  wrote:
> > The new status of this patch is: Ready for Committer

> ...  That's the version I plan to commit tomorrow, unless
> there are further comments or objections.  ...

Done, and back-patched.

I thought a bit more about the fact that we fail to unlink
higher-numbered segments in certain error cases, potentially leaving
stray files behind.  As far as I can see, nothing we do in this
code-path is going to be a bullet-proof solution to that problem.  One
simple idea would be for the checkpointer to refuse to unlink segment
0 (thereby allowing the relfilenode to be recycled) until it has
scanned the parent directory for any related files that shouldn't be
there.

> While looking at trace output, I figured we should just use
> truncate(2) on non-Windows, on the master branch only.  It's not like
> it really makes much difference, but I don't see why we shouldn't
> allow ourselves to use ancient standardised Unix syscalls when we can.

Also pushed, but only to master.

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

2020-11-30 Thread Thomas Munro

On Tue, Dec 1, 2020 at 3:55 PM Michael Paquier  wrote:
> On Mon, Nov 30, 2020 at 06:59:40PM +1300, Thomas Munro wrote:
> > So I fixed that, by adding a return value to do_truncate() and
> > checking it.  That's the version I plan to commit tomorrow, unless
> > there are further comments or objections.  I've also attached a
> > version suitable for REL_11_STABLE and earlier branches (with a name
> > that cfbot should ignore), where things are slightly different.  In
> > those branches, the register_forget_request() logic is elsewhere.
>
> Hmm.  Sorry for arriving late at the party.  But is that really
> something suitable for a backpatch?  Sure, it is not optimal to not
> truncate all the segments when a transaction dropping a relation
> commits, but this was not completely broken either.

I felt on balance it was a "bug", since it causes operational
difficulties for people and was clearly not our intended behaviour,
and I announced this intention 6 weeks ago.  Of course I'll be happy
to revert it from the back-branches if that's the consensus.  Any
other opinions?

PG vs LLVM 12 on seawasp, next round

2020-11-30 Thread Thomas Munro

Hi

Since seawasp's bleeding edge LLVM installation moved to "trunk
20201114 c8f4e06b 12.0.0" ~16 days ago, it has been red.  Further
updates didn't help it and it's now on "trunk 20201127 6ee22ca6
12.0.0".  I wonder if there is something in Fabien's scripting that
needs to be tweaked, perhaps a symlink name or similar.  I don't
follow LLVM development but I found my way to a commit[1] around the
right time that mentions breaking up the OrcJIT library, so *shrug*
maybe that's a clue.

+ERROR:  could not load library
"/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/tmp_install/home/fabien/pg/build-farm-11/buildroot/HEAD/inst/lib/postgresql/llvmjit.so":
libLLVMOrcJIT.so.12git: cannot open shared object file: No such file
or directory

[1] 
https://github.com/llvm/llvm-project/commit/1d0676b54c4e3a517719220def96dfdbc26d8048

Re: Recent eelpout failures on 9.x branches

2020-12-01 Thread Thomas Munro

On Wed, Dec 2, 2020 at 11:36 AM Tom Lane  wrote:
> Perhaps this is just a question of the machine being too slow to complete
> the test, in which case we ought to raise wal_sender_timeout.  But it's
> weird that it would've started to fail just now, because I don't really
> see any changes in those branches that would explain a week-old change
> in the test runtime.

Unfortunately, eelpout got kicked off the nice shiny ARM server it was
running on so last week I moved it to a rack mounted Raspberry Pi.  It
seems to be totally I/O starved causing some timeouts to be reached,
and I'm looking into fixing that by adding fast storage.  This may
take a few days.  Sorry for the noise.

Re: Recent eelpout failures on 9.x branches

2020-12-01 Thread Thomas Munro

On Wed, Dec 2, 2020 at 12:07 PM Tom Lane  wrote:
> I'm also wondering a bit why the issue isn't affecting the newer
> branches.  It's certainly not because we made the test shorter ...

I looked at htop while it was building the 9.x branches and saw
pg_basebackup sitting in D state waiting for glacial I/O.  Perhaps
this was improved by the --no-sync option that got sprinkled around
the place in later releases?

Re: Commitfest 2020-11 is closed

2020-12-02 Thread Thomas Munro

On Thu, Dec 3, 2020 at 10:00 AM Tom Lane  wrote:
> This is actually a bit problematic, because now the cfbot is ignoring
> those patches (or if it's not, I don't know where it's displaying the
> results).  Please go ahead and move the remaining open patches, or
> else re-open the CF if that's possible.

As of quite recently, Travis CI doesn't seem to like cfbot's rate of
build jobs.  Recently it's been taking a very long time to post
results for new patches and taking so long to get around to retesting
patches for bitrot that the my "delete old results after a week" logic
started wiping out some results while they are still the most recent,
leading to the blank spaces where the results are supposed to be.
D'oh. I'm looking into a couple of options.

Re: Commitfest 2020-11 is closed

2020-12-02 Thread Thomas Munro

On Thu, Dec 3, 2020 at 10:51 AM David Steele  wrote:
> On 12/2/20 4:15 PM, Thomas Munro wrote:
> > On Thu, Dec 3, 2020 at 10:00 AM Tom Lane  wrote:
> >> This is actually a bit problematic, because now the cfbot is ignoring
> >> those patches (or if it's not, I don't know where it's displaying the
> >> results).  Please go ahead and move the remaining open patches, or
> >> else re-open the CF if that's possible.
> >
> > As of quite recently, Travis CI doesn't seem to like cfbot's rate of
> > build jobs.  Recently it's been taking a very long time to post
> > results for new patches and taking so long to get around to retesting
> > patches for bitrot that the my "delete old results after a week" logic
> > started wiping out some results while they are still the most recent,
> > leading to the blank spaces where the results are supposed to be.
> > D'oh. I'm looking into a couple of options.
>
> pgBackRest test runs have gone from ~17 minutes to 3-6 hours over the
> last two months.

Ouch.

> Also keep in mind that travis-ci.org will be shut down at the end of the
> month. Some users who have migrated to travis-ci.com have complained
> that it is not any faster, but I have not tried myself (yet).

Oh.

> Depending on how you have Github organized migrating to travis-ci.com
> may be bit tricky because it requires full access to all private
> repositories in your account and orgs administrated by your account.

I'm experimenting with Github's built in CI.  All other ideas welcome.

Github Actions (CI)

2020-12-02 Thread Thomas Munro

Hi hackers,

I'm looking for more horsepower for testing commitfest entries
automatically, and today I tried out $SUBJECT.  The attached is a
rudimentary first attempt, for show-and-tell.  If you have a Github
account, you just have to push it to a branch there and look at the
Actions tab on the web page for the results.  Does anyone else have
.github files and want to share, to see if we can combine efforts
here?

The reason for creating three separate "workflows" for Linux, Windows
and macOS rather than three separate "jobs" inside one workflow is so
that cfbot.cputube.org could potentially get separate pass/fail
results for each OS out of the API rather than one combined result.  I
rather like that feature of cfbot's results.  (I could be wrong about
needing to do that, this is the first time I've ever looked at this
stuff.)

The Windows test actually fails right now, exactly as reported by
Ranier[1].  It is a release build on a recent MSVC, so I guess that is
expected and off-topic for this thread.  But generally,
.github/workflows/ci-windows.yml is the weakest part of this.  It'd be
great to get a debug/assertion build, show backtraces when it crashes,
run more of the tests, etc etc, but I don't know nearly enough about
Windows to do that myself.  Another thing is that it uses Choco for
flex and bison; it'd be better to find those on the image, if
possible.  Also, for all 3 OSes, it's not currently attempting to
cache build results or anything like that.

I'm a bit sad that GH doesn't have FreeBSD build runners.  Those are
now popping up on other CIs, but I'm not sure if their free/open
source tiers have enough resources for cfbot.

[1] 
https://www.postgresql.org/message-id/flat/CAEudQArhn8bH836OB%2B3SboiaeEcgOtrJS58Bki4%3D5yeVqToxgw%40mail.gmail.com
From 424bcae7f1fcb1ada0e7046bfc5e0c5254c6f439 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 3 Dec 2020 10:58:16 +1300
Subject: [PATCH] Github CI WIP

---
 .github/workflows/ci-linux.yml | 43 ++
 .github/workflows/ci-macos.yml | 38 +++
 .github/workflows/ci-windows-buildsetup.pl | 35 ++
 .github/workflows/ci-windows-dumpregr.pl   | 22 +++
 .github/workflows/ci-windows.yml   | 32 
 5 files changed, 170 insertions(+)
 create mode 100644 .github/workflows/ci-linux.yml
 create mode 100644 .github/workflows/ci-macos.yml
 create mode 100644 .github/workflows/ci-windows-buildsetup.pl
 create mode 100644 .github/workflows/ci-windows-dumpregr.pl
 create mode 100644 .github/workflows/ci-windows.yml

diff --git a/.github/workflows/ci-linux.yml b/.github/workflows/ci-linux.yml
new file mode 100644
index 00..8ee32770cd
--- /dev/null
+++ b/.github/workflows/ci-linux.yml
@@ -0,0 +1,43 @@
+name: ci-linux
+on: [push]
+jobs:
+  ci-linux:
+runs-on: ubuntu-latest
+steps:
+- uses: actions/checkout@v2
+- name: Install packages
+  run: |
+sudo apt-get --yes update
+sudo apt-get --yes install gcc libreadline-dev flex bison make perl libipc-run-perl clang llvm-dev libperl-dev libpython-dev tcl-dev libldap2-dev libicu-dev docbook-xml docbook-xsl fop libxml2-utils xsltproc krb5-admin-server krb5-kdc krb5-user slapd ldap-utils libssl-dev pkg-config locales-all gdb
+  env:
+DEBIAN_FRONTEND: noninteractive
+- name: Configure
+  run: ./configure --enable-cassert --enable-debug --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap --with-openssl --with-icu --with-llvm
+- name: Build
+  run: |
+echo "COPT=-Wall -Werror" > src/Makefile.custom
+make -s -j3
+- name: Check world
+  run: |
+echo '/tmp/%e-%s-%p.core' | sudo tee /proc/sys/kernel/core_pattern
+ulimit -c unlimited
+make -s check-world
+  env:
+PG_TEST_EXTRA: "ssl kerberos"
+#PG_TEST_EXTRA: "ssl ldap kerberos" why does slapd fail to start?
+- name: Look for clues
+  if: ${{ failure() }}
+  run: |
+# dump useful logs
+for F in ` find . -name initdb.log -o -name regression.diffs ` ; do
+  echo === $F ===
+  head -1000 $F
+done
+# look for core files and spit out backtraces
+for corefile in $(find /tmp/ -name '*.core' 2>/dev/null) ; do
+  binary=$(gdb -quiet -core $corefile -batch -ex 'info auxv' | grep AT_EXECFN | perl -pe "s/^.*\"(.*)\"\$/\$1/g")
+  echo dumping $corefile for $binary
+  gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $binary $corefile
+done
+- name: Documentation
+  run: make -s docs
diff --git a/.github/workflows/ci-macos.yml b/.github/workflows/ci-macos.yml
new file mode 100644
index 00..2c20a5a279
--- /dev/null
+++ b/.github/workflows/ci-

Re: Commitfest 2020-11 is closed

2020-12-02 Thread Thomas Munro

On Thu, Dec 3, 2020 at 12:02 PM Josef Šimánek  wrote:
> st 2. 12. 2020 v 23:36 odesílatel Andrew Dunstan  napsal:
> > On 12/2/20 5:13 PM, Thomas Munro wrote:
> > > I'm experimenting with Github's built in CI.  All other ideas welcome.
> >
> > I'd look very closely at gitlab.
>
> I was about to respond before with the same idea. Feel free to ping
> regarding another CI. Also there is possibility to move to GitHub
> Actions (free open source CI). I got some experience with that as
> well.

I spent today getting something working on Github just to try it out,
and started a new thread[1] about that.  I've not tried Gitlab and
have no opinion about that; if someone has a working patch similar to
that, I'd definitely be interested to take a look.  I've looked at
some others.  For what cfbot is doing (namely: concentrating the work
of hundreds into one "account" via hundreds of branches), the spin-up
times and concurrency limits are a bit of a problem on many of them.
FWIW I think they're all wonderful for offering this service to open
source projects and I'm grateful that they do it!

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2By_SHVQcU3CPokmJxuHp1niebCjq4XzZizf8SR9ZdQRQ%40mail.gmail.com

Re: SELECT INTO deprecation

2020-12-03 Thread Thomas Kellerer

Stephen Frost schrieb am 02.12.2020 um 18:58:
> We should either remove it, or remove the comments that it's deprecated,
> not try to make it more deprecated or try to somehow increase the
> recommendation to not use it.

(I am writing from a "user only" perspective, not a developer)

I don't see any warning about the syntax being "deprecated" in the current 
manual.

There is only a note that says that CTAS is "recommended" instead of SELET INTO,
but for me that's something entirely different than "deprecating" it.

I personally have nothing against removing it, but I still see it used
a lot in questions on various online forums, and I would think that
a lot of people would be very unpleasantly surprised if a feature
gets removed without any warning (the current "recommendation" does not
constitute a deprecation or even removal warning for most people I guess)

I would vote for a clear deprecation message as suggested by Peter, but I would
add "and will be removed in a future version" to it.

Not sure if maybe even back-patching that warning would make sense as well, so
that also users of older versions get to see that warning.

Then target 15 or 16 as the release for removal, but not 14

Thomas

Re: Github Actions (CI)

2020-12-03 Thread Thomas Munro

On Fri, Dec 4, 2020 at 2:55 AM Josef Šimánek  wrote:
> Any chance to also share links to failing/passing testing builds?

https://github.com/macdice/postgres/runs/1490727809
https://github.com/macdice/postgres/runs/1490727838

However, looking these in a clean/cookieless browser, I see that it
won't show you anything useful unless you're currently logged in to
Github, so that's a major point against using it for cfbot (it's
certainly not my intention to make cfbot only useful to people who
have Github accounts).

Re: Index Skip Scan (new UniqueKeys)

2020-12-05 Thread Thomas Munro

On Wed, Dec 2, 2020 at 9:59 AM Heikki Linnakangas  wrote:
> On 01/12/2020 22:21, Dmitry Dolgov wrote:
> >> On Mon, Nov 30, 2020 at 04:42:20PM +0200, Heikki Linnakangas wrote:
> >> - I'm surprised you need a new index AM function (amskip) for this. Can't
> >> you just restart the scan with index_rescan()? The btree AM can check if 
> >> the
> >> new keys are on the same page, and optimize the rescan accordingly, like
> >> amskip does. That would speed up e.g. nested loop scans too, where the keys
> >> just happen to be clustered.
> >
> > An interesting point. At the moment I'm not sure whether it's possible
> > to implement skipping via index_rescan or not, need to take a look. But
> > checking if the new keys are on the same page would introduce some
> > overhead I guess, wouldn't it be too invasive to add it into already
> > existing btree AM?
>
> I think it'll be OK. But if it's not, you could add a hint argument to
> index_rescan() to hint the index AM that the new key is known to be
> greater than the previous key.

FWIW here's what I wrote about that years ago[1]:

> It works by adding a new index operation 'skip' which the executor
> code can use during a scan to advance to the next value (for some
> prefix of the index's columns). That may be a terrible idea and
> totally unnecessary... but let me explain my
> reasoning:
>
> 1. Perhaps some index implementations can do something better than a
> search for the next key value from the root. Is it possible or
> desirable to use the current position as a starting point for a btree
> traversal? I don't know.
>
> 2. It seemed that I'd need to create a new search ScanKey to use the
> 'rescan' interface for skipping to the next value, but I already had
> an insertion ScanKey so I wanted a way to just reuse that. But maybe
> there is some other way to reuse existing index interfaces, or maybe
> there is an easy way to make a new search ScanKey from the existing
>insertion ScanKey?

[1] 
https://www.postgresql.org/message-id/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

Re: Blocking I/O, async I/O and io_uring

2020-12-07 Thread Thomas Munro

On Tue, Dec 8, 2020 at 3:56 PM Craig Ringer
 wrote:
> I thought I'd start the discussion on this and see where we can go with it. 
> What incremental steps can be done to move us toward parallelisable I/O 
> without having to redesign everything?
>
> I'm thinking that redo is probably a good first candidate. It doesn't depend 
> on the guts of the executor. It is much less sensitive to ordering between 
> operations in shmem and on disk since it runs in the startup process. And it 
> hurts REALLY BADLY from its single-threaded blocking approach to I/O - as 
> shown by an extension written by 2ndQuadrant that can double redo performance 
> by doing read-ahead on btree pages that will soon be needed.

About the redo suggestion: https://commitfest.postgresql.org/31/2410/
does exactly that!  It currently uses POSIX_FADV_WILLNEED because
that's what PrefetchSharedBuffer() does, but when combined with a
"real AIO" patch set (see earlier threads and conference talks on this
by Andres) and a few small tweaks to control batching of I/O
submissions, it does exactly what you're describing.  I tried to keep
the WAL prefetcher project entirely disentangled from the core AIO
work, though, hence the "poor man's AIO" for now.

Re: Commitfest 2020-11 is closed

2020-12-08 Thread Thomas Munro

On Fri, Dec 4, 2020 at 1:29 AM Peter Eisentraut
 wrote:
> On 2020-12-02 23:13, Thomas Munro wrote:
> > I'm experimenting with Github's built in CI.  All other ideas welcome.
>
> You can run Linux builds on AppVeyor, too.

Yeah.  This would be the easiest change to make, because I already
have configuration files, cfbot already knows how to talk to Appveyor
to collect results, and the build results are public URLs.  So I'm
leaning towards this option in the short term, so that cfbot keeps
working after December 31.  We can always review the options later.
Appveyor's free-for-open-source plan has no cap on minutes, but limits
concurrent jobs to 1.

Gitlab's free-for-open-source plan is based on metered CI minutes, but
cfbot is a bit too greedy for the advertised limits.  An annual
allotment of 50,000 minutes would run out some time in February
assuming we rebase each of ~250 branches every few days.

GitHub Actions has very generous resource limits, nice UX and API, etc
etc, so it's tempting, but its build log URLs can only be accessed by
people with accounts.  That limits their usefulness when discussing
test failures on our public mailing list.  I thought about teaching
cfbot to pull down the build logs and host them on its own web server,
but that feels like going against the grain.

Re: Autovacuum worker doesn't immediately exit on postmaster death

2020-12-10 Thread Thomas Munro

On Fri, Dec 11, 2020 at 8:34 AM Robert Haas  wrote:
> On Thu, Oct 29, 2020 at 5:36 PM Alvaro Herrera  
> wrote:
> > Maybe instead of thinking specifically in terms of vacuum, we could
> > count buffer accesses (read from kernel) and check the latch once every
> > 1000th such, or something like that.  Then a very long query doesn't
> > have to wait until it's run to completion.  The cost is one integer
> > addition per syscall, which should be bearable.
>
> Interesting idea. One related case is where everything is fine on the
> server side but the client has disconnected and we don't notice that
> the socket has changed state until something makes us try to send a
> message to the client, which might be a really long time if the
> server's doing like a lengthy computation before generating any rows.
> It would be really nice if we could find a cheap way to check for both
> postmaster death and client disconnect every now and then, like if a
> single system call could somehow answer both questions.

For the record, an alternative approach was proposed[1] that
periodically checks for disconnected sockets using a timer, that will
then cause the next CFI() to abort.

Doing the check (a syscall) based on elapsed time rather than every
nth CFI() or buffer access or whatever seems better in some ways,
considering the difficulty of knowing what the frequency will be.  One
of the objections was that it added unacceptable setitimer() calls.
We discussed an idea to solve that problem generally, and then later I
prototyped that idea in another thread[2] about idle session timeouts
(not sure about that yet, comments welcome).

I've also wondered about checking postmaster_possibly_dead in CFI() on
platforms where we have it (and working to increase that set of
platforms), instead of just reacting to PM death when sleeping.   But
it seems like the real problem in this specific case is the use of
pg_usleep() where WaitLatch() should be used, no?

The recovery loop is at the opposite end of the spectrum: while vacuum
doesn't check for postmaster death often enough, the recovery loop
checks potentially hundreds of thousands or millions of times per
seconds, which sucks on systems that don't have parent-death signals
and slows down recovery quite measurably.  In the course of the
discussion about fixing that[3] we spotted other places that are using
a pg_usleep() where they ought to be using WaitLatch() (which comes
with exit-on-PM-death behaviour built-in).  By the way, the patch in
that thread does almost what Robert described, namely check for PM
death every nth time (which in this case means every nth WAL record),
except it's not in the main CFI(), it's in a special variant used just
for recovery.

[1] 
https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/flat/763a0689-f189-459e-946f-f0ec44589...@hotmail.com
[3] 
https://www.postgresql.org/message-id/flat/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4r2yaobywooa3m8xe...@mail.gmail.com

Re: Cache relation sizes?

2020-12-16 Thread Thomas Munro

Hi Andy,

On Thu, Dec 17, 2020 at 7:29 PM Andy Fan  wrote:
> I spent one day studying the patch and I want to talk about one question for 
> now.
> What is the purpose of calling smgrimmedsync to evict a DIRTY sr (what will 
> happen
> if we remove it and the SR_SYNCING and SR_JUST_DIRTIED flags)?

The underlying theory of the patch is that after fsync() succeeds, the
new size is permanent, but before that it could change at any time
because of asynchronous activities in the kernel*.  Therefore, if you
want to evict the size from shared memory, you have to fsync() the
file first.  I used smgrimmedsync() to do that.

*I suspect that it can really only shrink with network file systems
such as NFS and GlusterFS, where the kernel has to take liberties to
avoid network round trips for every file extension, but I don't know
the details on all 9 of our supported OSes and standards aren't very
helpful for understanding buffered I/O.

pg_preadv() and pg_pwritev()

2020-12-19 Thread Thomas Munro

Hello hackers,

I want to be able to do synchronous vectored file I/O, so I made
wrapper macros for preadv() and pwritev() with fallbacks for systems
that don't have them.  Following the precedent of the pg_pread() and
pg_pwrite() macros, the "pg_" prefix reflects a subtle contract
change: the fallback paths might have the side effect of changing the
file position.

They're non-standard system calls, but the BSDs and Linux have had
them for a long time, and for other systems we can use POSIX
readv()/writev() with an additional lseek().  The worst case is
Windows (and maybe our favourite antique Unix build farm animal?)
which has none of those things, so there is a further fallback to a
loop.  Windows does have ReadFileScatter() and WriteFileGather(), but
those only work for overlapped (= asynchronous), unbuffered, page
aligned access.  They'll very likely be useful for native AIO+DIO
support in the future, but don't fit the bill here.

This is part of a project to consolidate and offload I/O (about which
more soon), but seemed isolated enough to post separately and I guess
it could be independently useful.
From 4f4e26d0e2c023d62865365e3cb5ce1c89e63cb0 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 26 Nov 2020 15:48:31 +1300
Subject: [PATCH 07/11] Add pg_preadv() and pg_pwritev().

Provide synchronous scatter/gather I/O routines.  These map to preadv(),
pwritev() with various fallbacks for systems that don't have them.
---
 configure  | 30 ++--
 configure.ac   |  9 +++--
 src/include/pg_config.h.in | 15 ++
 src/include/port.h | 37 ++
 src/port/Makefile  |  2 ++
 src/port/pread.c   | 41 --
 src/port/pwrite.c  | 41 --
 src/tools/msvc/Solution.pm |  5 +
 8 files changed, 146 insertions(+), 34 deletions(-)

diff --git a/configure b/configure
index cc2089e596..d25b8f2900 100755
--- a/configure
+++ b/configure
@@ -13199,7 +13199,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h wctype.h
+for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -15293,7 +15293,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pread preadv pstat pthread_is_threaded_np pwrite pwritev readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l writev
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15970,32 +15970,6 @@ esac
 
 fi
 
-ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread"
-if test "x$ac_cv_func_pread" = xyes; then :
-  $as_echo "#define HAVE_PREAD 1" >>confdefs.h
-
-else
-  case " $LIBOBJS " in
-  *" pread.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS pread.$ac_objext"
- ;;
-esac
-
-fi
-
-ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite"
-if test "x$ac_cv_func_pwrite" = xyes; then :
-  $as_echo "#define HAVE_PWRITE 1" >>confdefs.h
-
-else
-  case " $LIBOBJS " in
-  *" pwrite.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS pwrite.$ac_objext"
- ;;
-esac
-
-fi
-
 ac_fn_c_check_func "$LINENO" "random" "ac_cv_func_random"
 if test "x$ac_cv_func_random" = xyes; then :
   $as_echo "#define HAVE_RANDOM 1" >>confdefs.h
diff --git a/configure.ac b/configure.ac
index c819aeab26..9c21e46ec8 100644
---

Re: pg_preadv() and pg_pwritev()

2020-12-19 Thread Thomas Munro

On Sun, Dec 20, 2020 at 12:34 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > I want to be able to do synchronous vectored file I/O, so I made
> > wrapper macros for preadv() and pwritev() with fallbacks for systems
> > that don't have them.  Following the precedent of the pg_pread() and
> > pg_pwrite() macros, the "pg_" prefix reflects a subtle contract
> > change: the fallback paths might have the side effect of changing the
> > file position.
>
> In a quick look, seems OK with some nits:

Thanks for looking!

> 1. port.h cannot assume that  has already been included;
> nor do I want to fix that by including  there.  Do we
> really need to define a fallback value of IOV_MAX?  If so,
> maybe the answer is to put the replacement struct iovec and
> IOV_MAX in some new header.

Ok, I moved all this stuff into port/pg_uio.h.

> 2. I'm not really that happy about loading  into
> every compilation we do, which would be another reason for a
> new specialized header that either includes  or
> provides fallback definitions.

Ack.

> 3. The patch as given won't prove anything except that the code
> compiles.  Is it worth fixing at least one code path to make
> use of pg_preadv and pg_pwritev, so we can make sure this code
> is tested before there's layers of other new code on top?

OK, here's a patch to zero-fill fresh WAL segments with pwritev().

I'm drawing a blank on trivial candidate uses for preadv(), without
infrastructure from later patches.
From d7178f296642e177bc57fabe93abec395cbaac5f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 26 Nov 2020 15:48:31 +1300
Subject: [PATCH v2 1/2] Add pg_preadv() and pg_pwritev().

Provide synchronous scatter/gather I/O routines.  These map to preadv(),
pwritev() with various fallbacks for systems that don't have them.

Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
---
 configure  | 30 ++
 configure.ac   |  9 +--
 src/include/pg_config.h.in | 15 +++
 src/include/port.h |  2 ++
 src/include/port/pg_uio.h  | 51 ++
 src/port/Makefile  |  2 ++
 src/port/pread.c   | 43 ++--
 src/port/pwrite.c  | 43 ++--
 src/tools/msvc/Solution.pm |  5 
 9 files changed, 166 insertions(+), 34 deletions(-)
 create mode 100644 src/include/port/pg_uio.h

diff --git a/configure b/configure
index 11a4284e5b..5887c712cc 100755
--- a/configure
+++ b/configure
@@ -13061,7 +13061,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h wctype.h
+for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -15155,7 +15155,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pread preadv pstat pthread_is_threaded_np pwrite pwritev readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l writev
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
@@ -15832,32 +15832,6 @@ esac
 
 fi
 
-ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread"
-if test "x$ac_cv_func_pread" = xyes; then :
-  $as_echo "#define HAVE_PREAD 1" >>confdefs.h
-
-else
-  case " $LIBOBJS " in
-  *" pread.$ac_objext "* ) ;;
-  *) LIBOBJS="$LIBOBJS pread.$ac_objext"
- ;;
-esac
-
-fi
-
-ac_fn_c_check_func &

Re: pg_preadv() and pg_pwritev()

2020-12-20 Thread Thomas Munro

On Sun, Dec 20, 2020 at 8:07 PM Tom Lane  wrote:
> One minor thought is that in
>
> +   struct iovec iov[Min(IOV_MAX, 1024)];   /* cap stack space */
>
> it seems like pretty much every use of IOV_MAX would want some
> similar cap.  Should we centralize that idea with, say,
>
> #define PG_IOV_MAX  Min(IOV_MAX, 1024)
>
> ?  Or will the plausible cap vary across uses?

Hmm.  For the real intended user of this, namely worker processes that
simulate AIO when native AIO isn't available, higher level code will
limit the iov count to much smaller numbers anyway.  It wants to try
to stay under typical device limits for vectored I/O, because split
requests would confound attempts to model and limit queue depth and
control latency.  In Andres's AIO prototype he currently has a macro
PGAIO_MAX_COMBINE set to 16 (meaning approximately 16 data block or
wal reads/writes = 128KB worth of scatter/gather per I/O request); I
guess it should really be Min(IOV_MAX, ), but I don't
currently have an opinion on the , except that it should
surely be closer to 16 than 1024 (for example
/sys/block/nvme0n1/queue/max_segments is 33 here).  I mention all this
to explain that I don't think the code in patch 0002 is going to turn
out to be very typical: it's trying to minimise system calls by
staying under an API limit (though I cap it for allocation sanity),
whereas more typical code probably wants to stay under a device limit,
so I don't immediately have another use for eg PG_IOV_MAX.

Re: pg_preadv() and pg_pwritev()

2020-12-21 Thread Thomas Munro

On Mon, Dec 21, 2020 at 8:25 PM Jakub Wartak  wrote:
> > > I'm drawing a blank on trivial candidate uses for preadv(), without
> > > infrastructure from later patches.
> >
> > Can't immediately think of something either.
>
> This might be not that trivial , but maybe acquire_sample_rows() from 
> analyze.c ?
>
> Please note however there's patch 
> https://www.postgresql.org/message-id/20201109180644.GJ16415%40tamriel.snowman.net
>  ( https://commitfest.postgresql.org/30/2799/ ) for adding fadvise, but maybe 
> those two could be even combined so you would be doing e.g. 16x fadvise() and 
> then grab 8 pages in one preadv() call ?  I'm find unlikely however that 
> preadv give any additional performance benefit there after having fadvise, 
> but clearly a potential place to test.

Oh, interesting, that looks like another test case to study with the
AIO patch set, but I don't think it's worth trying to do a
simpler/half-baked version in the meantime.  (Since that ANALYZE patch
uses PrefetchBuffer() it should automatically benefit: the
posix_fadvise() calls will be replaced with consolidated preadv()
calls in a worker process or native AIO equivalent so that system
calls are mostly gone from the initiating process, and by the time you
try to access the buffer it'll hopefully see that it's finished
without any further system calls.  Refinements are possible though,
like making use of recent_buffer to avoid double-lookup, and
tuning/optimisation for how often IOs should be consolidated and
submitted.)

Re: pg_preadv() and pg_pwritev()

2020-12-22 Thread Thomas Munro

On Mon, Dec 21, 2020 at 11:40 AM Andres Freund  wrote:
> On 2020-12-20 16:26:42 +1300, Thomas Munro wrote:
> > > 1. port.h cannot assume that  has already been included;
> > > nor do I want to fix that by including  there.  Do we
> > > really need to define a fallback value of IOV_MAX?  If so,
> > > maybe the answer is to put the replacement struct iovec and
> > > IOV_MAX in some new header.
> >
> > Ok, I moved all this stuff into port/pg_uio.h.
>
> Can we come up with a better name than 'uio'? I find that a not exactly
> meaningful name.

Ok, let's try port/pg_iovec.h.

> Or perhaps we could just leave the functions in port/port.h, but extract
> the value of the define at configure time? That way only pread.c /
> pwrite.c would need it.

That won't work for the struct definition, so client code would need
to remember to do:

#ifdef HAVE_SYS_UIO_H
#include 
#endif

... which is a little tedious, or port.h would need to do that and
incur an overhead in every translation unit, which Tom objected to.
That's why I liked the separate header idea.

> > > 3. The patch as given won't prove anything except that the code
> > > compiles.  Is it worth fixing at least one code path to make
> > > use of pg_preadv and pg_pwritev, so we can make sure this code
> > > is tested before there's layers of other new code on top?
> >
> > OK, here's a patch to zero-fill fresh WAL segments with pwritev().
>
> I think that's a good idea. However, I see two small issues: 1) If we
> write larger amounts at once, we need to handle partial writes. That's a
> large enough amount of IO that it's much more likely to hit a memory
> shortage or such in the kernel - we had to do that a while a go for WAL
> writes (which can also be larger), if memory serves.
>
> Perhaps we should have pg_pwritev/readv unconditionally go through
> pwrite.c/pread.c and add support for "continuing" partial writes/reads
> in one central place?

Ok, here's a new version with the following changes:

1.  Define PG_IOV_MAX, a reasonable size for local variables, not
larger than IOV_MAX.
2   Use 32 rather than 1024, based on off-list complaint about 1024
potentially swamping the IO system unfairly.
3.  Add a wrapper pg_pwritev_retry() to retry automatically on short
writes.  (I didn't write pg_preadv_retry(), because I don't currently
need it for anything so I didn't want to have to think about EOF vs
short-reads-for-implementation-reasons.)
4.  I considered whether pg_pwrite() should have built-in retry
instead of a separate wrapper, but I thought of an argument against
hiding the "raw" version: the AIO patch set already understands short
reads/writes and knows how to retry at a higher level, as that's
needed for native AIO too, so I think it makes sense to be able to
keep things the same and not solve the same problem twice.  A counter
argument would be that you could get the retry underway sooner with a
tight loop, but I'm not expecting this to be common.

> > I'm drawing a blank on trivial candidate uses for preadv(), without
> > infrastructure from later patches.
>
> Can't immediately think of something either.

One idea I had for the future is for xlogreader.c to read the WAL into
a large multi-page circular buffer, which could wrap around using a
pair of iovecs, but that requires lots more work.
From 3da0b25b4ce0804355f912bd026ef9c9eee146f3 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 26 Nov 2020 15:48:31 +1300
Subject: [PATCH v3 1/2] Add pg_preadv() and pg_pwritev().

Provide synchronous scatter/gather I/O routines.  These map to preadv(),
pwritev() with various fallbacks for systems that don't have them.  Also
provide a wrapper pg_pwritev_retry() which automatically retries on
short write.

Reviewed-by: Tom Lane 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com
---
 configure   |  30 +-
 configure.ac|   9 ++-
 src/include/pg_config.h.in  |  15 +
 src/include/port.h  |   2 +
 src/include/port/pg_iovec.h |  57 +++
 src/port/Makefile   |   2 +
 src/port/pread.c|  43 +-
 src/port/pwrite.c   | 109 +++-
 src/tools/msvc/Solution.pm  |   5 ++
 9 files changed, 238 insertions(+), 34 deletions(-)
 create mode 100644 src/include/port/pg_iovec.h

diff --git a/configure b/configure
index 11a4284e5b..5887c712cc 100755
--- a/configure
+++ b/configure
@@ -13061,7 +13061,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi

-for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys

Re: Cache relation sizes?

2020-12-23 Thread Thomas Munro

On Thu, Dec 17, 2020 at 10:22 PM Andy Fan  wrote:
> Let me try to understand your point.  Suppose process 1 extends a file to
> 2 blocks from 1 block, and fsync is not called, then a). the lseek *may* still
> return 1 based on the comments in the ReadBuffer_common ("because
> of buggy Linux kernels that sometimes return an seek SEEK_END result
> that doesn't account for a recent write").  b). When this patch is introduced,
> we would get 2 blocks for sure if we can get the size from cache. c). After
> user reads the 2 blocks from cache and then the cache is evicted, and user
> reads the blocks again, it is possible to get 1 again.
>
> Do we need to consider case a) in this patch? and Is case c). the situation 
> you
> want to avoid and do we have to introduce fsync to archive this? Basically I
> think case a) should not happen by practice so we don't need to handle case
> c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag.
> I'm not opposed to adding them, I am just interested in the background in case
> you are also willing to discuss it.

Sorry for the lack of background -- there was some discussion on the
thread "Optimize dropping of relation buffers using dlist", but it's
long and mixed up with other topics.  I'll try to summarise here.

It's easy to reproduce files shrinking asynchronously on network
filesystems that reserve space lazily[1], and we know that there have
historically been problems even with local filesystems[2], leading to
that comment about buggy kernels.  I am only interested in the
behaviour I can demonstrate, because I believe Linux is working as
intended when it does that (hypothetical/unknown bugs would probably
be helped by this approach too, but I'm not interested in speculating
about that beyond these parentheses).

So, why should we consider this?  It's true that commit ffae5cc5a60's
change to ReadBuffer_common() already prevents us from eating data by
overwriting an existing buffer due to this phenomenon, but there are
other problems:

1.  A system that in this state can give an incorrect answer to a
query: heapam.c calls RelationGetNumberOfBlocks() at the beginning of
a scan, so it might not see recently added blocks containing committed
data.  Hopefully we'll panic when we try to create a checkpoint and
see errors, but we could be giving bad results for a while.

2.  Commitfest entry #2319 needs an accurate size, or it might leave
stray blocks in the buffer pool that will cause failures or corruption
later.

It's true that we could decide not to worry about this, and perhaps
even acknowledge in some official way that PostgreSQL doesn't work
reliably on some kinds of filesystems.  But in this prototype I
thought it was fun to try to fix our very high rate of lseek()
syscalls and this reliability problem, at the same time.  I also
speculate that we'll eventually have other reasons to want a
demand-paged per-relation shared memory object.

> I would suggest the below change for smgrnblock_fast,  FYI
>
> @@ -182,7 +182,11 @@ smgrnblocks_fast(SMgrRelation reln, ForkNumber forknum)
> /*
>  * The generation doesn't match, the shared relation must 
> have been
>  * evicted since we got a pointer to it.  We'll need to do 
> more work.
> +* and invalid the fast path for next time.
>  */
> +   reln->smgr_shared = NULL;
> }

Thanks, makes sense -- I'll add this to the next version.

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGLZTfKuXir9U4JQkY%3Dk3Tb6M_3toGrGOK_fa2d4MPQQ_w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/BAY116-F146A38326C5630EA33B36BD12B0%40phx.gbl

Re: Re: Cache relation sizes?

2020-12-23 Thread Thomas Munro

On Wed, Dec 23, 2020 at 1:31 AM 陈佳昕(步真)  wrote:
> I studied your patch these days and found there might be a problem.
> When execute 'drop database', the smgr shared pool will not be removed 
> because of no call 'smgr_drop_sr'. Function 'dropdb' in dbcommands.c remove 
> the buffer from bufferpool and unlink the real files by 'rmtree', It doesn't 
> call smgrdounlinkall, so the smgr shared cache will not be dropped although 
> the table has been removed. This will cause some errors when smgr_alloc_str 
> -> smgropen、smgrimmedsync. Table file has been removed, so smgropen and 
> smgrimmedsync will get a unexpected result.

Hi Buzhen,

Thanks, you're right -- it needs to scan the pool of SRs and forget
everything from the database you're dropping.

Re: doc review for v14

2020-12-28 Thread Thomas Munro

On Thu, Dec 24, 2020 at 9:12 PM Michael Paquier  wrote:
> On Mon, Dec 21, 2020 at 10:11:53PM -0600, Justin Pryzby wrote:
> >  Specifies the amount of memory that should be allocated at server
> > -startup time for use by parallel queries.  When this memory region 
> > is
> > +startup for use by parallel queries.  When this memory region is
> >  insufficient or exhausted by concurrent queries, new parallel 
> > queries
> >  try to allocate extra shared memory temporarily from the operating
> >  system using the method configured with
> >  dynamic_shared_memory_type, which may be slower 
> > due
> >  to memory management overheads.  Memory that is allocated at 
> > startup
> > -time with min_dynamic_shared_memory is affected 
> > by
> > +with min_dynamic_shared_memory is affected by
> >  the huge_pages setting on operating systems 
> > where
> >  that is supported, and may be more likely to benefit from larger 
> > pages
> >  on operating systems where that is managed automatically.
>
> The current formulation is not that confusing, but I agree that this
> is an improvement.  Thomas, you are behind this one.  What do you
> think?

LGTM.

Re: Parallel Full Hash Join

2020-12-28 Thread Thomas Munro

On Mon, Dec 28, 2020 at 9:49 PM Masahiko Sawada  wrote:
> On Thu, Nov 5, 2020 at 7:34 AM Melanie Plageman
>  wrote:
> > I've attached a patch with the corrections I mentioned upthread.
> > I've gone ahead and run pgindent, though, I can't say that I'm very
> > happy with the result.
> >
> > I'm still not quite happy with the name
> > BarrierArriveAndDetachExceptLast(). It's so literal. As you said, there
> > probably isn't a nice name for this concept, since it is a function with
> > the purpose of terminating parallelism.
>
> You sent in your patch, v3-0001-Support-Parallel-FOJ-and-ROJ.patch to
> pgsql-hackers on Nov 5, but you did not post it to the next
> CommitFest[1].  If this was intentional, then you need to take no
> action.  However, if you want your patch to be reviewed as part of the
> upcoming CommitFest, then you need to add it yourself before
> 2021-01-01 AOE[2]. Also, rebasing to the current HEAD may be required
> as almost two months passed since when this patch is submitted. Thanks
> for your contributions.

Thanks for this reminder Sawada-san.  I had some feedback I meant to
post in November but didn't get around to:

+bool
+BarrierArriveAndDetachExceptLast(Barrier *barrier)

I committed this part (7888b099).  I've attached a rebase of the rest
of Melanie's v3 patch.

+WAIT_EVENT_HASH_BATCH_PROBE,

That new wait event isn't needed (we can't and don't wait).

  *  PHJ_BATCH_PROBING-- all probe
- *  PHJ_BATCH_DONE   -- end
+
+ *  PHJ_BATCH_DONE   -- queries not requiring inner fill done
+ *  PHJ_BATCH_FILL_INNER_DONE -- inner fill completed, all queries done

Would it be better/tidier to keep _DONE as the final phase?  That is,
to switch around these two final phases.  Or does that make it too
hard to coordinate the detach-and-cleanup logic?

+/*
+ * ExecPrepHashTableForUnmatched
+ * set up for a series of ExecScanHashTableForUnmatched calls
+ * return true if this worker is elected to do the
unmatched inner scan
+ */
+bool
+ExecParallelPrepHashTableForUnmatched(HashJoinState *hjstate)

Comment name doesn't match function name.
From 9199bfcfa84acbcfeb9a8d3c21962096c7ff645c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 5 Nov 2020 16:20:24 +1300
Subject: [PATCH v4 1/2] Parallel Hash Full/Right Join.

Previously we allowed PHJ only for inner and left outer joins.
Share the hash table match bits between backends, so we can also also
handle full and right joins.  In order to do that without introducing
any deadlock risks, for now we drop down to a single process for the
unmatched scan at the end of each batch.  Other processes detach and
look for another batch to help with.  If there aren't any more batches,
they'll finish the hash join early, making work distribution suboptimal.
Improving that might require bigger executor changes.

Also switch the unmatched tuple scan to work in memory-chunk order,
rather than bucket order.  This prepares for potential later
improvements that would use chunks as parallel grain, and seems to be a
little cache-friendlier than the bucket-scan scheme scheme in the
meantime.

Author: Melanie Plageman 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
---
 src/backend/executor/nodeHash.c | 190 ++--
 src/backend/executor/nodeHashjoin.c |  61 
 src/backend/optimizer/path/joinpath.c   |  14 +-
 src/include/executor/hashjoin.h |  15 +-
 src/include/executor/nodeHash.h |   3 +
 src/test/regress/expected/join_hash.out |  56 ++-
 src/test/regress/sql/join_hash.sql  |  23 ++-
 7 files changed, 274 insertions(+), 88 deletions(-)

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index ea69eeb2a1..36cc752163 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -510,6 +510,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
 		hashtable->spaceAllowed * SKEW_HASH_MEM_PERCENT / 100;
 	hashtable->chunks = NULL;
 	hashtable->current_chunk = NULL;
+	hashtable->current_chunk_idx = 0;
 	hashtable->parallel_state = state->parallel_state;
 	hashtable->area = state->ps.state->es_query_dsa;
 	hashtable->batches = NULL;
@@ -2053,9 +2054,58 @@ ExecPrepHashTableForUnmatched(HashJoinState *hjstate)
 	 * hj_CurTuple: last tuple returned, or NULL to start next bucket
 	 *--
 	 */
+	HashJoinTable hashtable = hjstate->hj_HashTable;
+
 	hjstate->hj_CurBucketNo = 0;
 	hjstate->hj_CurSkewBucketNo = 0;
 	hjstate->hj_CurTuple = NULL;
+	hashtable->current_chunk = hashtable->chunks;
+	hashtable->current_chunk_idx = 0;
+}
+
+/*
+ * ExecPrepHashTableForUnmatched
+ *		set up for a

Re: Let's start using setenv()

2020-12-28 Thread Thomas Munro

On Tue, Dec 29, 2020 at 4:21 PM Tom Lane  wrote:
> Back in 2001 we decided to prefer putenv() over setenv() because the
> latter wasn't in POSIX (SUSv2) at the time.  That decision has been
> overtaken by events: more recent editions of POSIX not only provide
> setenv() but recommend it over putenv().  So I think it's time to
> change our policy and prefer setenv().  We've had previous discussions
> about that but nobody did the legwork yet.
>
> The attached patch provides the infrastructure to allow using setenv().
> I added a src/port/ implementation of setenv() for any machines that
> haven't caught up with POSIX lately.  (I've tested that code on my old
> dinosaur gaur, and I would not be surprised if that is the only machine
> anywhere that ever runs it.  But I could be wrong.)  I also extended
> win32env.c to support setenv().

+1, nice modernisation.

> I also changed our unsetenv() emulations to make them return an int
> error indicator, as per POSIX.  I have no desire to change the call
> sites to check for errors, but it seemed like our compatibility stubs
> should be compatible with the spec.  (Note: I think that unsetenv()
> did return void in old BSD versions, before POSIX standardized it.
> So that might be a real reason not to change the callers.)

+1 for conformance.  I suppose it's out of spec that unsetenv() can
return -1 with errno = ENOMEM (from malloc()), but that hardly
matters.  Hmm... could a static buffer be used for that clobbering
trick?

For the note in parens:  it looks like the 3 main BSDs all switched
from the historical void function to the POSIX one 12-18 years
ago[1][2][3], so I wouldn't worry about that.  Glibc made a similar
change 19 years ago.

[1] 
https://github.com/NetBSD/src/commit/13dee93fb3a93189a74a3705a5f7cd1c6b290125
[2] 
https://github.com/openbsd/src/commit/471b62eeaa7f3a18c0aa98b5d605e5cec1625b62
[3] 
https://github.com/freebsd/freebsd/commit/196b6346ba4e13a3f7679e2de3317b6aa65983df

Re: Let's start using setenv()

2020-12-29 Thread Thomas Munro

On Wed, Dec 30, 2020 at 5:45 AM Tom Lane  wrote:
> Since the cfbot seems happy with v1, here's a v2 that runs around
> and converts all putenv() uses to setenv().  In some places there's
> no notational savings, but it seems to me that standardizing
> on just one way to do it is beneficial.

+if (setenv("KRB5_KTNAME", pg_krb_server_keyfile, 0) != 0)
 {
-size_tkt_len = strlen(pg_krb_server_keyfile) + 14;
-char   *kt_path = malloc(kt_len);
-
-if (!kt_path ||
-snprintf(kt_path, kt_len, "KRB5_KTNAME=%s",
- pg_krb_server_keyfile) != kt_len - 2 ||
-putenv(kt_path) != 0)
-{
-ereport(LOG,
-(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of memory")));
-return STATUS_ERROR;
-}
+/* We assume the error must be "out of memory" */
+ereport(LOG,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));
+return STATUS_ERROR;
 }

Wouldn't it be better to do "cannot set environment: %m" or similar,
and let ENOMEM do its thing if that be the cause?  It's true that
POSIX only allows EINVAL or ENOMEM and we can see no reason for EINVAL
here, but still it seems unnecessary to assume.

-if (getenv("PGLOCALEDIR") == NULL)
-{
-/* set for libpq to use */
-snprintf(env_path, sizeof(env_path), "PGLOCALEDIR=%s", path);
-canonicalize_path(env_path + 12);
-dup_path = strdup(env_path);
-if (dup_path)
-putenv(dup_path);
-}
+/* set for libpq to use, but don't override existing setting */
+setenv("PGLOCALEDIR", path, 0);

Did you want to drop the canonicalize_path() logic here and nearby?

Re: Cache relation sizes?

2020-12-29 Thread Thomas Munro

On Mon, Dec 28, 2020 at 5:24 PM Andy Fan  wrote:
> lseek(..., SEEK_END) = 9670656
> write(...) = 8192
> lseek(..., SEEK_END) = 9678848
> fsync(...) = -1
> lseek(..., SEEK_END) = 9670656
>
> I got 2 information from above.  a) before the fsync,  the lseek(fd, 0, 
> SEEK_END)
> returns a correct value,  however after the fsync,  it returns a wrong value. 
>  b).
> the fsync failed(return values is -1) in the above case.  I'm more confused
> because of point a).  Since the fsync can't correct some wrong value, what
> is the point to call fsync in this patch (I agree that it is good to fix this
> reliability problem within this  patch, but I'm doubtful if fsync can help in
> this case). Am I missing something obviously?

The point of the fsync() call isn't to correct the value, it's to find
out if it is safe to drop the SR object from shared memory because the
operating system has promised that it won't forget about the new size.
If that fails, we'll report an ERROR or PANIC (depending on
data_sync_retry), but not drop the SR.

By the way, you don't need fsync(fd) for the size to change, as shown
by the other experiment in that other thread that just did sleep(60).
It might also happen asynchronously.  fsync(fd) forces it, but also
tells you about errors.

> I read your patch with details some days ago and feel it is in pretty good 
> shape.
> and I think you are clear about the existing issues very well. like  a).  
> smgr_alloc_sr use a
> FIFO design.  b). SR_PARTITION_LOCK doesn't use a partition lock really.  c). 
> and

Yeah, it probably should have its own bank of locks (I wish they were
easier to add, via lwlocknames.txt or similar).

> looks like you have some concern about the concurrency issue, which I didn't 
> find
> anything wrong.  d).  how to handle the DIRTY sr during evict.  I planned to 
> recheck

So yeah, mostly this discussion has been about not trusting lseek()
and using our own tracking of relation size instead.  But there is a
separate question about how "fresh" the value needs to be.  The
question is: under what circumstances could the unlocked read of
nblocks from shared memory give you a value that isn't fresh enough
for your snapshot/scan?  This involves thinking about implicit memory
barriers.

The idea of RECENTLY_DIRTIED is similar to what we do for buffers:
while you're trying to "clean" it (in this case by calling fsync())
someone else can extend the relation again, and in that case we don't
know whether the new extension was included in the fsync() or not, so
we can't clear the DIRTY flag when it completes.

> item c) before this reply, but looks like the time is not permitted.  May I 
> know what
> Is your plan about this patch?

Aside from details, bugs etc discussed in this thread, one major piece
remains incomplete: something in the background to "clean" SRs so that
foreground processes rarely have to block.

Re: Re: Re: Cache relation sizes?

2020-12-29 Thread Thomas Munro

On Wed, Dec 30, 2020 at 4:13 AM 陈佳昕(步真)  wrote:
> I found some other problems which I want to share my change with you to make 
> you confirm.
> <1> I changed the code in smgr_alloc_sr to avoid dead lock.
>
>   LWLockAcquire(mapping_lock, LW_EXCLUSIVE);
>   flags = smgr_lock_sr(sr);
>   Assert(flags & SR_SYNCING(forknum));
>   + flags &= ~SR_SYNCING(forknum);
>   if (flags & SR_JUST_DIRTIED(forknum))
>   {
>/*
> * Someone else dirtied it while we were syncing, so we can't mark
> * it clean.  Let's give up on this SR and go around again.
> */
>smgr_unlock_sr(sr, flags);
>LWLockRelease(mapping_lock);
>goto retry;
>   }
>   /* This fork is clean! */
>   - flags &= ~SR_SYNCING(forknum);
>   flags &= ~SR_DIRTY(forknum);
>  }
>
> In smgr_drop_sr, if the sr is SR_SYNCING, it will retry until the sr is not 
> SR_SYNCING.  But in smgr_alloc_sr, if the sr is SR_JUST_DIRTIED, it will 
> retry to get another sr, although it has been synced by smgrimmedsync, the 
> flag SR_SYNCING  doesn't changed. This might cause dead lock. So I changed 
> your patch as above.

Right.  Thanks!

I also added smgrdropdb() to handle DROP DATABASE (the problem you
reported in your previous email).

While tweaking that code, I fixed it so that it uses a condition
variable to wait (instead of the silly sleep loop) when it needs to
drop an SR that is being sync'd.  Also, it now releases the mapping
lock while it's doing that, and requires it on retry.

> <2> I changed the code in smgr_drop_sr to avoid some corner cases
> /* Mark it invalid and drop the mapping. */
> smgr_unlock_sr(sr, ~SR_VALID);
> + for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> +  sr->nblocks[forknum] = InvalidBlockNumber;
> hash_search_with_hash_value(sr_mapping_table,
>&reln->smgr_rnode,
>hash,
>HASH_REMOVE,
>NULL);
>
> smgr_drop_sr just removes the hash entry from sr_mapping_table, but doesn't 
> remove the sr_pool. But in smgrnblocks_fast, it just get sr from sr_pool, so 
> I add some codes as above to avoid some corner cases get an unexpected result 
> from smgrnblocks_fast. Is it necessary, I also want some advice from you.

Hmm.  I think it might be better to increment sr->generation.  That
was already done in the "eviction" code path, where other processes
might still have references to the SR object, and I don't think it's
possible for anyone to access a dropped relation, but I suppose it's
more consistent to do that anyway.  Fixed.

Thanks for the review!
From 1d49405b41b3cdd6baf6f29219f18c6adefb5547 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 13 Nov 2020 14:38:41 +1300
Subject: [PATCH v3 1/2] WIP: Track relation sizes in shared memory.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Introduce a fixed size pool of SMgrSharedRelation objects.  A new GUC
smgr_shared_relation controls how many can exist at once, and they are
evicted as required.  "Dirty" SMgrSharedRelations can only be evicted
after being synced to disk.  Goals:

1.  Provide faster lookups of relation sizes, cutting down on lseek()
calls.  This supercedes the recovery-only caching added recently, and
replaces preexisting FSM and VM caching schemes.

2.  Stop trusting the operating system to keep track of the size of
files that we have recently extended, until fsync() has been called.

XXX smgrimmedsync() is maybe too blunt an instrument?

XXX perhaps mdsyncfiletag should tell smgr.c when it cleans forks via
some new interface, but it doesn't actually know if it's cleaning a fork
that extended a relation...

XXX perhaps bgwriter should try to clean them?

XXX currently reusing the array of locks also used for buffer mapping,
need to define some more in lwlocks.c...

Discussion: https://postgr.es/m/CAEepm%3D3SSw-Ty1DFcK%3D1rU-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com
Reviewed-by: 陈佳昕(步真) 
Reviewed-by: Andy Fan 
---
 contrib/pg_visibility/pg_visibility.c |   1 -
 src/backend/access/heap/visibilitymap.c   |  28 +-
 src/backend/catalog/storage.c |   2 -
 src/backend/commands/dbcommands.c |   3 +
 src/backend/postmaster/pgstat.c   |   3 +
 src/backend/storage/freespace/freespace.c |  35 +-
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/smgr/md.c |  10 +
 src/backend/storage/smgr/smgr.c   | 540 --
 src/backend/utils/misc/guc.c  |  11 +
 src/include/pgstat.h  |   1 +
 src/include/storage/smgr.h|  18 +-
 12 files changed, 566 insertions(+), 89 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 54e47b810f..702951e487 1

Re: Re: Re: Cache relation sizes?

2020-12-29 Thread Thomas Munro

On Wed, Dec 30, 2020 at 5:52 PM Thomas Munro  wrote:
> and requires it on retry

s/requires/reacquires/

Re: pgbench: option delaying queries till connections establishment?

2020-12-31 Thread Thomas Munro

On Sun, Nov 15, 2020 at 4:53 AM Fabien COELHO  wrote:
> In the attached version, I just comment out the call and add an
> explanation about why it is commented out. If pg overall version
> requirements are changed on windows, then it could be reinstated.

It looks like macOS doesn't have pthread barriers (via cfbot 2021, now
with more operating systems):

pgbench.c:326:8: error: unknown type name 'pthread_barrier_t'
static pthread_barrier_t barrier;
^
pgbench.c:6128:2: error: implicit declaration of function
'pthread_barrier_init' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
pthread_barrier_init(&barrier, NULL, nthreads);
^

Re: new heapcheck contrib module

2020-12-31 Thread Thomas Munro

On Tue, Oct 27, 2020 at 5:12 AM Mark Dilger
 wrote:
> The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a 
> rebase.  (0001 was already committed last week.)
>
> Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with 
> the previous naming.  There are no substantial changes.

Hi Mark,

The command line stuff fails to build on Windows[1].  I think it's
just missing #include "getopt_long.h" (see
contrib/vacuumlo/vacuumlo.c).

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123328

Re: WIP: BRIN multi-range indexes

2021-01-01 Thread Thomas Munro

On Sun, Dec 20, 2020 at 1:16 PM Tomas Vondra
 wrote:
> Attached is an updated version of the patch series, rebased on current
> master, and results for benchmark comparing the various bloom variants.

Perhaps src/include/utils/inet.h needs to include ,
because FreeBSD says:

brin_minmax_multi.c:1693:24: error: use of undeclared identifier 'AF_INET'
if (ip_family(ipa) == PGSQL_AF_INET)
^
../../../../src/include/utils/inet.h:39:24: note: expanded from macro
'PGSQL_AF_INET'
#define PGSQL_AF_INET (AF_INET + 0)
^

Re: psql \df choose functions by their arguments

2021-01-01 Thread Thomas Munro

On Thu, Dec 31, 2020 at 7:01 AM Greg Sabino Mullane  wrote:
> Attached is the latest patch against HEAD - basically fixes a few typos.

Hi Greg,

It looks like there is a collation dependency here that causes the
test to fail on some systems:

=== ./src/test/regress/regression.diffs ===
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/psql.out
/tmp/cirrus-ci-build/src/test/regress/results/psql.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/psql.out 2021-01-01
16:05:25.749692000 +
+++ /tmp/cirrus-ci-build/src/test/regress/results/psql.out 2021-01-01
16:11:28.525632000 +
@@ -5094,8 +5094,8 @@
public | mtest | integer | double precision, double precision, integer | func
public | mtest | integer | integer | func
public | mtest | integer | integer, text | func
- public | mtest | integer | timestamp without time zone, timestamp
with time zone | func
public | mtest | integer | time without time zone, time with time zone | func
+ public | mtest | integer | timestamp without time zone, timestamp
with time zone | func

Re: Context diffs

2021-01-04 Thread Thomas Munro

On Tue, Jan 5, 2021 at 8:07 AM Bruce Momjian  wrote:
> *  "git apply" and "git am" can't process context diffs (they throw an
>error once a context-like section of the diff is hit; simple
>adding/removing lines in a block works)
>
> *  the commit-fest doesn't recognized context diff attachments as
> patches:
>
> https://commitfest.postgresql.org/31/2912/
>
> *  cfbot can't process file renames/add/delete from context diffs

For the record, cfbot just uses plain old GNU patch, because that
seems to accept nearly everything that anyone posts here (after a step
that tries to unpack tarballs etc).  Several people have suggested I
change it to use git apply instead (IIRC it works better for patches
containing binary files such as cryptographic keys?), but then it
wouldn't accept ye olde context diffs.

Re: Handing off SLRU fsyncs to the checkpointer

2021-01-04 Thread Thomas Munro

On Mon, Jan 4, 2021 at 3:35 AM Tomas Vondra
 wrote:
> Seems this commit left behind a couple unnecessary prototypes in a bunch
> of header files. In particular, it removed these functions
>
> - ShutdownCLOG();
> - ShutdownCommitTs();
> - ShutdownSUBTRANS();
> - ShutdownMultiXact();

Thanks.  Fixed.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4278 matches

Mail list logo