Hi, On 2024-10-30 10:47:35 -0700, Jeff Davis wrote: > On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote: > > What I suspect we might want instead is something inbetween a share > > and an > > exclusive lock, which is taken while setting a hint bit and which > > conflicts > > with having an IO in progress. > > I am starting to wonder if a shared content locks are really the right > concept at all. It makes sense for simple mutexes, but we are already > more complex than that, and you are suggesting adding on to that > complexity.
What I am proposing isn't making the content lock more complicated, it's orthogonal to the content lock. > Which I agree is a good idea, I'm just wondering if we could go even > further. > > The README states that a shared lock is necessary for visibility > checking, but can we just be more careful with the ordering and > atomicity of visibility changes in the page? > > * carefully order reads and writes of xmin/xmax/hints (would > that create too many read barriers in the tqual.c code?) > * write line pointer after tuple is written It's possible, but it'd be a lot of work. And you wouldn't need to just do this for heap, but all the indexes too, to make progress on the don't-set-hint-bits-during-io front. So I don't think it makes sense to tie these things together. I do think that it's an argument for not importing all the complexity into lwlock.c though. > We would still have pins and cleanup locks to prevent data removal. As-is cleanup locks only work in coordination with content locks. While cleanup is ongoing we need to prevent anybody from starting to look at the page - without acquiring something like a shared lock that's not easy. > We'd have the logic you suggest that would prevent modification during > IO. And there would still need to be an exclusive content locks so that > two inserts don't try to allocate the same line pointer, or lock the > same tuple. > > If PD_ALL_VISIBLE is set it's even simpler. > > Am I missing some major hazards? I don't think anything fundamental, but it's a decidedly nontrivial amount of work. Greetings, Andres Freund