On 11/18/23 19:12, Andres Freund wrote: > Hi, > > On 2023-11-18 11:56:47 +0100, Tomas Vondra wrote: >>> I guess it's not really feasible to just increase the lock level here though >>> :(. The use of ShareUpdateExclusiveLock isn't new, and suddenly using AEL >>> would perhaps lead to new deadlocks and such? But it also seems quite wrong. >>> >> >> If this really is about the lock being too weak, then I don't see why >> would it be wrong? > > Sorry, that was badly formulated. The wrong bit is the use of > ShareUpdateExclusiveLock. >
Ah, you meant the current lock mode seems wrong, not that changing the locks seems wrong. Yeah, true. > >> If it's required for correctness, it's not really wrong, IMO. Sure, stronger >> locks are not great ... >> >> I'm not sure about the risk of deadlocks. If you do >> >> ALTER PUBLICATION ... ADD TABLE >> >> it's not holding many other locks. It essentially gets a lock just a >> lock on pg_publication catalog, and then the publication row. That's it. >> >> If we increase the locks from ShareUpdateExclusive to ShareRowExclusive, >> we're making it conflict with RowExclusive. Which is just DML, and I >> think we need to do that. > > From what I can tell it needs to to be an AccessExlusiveLock. Completely > independent of logical decoding. The way the cache stays coherent is catalog > modifications conflicting with anything that builds cache entries. We have a > few cases where we do use lower level locks, but for those we have explicit > analysis for why that's ok (see e.g. reloptions.c) or we block until nobody > could have an old view of the catalog (various CONCURRENTLY) operations. > Yeah, I got too focused on the issue I triggered, which seems to be fixed by using SRE (still don't understand why ...). But you're probably right there may be other cases where SRE would not be sufficient, I certainly can't prove it'd be safe. > >> So maybe that's fine? For me, a detected deadlock is better than >> silently missing some of the data. > > That certainly is true. > > >>> We could brute force this in the logical decoding infrastructure, by >>> distributing invalidations from catalog modifying transactions to all >>> concurrent in-progress transactions (like already done for historic catalog >>> snapshot, c.f. SnapBuildDistributeNewCatalogSnapshot()). But I think that'd >>> be a fairly significant increase in overhead. >>> >> >> I have no idea what the overhead would be - perhaps not too bad, >> considering catalog changes are not too common (I'm sure there are >> extreme cases). And maybe we could even restrict this only to >> "interesting" catalogs, or something like that? (However I hate those >> weird differences in behavior, it can easily lead to bugs.) >> >> But it feels more like a band-aid than actually fixing the issue. > > Agreed. > ... and it would no not fix the other places outside logical decoding. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company