Re: Status of the table access method work

Andres Freund Tue, 09 Apr 2019 05:33:02 -0700

Hi,

On 2019-04-09 11:17:29 +0200, Dmitry Dolgov wrote:
> I'm also curious about that. As far as I can see the main objection against
> that was that in this case the recovery process will depend on an extension,
> which could violate reliability.


I don't think that's a primary concern - although it is one. The mapping
from types of records to the handler function needs to be accessible at
a very early state, when the cluster isn't yet in a consistent state. So
we can't just go an look into pg_am, and look up a handler function, etc
- crash recovery happens much earlier than that is possible. Nor do we
want the mapping of 'rmgr id' -> 'extension' to be defined in the config
file, that's way too likely to be wrong.  So there needs to be a
different type of mapping, accessible outside the catalog. I supect we'd
have to end up with something very roughly like the relmapper
infrastructure.  A tertiary problem is then how to identify extensions
in that mapping - although I suspect just using any library name that
can be passed to load_library() will be OK.


> But I wonder if this argument is still valid for AM's, since the whole
> data is kind of depends on it, not only the recovery.

I don't buy that argument. If you have an AM that registers, using a new
facility, replay routines, and then it errors out / crashes during
those, there's no way to get the cluster back into a consistent
state. So it's not just the one table in that AM that's gone, it's the
entire cluster that's impacted.


> Btw, can someone elaborate, why exactly generic_xlog is not efficient enough?
> I've went through the corresponding thread, looks like generic WAL records are
> bigger than normal one - is it the only reason?

That's one big reason. But also, you just can't do much more than "write
this block into that file" during recovery with. A lot of our replay
routines intentionally do more complicated tasks.

Greetings,

Andres Freund

Re: Status of the table access method work

Reply via email to