On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada > <masahiko.saw...@2ndquadrant.com> wrote: > > > > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > > > > Second, as long as the unique identifier is the slot name there is no > > > > convenient way to distinguish between the same name old and new > > > > replication slots, so the backend process or wal sender process sends > > > > a message to the stats collector to drop the replication slot at > > > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > > > table, index, and function statistics. It might not be a problem but > > > > I’m thinking a better way. > > > > > > > > > > Can we rely on message ordering in the transmission mechanism (UDP) > > > for stats? The wiki suggests [1] we can't. If so, then this might > > > not work. > > > > Yeah, I'm also concerned about this. Another idea would be to have > > another unique identifier to distinguish old and new replication slots > > with the same name. For example, creation timestamp. And then we > > reclaim the stats of unused slots later like table and function > > statistics. > > > > So, we need to have 'creation timestamp' as persistent data for slots > to achieve this? I am not sure of adding creation_time as a parameter > to identify for this case because users can change timings on systems > so it might not be a bullet-proof method but I agree that it can work > in general. > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots? That might simplify this problem quite a bit? > On the other hand, if the ordering were to be reversed, we would miss > > that stats but the next stat reporting would create the new entry. If > > the problem is unlikely to happen in common case we can live with > > that. > > > > Yeah, that is a valid point and I think otherwise also some UDP > packets can be lost so maybe we don't need to worry too much about > this. I guess we can add a comment in the code for such a case. > The fact that we may in theory lose some packages over UDP is the main reason we're using UDP in the first place, I believe :) But it's highly unlikely to happen in the real world I believe (and I think on some platforms impossible). > > > Aside from the above, this patch will change the most of the changes > > > > introduced by commit 9290ad198b1 and introduce new code much. I’m > > > > concerned whether such changes are acceptable at the time of beta 2. > > > > > > > > > > I think it depends on the final patch. My initial thought was that we > > > should do this for PG14 but if you are suggesting removing the changes > > > done by commit 9290ad198b1 then we need to think over it. I could > > > think of below options: > > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > > > were anyway having spilling without any work in PG13 but didn’t have > > > stats. > > > b. Try to get your patch in PG13 if we can, otherwise, revert the > > > feature 9290ad198b1. > > > c. Get whatever we have in commit 9290ad198b1 for PG13 and > > > additionally have what we are discussing here for PG14. This means > > > that spilled stats at slot level will be available in PG14 via > > > pg_stat_replication_slots and for individual WAL senders it will be > > > available via pg_stat_replication both in PG13 and PG14. Even if we > > > can get your patch in PG13, we can still keep those in > > > pg_stat_replication. > > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > > > for PG14. I don't think this will be a popular approach. > > > > I was thinking option (a) or (b). I'm inclined to option (a) since the > > PoC patch added a certain amount of new codes. I agree with you that > > it depends on the final patch. > > > > Magnus, Tomas, others, do you have any suggestions on the above > options or let us know if you have any other option in mind? > > I have a feeling it's far too late for (b) at this time. Regardless of the size of the patch, it feels that this can end up being a rushed and not thought-through-all-the-way one, in which case we may end up in an even worse position. Much as I would like to have these stats earlier, I'm also leaning towards (a). //Magnus