On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapil...@gmail.com> wrote:

> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
> <masahiko.saw...@2ndquadrant.com> wrote:
> >
> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> > >
> > > > Second, as long as the unique identifier is the slot name there is no
> > > > convenient way to distinguish between the same name old and new
> > > > replication slots, so the backend process or wal sender process sends
> > > > a message to the stats collector to drop the replication slot at
> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > > table, index, and function statistics. It might not be a problem but
> > > > I’m thinking a better way.
> > > >
> > >
> > > Can we rely on message ordering in the transmission mechanism (UDP)
> > > for stats?  The wiki suggests [1] we can't.  If so, then this might
> > > not work.
> >
> > Yeah, I'm also concerned about this. Another idea would be to have
> > another unique identifier to distinguish old and new replication slots
> > with the same name. For example, creation timestamp. And then we
> > reclaim the stats of unused slots later like table and function
> > statistics.
> >
>
> So, we need to have 'creation timestamp' as persistent data for slots
> to achieve this?  I am not sure of adding creation_time as a parameter
> to identify for this case because users can change timings on systems
> so it might not be a bullet-proof method but I agree that it can work
> in general.
>

If we need them to be persistent across time like that, perhaps we simply
need to assign oids to replication slots? That might simplify this problem
quite a bit?


> On the other hand, if the ordering were to be reversed, we would miss
> > that stats but the next stat reporting would create the new entry. If
> > the problem is unlikely to happen in common case we can live with
> > that.
> >
>
> Yeah, that is a valid point and I think otherwise also some UDP
> packets can be lost so maybe we don't need to worry too much about
> this.  I guess we can add a comment in the code for such a case.
>

The fact that we may in theory lose some packages over UDP is the main
reason we're using UDP in the first place, I believe :) But it's highly
unlikely to happen in the real world I believe (and I think on some
platforms impossible).


> > > Aside from the above, this patch will change the most of the changes
> > > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > > concerned whether such changes are acceptable at the time of beta 2.
> > > >
> > >
> > > I think it depends on the final patch. My initial thought was that we
> > > should do this for PG14 but if you are suggesting removing the changes
> > > done by commit 9290ad198b1 then we need to think over it.  I could
> > > think of below options:
> > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14.  We
> > > were anyway having spilling without any work in PG13 but didn’t have
> > > stats.
> > > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > > feature 9290ad198b1.
> > > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > > additionally have what we are discussing here for PG14.  This means
> > > that spilled stats at slot level will be available in PG14 via
> > > pg_stat_replication_slots and for individual WAL senders it will be
> > > available via pg_stat_replication both in PG13 and PG14.  Even if we
> > > can get your patch in PG13, we can still keep those in
> > > pg_stat_replication.
> > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > > for PG14.  I don't think this will be a popular approach.
> >
> > I was thinking option (a) or (b). I'm inclined to option (a) since the
> > PoC patch added a certain amount of new codes. I agree with you that
> > it depends on the final patch.
> >
>
> Magnus, Tomas, others, do you have any suggestions on the above
> options or let us know if you have any other option in mind?
>
>
I have a feeling it's far too late for (b) at this time. Regardless of the
size of the patch, it feels that this can end up being a rushed and not
thought-through-all-the-way one, in which case we may end up in an even
worse position.

Much as I would like to have these stats earlier, I'm also
leaning towards (a).

//Magnus

Reply via email to