Re: Resetting spilled txn statistics in pg_stat_replication

Fujii Masao Thu, 11 Jun 2020 20:57:10 -0700



On 2020/06/12 12:21, Amit Kapila wrote:

On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada
<masahiko.saw...@2ndquadrant.com> wrote:


On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapil...@gmail.com> wrote:


On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada
<masahiko.saw...@2ndquadrant.com> wrote:


On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapil...@gmail.com> wrote:


On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada
<masahiko.saw...@2ndquadrant.com> wrote:


On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapil...@gmail.com> wrote:



Now, thinking about this again, I am not sure if these stats are
directly related to slots. These are stats for logical decoding which
can be performed either via WALSender or decoding plugin (via APIs).
So, why not have them displayed in a new view like pg_stat_logical (or
pg_stat_logical_decoding/pg_stat_logical_replication)?   In future, we
will need to add similar stats for streaming of in-progress
transactions as well (see patch 0007-Track-statistics-for-streaming at
[1]), so having a separate view for these doesn't sound illogical.


I think we need to decide how long we want to remain these statistics
values. That is, if we were to have such pg_stat_logical view, these
values would remain until logical decoding finished since I think the
view would display only running logical decoding. OTOH, if we were to
correspond these stats to slots, these values would remain beyond
multiple logical decoding SQL API calls.


I thought of having these till the process that performs these
operations exist.  So for WALSender, the stats will be valid till it
is not restarted due to some reason or when performed via backend, the
stats will be valid till the corresponding backend exits.


The number of rows of that view could be up to (max_backends +
max_wal_senders). Is that right? What if different backends used the
same replication slot one after the other?


Yeah, it would be tricky if multiple slots are used by the same
backend.  We could probably track the number of times decoding has
happened by the session that will probably help us in averaging the
spill amount.  If we think that the aim is to help users to tune
logical_decoding_work_mem to avoid frequent spilling or streaming then
how would maintaining at slot level will help?


Since the logical decoding intermediate files are written at per slots
directory, I thought that corresponding these statistics to
replication slots is also understandable for users.


What I wanted to know is how will it help users to tune
logical_decoding_work_mem?  Different backends can process from the
same slot, so it is not clear how user will be able to make any
meaning out of those stats.  OTOH, it is easier to see how to make
meaning of these stats if we display them w.r.t process.  Basically,
we have spill_count and spill_size which can be used to tune
logical_decoding_work_mem and also the activity of spilling happens at
process level, so it sounds like one-to-one mapping.  I am not telling
to rule out maintaining a slot level but trying to see if we can come
up with a clear definition.

I was thinking
something like pg_stat_logical_replication_slot view which shows
slot_name and statistics of only logical replication slots. The view
always shows rows as many as existing replication slots regardless of
logical decoding being running. I think there is no big difference in
how users use these statistics values between maintaining at slot
level and at logical decoding level.

In logical replication case, since we generally don’t support setting
different logical_decoding_work_mem per wal senders, every wal sender
will decode the same WAL stream with the same setting, meaning they
will similarly spill intermediate files.


I was thinking we support that. We can create multiple replication users
with different logical_decoding_work_mem settings. Also each walsender
can use logical_decoding_work_mem configured in its user. No?

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Resetting spilled txn statistics in pg_stat_replication

Reply via email to