On 2020/09/09 12:04, Amit Kapila wrote:
On Tue, Sep 8, 2020 at 7:03 PM Magnus Hagander <mag...@hagander.net> wrote:

On Tue, Sep 8, 2020 at 3:11 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote:



On 2020/09/08 19:28, Magnus Hagander wrote:


On Tue, Sep 8, 2020 at 8:10 AM Amit Kapila <amit.kapil...@gmail.com 
<mailto:amit.kapil...@gmail.com>> wrote:

     We use the timestamp of the global statfile if we are not able to
     determine it for a particular database either because the entry for
     that database doesn't exist or there is an error while reading the
     specific database entry. This was not taken care of while reading
     other entries like ArchiverStats or SLRUStats. See
     pgstat_read_db_statsfile_timestamp. I have observed this while
     reviewing Sawada-San's patch related to logical replication stats [1].

     I think this can only happen if due to some reason the statfile got
     corrupt or we
     have some bug in stats writing code, the chances of both are rare and even
     if that happens we will use stale statistics.

     The attached patch by Sawada-San will fix this issue. As the chances of 
this
     the problem is rare and nobody has reported any issue related to this,
     so it might be okay not to backpatch this.

     Thoughts?


Why are we reading the archiver statis and and slru stats in 
"pgstat_read_db_statsfile_timestamp" in the first place?

Maybe because they are written before database stats entries? That is,
to read the target database stats entry and get the timestamp from it,
we need to read (or lseek) all the global stats entries written before them.


Oh meh. Yeah, I'm reading this thing completely wrong :/ Ignore my comments :)



That seems well out of scope for that function.

If nothing else the comment at the top of that function is out of touch with 
reality. We do seem to read it into local buffers and then ignore the contents. 
I guess we're doing it just to verify that it's not corrupt -- so perhaps the 
function should actually have a different name than it does now, since it 
certainly seems to do more than actually read the statsfile timestamp.

Specifically in this patch it looks wrong -- in the case of say the SLRU stats 
being corrupt, we will now return the timestamp of the global stats file even 
if there is one existing for the database requested, which definitely breaks 
the contract of the function.

Yes.
We should return false when fread() for database entry fails, instead? That is,

1. If corrupted stats file is found, the function always returns false.
2. If the file is not currupted and the target database entry is found, its 
timestamp is returned.
3. If the file is not corrupted and the target is NOT found, the timestamp of 
global entry is returned


Yeah, with more coffee and re-reading it, I'm not sure how we could have the 
database stats being OK if the archiver or slru stats are not.

That said, I think it still makes sense to return false if the stats file is 
corrupt. How much can we trust the first block, if the block right after it is 
corrupt? So yeah, I think your described order seems correct. But that's also 
what the code already did before this patch, isn't it?


No, before patch as well, if we can't read the DB entry say because
the file is corrupt, we return true and use timestamp of global stats
file and this is what is established by the original commit 187492b6.
So, if we consider that as correct then the functionality is something
like once we have read the timestamp of the global statfile, we use
that if we can't read the actual db entry for whatever reason. It
seems if we return false, the caller will call this function again in
the loop. Now, I see the point that if we can't read some part of the
file we should return false instead but not sure if that is helpful
from the perspective of the caller of
pgstat_read_db_statsfile_timestamp.

When false is returned, the caller backend_read_statsfile() seems to
request the stats collector to write a fresh stats data into the file,
and then pgstat_read_db_statsfile_timestamp() can try to read the fresh
file that may not be corrupted. So returning false in that case seems ok
to me...

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to