On Tue, May 22, 2018 at 2:42 PM, Maxim Boguk <maxim.bo...@gmail.com> wrote:

>
>
> On Tue, May 22, 2018 at 10:30 PM, Andres Freund <and...@anarazel.de>
> wrote:
>
>> Hi,
>>
>> On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
>> > On Tue, May 22, 2018 at 9:47 PM, Andres Freund <and...@anarazel.de>
>> wrote:
>> > > >  select relfrozenxid from pg_class where relname='pg_authid';
>> > > >  relfrozenxid
>> > > > --------------
>> > > >    2863429136
>>
>> > ​select txid_current();
>> >  txid_current
>> > --------------
>> >   41995913769
>>
>> So that's an xid of 3341208114, if you leave the epoch out. What's
>> ​​
>> autovacuum_freeze_max_age set to in that cluster?
>>
>
> ​postgres=# show autovacuum_freeze_max_age;
>  autovacuum_freeze_max_age
> ---------------------------
>  200000000
> (default value I think)​
>
>
>
>> Can you show pg_controldata output, and
>> ​​
>> relminmxid from that cluster?
>>
>
> ​postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata  -D
> /var/lib/postgresql/9.6/main
> pg_control version number:            960
> Catalog version number:               201608131
> Database system identifier:           6469368654711450114
> Database cluster state:               in production
> pg_control last modified:             Tue 22 May 2018 10:20:14 PM MSK
> Latest checkpoint location:           CCB5/F9C37950
> Prior checkpoint location:            CCB0/43F316B0
> Latest checkpoint's REDO location:    CCB1/6706BD88
> Latest checkpoint's REDO WAL file:    000000010000CCB100000067
> Latest checkpoint's TimeLineID:       1
> Latest checkpoint's PrevTimeLineID:   1
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID:          9:3341161759
> Latest checkpoint's NextOID:          190071899
> Latest checkpoint's NextMultiXactId:  59416233
> Latest checkpoint's NextMultiOffset:  215588532
> Latest checkpoint's oldestXID:        2814548646
> Latest checkpoint's oldestXID's DB:   16400
> Latest checkpoint's oldestActiveXID:  3341161759
> Latest checkpoint's oldestMultiXid:   54264778
> Latest checkpoint's oldestMulti's DB: 16400
> Latest checkpoint's oldestCommitTsXid:2814548646
> Latest checkpoint's newestCommitTsXid:3341161758
> Time of latest checkpoint:            Tue 22 May 2018 10:05:16 PM MSK
> Fake LSN counter for unlogged rels:   0/1
> Minimum recovery ending location:     0/0
> Min recovery ending loc's timeline:   0
> Backup start location:                0/0
> Backup end location:                  0/0
> End-of-backup record required:        no
> wal_level setting:                    replica
> wal_log_hints setting:                on
> max_connections setting:              2000
> max_worker_processes setting:         8
> max_prepared_xacts setting:           0
> max_locks_per_xact setting:           64
> track_commit_timestamp setting:       on
> Maximum data alignment:               8
> Database block size:                  8192
> Blocks per segment of large relation: 131072
> WAL block size:                       8192
> Bytes per WAL segment:                16777216
> Maximum length of identifiers:        64
> Maximum columns in an index:          32
> Maximum size of a TOAST chunk:        1996
> Size of a large-object chunk:         2048
> Date/time type storage:               64-bit integers
> Float4 argument passing:              by value
> Float8 argument passing:              by value
> Data page checksum version:           0
> ​
> postgres=# select datname,datfrozenxid,datminmxid from pg_database order
> by datname;
>   datname  | datfrozenxid | datminmxid
> -----------+--------------+------------
>  **        |   2815939794 |   54265194
>  postgres  |   2863429136 |   54280819
>  template0 |   3148297669 |   59342106
>  template1 |   2816765546 |   59261794
>
>
>
>
>
>>
>> I might be daft here, but it's surely curious that the relfrozenxid from
>> the error and pg_catalog are really different (number of digits):
>> catalog: 2863429136
>> error:    248712603
>>
>>
>> > ​About gdb bt - it's tricky because it is mission critical master db of
>> > huge project.
>> > I'll will try promote backup replica and check is issue persist there
>> and
>> > if yes - we will have our playground for a while, but it will require
>> > sometime to arrange.​
>>
>> You should be ok to just bt that in the running cluster, but I
>> definitely understand if you don't want to do that...  I'd appreciate if
>> you set up the a playground, because this seems like something that'll
>> reappear.
>>
>
> ​My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.​
> ​And if error did't exist on fresh promoted replica it will give us useful
> info as well.​
>
> --
> Maxim Boguk
> Senior Postgresql DBA
> http://dataegret.com/ <http://www.postgresql-consulting.com/>
>
> Phone RU: +7  985 433 0000
> Phone UA: +380 99 143 0000
> Phone AU: +61  45 218 5678
>
> LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
> Skype: maxim.boguk
>
> "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
> когда я так делаю ещё раз?"
>
>
All -

We just noticed that this exact same problem has now returned for us.  It
has exactly the same symptoms as my original post (including the fact that
a snapshot does not have the issue).

I am open to all suggestions as to troubleshooting this further!

Thank you,
Jeremy

Reply via email to