Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-15 Thread Maxim Boguk
​Hi everyone,

I just got the same issue on 9.6.8:

2018-05-15 11:52:01 MSK 33558 @ from  [vxid:317/92895305 txid:0] [] ERROR:
found xmin 2808837517 from before relfrozenxid 248712603
2018-05-15 11:52:01 MSK 33558 @ from  [vxid:317/92895305 txid:0] []
CONTEXT:  automatic vacuum of table "template0.pg_catalog.pg_authid"

Additional details:

1) bt_index_check - no errors on both indexes.

2) SELECT pg_truncate_visibility_map('pg_authid'::regclass); + vacuum
freeze - doesn't help.

3)Total 6 rows affected:
template1=# select oid, ctid, xmin, xmax, cmin, cmax from pg_authid where
xmin::text::bigint > (select relfrozenxid::text::bigint from pg_class where
relname='pg_authid');
oid|  ctid  |xmin| xmax | cmin | cmax
---+++--+--+--
 183671986 | (0,90) | 3039161773 |0 |0 |0
 183106699 | (1,48) | 2576823237 |0 |0 |0
 183921770 | (1,50) | 3265971811 |0 |0 |0
 183921914 | (1,52) | 3266122344 |0 |0 |0
 187988173 | (1,58) | 4258893789 |0 |0 |0
 182424977 | (1,62) | 2808837517 |0 |0 |0
on total two pages.

4) template1=# select relfrozenxid from pg_catalog.pg_class where relname =
'pg_authid';
 relfrozenxid
--
   2548304492


5)Rows itself looks pretty valid and correspond to the actual database
users.
7)No database/server crash happened last few years, no disk errors/problems.

I feel it could be related with vacuum skip locked pages patch + freeze +
shared catalog combination, but cannot prove it yet.
​
Looking for possible course of action.
Probably simplest fix - drop and recreate these 6 affected users, but so
far I willing spent some time research into this issue.

-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-22 Thread Maxim Boguk
Hi Andres,


> ​
> > Looking for possible course of action.
> > Probably simplest fix - drop and recreate these 6 affected users, but so
> > far I willing spent some time research into this issue.
>
> Could you use pageinspect to get the infomasks for the affected tuples?
>
> Greetings,
>
> Andres Freund
>


​For sample:

postgres=# vacuum pg_catalog.pg_authid;
ERROR:  found xmin 2894889518 from before relfrozenxid 248712603

select ctid, xmin, xmax, cmin, cmax from pg_catalog.pg_authid where
xmin::text::bigint=2894889518;
  ctid  |xmin| xmax | cmin | cmax
++--+--+--
 (1,26) | 2894889518 |0 |0 |0


postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_authid', 1))
where t_ctid::text='(1,26)';
-[ RECORD 1
]---
lp  | 26
lp_off  | 4656
lp_flags| 1
lp_len  | 144
t_xmin  | 2894889518
t_xmax  | 0
t_field3| 0
t_ctid  | (1,26)
t_infomask2 | 32779
t_infomask  | 10507
t_hoff  | 32
t_bits  | 1100
t_oid   | 189787727

Any new role created in DB instantly affected by this issue.


In the same time:

 select relfrozenxid from pg_class where relname='pg_authid';
 relfrozenxid
--
   2863429136

So it's interesting where value of " from before relfrozenxid 248712603"
come from.

​
​


-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-22 Thread Maxim Boguk
On Tue, May 22, 2018 at 9:47 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-05-22 21:30:43 +0300, Maxim Boguk wrote:
> > ​For sample:
> >
> > postgres=# vacuum pg_catalog.pg_authid;
> > ERROR:  found xmin 2894889518 from before relfrozenxid 248712603
> >
> > select ctid, xmin, xmax, cmin, cmax from pg_catalog.pg_authid where
> > xmin::text::bigint=2894889518;
> >   ctid  |xmin| xmax | cmin | cmax
> > ++--+--+--
> >  (1,26) | 2894889518 |0 |0 |0
> >
> >
> > postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_authid', 1))
> > where t_ctid::text='(1,26)';
> > -[ RECORD 1
> > ]---
> 
> 
> 
> 
> > lp  | 26
> > lp_off  | 4656
> > lp_flags| 1
> > lp_len  | 144
> > t_xmin  | 2894889518
> > t_xmax  | 0
> > t_field3| 0
> > t_ctid  | (1,26)
> > t_infomask2 | 32779
> > t_infomask  | 10507
> > t_hoff  | 32
> > t_bits  | 1100
> > t_oid   | 189787727
> >
>
> So this row has, if I didn't screw up decoding the following infomask
> bits set::
> HEAP_HASNULL
> HEAP_HASVARWIDTH
> HEAP_HASOID
> HEAP_XMIN_COMMITTED
> HEAP_XMAX_INVALID
> HEAP_UPDATED
>
> So it's not been previously frozen, which I was wondering about.
>
>
> > Which indeed makes it
> > Any new role created in DB instantly affected by this issue.
>
> What's txid_current()?
>
>
> >
> > In the same time:
> >
> >  select relfrozenxid from pg_class where relname='pg_authid';
> >  relfrozenxid
> > --
> >2863429136
> >
> > So it's interesting where value of " from before relfrozenxid 248712603"
> > come from.
>
> Hm. That's indeed odd.  Could you get a backtrace of the error with "bt
> full" of the error?
>
> Greetings,
>
> Andres Freund
>

​select txid_current();
 txid_current
--
  41995913769
​
​About gdb bt - it's tricky because it is mission critical master db of
huge project.
I'll will try promote backup replica and check is issue persist there and
if yes - we will have our playground for a while, but it will require
sometime to arrange.​


-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-22 Thread Maxim Boguk
On Tue, May 22, 2018 at 10:30 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-05-22 22:18:15 +0300, Maxim Boguk wrote:
> > On Tue, May 22, 2018 at 9:47 PM, Andres Freund 
> wrote:
> > > >  select relfrozenxid from pg_class where relname='pg_authid';
> > > >  relfrozenxid
> > > > --
> > > >2863429136
>
> > ​select txid_current();
> >  txid_current
> > --
> >   41995913769
>
> So that's an xid of 3341208114, if you leave the epoch out. What's
> ​​
> autovacuum_freeze_max_age set to in that cluster?
>

​postgres=# show autovacuum_freeze_max_age;
 autovacuum_freeze_max_age
---
 2
(default value I think)​



> Can you show pg_controldata output, and
> ​​
> relminmxid from that cluster?
>

​postgres@db:~$ /usr/lib/postgresql/9.6/bin/pg_controldata  -D
/var/lib/postgresql/9.6/main
pg_control version number:960
Catalog version number:   201608131
Database system identifier:   6469368654711450114
Database cluster state:   in production
pg_control last modified: Tue 22 May 2018 10:20:14 PM MSK
Latest checkpoint location:   CCB5/F9C37950
Prior checkpoint location:CCB0/43F316B0
Latest checkpoint's REDO location:CCB1/6706BD88
Latest checkpoint's REDO WAL file:0001CCB10067
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  9:3341161759
Latest checkpoint's NextOID:  190071899
Latest checkpoint's NextMultiXactId:  59416233
Latest checkpoint's NextMultiOffset:  215588532
Latest checkpoint's oldestXID:2814548646
Latest checkpoint's oldestXID's DB:   16400
Latest checkpoint's oldestActiveXID:  3341161759
Latest checkpoint's oldestMultiXid:   54264778
Latest checkpoint's oldestMulti's DB: 16400
Latest checkpoint's oldestCommitTsXid:2814548646
Latest checkpoint's newestCommitTsXid:3341161758
Time of latest checkpoint:Tue 22 May 2018 10:05:16 PM MSK
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
wal_level setting:replica
wal_log_hints setting:on
max_connections setting:  2000
max_worker_processes setting: 8
max_prepared_xacts setting:   0
max_locks_per_xact setting:   64
track_commit_timestamp setting:   on
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0
​
postgres=# select datname,datfrozenxid,datminmxid from pg_database order by
datname;
  datname  | datfrozenxid | datminmxid
---+--+
 **|   2815939794 |   54265194
 postgres  |   2863429136 |   54280819
 template0 |   3148297669 |   59342106
 template1 |   2816765546 |   59261794





>
> I might be daft here, but it's surely curious that the relfrozenxid from
> the error and pg_catalog are really different (number of digits):
> catalog: 2863429136
> error:248712603
>
>
> > ​About gdb bt - it's tricky because it is mission critical master db of
> > huge project.
> > I'll will try promote backup replica and check is issue persist there and
> > if yes - we will have our playground for a while, but it will require
> > sometime to arrange.​
>
> You should be ok to just bt that in the running cluster, but I
> definitely understand if you don't want to do that...  I'd appreciate if
> you set up the a playground, because this seems like something that'll
> reappear.
>

​My gdb kunf-fu exceptionally rusty now, so I better play in sandbox.​
​And if error did't exist on fresh promoted replica it will give us useful
info as well.​

-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-24 Thread Maxim Boguk
>
>
> > ​About gdb bt - it's tricky because it is mission critical master db of
> > huge project.
> > I'll will try promote backup replica and check is issue persist there and
> > if yes - we will have our playground for a while, but it will require
> > sometime to arrange.​
>
> You should be ok to just bt that in the running cluster, but I
> definitely understand if you don't want to do that...  I'd appreciate if
> you set up the a playground, because this seems like something that'll
> reappear.
>
> Greetings,
>
> Andres Freund
>

​Ok this issue reproducible on promoted replica.
So now I have my playground.
Will provide gdb info in few hours.​


-- 
Maxim Boguk
Senior Postgresql DBA
http://dataegret.com/ <http://www.postgresql-consulting.com/>

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-24 Thread Maxim Boguk
On Thu, May 24, 2018 at 12:38 PM, Maxim Boguk  wrote:

>
>
>>
>> > ​About gdb bt - it's tricky because it is mission critical master db of
>> > huge project.
>> > I'll will try promote backup replica and check is issue persist there
>> and
>> > if yes - we will have our playground for a while, but it will require
>> > sometime to arrange.​
>>
>> You should be ok to just bt that in the running cluster, but I
>> definitely understand if you don't want to do that...  I'd appreciate if
>> you set up the a playground, because this seems like something that'll
>> reappear.
>>
>> Greetings,
>>
>> Andres Freund
>>
>
> ​bt full :
>

​#0  errmsg_internal (fmt=0x555b62e6eb70 "found xmin %u from before
relfrozenxid %u") at
/build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/utils/error/elog.c:828
edata = 
__func__ = "errmsg_internal"
#1  0x555b62ad1cb9 in heap_prepare_freeze_tuple (tuple=,
relfrozenxid=relfrozenxid@entry=248720453,
relminmxid=relminmxid@entry=53644256,
cutoff_xid=3485221679,
cutoff_multi=, frz=frz@entry=0x555b640d1988,
totally_frozen_p=0x7ffca32c0e90 "\001")
at
/build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/access/heap/heapam.c:6685
changed = 
freeze_xmax = 0 '\000'
xid = 
totally_frozen = 
__func__ = "heap_prepare_freeze_tuple"
#2  0x555b62bfd2b5 in lazy_scan_heap (aggressive=0 '\000', nindexes=2,
Irel=0x555b64095948, vacrelstats=, options=26,
onerel=0x555b64029498)
at
/build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:1090
tuple_totally_frozen = 1 '\001'
itemid = 0x7f2b7bc5d4fc
buf = 18138842
page = 
offnum = 26
maxoff = 26
hastup = 1 '\001'
nfrozen = 0
freespace = 
all_frozen = 1 '\001'
tupgone = 0 '\000'
prev_dead_count = 0
all_visible_according_to_vm = 0 '\000'
all_visible = 1 '\001'
has_dead_tuples = 0 '\000'
visibility_cutoff_xid = 3490221678
relname = 0x555b640296a8 "pg_authid"
relminmxid = 53644256
tups_vacuumed = 0
indstats = 0x555b64095928
relfrozenxid = 248720453
skipping_blocks = 0 '\000'
frozen = 0x555b640d1988
initprog_val = {1, 2, 582}
vacuumed_pages = 0
num_tuples = 80
nkeep = 0
ru0 = {tv = {tv_sec = 1527175061, tv_usec = 739743}, ru = {ru_utime
= {tv_sec = 0, tv_usec = 6}, ru_stime = {tv_sec = 1, tv_usec = 416000},
{ru_maxrss = 9704,
  __ru_maxrss_word = 9704}, {ru_ixrss = 0, __ru_ixrss_word =
0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word =
0}, {ru_minflt = 33982,
  __ru_minflt_word = 33982}, {ru_majflt = 0, __ru_majflt_word =
0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word
= 0}, {ru_oublock = 8,
  __ru_oublock_word = 8}, {ru_msgsnd = 0, __ru_msgsnd_word =
0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0,
__ru_nsignals_word = 0}, {ru_nvcsw = 38446,
  __ru_nvcsw_word = 38446}, {ru_nivcsw = 2, __ru_nivcsw_word =
2}}}
vmbuffer = 8763411
empty_pages = 0
nunused = 0
i = 
next_unskippable_block = 
buf = {data = 0x1 , len
= -1557393520, maxlen = 32764, cursor = -1557393616}
nblocks = 2
blkno = 
tuple = {t_len = 144, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1},
ip_posid = 26}, t_tableOid = 1260, t_data = 0x7f2b7bc5e6b0}
initprog_index = {0, 1, 5}
#3  lazy_vacuum_rel (onerel=onerel@entry=0x555b64029498,
options=options@entry=1, params=params@entry=0x7ffca32c11b0,
bstrategy=)
at
/build/postgresql-9.6-9VrjLn/postgresql-9.6-9.6.8/build/../src/backend/commands/vacuumlazy.c:253
vacrelstats = 
Irel = 0x555b64095948
nindexes = 2
ru0 = {tv = {tv_sec = 93850993708032, tv_usec = 23936}, ru =
{ru_utime = {tv_sec = 128, tv_usec = 93851007694584}, ru_stime = {tv_sec =
0, tv_usec = 0}, {ru_maxrss = 140723046059424,
  __ru_maxrss_word = 140723046059424}, {ru_ixrss =
93850989549501, __ru_ixrss_word = 93850989549501}, {ru_idrss = 16,
__ru_idrss_word = 16}, {ru_isrss = 93851007694584,
  __ru_isrss_word = 93851007694584}, {ru_minflt =
140723046059472, __ru_minflt_word = 140723046059472}, {ru_majflt =
93850989559601, __ru_majflt_word = 93850989559601}, {
  ru_nswap = 93850994010032, __ru_nswap_word = 93850994010032},
{ru_inblock = 139833259949608, __ru_inblock_word = 139833259949608},
{ru_oublock = 93850994010032,
  __ru_oublock_word = 9385099401003

Re: Missing query plan for auto_explain.

2022-09-05 Thread Maxim Boguk
On Tue, Aug 30, 2022 at 1:38 PM Matheus Martin <
matheus.mar...@voidbridge.com> wrote:

> Our Postgres recently started reporting considerably different execution 
> times for the same query. When executed from our JDBC application the 
> Postgres logs report an average execution time of 1500 ms but when the query 
> is manually executed through `psql` it doesn't take longer than 50 ms.
>
> With a view to investigate discrepancies in the plan we enabled 
> `auto_explain` in `session_preload_libraries` with 
> `auto_explain.log_min_duration = '1s'`. All application servers were bounced 
> to ensure new connections were created and picked up the changes. However 
> this trouble query does not have an explain plan printed, even when its 
> execution time exceeds the threshold (other queries do though).
>
> Does anyone have ideas of why the explain plan is not being printed?
>
> Sample log entry for trouble query executed from application:
> ```
> Aug 26 09:11:33 db-931 postgres[8106]: [66-1] 2022-08-26 09:11:33 GMT [8106]: 
> [5-1] db=betwave,user=betwave_app_readonly_user LOG:  duration: 1423.481 ms  
> bind :
>
>
My understanding of how to auto_explain work - it deals only for execution
calls, but in your case duration: 1423.481 ms on BIND call, before query
execution.
At least in my understanding - auto_explain cannot work and will not help
in case of a slow BIND call (because it's a time when the query is planned
but not executed).
According documentation:
"Query planning typically occurs when the Bind message is processed. If the
prepared statement has no parameters, or is executed repeatedly, the server
might save the created plan and re-use it during subsequent Bind messages
for the same prepared statement."
Hard to say what the reason for slow planning, but one (there could be
others) likely reason is JIT work. Do you have JIT enabled?


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"


Re: Missing query plan for auto_explain.

2022-09-09 Thread Maxim Boguk
On Thu, Sep 8, 2022 at 1:18 PM Matheus Martin 
wrote:

> We do have JIT enabled `jit=on` with `jit_above_cost=10`.
>
> I am sorry but I don't quite understand what role JIT plays in
> this situation with `auto_explain`. Could you please elaborate on that?
>
>
In your log - time spent during the execution stage (where auto-exlain can
help) but during the bind/planning stage (e.g. generation of plan).
So you have a problem not with slow query execution, but with slow query
planning, so autoexplain cannot help in that case.
JIT is one possible explanation of planning stage slowdown.

Can you run explain (analyze, costs, buffers, timing) of your query?
And check how long the planning stage took and check if JIT was used or not
(and how much time spent during JIT if it had been used).



-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Unexpectedly huge memory usage (over 180x of work_mem) during hash join... confirmed by TopMemoryContext results (postgresql 13.10)

2023-04-21 Thread Maxim Boguk
hit=1795287, temp written=247932
   ->  Parallel Seq Scan on
vacancy v  (cost=0.00..414267.15 rows=23473845 width=8) (actual
time=0.011..5221.328 rows=18198520 loops=4)
 Buffers: shared
hit=1795287
   ->  Parallel Hash
 (cost=297333.92..297333.92 rows=15801812 width=8) (actual
time=4352.140..4352.141 rows=12280833 loops=4)
 Buckets: 1048576  Batches: 128
 Memory Usage: 23296kB
 Buffers: shared hit=1393158,
temp written=167644
 ->  Parallel Seq Scan on
account_service_package_entry aspe  (cost=0.00..297333.92 rows=15801812
width=8) (actual time=0.015..2520.045 rows=12280833 loops=4)
   Buffers: shared
hit=1393158
 ->  Parallel Hash
 (cost=258846.24..258846.24 rows=11429325 width=4) (actual
time=3424.467..3424.468 rows=8908616 loops=4)
   Buckets: 1048576  Batches: 64
 Memory Usage: 30080kB
   Buffers: shared hit=900350, temp
written=103252
   ->  Parallel Seq Scan on
employer_service es  (cost=0.00..258846.24 rows=11429325 width=4) (actual
time=0.010..2087.361 rows=8908616 loops=4)
 Filter: (price = 0)
 Rows Removed by Filter: 3624132
 Buffers: shared hit=900350
 Planning:
   Buffers: shared hit=236
 Planning Time: 25.494 ms
 Execution Time: 541359.085 ms

There are no big missestimations in rows, but during moments of high memory
usage each parallel worker used over 5GB of RAM (so total over 20GB of RAM
3 worker + leader).

Memory context results of single worker:

TopMemoryContext: 3357032 total in 16 blocks; 46064 free (16 chunks);
3310968 used
...
  TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalHoldContext: 24632 total in 2 blocks; 7384 free (0 chunks); 17248
used
PortalContext: 540672 total in 69 blocks; 8568 free (10 chunks); 532104
used:
  ExecutorState: 1082567568 total in 124072 blocks; 4073608 free (7018
chunks); 1078493960 used
HashTableContext: 74808 total in 5 blocks; 25008 free (3 chunks);
49800 used
  HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks);
264 used
HashTableContext: 74808 total in 5 blocks; 25056 free (3 chunks);
49752 used
  HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks);
264 used
HashTableContext: 40960 total in 3 blocks; 12328 free (6 chunks);
28632 used
  HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks);
264 used
HashTableContext: 4156883208 total in 124074 blocks; 146264 free
(30 chunks); 4156736944 used
  HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks);
264 used
HashTableContext: 8192 total in 1 blocks; 4376 free (7 chunks);
3816 used
  HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks);
264 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
...
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  Relcache by OID: 16384 total in 2 blocks; 1424 free (2 chunks); 14960 used
  CacheMemoryContext: 1163704 total in 14 blocks; 197832 free (0 chunks);
965872 used
relation rules: 24576 total in 6 blocks; 4896 free (1 chunks); 19680
used: pg_settings
index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used:
pg_toast_2619_index

48 more child contexts containing 91136 total in 83 blocks; 27632 free
(45 chunks); 63504 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 8192 total in 1 blocks; 3880 free (2 chunks); 4312 used
  LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
Grand total: 5245755200 bytes in 248596 blocks; 4995280 free (7224 chunks);
5240759920 used

Also during query execution there was huge amount of small temp files:
find ~postgres/13/main/base/pgsql_tmp/ -type f | wc -l
1459383


There are two questions:
1)is this behavior normal/expected?
2)is there any way to limit memory usage of such queries to prevent OOM in
random times
(memory usage of almost 200x of work_mem very surprised me).

PS: after crash recovery of a database with a few millions of files in
pgsql_tmp takes ages (however it is fixed in newer versions so it's not a
critical issue).


-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2018-12-23 Thread Maxim Boguk
 Cond: (school_id = 36)
>  Planning time: 0.334 ms
>  Execution time: 6.542 ms
> (4 rows)
>
> db=> select school_id, count(*) from common_student  group by school_id
> order by count(*)  limit 6 ;
>  school_id | count
> ---+---
>  1 |50
> 88 |   161
> 53 |   252
> 94 |   422
> 31 |   434
> 68 |   454
> (6 rows)
>
> dvpjxbzc=> select school_id, count(*) from common_student where school_id
> = 36  group by school_id ;
>  school_id | count
> ---+---
> 36 |  1388
> (1 row)
>
> db=> explain analyse select * from common_student  where school_id = 1 ;
> QUERY PLAN
>
>
> --
>  Bitmap Heap Scan on common_student  (cost=4.91..243.76 rows=64 width=385)
> (actual time=0.128..0.267 rows=50 loops=1)
>Recheck Cond: (school_id = 1)
>Heap Blocks: exact=16
>->  Bitmap Index Scan on "unique common_student"  (cost=0.00..4.90
> rows=64 width=0) (actual time=0.110..0.110 rows=50 loops=1)
>  Index Cond: (school_id = 1)
>  Planning time: 0.177 ms
>  Execution time: 0.414 ms
> (7 rows)
>
> db=> select VERSION();
>version
>
>
> -
>  PostgreSQL 10.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2,
> 64-bit
> (1 row)
>
>

-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7  985 433 
Phone UA: +380 99 143 
Phone AU: +61  45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"