Hello, may I ask you for a feedback? Thanks a lot Il giorno mar 21 gen 2020 alle ore 17:14 Nicola Contu < nicola.co...@gmail.com> ha scritto:
> We also reverted this param : > > cmdv3=# show max_parallel_workers_per_gather; > max_parallel_workers_per_gather > --------------------------------- > 2 > (1 row) > > It was set to 8. > > > Il giorno mar 21 gen 2020 alle ore 16:06 Nicola Contu < > nicola.co...@gmail.com> ha scritto: > >> Hey Thomas, >> after a few months, we started having this issue again. >> So we revert the work_mem parameter to 600MB instead of 2GB. >> But the issue is still there. A query went to segmentation fault, the DB >> went to recovery mode and our app went to read only for a few minutes. >> >> I understand we can increase max_connections so we can have many more >> segments. >> >> My question is : is there a way to understand the number of segments we >> reached? >> Currently we have 220 max_conn so as your formula is 64 + 2* 220 we have >> about 500 shared segments. >> We would like to increase that number to 300 or 400 but would be great to >> understand if there is a way to make sure we will solve the issue as it >> requires a restart of the service. >> >> I know you were also talking about a redesign this part in PostgreSQL. Do >> you know if anything has changed in any of the newer versions after 11.5? >> >> Thanks a lot, >> Nicola >> >> >> >> >> >> Il giorno gio 12 set 2019 alle ore 01:01 Thomas Munro < >> thomas.mu...@gmail.com> ha scritto: >> >>> On Wed, Sep 11, 2019 at 11:20 PM Nicola Contu <nicola.co...@gmail.com> >>> wrote: >>> > If the error persist I will try to revert the work_mem. >>> > Thanks a lot >>> >>> Hi Nicola, >>> >>> It's hard to say exactly what the cause of the problem is in your case >>> and how to avoid it, without knowing what your query plans look like. >>> PostgreSQL allows 64 + 2 * max_connections segments to exist a time, >>> and it needs a number of them that depends on work_mem (in the case of >>> Parallel Hash Join and Parallel Bitmap Index Scan), and also depends >>> on the number of Gather nodes that appear in the plan, which in some >>> unusual cases can result from partitioning. >>> >>> I've seen people reaching this error by running a lot of parallel >>> queries concurrently. If that's the cause, then you can definitely >>> get some relief by turning work_mem down, or by turning >>> max_connections up (even though you don't want to allow more >>> connections -- because it influences the formula for deciding on the >>> DSM segment limit). We should probably adjust some of the internal >>> constants to give us more slots, to avoid that problem, as discussed >>> here: >>> >>> >>> https://www.postgresql.org/message-id/flat/CA%2BhUKGL6H2BpGbiF7Lj6QiTjTGyTLW_vLR%3DSn2tEBeTcYXiMKw%40mail.gmail.com >>> >>> I've also seen people reaching this error by somehow coming up with >>> plans that had a very large number of Gather nodes in them, >>> corresponding to partitions; that's probably a bad plan (it'd >>> presumably be better to terminate parallelism higher up in the plan, >>> but these plans do seem to exist in the wild; I don't recall exactly >>> why). I think we need a bit of a redesign so that if there are >>> multiple Gather nodes, they share the same main DSM segment, instead >>> of blowing through this limit. >>> >>> -- >>> Thomas Munro >>> https://enterprisedb.com >>> >>