On Tue, Jun 26, 2018 at 11:40 AM, Ivar Fredriksen <ivar.fredrik...@gmail.com > wrote:
> A single large query is able to spend all the system memory (as reported > by top), and the oom_reaper kills postgres. See bottom of email for an > example query and logs. > > > > Expected behavior would be that postgres is not killed and the query is > limited to the shared_buffer memory, potentially failing, but not killing > the postgres process. > Then your expectations are completely wrong. shared_buffers have nothing to do with limiting the memory of a query. shared_buffers set the size of the PostgreSQL cache, not the working set. That's controlled by work_mem (see below). > Do I have some fundamental flaw in my understanding of this, or is there > some sort of configuration that should be in place that is missing? The > offending sample query has been > > rewritten to not use joins, and will no longer crash postgres. I am not > asking anyone to spend a lot of time analyzing the query itself, it is just > provided as an example for when the problem will occur. > > My question is more in a broader sense why one query is eating through all > system memory, and is there anything I can do to prevent this from > happening? > > We have set shared_buffers to 256MB on a test-system that has 1GB memory. > Production machines have more resources, but the errors are happening in > exactly the same way, so I assume (perhaps wrongly) that using the test > > system specs and logs might give me the answers I need to figure out what > is happening. > > > > Technical details are provided below, a big thanks to anyone who can > provide any help or insight to this. > > > > Regards, > > Ivar Fredriksen > > > > PostgreSQL version number: > > PostgreSQL 10.4 (Debian 10.4-2.pgdg90+1) on x86_64-pc-linux-gnu, compiled > by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit > > Installed with the debian packages for postgres found at: deb > http://apt.postgresql.org/pub/repos/apt/ > > > > Changes made to the settings in the postgresql.conf file: > > name | current_setting | > source > > ------------------------------+----------------------------- > ------------+---------------------- > > application_name | psql | > client > > checkpoint_completion_target | 0.9 | > configuration file > > client_encoding | UTF8 | > client > > cluster_name | 10/main | > configuration file > > DateStyle | ISO, MDY | > configuration file > > default_statistics_target | 100 | > configuration file > > default_text_search_config | pg_catalog.english | > configuration file > > dynamic_shared_memory_type | posix | > configuration file > > effective_cache_size | 1536MB | > configuration file > > external_pid_file | /var/run/postgresql/10-main.pid | > configuration file > > lc_messages | C | > configuration file > > lc_monetary | C | > configuration file > > lc_numeric | en_US.UTF-8 | > configuration file > > lc_time | C | > configuration file > > listen_addresses | * | > configuration file > > log_line_prefix | %m [%p] %q%u@%d | > configuration file > > log_timezone | UTC | > configuration file > > maintenance_work_mem | 128MB | > configuration file > > max_connections | 100 | > configuration file > > max_stack_depth | 2MB | > environment variable > > max_wal_size | 2GB | > configuration file > > min_wal_size | 1GB | > configuration file > > port | 5432 | > configuration file > > search_path | "$user", public, pg_catalog | > configuration file > > shared_buffers | 256MB | > configuration file > > ssl | on | > configuration file > > ssl_cert_file | /etc/ssl/certs/ssl-cert-snakeoil.pem | > configuration file > > ssl_key_file | /etc/ssl/private/ssl-cert-snakeoil.key | > configuration file > > standard_conforming_strings | off | > configuration file > > stats_temp_directory | /var/run/postgresql/10-main.pg_stat_tmp | > configuration file > > tcp_keepalives_idle | 0 | > configuration file > > TimeZone | UTC | > configuration file > > unix_socket_directories | /var/run/postgresql | > configuration file > > wal_buffers | 16MB | > configuration file > > work_mem | 1310kB | > configuration file > > (35 rows) > > > > Operating system and version: > > Debian GNU/Linux 9.4 (stretch) > > Linux ip-172-31-30-104 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29) > x86_64 GNU/Linux > > > > Machine specs: > > CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz > > Memory: 1GB > > HDD: General purpose SSD from amazon ec2, only one disk > With only 1GB of memory, your value for work_mem is probably too high for a query like yours. Even with 1MB, that's 1MB per join. You seem to have about 350 joins? That would be 350Mb right there. Plus a lot of more overhead. In general, what's up with 350 joins?! That seems to be, ahem, quite insane. For a query like that you probably have to reduce everything to bare minimums -- I'd try with shared_buffers at maybe 64MB and work_mem at 100kB and see if it passes. (it'll probably still be very slow, but that's to be expected from a query like that) But I'd sum it up as (1) don't do 350 joins. Just don't. And (2) if you have to, then a machine with just 1Gb is simply too small. //Magnus