On Tue, Jun 26, 2018 at 11:40 AM, Ivar Fredriksen <ivar.fredrik...@gmail.com
> wrote:

> A single large query is able to spend all the system memory (as reported
> by top), and the oom_reaper kills postgres. See bottom of email for an
> example query and logs.
>
>
>
> Expected behavior would be that postgres is not killed and the query is
> limited to the shared_buffer memory, potentially failing, but not killing
> the postgres process.
>

Then your expectations are completely wrong. shared_buffers have nothing to
do with limiting the memory of a query. shared_buffers set the size of the
PostgreSQL cache, not the working set. That's controlled by work_mem (see
below).



> Do I have some fundamental flaw in my understanding of this, or is there
> some sort of configuration that should be in place that is missing? The
> offending sample query has been
>
> rewritten to not use joins, and will no longer crash postgres. I am not
> asking anyone to spend a lot of time analyzing the query itself, it is just
> provided as an example for when the problem will occur.
>
> My question is more in a broader sense why one query is eating through all
> system memory, and is there anything I can do to prevent this from
> happening?
>
> We have set shared_buffers to 256MB on a test-system that has 1GB memory.
> Production machines have more resources, but the errors are happening in
> exactly the same way, so I assume (perhaps wrongly) that using the test
>
> system specs and logs might give me the answers I need to figure out what
> is happening.
>
>
>
> Technical details are provided below, a big thanks to anyone who can
> provide any help or insight to this.
>
>
>
> Regards,
>
> Ivar Fredriksen
>
>
>
> PostgreSQL version number:
>
> PostgreSQL 10.4 (Debian 10.4-2.pgdg90+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
>
> Installed with the debian packages for postgres found at: deb
> http://apt.postgresql.org/pub/repos/apt/
>
>
>
> Changes made to the settings in the postgresql.conf file:
>
>              name             |             current_setting             |
> source
>
> ------------------------------+-----------------------------
> ------------+----------------------
>
>  application_name             | psql                                    |
> client
>
>  checkpoint_completion_target | 0.9                                     |
> configuration file
>
>  client_encoding              | UTF8                                    |
> client
>
>  cluster_name                 | 10/main                                 |
> configuration file
>
>  DateStyle                    | ISO, MDY                                |
> configuration file
>
>  default_statistics_target    | 100                                     |
> configuration file
>
>  default_text_search_config   | pg_catalog.english                      |
> configuration file
>
>  dynamic_shared_memory_type   | posix                                   |
> configuration file
>
>  effective_cache_size         | 1536MB                                  |
> configuration file
>
>  external_pid_file            | /var/run/postgresql/10-main.pid         |
> configuration file
>
>  lc_messages                  | C                                       |
> configuration file
>
>  lc_monetary                  | C                                       |
> configuration file
>
>  lc_numeric                   | en_US.UTF-8                             |
> configuration file
>
>  lc_time                      | C                                       |
> configuration file
>
>  listen_addresses             | *                                       |
> configuration file
>
>  log_line_prefix              | %m [%p] %q%u@%d                         |
> configuration file
>
>  log_timezone                 | UTC                                     |
> configuration file
>
>  maintenance_work_mem         | 128MB                                   |
> configuration file
>
>  max_connections              | 100                                     |
> configuration file
>
>  max_stack_depth              | 2MB                                     |
> environment variable
>
>  max_wal_size                 | 2GB                                     |
> configuration file
>
>  min_wal_size                 | 1GB                                     |
> configuration file
>
>  port                         | 5432                                    |
> configuration file
>
>  search_path                  | "$user", public, pg_catalog             |
> configuration file
>
>  shared_buffers               | 256MB                                   |
> configuration file
>
>  ssl                          | on                                      |
> configuration file
>
>  ssl_cert_file                | /etc/ssl/certs/ssl-cert-snakeoil.pem    |
> configuration file
>
>  ssl_key_file                 | /etc/ssl/private/ssl-cert-snakeoil.key  |
> configuration file
>
>  standard_conforming_strings  | off                                     |
> configuration file
>
>  stats_temp_directory         | /var/run/postgresql/10-main.pg_stat_tmp |
> configuration file
>
>  tcp_keepalives_idle          | 0                                       |
> configuration file
>
>  TimeZone                     | UTC                                     |
> configuration file
>
>  unix_socket_directories      | /var/run/postgresql                     |
> configuration file
>
>  wal_buffers                  | 16MB                                    |
> configuration file
>
>  work_mem                     | 1310kB                                  |
> configuration file
>
> (35 rows)
>
>
>
> Operating system and version:
>
> Debian GNU/Linux 9.4 (stretch)
>
> Linux ip-172-31-30-104 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1 (2018-04-29)
> x86_64 GNU/Linux
>
>
>
> Machine specs:
>
> CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
>
> Memory: 1GB
>
> HDD: General purpose SSD from amazon ec2, only one disk
>

With only 1GB of memory, your value for work_mem is probably too high for a
query like yours.

Even with 1MB, that's 1MB per join. You seem to have about 350 joins? That
would be 350Mb right there. Plus a lot of more overhead.

In general, what's up with 350 joins?! That seems to be, ahem, quite insane.

For a query like that you probably have to reduce everything to bare
minimums -- I'd try with shared_buffers at maybe 64MB and work_mem at 100kB
and see if it passes. (it'll probably still be very slow, but that's to be
expected from a query like that)

But I'd sum it up as (1) don't do 350 joins. Just don't. And (2) if you
have to, then a machine with just 1Gb is simply too small.


//Magnus

Reply via email to