Rapid disk usage spikes when updating large tables with GIN indexes

Jonathan Marks Mon, 14 May 2018 09:28:53 -0700

Hello!

We have a mid-sized database on RDS running 10.1 (32 cores, 240 GB RAM, 5TB 
total disk space, 20k PIOPS) with several large (100GB+, tens of millions of 
rows) tables that use GIN indexes for full-text search. We at times need to 
index very large (hundreds of pages) documents and as a result our tables have 
a mix of small (tens of tokens) to very large (hundreds of thousands of tokens 
near to the tsvector 1MB limit). All our GIN indexes have fastupdate turned off 
— we found that turning fastupdate on led to significant blocking and that we 
get better average performance with it turned off. We’ve put a lot of effort 
into tuning our database over the last several years to the point where we have 
acceptable read and write performance for these tables.


One recurring, and predictable, issue that we have experienced regularly for 
multiple years is that inserting or updating rows in any table with GIN indexes 
results in extremely large drops in free disk space — i.e. inserting 10k rows 
with a total size of 10GB can result in the temporary loss of several hundred 
gigabytes of free disk space over 2-3 hours (and it could be more — we try to 
keep a 10-15% buffer of free disk space so that often represents almost all 
available disk space). Once we stop the operation, free disk space rapidly 
recovers, which makes us believe that this occurs due to logs, or some kind of 
temporary table. Our work_mem and maintenance_work_mem settings are pretty 
large (12GB and 62GB, respectively). The database’s size on disk scarcely 
budges during this process.

Unfortunately, we’re on RDS, so we’re unable to ssh directly into the instance 
to see what files are so large, and none of the logs we can see (nor the wal 
logs) are large enough to explain this process. Any suggestions about where to 
look to see the cause of this problem (or about any settings we can tune or 
changes we could make to stop it) would be greatly appreciated.

Thank you!

Rapid disk usage spikes when updating large tables with GIN indexes

Reply via email to