On Fri, Mar 2, 2018 at 6:30 PM, Darafei "Komяpa" Praliaskouski <m...@komzpa.net> wrote: > I gave this all some thought and it looks like it all could have not > happened if Postgres was able to cluster heap insertions by (id, ts) index. > We're ok with synchronuous_commit=off, so amplified write won't immediately > hit disk and can get cooled down in progress. Clustering doesn't require > perfect sorting: we need to minimize number of pages fetched, it's ok if the > pages are not consecutive on disk.
Data locality is indeed the key here. Specifically for non-cached data. It is possible to manually implement some approximation of clustering on SQL level with current PostgreSQL features. Insert incoming data into new data partitions and have a background job swap input to a new partition and then insert data from the previous new data partition to main storage sorting it by vehicle in the process. If you do this every few minutes or so you should be able to tune the system in a way that the new partition data isn't even written to disk, you only have to pay the cost of double WAL for insertion and the CPU work to perform the move. This approach mixes well with hash partitioning. It would be neat indeed if PostgreSQL do something equivalent on its own, and pluggable storage work being done could enable index organized tables that would help. But you probably need something right now. I guess I don't have to tell you that it looks like your needs have outgrown what RDS works well with and you are in for a painful move sooner or later. Regards, Ants Aasma -- +43-670-6056265 Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com