Re: All Taxi Services need Index Clustered Heap Append

Ants Aasma Sun, 04 Mar 2018 02:03:15 -0800

On Fri, Mar 2, 2018 at 6:30 PM, Darafei "Komяpa" Praliaskouski
<m...@komzpa.net> wrote:
> I gave this all some thought and it looks like it all could have not
> happened if Postgres was able to cluster heap insertions by (id, ts) index.
> We're ok with synchronuous_commit=off, so amplified write won't immediately
> hit disk and can get cooled down in progress. Clustering doesn't require
> perfect sorting: we need to minimize number of pages fetched, it's ok if the
> pages are not consecutive on disk.


Data locality is indeed the key here. Specifically for non-cached
data. It is possible to manually implement some approximation of
clustering on SQL level with current PostgreSQL features. Insert
incoming data into new data partitions and have a background job swap
input to a new partition and then insert data from the previous new
data partition to main storage sorting it by vehicle in the process.
If you do this every few minutes or so you should be able to tune the
system in a way that the new partition data isn't even written to
disk, you only have to pay the cost of double WAL for insertion and
the CPU work to perform the move. This approach mixes well with hash
partitioning. It would be neat indeed if PostgreSQL do something
equivalent on its own, and pluggable storage work being done could
enable index organized tables that would help. But you probably need
something right now.

I guess I don't have to tell you that it looks like your needs have
outgrown what RDS works well with and you are in for a painful move
sooner or later.

Regards,
Ants  Aasma
-- 
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Re: All Taxi Services need Index Clustered Heap Append

Reply via email to