On Mon, Mar 5, 2018 at 2:11 PM, Darafei "Komяpa" Praliaskouski <m...@komzpa.net> wrote: >> This approach mixes well with hash >> partitioning. It would be neat indeed if PostgreSQL do something >> equivalent on its own, and pluggable storage work being done could >> enable index organized tables that would help. But you probably need >> something right now. > > > Fixing glaring issues (no vacuum and thus no Index-Only Scan on append-only > tables, vacuum processing all of the eternity of btree) by 11 will get most > of spike-nails out of the microservice code, and we can probably live with > them until 11 gets to RDS. > > I also don't see why a pluggable storage is a must for the clustered write. > Postgres does have a mechanism for selecting the next page to write tuple > to, right now it's just looking at FSM - but what if it just peeked at > existing index that already has enough the data to route tuple to correct > page on write?
The mechanism you outlined would likely work for your use case, but it has many issues that prevent it from being universally useful. From the top of my head: * One extra index descent per insertion (I/O for this is necessary anyway, but CPU work is duplicated). * We don't currently track the amount of bloat. A mechanism that does this needs to be added. * If table hits the bloat limit there will be a sudden change in behavior. This is pretty nasty from an operations point of view. * With your (id,ts) clustering and data coming in mostly ordered by timestamp, after initial warmup, each page will contain rows from a single id, but different ids are arbitrarily interleaved. This is better than current state, but people might want to have an interleaving step bigger than 8kB to better utilize storage hardware. * It seems that with a common (ts) clustering and age of timestamp coming from an exponential distribution, this will quickly bloat to threshold and then insert data in a rather arbitrary order. This is much worse than the default behavior. At least in my opinion these problems make it a special case optimization that is hard to justify in core. A decent alternative would be a plugin mechanism for locating free space for a tuple where you can write your extension to find a suitable location for the row. >> I guess I don't have to tell you that it looks like your needs have >> outgrown what RDS works well with and you are in for a painful move >> sooner or later. > > > Painful move where to? If we just run a Postgres instance without RDS we'll > get the pain of setting up Postgres and replication and backups and > autofailover, with no visible gain except if we get some private / > unaccepted patches applied to it. If we can get these things right upstream > why would we want to switch? EC2 for example. Mainly because I3 instances and ephemeral provide an order of magnitude or two of performance improvement while costing less. Being able to run custom extensions and patches if necessary is a nice bonus. Yes, setting up replication, autofailover and backups is extra work that you have to weigh against the benefits. But don't overestimate the effort - there are some pretty nice tools available that make a proper cluster relatively simple to set up. > Per my colleagues, MySQL offers clustered index, also MySQL is available on > RDS without the need of "painful move", which is doable by writing to two > locations for a day and then pointing readers to new DB. But if we can > instead do no move and be sure the issues are gone upstream before we hit > the limit of spike-nails we're running on currently, wouldn't that be > better? :) The move off of RDS is painful because getting data out of RDS involves either downtime or building an ad-hoc logical replication solution. You need to solve that regardless of where you move to. Providing an out-of-the-box solution in core PostgreSQL would of course be best, but realistically you will be waiting at least 2 years to get it on RDS. In the meanwhile either the buffer partition approach I described, or a buffering microservice in front of PostgreSQL like Aleksander recommended should fix data locality for you. If you weren't running on RDS I would even propose using Redis as the buffer with one key per driver and redis_fdw to make the data accessible from within PostgreSQL. Regards, Ants Aasma -- +43-670-6056265 Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com