On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost <sfr...@snowman.net> wrote: > * Claudio Freire (klaussfre...@gmail.com) wrote: >> But, still, the implementation is very similar to what postgres needs: >> sharing a physical page for two distinct logical pages, efficiently, >> with efficient copy-on-write. > > Agreed, except that KSM seems like it'd be slow/lazy about it and I'm > guessing there's a reason the pagecache isn't included normally..
KSM does an active de-duplication. That's slow. This would be leveraging KSM structures in the kernel (page sharing) but without all the de-duplication logic. > >> So it'd be just a matter of removing that limitation regarding page >> cache and shared pages. > > Any idea why that limitation is there? No, but I'm guessing it's because nobody bothered to implement the required copy-on-write in the page cache, which would be a PITA to write - think of all the complexities with privilege checks and everything - even though the benefits for many kinds of applications would be important. >> If you asked me, I'd implement it as copy-on-write on the page cache >> (not the user page). That ought to be low-overhead. > > Not entirely sure I'm following this- if it's a shared page, it doesn't > matter who starts writing to it, as soon as that happens, it need to get > copied. Perhaps you mean that the application should keep the > "original" and that the page-cache should get the "copy" (or, really, > perhaps just forget about the page existing at that point- we won't want > it again...). > > Would that be a way to go, perhaps? This does go back to the "make it > act like mmap, but not *be* mmap", but the idea would be: > open(..., O_ZEROCOPY_READ) > read() - Goes to PG's shared buffers, pagecache and PG share the page > page fault (PG writes to it) - pagecache forgets about the page > write() / fsync() - operate as normal Yep. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers