2011/1/7 Garick Hamlin <gham...@isc.upenn.edu>: > On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote: >> 2011/1/5 Magnus Hagander <mag...@hagander.net>: >> > On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimi...@2ndquadrant.fr> >> > wrote: >> >> Magnus Hagander <mag...@hagander.net> writes: >> >>> * Stefan mentiond it might be useful to put some >> >>> posix_fadvise(POSIX_FADV_DONTNEED) >> >>> in the process that streams all the files out. Seems useful, as long >> >>> as that >> >>> doesn't kick them out of the cache *completely*, for other backends as >> >>> well. >> >>> Do we know if that is the case? >> >> >> >> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are >> >> not already in SHM? >> > >> > I think that's way more complex than we want to go here. >> > >> >> DONTNEED will remove the block from OS buffer everytime. >> >> It should not be that hard to implement a snapshot(it needs mincore()) >> and to restore previous state. I don't know how basebackup is >> performed exactly...so perhaps I am wrong. >> >> posix_fadvise support is already in postgresql core...we can start by >> just doing a snapshot of the files before starting, or at some point >> in the basebackup, it will need only 256kB per GB of data... > > It is actually possible to be more scalable than the simple solution you > outline here (although that solution works pretty well).
Yes I suggest something pretty simple to go with a first shoot. > > I've written a program that syncronizes the OS cache state using > mmap()/mincore() between two computers. It haven't actually tested its > impact on performance yet, but I was surprised by how fast it actually runs > and how compact cache maps can be. > > If one encodes the data so one remembers the number of zeros between 1s > one, storage scale by the amount of memory in each size rather than the > dataset size. I actually played with doing that, then doing huffman > encoding of that. I get around 1.2-1.3 bits / page of _physical memory_ > on my tests. > > I don't have my notes handy, but here are some numbers from memory... that is interesting, even if I didn't have issue with the size of the maps so far, I thought that a simple zlib compression should be enought. > > The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page > of physical memory in the machine. The latter limit get better, however, > since there are < 1024 symbols possible for the encoder (since in this > case symbols are spans of zeros that need to fit in a file that is 1 GB in > size). So is actually real worst case is much closer to 1 bit per page of > the dataset or ~10 bits per page of physical memory. The real performance > I see with huffman is more like 1.3 bits per page of physical memory. All the > encoding decoding is actually very fast. zlib would actually compress even > better than huffman, but huffman encoder/decoder is actually pretty good and > very straightforward code. pgfincore currently hold those information in flat file. The on-going dev is more simple and provide the data as bits, so you can store it in a table, and restore it on your slave thanks to SR, and use it on the slave. > > I would like to integrate something like this into PG or perhaps even into > something like rsync, but its was written as proof of concept and I haven't > had time work on it recently. > > Garick > >> -- >> Cédric Villemain 2ndQuadrant >> http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers