Re: [HACKERS] Streaming base backups

Cédric Villemain Sun, 09 Jan 2011 06:21:33 -0800

2011/1/7 Garick Hamlin <[email protected]>:
> On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote:
>> 2011/1/5 Magnus Hagander <[email protected]>:
>> > On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <[email protected]> 
>> > wrote:
>> >> Magnus Hagander <[email protected]> writes:
>> >>> * Stefan mentiond it might be useful to put some
>> >>> posix_fadvise(POSIX_FADV_DONTNEED)
>> >>>   in the process that streams all the files out. Seems useful, as long 
>> >>> as that
>> >>>   doesn't kick them out of the cache *completely*, for other backends as 
>> >>> well.
>> >>>   Do we know if that is the case?
>> >>
>> >> Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
>> >> not already in SHM?
>> >
>> > I think that's way more complex than we want to go here.
>> >
>>
>> DONTNEED will remove the block from OS buffer everytime.
>>
>> It should not be that hard to implement a snapshot(it needs mincore())
>> and to restore previous state. I don't know how basebackup is
>> performed exactly...so perhaps I am wrong.
>>
>> posix_fadvise support is already in postgresql core...we can start by
>> just doing a snapshot of the files before starting, or at some point
>> in the basebackup, it will need only 256kB per GB of data...
>
> It is actually possible to be more scalable than the simple solution you
> outline here (although that solution works pretty well).


Yes I suggest something pretty simple to go with a first shoot.

>
> I've written a program that syncronizes the OS cache state using
> mmap()/mincore() between two computers.  It haven't actually tested its
> impact on performance yet, but I was surprised by how fast it actually runs
> and how compact cache maps can be.
>
> If one encodes the data so one remembers the number of zeros between 1s
> one, storage scale by the amount of memory in each size rather than the
> dataset size.  I actually played with doing that, then doing huffman
> encoding of that.  I get around 1.2-1.3 bits / page of _physical memory_
> on my tests.
>
> I don't have my notes handy, but here are some numbers from memory...

that is interesting, even if I didn't have issue with the size of the
maps so far, I thought that a simple zlib compression should be
enought.

>
> The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
> of physical memory in the machine.  The latter limit get better, however,
> since there are < 1024 symbols possible for the encoder (since in this
> case symbols are spans of zeros that need to fit in a file that is 1 GB in
> size).  So is actually real worst case is much closer to 1 bit per page of
> the dataset or ~10 bits per page of physical memory.  The real performance
> I see with huffman is more like 1.3 bits per page of physical memory.  All the
> encoding decoding is actually very fast.  zlib would actually compress even
> better than huffman, but huffman encoder/decoder is actually pretty good and
> very straightforward code.

pgfincore currently hold those information in flat file. The on-going
dev is more simple and provide the data as bits, so you can store it
in a table, and restore it on your slave thanks to SR, and use it on
the slave.

>
> I would like to integrate something like this into PG or perhaps even into
> something like rsync, but its was written as proof of concept and I haven't
> had time work on it recently.
>
> Garick
>
>> --
>> Cédric Villemain               2ndQuadrant
>> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>>
>> --
>> Sent via pgsql-hackers mailing list ([email protected])
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

Reply via email to