Em 20/11/2013 01:30, Jeff Janes escreveu:
On Tuesday, November 19, 2013, Edson Richter wrote:
Em 19/11/2013 22:29, Jeff Janes escreveu:
On Sun, Nov 17, 2013 at 4:46 PM, Edson Richter
<edsonrich...@hotmail.com <javascript:_e({}, 'cvml',
'edsonrich...@hotmail.com');>> wrote:
Yes, those optimizations I was talking about: having database
server store transaction log in high speed solid state disks
and consider it done while background thread will update data
in slower disks...
There is no reason to wait for fsync in slow disks to
guarantee consistency... If database server crashes, then it
just need to "redo" log transactions from fast disk into
slower data storage and database server is ready to go (I
think this is Sybase/MS SQL strategy for years).
Using a nonvolatile write cache for pg_xlog is certainly possible
and often done with PostgreSQL. It is not important that the
nonvolatile write cache is fronting for SSD, fronting for HDD is
fine as the write cache turns the xlog into pure sequential
writes and HDD should not have a problem keeping up.
Cheers,
Jeff
Hum... I agree about the tecnology (SSD x HDD, etc) - but may be I
misunderstood, but I have read that to keep always safe data, I
must use fsync, and as result every transaction must wait for data
to be written in disk before returning as success.
A transaction must wait for the *xlog* to fsynced to "disk", but
non-volatile write cache counts as disk. It does not need to wait for
the ordinary data files to be fsynced. Checkpoints do need to wait
for the ordinary data files to be fsynced, but the checkpoint process
is a background process and it can wait for that without impeding user
processes.
If the checkpointer falls far enough behind, then things do start to
fall apart, but I think that this is true of any system. So you can't
just get get a BBU for the xlog and ignore all other IO
entirely--eventually the other data does need to reach disk, and if it
gets dirtied faster than it gets cleaned for a prolonged period then
things will freeze up.
By using the approach I've described you will have fsync (and data
will be 100% safe), but transaction is considered success once
written in the transaction log that is pure sequencial (and even
pre-allocated space, without need to ask OS for new files or new
space) - and also no need to wait for slow operations to write
data in data pages.
Am I wrong?
No user-facing process needs to wait for the data pages to fsync,
unless things have really gotten fouled up.
Cheers,
Jeff
Ok, I still have one doubt (I'm learning a lot, tkx!):
What happens, then, if data has been commited (so it is in xlog), but it
is not in data pages yet, and it doesn't fit in memory buffers anymore:
how would PostgreSQL query data without having to wait for checkpoint
happend and data be available in data pages?
Regards,
Edson