Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Alexander Korotkov Mon, 27 Aug 2018 08:39:29 -0700

Hi!

Thank you for feedback.

On Sun, Aug 26, 2018 at 4:09 AM Robert Haas <[email protected]> wrote:
> On Tue, Aug 21, 2018 at 9:10 AM, Alexander Korotkov
> <[email protected]> wrote:
> > After heap truncation using this algorithm, shared buffers may contain
> > past-OEF buffers.  But those buffers are empty (no used items) and
> > clean.  So, real-only queries shouldn't hint those buffers dirty
> > because there are no used items.  Normally, these buffers will be just
> > evicted away from the shared buffer arena.  If relation extension will
> > happen short after heap truncation then some of those buffers could be
> > found after relation extension.  I think this situation could be
> > handled.  For instance, we can teach vacuum to claim page as new once
> > all the tuples were gone.
>
> I think this all sounds pretty dangerous and fragile, especially in
> view of the pluggable storage work.  If we start to add new storage
> formats, deductions based on the specifics of the current heap's
> hint-bit behavior may turn out not to be valid.  Now maybe you could
> speculate that it won't matter because perhaps truncation will work
> differently in other storage formats too, but it doesn't sound to me
> like we'd be wise to bet on it working out that way.

Hmm, I'm not especially concerned about pluggable storages here.
Pluggable storages are deciding themselves how do they manage vacuum
including relation truncation if needed.  They might reuse or not
reuse function for relation truncation, which we have for heap.  The
thing we should do for that relation truncation function is
understandable and predictable interface.  So, if relation truncation
function cuts relation tailing pages, which are previously cleaned as
new.  For me, that looks fair enough.

The aspect I'm more concerned here about is whether we miss ability
for detecting some of IO errors, if we don't distinguish new pages
from pages whose tuples were removed by vacuum.

> IIRC, Andres had some patches revising shared buffer management that
> allowed the ending block number to be maintained in shared memory.
> I'm waving my hands here, but with that kind of a system you can
> imagine that maybe there could also be a flag bit indicating whether a
> truncation is in progress.  So, reduce the number of page and set the
> bit; then zap all the pages above that value that are still present in
> shared_buffers; then clear the bit.  Or maybe we don't even need the
> bit, but I think we do need some kind of race-free mechanism to make
> sure that we never try to read pages that either have been truncated
> away or in the process of being truncated away.

If we would have some pre-relation shared memory information, then we
can make it work even without special write barrier bit.  Instead we
can place a mark to the whole relation, which would say "please hold
on with writes past following pending truncation point".  Also having
ending block number in the shared memory can save us from trying to
read block past EOF.  So, I'm sure that when Andres work for revising
shared buffer management will be complete, we would be able to solve
these problems better.  But there is also a question of time.  As I
get, revising shared buffer management could not realistically get
committed to PostgreSQL 12.  And we have pretty nasty set of problems
here.  For me it would be nice to do something with them during this
release cycle.  But for sure, we should keep in the mind how this
solution should be revising once we have new shared buffer management.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

Reply via email to