Re: Historical rebalance

Seliverstov Igor Tue, 27 Nov 2018 01:56:57 -0800

Ivan,

1) The list is saved on each checkpoint, wholly (all transactions in active
state at checkpoint begin).
We need whole the list to get oldest transaction because after
the previous oldest tx finishes, we need to get the following one.


2) I guess there is a description of how persistent storage works and how
it restores [1]

Vladimir,

the whole list of what we going to store on checkpoint (updated):
1) Partition counter low watermark (LWM)
2) WAL pointer of earliest active transaction write to partition at the
time the checkpoint have started
3) List of prepared txs with acquired partition counters (which were
acquired but not applied yet)

This way we don't need any additional info in demand message. Start point
can be easily determined using stored WAL "back-pointer".

[1]
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess


вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <[email protected]>:

> Igor,
>
> Could you please elaborate - what is the whole set of information we are
> going to save at checkpoint time? From what I understand this should be:
> 1) List of active transactions with WAL pointers of their first writes
> 2) List of prepared transactions with their update counters
> 3) Partition counter low watermark (LWM) - the smallest partition counter
> before which there are no prepared transactions.
>
> And the we send to supplier node a message: "Give me all updates starting
> from that LWM plus data for that transactions which were active when I
> failed".
>
> Am I right?
>
> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <[email protected]>
> wrote:
>
> > Hi Igniters,
> >
> > Currently I’m working on possible approaches how to implement historical
> > rebalance (delta rebalance using WAL iterator) over MVCC caches.
> >
> > The main difficulty is that MVCC writes changes on tx active phase while
> > partition update version, aka update counter, is being applied on tx
> > finish. This means we cannot start iteration over WAL right from the
> > pointer where the update counter updated, but should include updates,
> which
> > the transaction that updated the counter did.
> >
> > These updates may be much earlier than the point where the update counter
> > was updated, so we have to be able to identify the point where the first
> > update happened.
> >
> > The proposed approach includes:
> >
> > 1) preserve list of active txs, sorted by the time of their first update
> > (using WAL ptr of first WAL record in tx)
> >
> > 2) persist this list on each checkpoint (together with TxLog for example)
> >
> > 4) send whole active tx list (transactions which were in active state at
> > the time the node was crushed, empty list in case of graceful node stop)
> as
> > a part of partition demand message.
> >
> > 4) find a checkpoint where the earliest tx exists in persisted txs and
> use
> > saved WAL ptr as a start point or apply current approach in case the
> active
> > tx list (sent on previous step) is empty
> >
> > 5) start iteration.
> >
> > Your thoughts?
> >
> > Regards,
> > Igor
>

Re: Historical rebalance

Reply via email to