Re: Historical rebalance

Павлухин Иван Wed, 28 Nov 2018 02:27:55 -0800

Hi,

Regarding Vladimir's new idea.
> We assume that transaction can be represented as a set of independent 
> operations, which are applied in the same order on both primary and backup 
> nodes.
I have not got why we can assume that reordering is not possible. What
have I missed?
вт, 27 нояб. 2018 г. в 14:42, Seliverstov Igor <gvvinbl...@gmail.com>:
>
> Vladimir,
>
> I think I got your point,
>
> It should work if we do the next:
> introduce two structures: active list (txs) and candidate list (updCntr ->
> txn pairs)
>
> Track active txs, mapping them to actual update counter at update time.
> On each next update put update counter, associated with previous update,
> into a candidates list possibly overwrite existing value (checking txn)
> On tx finish remove tx from active list only if appropriate update counter
> (associated with finished tx) is applied.
> On update counter update set the minimal update counter from the candidates
> list as a back-counter, clear the candidate list and remove an associated
> tx from the active list if present.
> Use back-counter instead of actual update counter in demand message.
>
> вт, 27 нояб. 2018 г. в 12:56, Seliverstov Igor <gvvinbl...@gmail.com>:
>
> > Ivan,
> >
> > 1) The list is saved on each checkpoint, wholly (all transactions in
> > active state at checkpoint begin).
> > We need whole the list to get oldest transaction because after
> > the previous oldest tx finishes, we need to get the following one.
> >
> > 2) I guess there is a description of how persistent storage works and how
> > it restores [1]
> >
> > Vladimir,
> >
> > the whole list of what we going to store on checkpoint (updated):
> > 1) Partition counter low watermark (LWM)
> > 2) WAL pointer of earliest active transaction write to partition at the
> > time the checkpoint have started
> > 3) List of prepared txs with acquired partition counters (which were
> > acquired but not applied yet)
> >
> > This way we don't need any additional info in demand message. Start point
> > can be easily determined using stored WAL "back-pointer".
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess
> >
> >
> > вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <voze...@gridgain.com>:
> >
> >> Igor,
> >>
> >> Could you please elaborate - what is the whole set of information we are
> >> going to save at checkpoint time? From what I understand this should be:
> >> 1) List of active transactions with WAL pointers of their first writes
> >> 2) List of prepared transactions with their update counters
> >> 3) Partition counter low watermark (LWM) - the smallest partition counter
> >> before which there are no prepared transactions.
> >>
> >> And the we send to supplier node a message: "Give me all updates starting
> >> from that LWM plus data for that transactions which were active when I
> >> failed".
> >>
> >> Am I right?
> >>
> >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <gvvinbl...@gmail.com>
> >> wrote:
> >>
> >> > Hi Igniters,
> >> >
> >> > Currently I’m working on possible approaches how to implement historical
> >> > rebalance (delta rebalance using WAL iterator) over MVCC caches.
> >> >
> >> > The main difficulty is that MVCC writes changes on tx active phase while
> >> > partition update version, aka update counter, is being applied on tx
> >> > finish. This means we cannot start iteration over WAL right from the
> >> > pointer where the update counter updated, but should include updates,
> >> which
> >> > the transaction that updated the counter did.
> >> >
> >> > These updates may be much earlier than the point where the update
> >> counter
> >> > was updated, so we have to be able to identify the point where the first
> >> > update happened.
> >> >
> >> > The proposed approach includes:
> >> >
> >> > 1) preserve list of active txs, sorted by the time of their first update
> >> > (using WAL ptr of first WAL record in tx)
> >> >
> >> > 2) persist this list on each checkpoint (together with TxLog for
> >> example)
> >> >
> >> > 4) send whole active tx list (transactions which were in active state at
> >> > the time the node was crushed, empty list in case of graceful node
> >> stop) as
> >> > a part of partition demand message.
> >> >
> >> > 4) find a checkpoint where the earliest tx exists in persisted txs and
> >> use
> >> > saved WAL ptr as a start point or apply current approach in case the
> >> active
> >> > tx list (sent on previous step) is empty
> >> >
> >> > 5) start iteration.
> >> >
> >> > Your thoughts?
> >> >
> >> > Regards,
> >> > Igor
> >>
> >




-- 
Best regards,
Ivan Pavlukhin

Re: Historical rebalance

Reply via email to