Hi, Regarding Vladimir's new idea. > We assume that transaction can be represented as a set of independent > operations, which are applied in the same order on both primary and backup > nodes. I have not got why we can assume that reordering is not possible. What have I missed? вт, 27 нояб. 2018 г. в 14:42, Seliverstov Igor <gvvinbl...@gmail.com>: > > Vladimir, > > I think I got your point, > > It should work if we do the next: > introduce two structures: active list (txs) and candidate list (updCntr -> > txn pairs) > > Track active txs, mapping them to actual update counter at update time. > On each next update put update counter, associated with previous update, > into a candidates list possibly overwrite existing value (checking txn) > On tx finish remove tx from active list only if appropriate update counter > (associated with finished tx) is applied. > On update counter update set the minimal update counter from the candidates > list as a back-counter, clear the candidate list and remove an associated > tx from the active list if present. > Use back-counter instead of actual update counter in demand message. > > вт, 27 нояб. 2018 г. в 12:56, Seliverstov Igor <gvvinbl...@gmail.com>: > > > Ivan, > > > > 1) The list is saved on each checkpoint, wholly (all transactions in > > active state at checkpoint begin). > > We need whole the list to get oldest transaction because after > > the previous oldest tx finishes, we need to get the following one. > > > > 2) I guess there is a description of how persistent storage works and how > > it restores [1] > > > > Vladimir, > > > > the whole list of what we going to store on checkpoint (updated): > > 1) Partition counter low watermark (LWM) > > 2) WAL pointer of earliest active transaction write to partition at the > > time the checkpoint have started > > 3) List of prepared txs with acquired partition counters (which were > > acquired but not applied yet) > > > > This way we don't need any additional info in demand message. Start point > > can be easily determined using stored WAL "back-pointer". > > > > [1] > > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-LocalRecoveryProcess > > > > > > вт, 27 нояб. 2018 г. в 11:19, Vladimir Ozerov <voze...@gridgain.com>: > > > >> Igor, > >> > >> Could you please elaborate - what is the whole set of information we are > >> going to save at checkpoint time? From what I understand this should be: > >> 1) List of active transactions with WAL pointers of their first writes > >> 2) List of prepared transactions with their update counters > >> 3) Partition counter low watermark (LWM) - the smallest partition counter > >> before which there are no prepared transactions. > >> > >> And the we send to supplier node a message: "Give me all updates starting > >> from that LWM plus data for that transactions which were active when I > >> failed". > >> > >> Am I right? > >> > >> On Fri, Nov 23, 2018 at 11:22 AM Seliverstov Igor <gvvinbl...@gmail.com> > >> wrote: > >> > >> > Hi Igniters, > >> > > >> > Currently I’m working on possible approaches how to implement historical > >> > rebalance (delta rebalance using WAL iterator) over MVCC caches. > >> > > >> > The main difficulty is that MVCC writes changes on tx active phase while > >> > partition update version, aka update counter, is being applied on tx > >> > finish. This means we cannot start iteration over WAL right from the > >> > pointer where the update counter updated, but should include updates, > >> which > >> > the transaction that updated the counter did. > >> > > >> > These updates may be much earlier than the point where the update > >> counter > >> > was updated, so we have to be able to identify the point where the first > >> > update happened. > >> > > >> > The proposed approach includes: > >> > > >> > 1) preserve list of active txs, sorted by the time of their first update > >> > (using WAL ptr of first WAL record in tx) > >> > > >> > 2) persist this list on each checkpoint (together with TxLog for > >> example) > >> > > >> > 4) send whole active tx list (transactions which were in active state at > >> > the time the node was crushed, empty list in case of graceful node > >> stop) as > >> > a part of partition demand message. > >> > > >> > 4) find a checkpoint where the earliest tx exists in persisted txs and > >> use > >> > saved WAL ptr as a start point or apply current approach in case the > >> active > >> > tx list (sent on previous step) is empty > >> > > >> > 5) start iteration. > >> > > >> > Your thoughts? > >> > > >> > Regards, > >> > Igor > >> > >
-- Best regards, Ivan Pavlukhin