Hi Vladimir, thank you for your response. I tested the current behaviour and it seems that the order is maintained for notifications within a partition. Unfortunately, I don’t know how it would behave in exceptional situations like losing partitions, rebalancing etc. Do you think it would be possible to make that ordering guarantee to be a part of the Ignite API? What I would really need is to have order for notifications sharing the same affinity key, not even a partition. So I think it wouldn’t require any cross-node ordering.
Thank you, Piotr śr., 9 sty 2019, 21:11: Vladimir Ozerov <voze...@gridgain.com> napisał(a): > Hi, > > MVCC caches have the same ordering guarantees as non-MVCC caches, i.e. two > subsequent updates on a single key will be delivered in proper order. There > is no guarantees Order of updates on two subsequent transactions affecting > the same partition may be guaranteed with current implementation (though. I > am not sure), but even if it is so, I am not aware that this was ever our > design goal. Most likely, this is an implementation artifact which may be > changed in future. Cache experts are needed to clarify this. > > As far as MVCC, data anomalies are still possible in current > implementation, because we didn't rework initial query handling in the > first iteration, because technically this is not so simple as we thought. > Once snapshot is obtained, query over that snapshot will return a data set > consistent at some point in time. But the problem is that there is a time > frame between snapshot acquisition and listener installation (or vice > versa), what leads to either duplicates or lost entries. Some multi-step > listener installation will be required here. We haven't designed it yet. > > Vladimir. > > > > On Mon, Dec 24, 2018 at 10:06 PM Denis Magda <dma...@apache.org> wrote: > > > > > > > In my case, values are immutable - I never change them, I just add new > > > entry for newer versions. Does it mean that I won't have any duplicates > > > between the initial query and listener entries when using continuous > > > queries on caches supporting MVCC? > > > > > > I'm afraid there still might be a race. Val, Vladimir, other Ignite > > experts, please confirm. > > > > After reading the related thread ( > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html > > > ) > > > I'm now concerned about the ordering. My case assumes that there are > > groups > > > of entries which belong to a business aggregate object and I would like > > to > > > make sure that if I commit two records in two serial transactions then > I > > > have notifications in the same order. Those entries will have different > > > keys so based on what you said ("we'd better to leave things as is and > > > guarantee only per-key ordering"), it would seem that the order is not > > > guaranteed. But do you think it would possible to guarantee order when > > > those entries share the same affinity key and they belong to the same > > > partition? > > > > > > The order should be the same for key-value transactions. Vladimir, could > > you clear out MVCC based behavior? > > > > -- > > Denis > > > > On Mon, Dec 17, 2018 at 9:55 AM Piotr Romański <piotr.roman...@gmail.com > > > > wrote: > > > > > Hi all, sorry for answering so late. > > > > > > I would like to use SqlQuery because I can leverage indexes there. > > > > > > As it was already mentioned earlier, the partition update counter is > > > exposed through CacheQueryEntryEvent. Initially, I thought that the > > > partition update counter is something what's persisted together with > the > > > data but I'm guessing now that this is only a part of the notification > > > mechanism. > > > > > > I imagined that I would be able to implement my own deduplicaton by > > having > > > 3 stages on the client side: 1. Keep processing initial query results, > > > store their keys in memory, 2. When initial query is over, then process > > > listener entries but before that check if they have been already > > delivered > > > in the first stage, 3. When we are sure that we are already processing > > > notifications for commits executed after initial query was done, then > we > > > can process listener entries without any additional checks (so our key > > set > > > from stage 1 can be removed from memory). The problem is that I have no > > way > > > to say that I can move from stage 2 to 3. Another problem is that we > need > > > to stash listener entries while still processing initial query results > > > causing an excessive memory pressure on our client. > > > > > > In my case, values are immutable - I never change them, I just add new > > > entry for newer versions. Does it mean that I won't have any duplicates > > > between the initial query and listener entries when using continuous > > > queries on caches supporting MVCC? > > > > > > After reading the related thread ( > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html > > > ) > > > I'm now concerned about the ordering. My case assumes that there are > > groups > > > of entries which belong to a business aggregate object and I would like > > to > > > make sure that if I commit two records in two serial transactions then > I > > > have notifications in the same order. Those entries will have different > > > keys so based on what you said ("we'd better to leave things as is and > > > guarantee only per-key ordering"), it would seem that the order is not > > > guaranteed. But do you think it would possible to guarantee order when > > > those entries share the same affinity key and they belong to the same > > > partition? > > > > > > Piotr > > > > > > pt., 14 gru 2018, 19:31: Denis Magda <dma...@apache.org> napisał(a): > > > > > > > Vladimir, > > > > > > > > Thanks for referring to the MVCC and Continuous Queries discussion, I > > > knew > > > > that saw us discussing a solution of the duplication problem. Let me > > copy > > > > and paste it in here for others: > > > > > > > > 2) *Initial query*. We implemented it so that user can get some > initial > > > > > data snapshot and then start receiving events. Without MVCC we have > > no > > > > > guarantees of visibility. E.g. if key is updated from V1 to V2, it > is > > > > > possible to see V2 in initial query and in event. With MVCC it is > now > > > > > technically possible to query data on certain snapshot and then > > receive > > > > > only events happened after this snapshot. So that we never see V2 > > > twice. > > > > > Do > > > > > you think we this feature will be interesting for our users? > > > > > > > > > > > > Am I right that this would be a generic solution - whether you use > Scan > > > or > > > > SQL query as an initial one? Have we planned it for the transactional > > SQL > > > > GA or it's out of scope for now? > > > > > > > > -- > > > > Denis > > > > > > > > On Thu, Dec 13, 2018 at 12:40 PM Vladimir Ozerov < > voze...@gridgain.com > > > > > > > wrote: > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html > > > > > > > > > > On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov < > > voze...@gridgain.com > > > > > > > > > wrote: > > > > > > > > > > > Denis, > > > > > > > > > > > > Not really. They are used to ensure that ordering of > notifications > > is > > > > > > consistent with ordering of updates, so that when a key K is > > updated > > > to > > > > > V1, > > > > > > then V2, then V3, you never observe V1 -> V3 -> V2. It also > solves > > > > > > duplicate notification problem in case of node failures, when the > > > same > > > > > > update is delivered twice. > > > > > > > > > > > > However, partition counters are unable to solve duplicates > problem > > in > > > > > > general. Essentially, the question is how to get consistent view > on > > > > some > > > > > > data plus all notifications which happened afterwards. There are > > only > > > > two > > > > > > ways to achieve this - either lock entries during initial query, > or > > > > take > > > > > a > > > > > > kind of consistent data snapshot. The former was never > implemented > > in > > > > > > Ignite - our Scan and SQL queries do not user locking. The latter > > is > > > > > > achievable in theory with MVCC. I raised that question earlier > [1] > > > (see > > > > > > p.2), and we came to conclusion that it might be a good feature > for > > > the > > > > > > product. It is not implemented that way for MVCC now, but most > > > probably > > > > > is > > > > > > not extraordinary difficult to implement. > > > > > > > > > > > > Vladimir. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998 > > > > > > > > > > > > On Thu, Dec 13, 2018 at 11:17 PM Denis Magda <dma...@apache.org> > > > > wrote: > > > > > > > > > > > >> Vladimir, > > > > > >> > > > > > >> The partition counter is supposed to be used internally to solve > > the > > > > > >> duplication issue. Does it sound like a right approach then? > > > > > >> > > > > > >> What would be an approach for SQL queries? Not sure the > partition > > > > > counter > > > > > >> is applicable. > > > > > >> > > > > > >> -- > > > > > >> Denis > > > > > >> > > > > > >> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov < > > > > voze...@gridgain.com> > > > > > >> wrote: > > > > > >> > > > > > >> > Partition counter is internal implemenattion detail, which has > > no > > > > > >> sensible > > > > > >> > meaning to end users. It should not be exposed through public > > API. > > > > > >> > > > > > > >> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda < > dma...@apache.org > > > > > > > > wrote: > > > > > >> > > > > > > >> > > Hello Piotr, > > > > > >> > > > > > > > >> > > That's a known problem and I thought a JIRA ticket already > > > exists. > > > > > >> > However, > > > > > >> > > failed to locate it. The ticket for the improvement should > be > > > > > created > > > > > >> as > > > > > >> > a > > > > > >> > > result of this conversation. > > > > > >> > > > > > > > >> > > Speaking of an initial query type, I would differentiate > from > > > > > >> ScanQueries > > > > > >> > > and SqlQueries. For the former, it sounds reasonable to > apply > > > the > > > > > >> > > partitionCounter logic. As for the latter, Vladimir Ozerov > > will > > > it > > > > > be > > > > > >> > > addressed as part of MVCC/Transactional SQL activities? > > > > > >> > > > > > > > >> > > Btw, Piotr what's your initial query type? > > > > > >> > > > > > > > >> > > -- > > > > > >> > > Denis > > > > > >> > > > > > > > >> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański < > > > > > >> piotr.roman...@gmail.com > > > > > >> > > > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > Hi, as suggested by Ilya here: > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html > > > > > >> > > > I'm resending it to the developers list. > > > > > >> > > > > > > > > >> > > > From that thread we know that there might be duplicates > > > between > > > > > >> initial > > > > > >> > > > query results and listener entries received as part of > > > > continuous > > > > > >> > query. > > > > > >> > > > That means that users need to manually dedupe data. > > > > > >> > > > > > > > > >> > > > In my opinion the manual deduplication in some use cases > may > > > > lead > > > > > to > > > > > >> > > > possible memory problems on the client side. In order to > > > remove > > > > > >> > > duplicated > > > > > >> > > > notifications which we are receiving in the local > listener, > > we > > > > > need > > > > > >> to > > > > > >> > > keep > > > > > >> > > > all initial query results in memory (or at least their > > unique > > > > > ids). > > > > > >> > > > Unfortunately, there is no way (is there?) to find a point > > in > > > > time > > > > > >> when > > > > > >> > > we > > > > > >> > > > can be sure that no dups will arrive anymore. That would > > mean > > > > that > > > > > >> we > > > > > >> > > need > > > > > >> > > > to keep that data indefinitely and use it every time a new > > > > > >> notification > > > > > >> > > > arrives. In case of multiple continuous queries run from a > > > > single > > > > > >> JVM, > > > > > >> > > this > > > > > >> > > > might eventually become a memory or performance problem. I > > can > > > > see > > > > > >> the > > > > > >> > > > following possible improvements to Ignite: > > > > > >> > > > > > > > > >> > > > 1. The deduplication between initial query and incoming > > > > > notification > > > > > >> > > could > > > > > >> > > > be done fully in Ignite. As far as I know there is already > > the > > > > > >> > > > updateCounter and partition id for all the objects so it > > could > > > > be > > > > > >> used > > > > > >> > > > internally. > > > > > >> > > > > > > > > >> > > > 2. Add a guarantee that notifications arriving in the > local > > > > > listener > > > > > >> > > after > > > > > >> > > > query() method returns are not duplicates. This kind of > > > > > >> functionality > > > > > >> > > would > > > > > >> > > > require a specific synchronization inside Ignite. It would > > > also > > > > > mean > > > > > >> > that > > > > > >> > > > the query() method cannot return before all potential > > > duplicates > > > > > are > > > > > >> > > > processed by a local listener what looks wrong. > > > > > >> > > > > > > > > >> > > > 3. Notify users that starting from a given notification > they > > > can > > > > > be > > > > > >> > sure > > > > > >> > > > they will not receive any duplicates anymore. This could > be > > an > > > > > >> > additional > > > > > >> > > > boolean flag in the CacheQueryEntryEvent. > > > > > >> > > > > > > > > >> > > > 4. CacheQueryEntryEvent already exposes the > > > > > partitionUpdateCounter. > > > > > >> > > > Unfortunately we don't have this information for initial > > query > > > > > >> results. > > > > > >> > > If > > > > > >> > > > we had, a client could manually deduplicate notifications > > and > > > > get > > > > > >> rid > > > > > >> > of > > > > > >> > > > initial query results for a given partition after newer > > > > > >> notifications > > > > > >> > > > arrive. Also it would be very convenient to expose > partition > > > id > > > > as > > > > > >> well > > > > > >> > > but > > > > > >> > > > now we can figure it out using the affinity service. The > > > > > assumption > > > > > >> > here > > > > > >> > > is > > > > > >> > > > that notifications are ordered by partitionUpdateCounter > (is > > > it > > > > > >> true?). > > > > > >> > > > > > > > > >> > > > Please correct me if I'm missing anything. > > > > > >> > > > > > > > > >> > > > What do you think? > > > > > >> > > > > > > > > >> > > > Piotr > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >