Guys, FYI: Partition counters are already a part of the public API. The following method reveals this information: CacheQueryEntryEvent#getPartitionUpdateCounter() <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/query/CacheQueryEntryEvent.html#getPartitionUpdateCounter--> I also think, that this kind of information shouldn't be accessible by user, but I don't see, how to prevent the duplication problem with it neither.
Denis чт, 13 дек. 2018 г. в 23:40, Vladimir Ozerov <voze...@gridgain.com>: > [1] > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html > > On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov <voze...@gridgain.com> > wrote: > > > Denis, > > > > Not really. They are used to ensure that ordering of notifications is > > consistent with ordering of updates, so that when a key K is updated to > V1, > > then V2, then V3, you never observe V1 -> V3 -> V2. It also solves > > duplicate notification problem in case of node failures, when the same > > update is delivered twice. > > > > However, partition counters are unable to solve duplicates problem in > > general. Essentially, the question is how to get consistent view on some > > data plus all notifications which happened afterwards. There are only two > > ways to achieve this - either lock entries during initial query, or take > a > > kind of consistent data snapshot. The former was never implemented in > > Ignite - our Scan and SQL queries do not user locking. The latter is > > achievable in theory with MVCC. I raised that question earlier [1] (see > > p.2), and we came to conclusion that it might be a good feature for the > > product. It is not implemented that way for MVCC now, but most probably > is > > not extraordinary difficult to implement. > > > > Vladimir. > > > > [1] > > > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998 > > > > On Thu, Dec 13, 2018 at 11:17 PM Denis Magda <dma...@apache.org> wrote: > > > >> Vladimir, > >> > >> The partition counter is supposed to be used internally to solve the > >> duplication issue. Does it sound like a right approach then? > >> > >> What would be an approach for SQL queries? Not sure the partition > counter > >> is applicable. > >> > >> -- > >> Denis > >> > >> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov <voze...@gridgain.com> > >> wrote: > >> > >> > Partition counter is internal implemenattion detail, which has no > >> sensible > >> > meaning to end users. It should not be exposed through public API. > >> > > >> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda <dma...@apache.org> > wrote: > >> > > >> > > Hello Piotr, > >> > > > >> > > That's a known problem and I thought a JIRA ticket already exists. > >> > However, > >> > > failed to locate it. The ticket for the improvement should be > created > >> as > >> > a > >> > > result of this conversation. > >> > > > >> > > Speaking of an initial query type, I would differentiate from > >> ScanQueries > >> > > and SqlQueries. For the former, it sounds reasonable to apply the > >> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it > be > >> > > addressed as part of MVCC/Transactional SQL activities? > >> > > > >> > > Btw, Piotr what's your initial query type? > >> > > > >> > > -- > >> > > Denis > >> > > > >> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański < > >> piotr.roman...@gmail.com > >> > > > >> > > wrote: > >> > > > >> > > > Hi, as suggested by Ilya here: > >> > > > > >> > > > > >> > > > >> > > >> > http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html > >> > > > I'm resending it to the developers list. > >> > > > > >> > > > From that thread we know that there might be duplicates between > >> initial > >> > > > query results and listener entries received as part of continuous > >> > query. > >> > > > That means that users need to manually dedupe data. > >> > > > > >> > > > In my opinion the manual deduplication in some use cases may lead > to > >> > > > possible memory problems on the client side. In order to remove > >> > > duplicated > >> > > > notifications which we are receiving in the local listener, we > need > >> to > >> > > keep > >> > > > all initial query results in memory (or at least their unique > ids). > >> > > > Unfortunately, there is no way (is there?) to find a point in time > >> when > >> > > we > >> > > > can be sure that no dups will arrive anymore. That would mean that > >> we > >> > > need > >> > > > to keep that data indefinitely and use it every time a new > >> notification > >> > > > arrives. In case of multiple continuous queries run from a single > >> JVM, > >> > > this > >> > > > might eventually become a memory or performance problem. I can see > >> the > >> > > > following possible improvements to Ignite: > >> > > > > >> > > > 1. The deduplication between initial query and incoming > notification > >> > > could > >> > > > be done fully in Ignite. As far as I know there is already the > >> > > > updateCounter and partition id for all the objects so it could be > >> used > >> > > > internally. > >> > > > > >> > > > 2. Add a guarantee that notifications arriving in the local > listener > >> > > after > >> > > > query() method returns are not duplicates. This kind of > >> functionality > >> > > would > >> > > > require a specific synchronization inside Ignite. It would also > mean > >> > that > >> > > > the query() method cannot return before all potential duplicates > are > >> > > > processed by a local listener what looks wrong. > >> > > > > >> > > > 3. Notify users that starting from a given notification they can > be > >> > sure > >> > > > they will not receive any duplicates anymore. This could be an > >> > additional > >> > > > boolean flag in the CacheQueryEntryEvent. > >> > > > > >> > > > 4. CacheQueryEntryEvent already exposes the > partitionUpdateCounter. > >> > > > Unfortunately we don't have this information for initial query > >> results. > >> > > If > >> > > > we had, a client could manually deduplicate notifications and get > >> rid > >> > of > >> > > > initial query results for a given partition after newer > >> notifications > >> > > > arrive. Also it would be very convenient to expose partition id as > >> well > >> > > but > >> > > > now we can figure it out using the affinity service. The > assumption > >> > here > >> > > is > >> > > > that notifications are ordered by partitionUpdateCounter (is it > >> true?). > >> > > > > >> > > > Please correct me if I'm missing anything. > >> > > > > >> > > > What do you think? > >> > > > > >> > > > Piotr > >> > > > > >> > > > >> > > >> > > >