Re: Continuous queries and duplicates

Vladimir Ozerov Thu, 13 Dec 2018 12:50:10 -0800

Denis,

Not really. They are used to ensure that ordering of notifications is
consistent with ordering of updates, so that when a key K is updated to V1,
then V2, then V3, you never observe V1 -> V3 -> V2. It also solves
duplicate notification problem in case of node failures, when the same
update is delivered twice.


However, partition counters are unable to solve duplicates problem in
general. Essentially, the question is how to get consistent view on some
data plus all notifications which happened afterwards. There are only two
ways to achieve this - either lock entries during initial query, or take a
kind of consistent data snapshot. The former was never implemented in
Ignite - our Scan and SQL queries do not user locking. The latter is
achievable in theory with MVCC. I raised that question earlier [1] (see
p.2), and we came to conclusion that it might be a good feature for the
product. It is not implemented that way for MVCC now, but most probably is
not extraordinary difficult to implement.

Vladimir.

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998

On Thu, Dec 13, 2018 at 11:17 PM Denis Magda <[email protected]> wrote:

> Vladimir,
>
> The partition counter is supposed to be used internally to solve the
> duplication issue. Does it sound like a right approach then?
>
> What would be an approach for SQL queries? Not sure the partition counter
> is applicable.
>
> --
> Denis
>
> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov <[email protected]>
> wrote:
>
> > Partition counter is internal implemenattion detail, which has no
> sensible
> > meaning to end users. It should not be exposed through public API.
> >
> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda <[email protected]> wrote:
> >
> > > Hello Piotr,
> > >
> > > That's a known problem and I thought a JIRA ticket already exists.
> > However,
> > > failed to locate it. The ticket for the improvement should be created
> as
> > a
> > > result of this conversation.
> > >
> > > Speaking of an initial query type, I would differentiate from
> ScanQueries
> > > and SqlQueries. For the former, it sounds reasonable to apply the
> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it be
> > > addressed as part of MVCC/Transactional SQL activities?
> > >
> > > Btw, Piotr what's your initial query type?
> > >
> > > --
> > > Denis
> > >
> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Hi, as suggested by Ilya here:
> > > >
> > > >
> > >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html
> > > > I'm resending it to the developers list.
> > > >
> > > > From that thread we know that there might be duplicates between
> initial
> > > > query results and listener entries received as part of continuous
> > query.
> > > > That means that users need to manually dedupe data.
> > > >
> > > > In my opinion the manual deduplication in some use cases may lead to
> > > > possible memory problems on the client side. In order to remove
> > > duplicated
> > > > notifications which we are receiving in the local listener, we need
> to
> > > keep
> > > > all initial query results in memory (or at least their unique ids).
> > > > Unfortunately, there is no way (is there?) to find a point in time
> when
> > > we
> > > > can be sure that no dups will arrive anymore. That would mean that we
> > > need
> > > > to keep that data indefinitely and use it every time a new
> notification
> > > > arrives. In case of multiple continuous queries run from a single
> JVM,
> > > this
> > > > might eventually become a memory or performance problem. I can see
> the
> > > > following possible improvements to Ignite:
> > > >
> > > > 1. The deduplication between initial query and incoming notification
> > > could
> > > > be done fully in Ignite. As far as I know there is already the
> > > > updateCounter and partition id for all the objects so it could be
> used
> > > > internally.
> > > >
> > > > 2. Add a guarantee that notifications arriving in the local listener
> > > after
> > > > query() method returns are not duplicates. This kind of functionality
> > > would
> > > > require a specific synchronization inside Ignite. It would also mean
> > that
> > > > the query() method cannot return before all potential duplicates are
> > > > processed by a local listener what looks wrong.
> > > >
> > > > 3. Notify users that starting from a given notification they can be
> > sure
> > > > they will not receive any duplicates anymore. This could be an
> > additional
> > > > boolean flag in the CacheQueryEntryEvent.
> > > >
> > > > 4. CacheQueryEntryEvent already exposes the partitionUpdateCounter.
> > > > Unfortunately we don't have this information for initial query
> results.
> > > If
> > > > we had, a client could manually deduplicate notifications and get rid
> > of
> > > > initial query results for a given partition after newer notifications
> > > > arrive. Also it would be very convenient to expose partition id as
> well
> > > but
> > > > now we can figure it out using the affinity service. The assumption
> > here
> > > is
> > > > that notifications are ordered by partitionUpdateCounter (is it
> true?).
> > > >
> > > > Please correct me if I'm missing anything.
> > > >
> > > > What do you think?
> > > >
> > > > Piotr
> > > >
> > >
> >
>

Re: Continuous queries and duplicates

Reply via email to