Re: SQL on PARTITIONED vs REPLICATED cache

Sergi Vladykin Wed, 12 Apr 2017 23:35:16 -0700

Yes, it is a correct explanation. I've created the issue

https://issues.apache.org/jira/browse/IGNITE-4955


Sergi

2017-04-13 0:25 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> Got it, Denis. I think you are right.
>
> On Wed, Apr 12, 2017 at 2:20 PM, Denis Magda <dma...@apache.org> wrote:
>
> > Dmitriy,
> >
> > No, I think that Sergi supposed a type of cache which reference is used
> > for a query execution. In my example
> >
> > >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> >
> > *replicatedCache* reference is used for the query execution and, as I
> > understand, this causes the issue.
> >
> > Sergi, please clarify.
> >
> > —
> > Denis
> >
> > > On Apr 12, 2017, at 1:51 PM, Dmitriy Setrakyan <dsetrak...@apache.org>
> > wrote:
> > >
> > > Denis, I think that you meant selecting from replicated cache first as
> an
> > > invalid scenario, but provided the wrong example. Here is the correct
> > > example for the invalid query:
> > >
> > > SELECT * FROM replicatedCache … JOIN partitionedCache …”
> > >
> > > I do agree, we should make the change, as long as we keep the flag to
> > > enable the old behavior.
> > >
> > > D.
> > >
> > > On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > >> Sergi,
> > >>
> > >> As far as I understand you’re considering an example below:
> > >>
> > >> IgniteCache partitioneCache = ...;
> > >> IgniteCache replicatedCache = …;
> > >>
> > >> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> > >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> > >>
> > >> Do you mean 2. as the issue? If it’s so then can’t we just detect on
> our
> > >> own that all the caches are replicated and execute a query more
> optimal?
> > >> This should omit necessity to add isReplicatedOnly()?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
> > >> andrey.mashen...@gmail.com> wrote:
> > >>>
> > >>> Yes, it's reasonable.
> > >>>
> > >>> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
> > >> sergi.vlady...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Good point, but I'm not sure. The difference is that on client node
> > you
> > >>>> should not be able to enable isLocal, while isReplicatedOnly is
> > >> perfectly
> > >>>> valid. What do you think?
> > >>>>
> > >>>> Sergi
> > >>>>
> > >>>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > >>> :
> > >>>>
> > >>>>> Sergi,
> > >>>>>
> > >>>>> Got it.
> > >>>>>
> > >>>>> Does query execution way and results will be same for
> > isReplicatedOnly
> > >>>> flag
> > >>>>> and for isLocal flag turned on?
> > >>>>> If my understanding is correct, we will get same results and there
> is
> > >> no
> > >>>>> need to introduce a new flag.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> > >>>> sergi.vlady...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Ok, let it be an exception. I'm just saying that the thing does
> not
> > >>>> work
> > >>>>>> now.
> > >>>>>>
> > >>>>>> Sergi
> > >>>>>>
> > >>>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> > >>>> andrey.mashen...@gmail.com
> > >>>>>> :
> > >>>>>>
> > >>>>>>> Sergi,
> > >>>>>>>
> > >>>>>>> I wounder how it is possible?
> > >>>>>>>
> > >>>>>>> Looks like it is impossible to run query on replicated cache, but
> > >>>>> select
> > >>>>>>> data from a
> > >>>>>>> partitioned table. It will result with IlleagalStateException on
> > >>>> stable
> > >>>>>>> topology or
> > >>>>>>> IgniteCacheException on unstable topology.
> > >>>>>>> See ReduceQueryExecutor.stableDataNodes() and
> > >>>>>>> replicatedUnstableDataNodes()
> > >>>>>>> methods.
> > >>>>>>>
> > >>>>>>> BTW, IlleagalStateException with no message is confusing.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> > >>>>>> sergi.vlady...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Andrey,
> > >>>>>>>>
> > >>>>>>>> Because if you run query on replicated cache, but select data
> from
> > >>>> a
> > >>>>>>>> partitioned table, you will get only a part of the result.
> > >>>>>>>>
> > >>>>>>>> Igor,
> > >>>>>>>>
> > >>>>>>>> You are mostly right, but
> > >>>>>>>>
> > >>>>>>>> 1. Performance characteristics may change.
> > >>>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff
> in
> > >>>> H2
> > >>>>>> SQL
> > >>>>>>>> and fail in some case where it worked previously.
> > >>>>>>>>
> > >>>>>>>> Because of this the change may affect existing applications and
> I
> > >>>>> want
> > >>>>>> to
> > >>>>>>>> have it in 2.0 to make it legal.
> > >>>>>>>>
> > >>>>>>>> Sergi
> > >>>>>>>>
> > >>>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <isap...@gridgain.com>:
> > >>>>>>>>
> > >>>>>>>>> Also, is it really a breaking change if the results are wrong?
> > >>>>>>>>> To me it looks more like a bugfix, i.e. you can't break
> something
> > >>>>>>>>> that does not work properly.
> > >>>>>>>>>
> > >>>>>>>>> Best Regards,
> > >>>>>>>>> Igor
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > >>>>>>>>> andrey.mashen...@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Sergi,
> > >>>>>>>>>>
> > >>>>>>>>>> How can query to replicated cache leads to to wrong results?
> > >>>>>>>>>> Is it due to we can read backup entries?
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > >>>>>>>>> sergi.vlady...@gmail.com
> > >>>>>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Guys,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I want to introduce another breaking change for 2.0.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Currently SQL is being processed differently when we call
> > >>>>> method
> > >>>>>>>>> `query`
> > >>>>>>>>>> on
> > >>>>>>>>>>> partitioned cache and on replicated: on replicated cache we
> > >>>> do
> > >>>>>> not
> > >>>>>>> do
> > >>>>>>>>> any
> > >>>>>>>>>>> extra processing and execute the query as is on current node.
> > >>>>>>>>>>>
> > >>>>>>>>>>> This behavior historically existed for performance reasons.
> > >>>> But
> > >>>>>> it
> > >>>>>>> is
> > >>>>>>>>> not
> > >>>>>>>>>>> obvious and leads to wrong query results. This issue becomes
> > >>>>> even
> > >>>>>>>> more
> > >>>>>>>>>>> creepy with JDBC and ODBC drivers.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
> > >>>>> through
> > >>>>>>> the
> > >>>>>>>>>> whole
> > >>>>>>>>>>> processing pipeline to guaranty the correct result
> > >>>>> irrespectively
> > >>>>>>> to
> > >>>>>>>>> the
> > >>>>>>>>>>> cache that was the query originator.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To be able to have the old behavior (skip all the
> > >>>> preprocessing
> > >>>>>> and
> > >>>>>>>> run
> > >>>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
> > >>>>> SqlQuery
> > >>>>>>> and
> > >>>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
> > >>>> knows
> > >>>>>>> that
> > >>>>>>>>> the
> > >>>>>>>>>>> only replicated tables participate in a query, then he can
> > >>>>> enable
> > >>>>>>> it
> > >>>>>>>>> for
> > >>>>>>>>>>> better performance.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Sergi
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Best regards,
> > >>>>>>>>>> Andrey V. Mashenkov
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best regards,
> > >>>>>>> Andrey V. Mashenkov
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards,
> > >>>>> Andrey V. Mashenkov
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Andrey V. Mashenkov
> > >>
> > >>
> >
> >
>

Re: SQL on PARTITIONED vs REPLICATED cache

Reply via email to