Re: SQL on PARTITIONED vs REPLICATED cache

Dmitriy Setrakyan Wed, 12 Apr 2017 14:26:41 -0700

Got it, Denis. I think you are right.

On Wed, Apr 12, 2017 at 2:20 PM, Denis Magda <dma...@apache.org> wrote:


> Dmitriy,
>
> No, I think that Sergi supposed a type of cache which reference is used
> for a query execution. In my example
>
> >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
>
> *replicatedCache* reference is used for the query execution and, as I
> understand, this causes the issue.
>
> Sergi, please clarify.
>
> —
> Denis
>
> > On Apr 12, 2017, at 1:51 PM, Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
> >
> > Denis, I think that you meant selecting from replicated cache first as an
> > invalid scenario, but provided the wrong example. Here is the correct
> > example for the invalid query:
> >
> > SELECT * FROM replicatedCache … JOIN partitionedCache …”
> >
> > I do agree, we should make the change, as long as we keep the flag to
> > enable the old behavior.
> >
> > D.
> >
> > On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dma...@apache.org> wrote:
> >
> >> Sergi,
> >>
> >> As far as I understand you’re considering an example below:
> >>
> >> IgniteCache partitioneCache = ...;
> >> IgniteCache replicatedCache = …;
> >>
> >> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
> >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
> >>
> >> Do you mean 2. as the issue? If it’s so then can’t we just detect on our
> >> own that all the caches are replicated and execute a query more optimal?
> >> This should omit necessity to add isReplicatedOnly()?
> >>
> >> —
> >> Denis
> >>
> >>> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
> >> andrey.mashen...@gmail.com> wrote:
> >>>
> >>> Yes, it's reasonable.
> >>>
> >>> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
> >> sergi.vlady...@gmail.com>
> >>> wrote:
> >>>
> >>>> Good point, but I'm not sure. The difference is that on client node
> you
> >>>> should not be able to enable isLocal, while isReplicatedOnly is
> >> perfectly
> >>>> valid. What do you think?
> >>>>
> >>>> Sergi
> >>>>
> >>>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >>> :
> >>>>
> >>>>> Sergi,
> >>>>>
> >>>>> Got it.
> >>>>>
> >>>>> Does query execution way and results will be same for
> isReplicatedOnly
> >>>> flag
> >>>>> and for isLocal flag turned on?
> >>>>> If my understanding is correct, we will get same results and there is
> >> no
> >>>>> need to introduce a new flag.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> >>>> sergi.vlady...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Ok, let it be an exception. I'm just saying that the thing does not
> >>>> work
> >>>>>> now.
> >>>>>>
> >>>>>> Sergi
> >>>>>>
> >>>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> >>>> andrey.mashen...@gmail.com
> >>>>>> :
> >>>>>>
> >>>>>>> Sergi,
> >>>>>>>
> >>>>>>> I wounder how it is possible?
> >>>>>>>
> >>>>>>> Looks like it is impossible to run query on replicated cache, but
> >>>>> select
> >>>>>>> data from a
> >>>>>>> partitioned table. It will result with IlleagalStateException on
> >>>> stable
> >>>>>>> topology or
> >>>>>>> IgniteCacheException on unstable topology.
> >>>>>>> See ReduceQueryExecutor.stableDataNodes() and
> >>>>>>> replicatedUnstableDataNodes()
> >>>>>>> methods.
> >>>>>>>
> >>>>>>> BTW, IlleagalStateException with no message is confusing.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> >>>>>> sergi.vlady...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Andrey,
> >>>>>>>>
> >>>>>>>> Because if you run query on replicated cache, but select data from
> >>>> a
> >>>>>>>> partitioned table, you will get only a part of the result.
> >>>>>>>>
> >>>>>>>> Igor,
> >>>>>>>>
> >>>>>>>> You are mostly right, but
> >>>>>>>>
> >>>>>>>> 1. Performance characteristics may change.
> >>>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff in
> >>>> H2
> >>>>>> SQL
> >>>>>>>> and fail in some case where it worked previously.
> >>>>>>>>
> >>>>>>>> Because of this the change may affect existing applications and I
> >>>>> want
> >>>>>> to
> >>>>>>>> have it in 2.0 to make it legal.
> >>>>>>>>
> >>>>>>>> Sergi
> >>>>>>>>
> >>>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <isap...@gridgain.com>:
> >>>>>>>>
> >>>>>>>>> Also, is it really a breaking change if the results are wrong?
> >>>>>>>>> To me it looks more like a bugfix, i.e. you can't break something
> >>>>>>>>> that does not work properly.
> >>>>>>>>>
> >>>>>>>>> Best Regards,
> >>>>>>>>> Igor
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> >>>>>>>>> andrey.mashen...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sergi,
> >>>>>>>>>>
> >>>>>>>>>> How can query to replicated cache leads to to wrong results?
> >>>>>>>>>> Is it due to we can read backup entries?
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> >>>>>>>>> sergi.vlady...@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Guys,
> >>>>>>>>>>>
> >>>>>>>>>>> I want to introduce another breaking change for 2.0.
> >>>>>>>>>>>
> >>>>>>>>>>> Currently SQL is being processed differently when we call
> >>>>> method
> >>>>>>>>> `query`
> >>>>>>>>>> on
> >>>>>>>>>>> partitioned cache and on replicated: on replicated cache we
> >>>> do
> >>>>>> not
> >>>>>>> do
> >>>>>>>>> any
> >>>>>>>>>>> extra processing and execute the query as is on current node.
> >>>>>>>>>>>
> >>>>>>>>>>> This behavior historically existed for performance reasons.
> >>>> But
> >>>>>> it
> >>>>>>> is
> >>>>>>>>> not
> >>>>>>>>>>> obvious and leads to wrong query results. This issue becomes
> >>>>> even
> >>>>>>>> more
> >>>>>>>>>>> creepy with JDBC and ODBC drivers.
> >>>>>>>>>>>
> >>>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
> >>>>> through
> >>>>>>> the
> >>>>>>>>>> whole
> >>>>>>>>>>> processing pipeline to guaranty the correct result
> >>>>> irrespectively
> >>>>>>> to
> >>>>>>>>> the
> >>>>>>>>>>> cache that was the query originator.
> >>>>>>>>>>>
> >>>>>>>>>>> To be able to have the old behavior (skip all the
> >>>> preprocessing
> >>>>>> and
> >>>>>>>> run
> >>>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
> >>>>> SqlQuery
> >>>>>>> and
> >>>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
> >>>> knows
> >>>>>>> that
> >>>>>>>>> the
> >>>>>>>>>>> only replicated tables participate in a query, then he can
> >>>>> enable
> >>>>>>> it
> >>>>>>>>> for
> >>>>>>>>>>> better performance.
> >>>>>>>>>>>
> >>>>>>>>>>> Sergi
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Andrey V. Mashenkov
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards,
> >>>>>>> Andrey V. Mashenkov
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Andrey V. Mashenkov
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrey V. Mashenkov
> >>
> >>
>
>

Re: SQL on PARTITIONED vs REPLICATED cache

Reply via email to