Denis, To make things really clearer we need to provide some concrete example of Compute + LocalSQL and reason about it to figure out whether "smart" SQL engine can deliver the same (or better) results or not.
пт, 8 нояб. 2019 г. в 01:48, Denis Magda <dma...@apache.org>: > > Folks, > > See our compute tasks as an advanced version of stored procedures that let > the users code the logic of various complexity with Java, .NET or C++ (and > not with PL/SQL). The logic can use a combination of APIs (key-value, SQL, > etc.) to access data both locally and remotely while being executed on > server nodes. The logic can make N key-value requests or run M SQL queries. > > We kept supporting local SQL queries exactly for such scenarios (for our > version of stored procedures) to ensure the distributed map-reduce phase is > canceled if all the data is local. And affinityCalls were improved one day > to pin the partitions. > > If the new engine is smart enough to understand that all the partitions are > available locally during the affinityRun execution then it's totally fine > to remove the 'local' flag. Otherwise, we need to instruct the engine > manually that a distributed phase is redundant via 'local' flag or by other > means. > > Does it make things clearer? > > > - > Denis > > > On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <vololo...@gmail.com> wrote: > > > Stephen, > > > > In my understanding we need to do a better job to realize use-cases of > > Compute + LocalSQL ourselves. > > > > Ideally smart optimizer should do the best job of query deployment. > > > > чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington > > <stephen.darling...@gridgain.com>: > > > > > > I made a (bad) assumption that this would also affect queries against > > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m > > happy. > > > > > > What I would say is that the “broadcast / local” method is one I see > > fairly often. Do we need to do a better job educating people of the > > “correct” way? > > > > > > Regards, > > > Stephen > > > > > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <alexey.goncha...@gmail.com> > > wrote: > > > > > > > > Denis, Stephen, > > > > > > > > Running a local query in a broadcast closure won't work on changing > > > > topology. We specifically added an affinityCall method to the compute > > API > > > > in order to pin a partition to prevent its moving and eviction > > throughout > > > > the task execution. Therefore, the query inside an affinityCall is > > always > > > > executed against some partitions (otherwise the query may give > > incorrect > > > > results when topology is changed). > > > > > > > > I support Igor's question and think that the 'local' flag for the query > > > > should be deprecated and eventually removed. A 'local' query can > > always be > > > > expressed as a query agains a set of partitions. If those partitions > > are > > > > located on the same node - good, we get fast and correct results. If > > not - > > > > we may either raise an exception and ask user to remap the query, or > > > > fallback to a distributed query execution. > > > > > > > > Given that the Calcite prototype is in its early stages, it's likely > > its > > > > first version will be available in 3.x, and it's a good chance to get > > rid > > > > of wrong API pieces. > > > > > > > > --AG > > > > > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington < > > > > stephen.darling...@gridgain.com>: > > > > > > > >> A common use case is where you want to work on many rows of data > > across > > > >> the grid. You’d broadcast a closure, running the same code on every > > node > > > >> with just the local data. SQL doesn’t work in isolation — it’s often > > used > > > >> as a filter for future computations. > > > >> > > > >> Regards, > > > >> Stephen > > > >> > > > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo...@gmail.com> wrote: > > > >>> > > > >>> Denis, > > > >>> > > > >>> I am mostly concerned about gathering use cases. It would be great to > > > >>> critically assess such cases to identify why it cannot be solved by > > > >>> using distributed SQL. Also it sounds similar to some kind of > > "hints", > > > >>> but very limited and with all hints drawbacks (impossibility to use > > > >>> full strength of CBO). We can provide better "hints" support with new > > > >>> engine as well. > > > >>> > > > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dma...@apache.org>: > > > >>>> > > > >>>> Ivan, > > > >>>> > > > >>>> I was involved in a couple of such use cases personally, so, that's > > not > > > >> my > > > >>>> imagination ;) Even more, as far as I remember, the primary reason > > why > > > >> we > > > >>>> improved our affinityRuns ensuring no partition is purged from a > > node > > > >> until > > > >>>> a task is completed is because many users were running local SQL > > from > > > >>>> compute tasks and needed a guarantee that SQL will always return a > > > >> correct > > > >>>> result set. > > > >>>> > > > >>>> - > > > >>>> Denis > > > >>>> > > > >>>> > > > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo...@gmail.com > > > > > > >> wrote: > > > >>>> > > > >>>>> Denis, > > > >>>>> > > > >>>>> Would be nice to see real use-cases of affinity call + local SQL > > > >>>>> combination. Generally, new engine will be able to infer > > collocation > > > >>>>> resulting in the same collocated execution automatically. > > > >>>>> > > > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dma...@apache.org>: > > > >>>>>> > > > >>>>>> Hi Igor, > > > >>>>>> > > > >>>>>> Local queries feature is broadly used together with affinity-based > > > >>>>> compute > > > >>>>>> tasks: > > > >>>>>> > > > >>>>> > > > >> > > https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods > > > >>>>>> > > > >>>>>> The use case is as follows. The user knows that all required data > > > >> needed > > > >>>>>> for computation is collocated, and SQL is used as an advanced API > > for > > > >>>>> data > > > >>>>>> retrieval from the computation code. The affinity task ensures > > that > > > >>>>>> partitions won't be discarded from the node(s) if the topology > > changes > > > >>>>>> during the task execution and, thus, it's safe to run SQL locally > > > >>>>> skipping > > > >>>>>> distributed phases. > > > >>>>>> > > > >>>>>> The combination of affinity compute tasks with local SQL is a > > real and > > > >>>>>> valuable use case, and this is what we need to support with > > Calcite. > > > >> Do > > > >>>>> you > > > >>>>>> see any challenges? > > > >>>>>> > > > >>>>>> - > > > >>>>>> Denis > > > >>>>>> > > > >>>>>> > > > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov > > > >> <kondako...@mail.ru.invalid > > > >>>>>> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> Hi Igor! > > > >>>>>>> > > > >>>>>>> IMO we need to maintain the backward compatibility between old > > and > > > >> new > > > >>>>>>> query engines as much as possible. And therefore we shouldn't > > change > > > >>>>> the > > > >>>>>>> behavior of local queries. > > > >>>>>>> > > > >>>>>>> So, for local queries Calcite's planner shouldn't consider the > > > >>>>>>> distribution trait at all. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> -- > > > >>>>>>> Kind Regards > > > >>>>>>> Roman Kondakov > > > >>>>>>> > > > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote: > > > >>>>>>>> Hi Igniters, > > > >>>>>>>> > > > >>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do > > we > > > >>>>> need > > > >>>>>>> local queries at all and, if so, what semantic they should > > have?». > > > >>>>>>>> > > > >>>>>>>> Current planing flow consists of next steps: > > > >>>>>>>> > > > >>>>>>>> 1) Parsing SQL to AST > > > >>>>>>>> 2) Validating AST (against Schema) > > > >>>>>>>> 3) Optimizing (Building execution graph) > > > >>>>>>>> 4) Splitting (into query fragments which executes on target > > nodes) > > > >>>>>>>> 5) Mapping (query fragments to nodes/partitions) > > > >>>>>>>> > > > >>>>>>>> At last step we check that all Fragment sources (a table or > > result) > > > >>>>> have > > > >>>>>>> the same distribution (in other words all sources have to be > > > >>>>> co-located) > > > >>>>>>>> > > > >>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are > > > >>>>>>> co-located, an Exchange is produced otherwise. But if we force > > local > > > >>>>>>> execution we cannot produce Exchanges, that means we may face two > > > >>>>>>> non-co-located caches inside a single query fragment (result of > > local > > > >>>>> query > > > >>>>>>> planning is a single query fragment). So, we cannot pass the > > check. > > > >>>>>>>> > > > >>>>>>>> Should we throw an exception or omit the check for local query > > > >>>>> planning > > > >>>>>>> or prohibit local queries at all? > > > >>>>>>>> > > > >>>>>>>> Your thoughts? > > > >>>>>>>> > > > >>>>>>>> Regards, > > > >>>>>>>> Igor > > > >>>>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>>> Best regards, > > > >>>>> Ivan Pavlukhin > > > >>>>> > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> Best regards, > > > >>> Ivan Pavlukhin > > > >> > > > >> > > > >> > > > > > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > -- Best regards, Ivan Pavlukhin