Re: New SQL execution engine

Roman Kondakov Mon, 18 Nov 2019 02:04:53 -0800

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll tryto explain.

When query parallelism = N is turned on, it means that each cache isdivided into N parts from the SQL point of view. Every SQL query isexecuted independently over each particular part, and then results aremerged together during the reducer step.

This is absolutely identical to the distributed query execution, whereinstead of a single node with query parallelism = N, we have N nodeswith query parallelism = 1. SQL query is executed over each partition ofdata on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed queryexecution. When we do joins over distributed tables, we need to thinkabout the collocation of data [1]. If data is not collocated, we get awrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on thejoining key and now join returns correct result [2].

Current approach in configuration and query execution looks veryuncomfortable and should be completely redesigned in the new engine.


[1] https://apacheignite-sql.readme.io/docs/distributed-joins

[2] https://github.com/hostettler/igniteParallelQueries/pull/1


--
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostett...@gmail.com wrote:

Actually I am now wondering whether this is not just a bug and that I should
record it as such. As the behavior is different with and without the
parallelism and there is no warning during execution or in the api.

Any thought?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: New SQL execution engine

Reply via email to