I think there might be an another approach; by making it possible to define safetiness of an operator - going back to the first example and considering:
SELECT e.name FROM emp e INNER JOIN dep d ON e.deptno = d.deptno WHERE (10 / e.empno) = 1 and the pushed down version: SELECT e.name FROM (select * from emp WHERE (10 / empno)=1) e INNER JOIN dep d ON e.deptno = d.deptno There are systems which handle divison_by_zero as a shortcut to NULL; for those the above two queries are conveniently equivalent. Limiting optimizations by stopping because of unsafety might hurt some system's performance - I think an alternate approach could be to: * mark operators regarding their safety * optionally provide a way to declare the safety conversion process; for DIVISION this could be: (case e.empno when 0 then NULL else e.empno end) * during push down the condition with unsafe elements should also be kept at the top level to ensure that we don't loose correctness so the pushed down version would be: --- SELECT e.name FROM (select * from emp WHERE (10 / (case empno when 0 then NULL else empno end) ) = 1 ) e INNER JOIN dep d ON e.deptno = d.deptno WHERE (10 / e.empno) = 1 This approch could possibly also benefit from optimizations based constraints/etc more easily: * say if empno is known to be >0 at compilation time * the case could be removed * get back to the original expression * finally remove the identical conditional from the top level cheers, Zoltan On 10/7/22 18:07, Stamatis Zampetakis wrote:
I just logged https://issues.apache.org/jira/browse/CALCITE-5315 but I am not planning to work on this sometime soon. Best, Stamatis On Fri, Feb 18, 2022 at 10:56 PM Stamatis Zampetakis <zabe...@gmail.com> wrote:Thank you all for the feedback. I will do a small research about the proper term to use for functions that might throw / never throw and log a JIRA to move the discussion further. @Viliam : Note that the problem you mention, although similar, is not exactly the same as the one I brought up now. The reordering of predicates in the WHERE clause has been brought up quite a few times in the dev list and our stance [1] is that since the standard leaves this decision to the implementor people should not rely on this (or put the appropriate guards). The case here is a bit different, at least the way I read the standard, cause it defines the following: "If all optional clauses are omitted, then the result of the <table expression> is the same as the result of the <from clause>. Otherwise, each specified clause is applied to the result of the previously specified clause and the result of the <table expression> is the result of the application of the last specified clause." and one of the optional clauses mentioned in the previous paragraph is the <where clause>. There seems to be a clearly defined order between the <from clause>, which includes inner joins, and the <where clause>. Best, Stamatis [1] https://lists.apache.org/thread/mq44cnrohz19hh10btms126vbcoxl50w On Fri, Feb 18, 2022 at 9:58 AM Viliam Durina <vil...@hazelcast.com> wrote:I have observed this issue years ago in well-known databases. My case was much simpler: data recordType:int value:text ----- ------ 0 1 1 a SELECT * FROM data WHERE recordType='1' AND CAST(value AS INT)<10 SQL is declarative, and unlike procedural languages, it doesn't prescribe short-circuit evaluation of the WHERE clause, or any specific evaluation order. If it was prescribed, the query would be perfectly safe. But prescribing the evaluation order would rule out many optimizations, or make them much harder, such as this half-null-half-error value. For example, reordering additions might or might not lead to overflow: TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 - 90 + 100` don't - imagine each value comes from a different table and we reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but what if they are BIGINTs? My understanding was that any expression must not fail in any plausible execution plan. Therefore if I parse the column `value`, it must succeed for every possible scanned row. In my specific case I ended up implementing a custom TO_NUMBER function that returns null on parsing error, and that null never appeared in the result because of the other condition. Viliam On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote:You should definitely log a bug, even if there are no plans to fix it.Itis a violation of the standard, and therefore it is a bug, andtherefore weshould document that it exists. Can you do some research on the right terminology. You use the term ‘unsafe’. I think the PL community uses the term ’strict’ forexpressionsthat must be evaluated (and therefore if they throw, the query is suretoabort). We have other related concepts floating around in Calcite -dynamicfunctions, deterministic operators - and standardizing terminology iskeyif we are to keep the menagerie in order. There might be a way to have our cake and eat it too. We could push down the division, and if there is a division-by-zero we return a specialvaluesuch as NULL. If the special value is not eliminated by the join thenit ispromoted to a throw. The “cake” here is the performance benefit ofpushingdown a filter (that may never or rarely throw); the “eating it” issafetyon the occasion that the filter does throw. Even if that strategy doesn’t work, maybe we could have a flag that says whether to push down conditions that might throw. JulianOn Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com>wrote:Is it feasible to prevent the filter push down in cases like this (detectpotential division by zero) ? What other exceptions that could happen? Should it only push down some filters and preserve the complex filterabovethe join? Regarding a Jira ticket, generating an exception when it shouldproducevalid results is gotta be a Jira case IMHO. On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com>wrote:Hi all, Till today, I believed that pushing a filter below an inner join isnotstrictly related to the operators (equals, plus, minus, cast,division)present in the filter. However, the query below will return some results if the filter isexecutedafter the join or it will fail with an exception if the filter ispushedbelow the join. EMP [EMPNO, NAME, DEPTNO] 0 | Alex | 0 10 | Bob | 1 DEP [DEPTNO] 1 SELECT e.name FROM emp e INNER JOIN dep d ON e.deptno = d.deptno WHERE (10 / e.empno) = 1 It seems that SQL standard (Section 7.4 general rule 1) mandates thatWHEREshould be applied to the result of FROM so in the case above pushing filters below the join seems to violate the standard. Currently, in Calcite we are going to push the division, cast, and basically any kind of operator below the join. Many well-knowndatabasese.g., Postgres, do that as well. Should we log it as a bug and try to do something about it or let itbeasit is? Best, Stamatis-- This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast
OpenPGP_signature
Description: OpenPGP digital signature