I have observed this issue years ago in well-known databases. My case was
much simpler:

data
recordType:int  value:text
-----           ------
0               1
1               a

SELECT *
FROM data
WHERE recordType='1' AND CAST(value AS INT)<10


SQL is declarative, and unlike procedural languages, it doesn't prescribe
short-circuit evaluation of the WHERE clause, or any specific evaluation
order. If it was prescribed, the query would be perfectly safe. But
prescribing the evaluation order would rule out many optimizations, or make
them much harder, such as this half-null-half-error value.

For example, reordering additions might or might not lead to overflow:
TINYINTs `100 + 100 - 90`, evaluated in this order, overflow, but `100 - 90
+ 100` don't - imagine each value comes from a different table and we
reorder the joins. Perhaps result of TINYINT addition can be SMALLINT, but
what if they are BIGINTs?

My understanding was that any expression must not fail in any plausible
execution plan. Therefore if I parse the column `value`, it must succeed
for every possible scanned row. In my specific case I ended up implementing
a custom TO_NUMBER function that returns null on parsing error, and that
null never appeared in the result because of the other condition.

Viliam

On Thu, 17 Feb 2022 at 20:08, Julian Hyde <jhyde.apa...@gmail.com> wrote:

> You should definitely log a bug, even if there are no plans to fix it. It
> is a violation of the standard, and therefore it is a bug, and therefore we
> should document that it exists.
>
> Can you do some research on the right terminology. You use the term
> ‘unsafe’. I think the PL community uses the term ’strict’ for expressions
> that must be evaluated (and therefore if they throw, the query is sure to
> abort). We have other related concepts floating around in Calcite - dynamic
> functions, deterministic operators - and standardizing terminology is key
> if we are to keep the menagerie in order.
>
> There might be a way to have our cake and eat it too. We could push down
> the division, and if there is a division-by-zero we return a special value
> such as NULL. If the special value is not eliminated by the join then it is
> promoted to a throw. The “cake” here is the performance benefit of pushing
> down a filter (that may never or rarely throw); the “eating it” is safety
> on the occasion that the filter does throw.
>
> Even if that strategy doesn’t work, maybe we could have a flag that says
> whether to push down conditions that might throw.
>
> Julian
>
>
> > On Feb 17, 2022, at 8:07 AM, Scott Reynolds <sdrreyno...@gmail.com>
> wrote:
> >
> > Is it feasible to prevent the filter push down in cases like this (
> detect
> > potential division by zero) ? What other exceptions that could happen?
> >
> > Should it only push down some filters and preserve the complex filter
> above
> > the join?
> >
> > Regarding a Jira ticket, generating an exception when it should produce
> > valid results is gotta be a Jira case IMHO.
> >
> >
> >
> >
> > On Thu, Feb 17, 2022, 7:17 AM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> Till today, I believed that pushing a filter below an inner join is not
> >> strictly related to the operators (equals, plus, minus, cast, division)
> >> present in the filter.
> >>
> >> However, the query below will return some results if the filter is
> executed
> >> after the join or it will fail with an exception if the filter is pushed
> >> below the join.
> >>
> >> EMP [EMPNO, NAME, DEPTNO]
> >> 0   | Alex | 0
> >> 10 | Bob | 1
> >>
> >> DEP [DEPTNO]
> >> 1
> >>
> >> SELECT e.name
> >> FROM emp e
> >> INNER JOIN dep d ON e.deptno = d.deptno
> >> WHERE (10 / e.empno) = 1
> >>
> >> It seems that SQL standard (Section 7.4 general rule 1) mandates that
> WHERE
> >> should be applied to the result of FROM so in the case above pushing
> >> filters below the join seems to violate the standard.
> >>
> >> Currently, in Calcite we are going to push the division, cast, and
> >> basically any kind of operator below the join. Many well-known databases
> >> e.g., Postgres, do that as well.
> >>
> >> Should we log it as a bug and try to do something about it or let it be
> as
> >> it is?
> >>
> >> Best,
> >> Stamatis
> >>
>
>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Reply via email to