Re: Implicit Casts for Arithmetic Operators

Benedict Elliott Smith Wed, 21 Nov 2018 13:54:45 -0800

The goal is simply to agree on a set of well-defined principles for how we 
should behave.  If we don’t like the implications that arise, we’ll have 
another vote?  A democracy cannot bind itself, so I never understood this fear 
of a decision.


A database also has a thousand toggles.  If we absolutely need to, we can 
introduce one more.

We should be doing this upfront a great deal more often.  Doing it 
retrospectively sucks, but in my opinion it's a bad reason to bind ourselves to 
whatever made it in.

Do we anywhere define the principles of our current behaviour?  I couldn’t find 
it.


> On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote:
> 
> On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <bened...@apache.org>
> wrote:
> 
>> FWIW, my meaning of arithmetic in this context extends to any features we
>> have already released (such as aggregates, and perhaps other built-in
>> functions) that operate on the same domain.  We should be consistent, after
>> all.
>> 
>> Whether or not we need to revisit any existing functionality we can figure
>> out after the fact, once we have agreed what our behaviour should be.
>> 
> 
> I'm not sure I correctly understand the process suggested, but I don't
> particularly like/agree with what I understand. What I understand is a
> suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no real
> evaluation of what that entails (at least I haven't seen one), and that
> this vote, if passed, would imply we'd then make any backward incompatible
> change necessary to achieve compliance ("my meaning of arithmetic in this
> context extends to any features we have already released" and "Whether or
> not we need to revisit any existing functionality we can figure out after
> the fact, once we have agreed what our behaviour should be").
> 
> This might make sense of a new product, but at our stage that seems
> backward to me. I think we owe our users to first make the effort of
> identifying what "inconsistencies" our existing arithmetic has[1] and
> _then_ consider what options we have to fix those, with their pros and cons
> (including how bad they break backward compatibility). And if _then_
> getting ANSI SQL 92 compliant proves to not be disruptive (or at least
> acceptably so), then sure, that's great.
> 
> [1]: one possibly efficient way to do that could actually be to compare our
> arithmetic to ANSI SQL 92. Not that all differences found would imply
> inconsistencies/wrongness of our arithmetic, but still, it should be
> helpful. And I guess my whole point is that we should that analysis first,
> and then maybe decide that being ANSI SQL 92 is a reasonable option, not
> decide first and live with the consequences no matter what they are.
> 
> --
> Sylvain
> 
> 
>> I will make this more explicit for the vote, but just to clarify the
>> intention so that we are all discussing the same thing.
>> 
>> 
>>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm> wrote:
>>> 
>>> Hi,
>>> 
>>> +1
>>> 
>>> This is a public API so we will be much better off if we get it right
>> the first time.
>>> 
>>> Ariel
>>> 
>>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>>> 
>>>> Sounds good to me.
>>>> 
>>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <
>> bened...@apache.org>
>>>> wrote:
>>>> 
>>>>> So, this thread somewhat petered out.
>>>>> 
>>>>> There are still a number of unresolved issues, but to make progress I
>>>>> wonder if it would first be helpful to have a vote on ensuring we are
>> ANSI
>>>>> SQL 92 compliant for our arithmetic?  This seems like a sensible
>> baseline,
>>>>> since we will hopefully minimise surprise to operators this way.
>>>>> 
>>>>> If people largely agree, I will call a vote, and we can pick up a
>> couple
>>>>> of more focused discussions afterwards on how we interpret the leeway
>> it
>>>>> gives.
>>>>> 
>>>>> 
>>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> From reading the spec. Precision is always implementation defined. The
>>>>> spec specifies scale in several cases, but never precision for any
>> type or
>>>>> operation (addition/subtraction, multiplication, division).
>>>>>> 
>>>>>> So we don't implement anything remotely approaching precision and
>> scale
>>>>> in CQL when it comes to numbers I think? So we aren't going to follow
>> the
>>>>> spec for scale. We are already pretty far down that road so I would
>> leave
>>>>> it alone.
>>>>>> 
>>>>>> I don't think the spec is asking for the most approximate type. It's
>>>>> just saying the result is approximate, and the precision is
>> implementation
>>>>> defined. We could return either float or double. I think if one of the
>>>>> operands is a double we should return a double because clearly the
>> schema
>>>>> thought a double was required to represent that number. I would also
>> be in
>>>>> favor of returning a double all the time so that people can expect a
>>>>> consistent type from expressions involving approximate numbers.
>>>>>> 
>>>>>> I am a big fan of widening for arithmetic expressions in a database to
>>>>> avoid having to error on overflow. You can go to the trouble of only
>>>>> widening the minimum amount, but I think it's simpler if we always
>> widen to
>>>>> bigint and double. This would be something the spec allows.
>>>>>> 
>>>>>> Definitely if we can make overflow not occur we should and the spec
>>>>> allows that. We should also not return different types for the same
>> operand
>>>>> types just to work around overflow if we detect we need more precision.
>>>>>> 
>>>>>> Ariel
>>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging
>> this
>>>>>>> out (and Mike for getting some empirical examples).
>>>>>>> 
>>>>>>> We still have to decide on the approximate data type to return; right
>>>>>>> now, we have float+bigint=double, but float+int=float.  I think this
>> is
>>>>>>> fairly inconsistent, and either the approximate type should always
>> win,
>>>>>>> or we should always upgrade to double for mixed operands.
>>>>>>> 
>>>>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>>>>> decimal+double=decimal
>>>>>>> 
>>>>>>> If we’re going to go with an approximate operand implying an
>>>>> approximate
>>>>>>> result, I think we should do it consistently (and consistent with the
>>>>>>> SQL92 spec), and have the type of the approximate operand always be
>> the
>>>>>>> return type.
>>>>>>> 
>>>>>>> This would still leave a decision for float+double, though.  The most
>>>>>>> consistent behaviour with that stated above would be to always take
>> the
>>>>>>> most approximate type to return (i.e. float), but this would seem to
>> me
>>>>>>> to be fairly unexpected for the user.
>>>>>>> 
>>>>>>> 
>>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I agree with what's been said about expectations regarding
>> expressions
>>>>> involving floating point numbers. I think that if one of the inputs is
>>>>> approximate then the result should be approximate.
>>>>>>>> 
>>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>>>>> follow dogmatically necessarily.
>>>>>>>> 
>>>>>>>> From the SQL 92 spec regarding assignment
>>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>>>>> "
>>>>>>>>     Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>>>>     FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>>>>> mutually
>>>>>>>>     comparable and mutually assignable. If an assignment would
>>>>> result
>>>>>>>>     in a loss of the most significant digits, an exception
>> condition
>>>>>>>>     is raised. If least significant digits are lost,
>> implementation-
>>>>>>>>     defined rounding or truncating occurs with no exception
>>>>> condition
>>>>>>>>     being raised. The rules for arithmetic are generally governed
>> by
>>>>>>>>     Subclause 6.12, "<numeric value expression>".
>>>>>>>> "
>>>>>>>> 
>>>>>>>> Section 6.12 numeric value expressions:
>>>>>>>> "
>>>>>>>>     1) If the data type of both operands of a dyadic arithmetic
>>>>> opera-
>>>>>>>>        tor is exact numeric, then the data type of the result is
>>>>> exact
>>>>>>>>        numeric, with precision and scale determined as follows:
>>>>>>>> ...
>>>>>>>>     2) If the data type of either operand of a dyadic arithmetic
>> op-
>>>>>>>>        erator is approximate numeric, then the data type of the re-
>>>>>>>>        sult is approximate numeric. The precision of the result is
>>>>>>>>        implementation-defined.
>>>>>>>> "
>>>>>>>> 
>>>>>>>> And this makes sense to me. I think we should only return an exact
>>>>> result if both of the inputs are exact.
>>>>>>>> 
>>>>>>>> I think we might want to look closely at the SQL spec and especially
>>>>> when the spec requires an error to be generated. Those are sometimes
>> in the
>>>>> spec to prevent subtle paths to wrong answers. Any time we deviate
>> from the
>>>>> spec we should be asking why is it in the spec and why are we
>> deviating.
>>>>>>>> 
>>>>>>>> Another issue besides overflow handling is how we determine
>> precision
>>>>> and scale for expressions involving two exact types.
>>>>>>>> 
>>>>>>>> Ariel
>>>>>>>> 
>>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>>>>> which is
>>>>>>>>> returning just about any type depending on the order of operators.
>>>>>>>>> Considering it actually mentions in the docs that using
>>>>> numeric/decimal is
>>>>>>>>> slow and also multiple times that floating points are inexact. So
>>>>> doing
>>>>>>>>> some math with Postgres (9.6.5):
>>>>>>>>> 
>>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>>>>> precision 2147483647
>>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>>>>> 2147483647
>>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>>>>> 2147483647
>>>>>>>>> 
>>>>>>>>> With + - we can get the same amount of mixture of returned types.
>>>>> There's
>>>>>>>>> no difference in those calculations, just some casting. To me
>>>>>>>>> floating-point math indicates inexactness and has errors and
>> whoever
>>>>> mixes
>>>>>>>>> up two different types should understand that. If one didn't want
>>>>> exact
>>>>>>>>> numeric type, why would the server return such? The floating point
>>>>> value
>>>>>>>>> itself could be wrong already before the calculation - trying to
>> say
>>>>> we do
>>>>>>>>> it lossless is just wrong.
>>>>>>>>> 
>>>>>>>>> Fun with 2.65:
>>>>>>>>> 
>>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>>>>> 
>>>>>>>>> SELECT round(2.65) returns numeric 4
>>>>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>>>>> 
>>>>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>>>>> 
>>>>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>>>>> 
>>>>>>>>> So as we're going to have silly values in any case, why pretend
>>>>> something
>>>>>>>>> else? Also, exact calculations are slow if we crunch large amount
>> of
>>>>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>>>>> in this
>>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And
>> most
>>>>>>>>> importantly, I would definitely want the exact same type returned
>>>>> each time
>>>>>>>>> I do a calculation.
>>>>>>>>> 
>>>>>>>>> - Micke
>>>>>>>>> 
>>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>>>>> bened...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> As far as I can tell we reached a relatively strong consensus
>> that we
>>>>>>>>>> should implement lossless casts by default?  Does anyone have
>>>>> anything more
>>>>>>>>>> to add?
>>>>>>>>>> 
>>>>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting
>> to
>>>>> decimal
>>>>>>>>>> for mixed float/int operands?
>>>>>>>>>> 
>>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what
>> we’re
>>>>> doing
>>>>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision
>> on
>>>>> Ariel’s
>>>>>>>>>> concerns about overflow, which I think are also pressing -
>>>>> particularly for
>>>>>>>>>> tinyint and smallint.  This does also impact implicit casts for
>> mixed
>>>>>>>>>> integer type operations, but an approach for these will probably
>>>>> fall out
>>>>>>>>>> of any decision on overflow.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>>>>> murukesh.moha...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I think you're conflating two things here. There's the loss
>>>>> resulting
>>>>>>>>>> from
>>>>>>>>>>> using some operators, and loss involved in casting. Dividing an
>>>>> integer
>>>>>>>>>> by
>>>>>>>>>>> another integer to obtain an integer result can result in loss,
>> but
>>>>>>>>>> there's
>>>>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>>>>> integer
>>>>>>>>>>> to a float can also result in loss. So dividing an integer by a
>>>>> float,
>>>>>>>>>> for
>>>>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>>>>> the
>>>>>>>>>>> implicit cast for the operands so that they're of the same type.
>> I
>>>>>>>>>> believe
>>>>>>>>>>> this discussion so far has been about the latter, not the loss
>> from
>>>>> the
>>>>>>>>>>> operations themselves.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>>>>> benjamin.le...@datastax.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to try to clarify things a bit to help people to
>>>>> understand
>>>>>>>>>>>> the true complexity of the problem.
>>>>>>>>>>>> 
>>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not
>> only
>>>>> at
>>>>>>>>>> the
>>>>>>>>>>>> operation level.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it,
>> you
>>>>> will
>>>>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact
>> types.
>>>>>>>>>>>> Using *decimals
>>>>>>>>>>>> *during operations will mitigate the problem but will not remove
>>>>> it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>>>>> mistaken
>>>>>>>>>> in
>>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to
>> what
>>>>> MS SQL
>>>>>>>>>>>> server and Oracle do. So all thoses databases will lose
>> precision
>>>>> if you
>>>>>>>>>>>> are not carefull.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you truly need precision you can have it by using exact
>> numeric
>>>>> types
>>>>>>>>>>>> for your data types. Of course it has a cost on performance,
>>>>> memory and
>>>>>>>>>>>> disk usage.
>>>>>>>>>>>> 
>>>>>>>>>>>> The advantage of the current approach is that it give you the
>>>>> choice.
>>>>>>>>>> It is
>>>>>>>>>>>> up to you to decide what you need for your application. It is
>> also
>>>>> in
>>>>>>>>>> line
>>>>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Muru
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>>> --
>>>> Jon Haddad
>>>> http://www.rustyrazorblade.com
>>>> twitter: rustyrazorblade
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>>

Re: Implicit Casts for Arithmetic Operators

Reply via email to