Re: Implicit Casts for Arithmetic Operators

Ariel Weisberg Tue, 20 Nov 2018 07:13:04 -0800

Hi,

+1


This is a public API so we will be much better off if we get it right the first 
time.

Ariel

> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
> 
> Sounds good to me.
> 
> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <bened...@apache.org>
> wrote:
> 
>> So, this thread somewhat petered out.
>> 
>> There are still a number of unresolved issues, but to make progress I
>> wonder if it would first be helpful to have a vote on ensuring we are ANSI
>> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
>> since we will hopefully minimise surprise to operators this way.
>> 
>> If people largely agree, I will call a vote, and we can pick up a couple
>> of more focused discussions afterwards on how we interpret the leeway it
>> gives.
>> 
>> 
>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> From reading the spec. Precision is always implementation defined. The
>> spec specifies scale in several cases, but never precision for any type or
>> operation (addition/subtraction, multiplication, division).
>>> 
>>> So we don't implement anything remotely approaching precision and scale
>> in CQL when it comes to numbers I think? So we aren't going to follow the
>> spec for scale. We are already pretty far down that road so I would leave
>> it alone.
>>> 
>>> I don't think the spec is asking for the most approximate type. It's
>> just saying the result is approximate, and the precision is implementation
>> defined. We could return either float or double. I think if one of the
>> operands is a double we should return a double because clearly the schema
>> thought a double was required to represent that number. I would also be in
>> favor of returning a double all the time so that people can expect a
>> consistent type from expressions involving approximate numbers.
>>> 
>>> I am a big fan of widening for arithmetic expressions in a database to
>> avoid having to error on overflow. You can go to the trouble of only
>> widening the minimum amount, but I think it's simpler if we always widen to
>> bigint and double. This would be something the spec allows.
>>> 
>>> Definitely if we can make overflow not occur we should and the spec
>> allows that. We should also not return different types for the same operand
>> types just to work around overflow if we detect we need more precision.
>>> 
>>> Ariel
>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
>>>> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
>>>> out (and Mike for getting some empirical examples).
>>>> 
>>>> We still have to decide on the approximate data type to return; right
>>>> now, we have float+bigint=double, but float+int=float.  I think this is
>>>> fairly inconsistent, and either the approximate type should always win,
>>>> or we should always upgrade to double for mixed operands.
>>>> 
>>>> The quoted spec also suggests that decimal+float=float, and decimal
>>>> +double=double, whereas we currently have decimal+float=decimal, and
>>>> decimal+double=decimal
>>>> 
>>>> If we’re going to go with an approximate operand implying an
>> approximate
>>>> result, I think we should do it consistently (and consistent with the
>>>> SQL92 spec), and have the type of the approximate operand always be the
>>>> return type.
>>>> 
>>>> This would still leave a decision for float+double, though.  The most
>>>> consistent behaviour with that stated above would be to always take the
>>>> most approximate type to return (i.e. float), but this would seem to me
>>>> to be fairly unexpected for the user.
>>>> 
>>>> 
>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I agree with what's been said about expectations regarding expressions
>> involving floating point numbers. I think that if one of the inputs is
>> approximate then the result should be approximate.
>>>>> 
>>>>> One thing we could look at for inspiration is the SQL spec. Not to
>> follow dogmatically necessarily.
>>>>> 
>>>>> From the SQL 92 spec regarding assignment
>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
>>>>> "
>>>>>       Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
>>>>>       FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
>> mutually
>>>>>       comparable and mutually assignable. If an assignment would
>> result
>>>>>       in a loss of the most significant digits, an exception condition
>>>>>       is raised. If least significant digits are lost, implementation-
>>>>>       defined rounding or truncating occurs with no exception
>> condition
>>>>>       being raised. The rules for arithmetic are generally governed by
>>>>>       Subclause 6.12, "<numeric value expression>".
>>>>> "
>>>>> 
>>>>> Section 6.12 numeric value expressions:
>>>>> "
>>>>>       1) If the data type of both operands of a dyadic arithmetic
>> opera-
>>>>>          tor is exact numeric, then the data type of the result is
>> exact
>>>>>          numeric, with precision and scale determined as follows:
>>>>> ...
>>>>>       2) If the data type of either operand of a dyadic arithmetic op-
>>>>>          erator is approximate numeric, then the data type of the re-
>>>>>          sult is approximate numeric. The precision of the result is
>>>>>          implementation-defined.
>>>>> "
>>>>> 
>>>>> And this makes sense to me. I think we should only return an exact
>> result if both of the inputs are exact.
>>>>> 
>>>>> I think we might want to look closely at the SQL spec and especially
>> when the spec requires an error to be generated. Those are sometimes in the
>> spec to prevent subtle paths to wrong answers. Any time we deviate from the
>> spec we should be asking why is it in the spec and why are we deviating.
>>>>> 
>>>>> Another issue besides overflow handling is how we determine precision
>> and scale for expressions involving two exact types.
>>>>> 
>>>>> Ariel
>>>>> 
>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm not sure if I would prefer the Postgres way of doing things,
>> which is
>>>>>> returning just about any type depending on the order of operators.
>>>>>> Considering it actually mentions in the docs that using
>> numeric/decimal is
>>>>>> slow and also multiple times that floating points are inexact. So
>> doing
>>>>>> some math with Postgres (9.6.5):
>>>>>> 
>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double
>>>>>> precision 2147483647
>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
>>>>>> SELECT 2147483647::bigint*1.0::real returns double
>>>>>> SELECT 2147483647::double precision*1::bigint returns double
>> 2147483647
>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double
>> 2147483647
>>>>>> 
>>>>>> With + - we can get the same amount of mixture of returned types.
>> There's
>>>>>> no difference in those calculations, just some casting. To me
>>>>>> floating-point math indicates inexactness and has errors and whoever
>> mixes
>>>>>> up two different types should understand that. If one didn't want
>> exact
>>>>>> numeric type, why would the server return such? The floating point
>> value
>>>>>> itself could be wrong already before the calculation - trying to say
>> we do
>>>>>> it lossless is just wrong.
>>>>>> 
>>>>>> Fun with 2.65:
>>>>>> 
>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65
>>>>>> 
>>>>>> SELECT round(2.65) returns numeric 4
>>>>>> SELECT round(2.65::double precision) returns double 4
>>>>>> 
>>>>>> SELECT 2.65 * 1 returns double 2.65
>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65
>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650
>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65
>>>>>> 
>>>>>> SELECT round(2.65) * 1 returns numeric 3
>>>>>> SELECT round(2.65) * round(1) returns double 3
>>>>>> 
>>>>>> So as we're going to have silly values in any case, why pretend
>> something
>>>>>> else? Also, exact calculations are slow if we crunch large amount of
>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention
>> in this
>>>>>> case, but I wish it wasn't used as a benchmark in this case. And most
>>>>>> importantly, I would definitely want the exact same type returned
>> each time
>>>>>> I do a calculation.
>>>>>> 
>>>>>> - Micke
>>>>>> 
>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
>> bened...@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> As far as I can tell we reached a relatively strong consensus that we
>>>>>>> should implement lossless casts by default?  Does anyone have
>> anything more
>>>>>>> to add?
>>>>>>> 
>>>>>>> Looking at the emails, everyone who participated and expressed a
>>>>>>> preference was in favour of the “Postgres approach” of upcasting to
>> decimal
>>>>>>> for mixed float/int operands?
>>>>>>> 
>>>>>>> I’d like to get a clear-cut decision on this, so we know what we’re
>> doing
>>>>>>> for 4.0.  Then hopefully we can move on to a collective decision on
>> Ariel’s
>>>>>>> concerns about overflow, which I think are also pressing -
>> particularly for
>>>>>>> tinyint and smallint.  This does also impact implicit casts for mixed
>>>>>>> integer type operations, but an approach for these will probably
>> fall out
>>>>>>> of any decision on overflow.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
>> murukesh.moha...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I think you're conflating two things here. There's the loss
>> resulting
>>>>>>> from
>>>>>>>> using some operators, and loss involved in casting. Dividing an
>> integer
>>>>>>> by
>>>>>>>> another integer to obtain an integer result can result in loss, but
>>>>>>> there's
>>>>>>>> no implicit casting there and no loss due to casting.  Casting an
>> integer
>>>>>>>> to a float can also result in loss. So dividing an integer by a
>> float,
>>>>>>> for
>>>>>>>> example, with an implicit cast has an additional avenue for loss:
>> the
>>>>>>>> implicit cast for the operands so that they're of the same type. I
>>>>>>> believe
>>>>>>>> this discussion so far has been about the latter, not the loss from
>> the
>>>>>>>> operations themselves.
>>>>>>>> 
>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
>> benjamin.le...@datastax.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I would like to try to clarify things a bit to help people to
>> understand
>>>>>>>>> the true complexity of the problem.
>>>>>>>>> 
>>>>>>>>> The *float *and *double *types are inexact numeric types. Not only
>> at
>>>>>>> the
>>>>>>>>> operation level.
>>>>>>>>> 
>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, you
>> will
>>>>>>>>> realize that the value has been truncated to 676543.2.
>>>>>>>>> 
>>>>>>>>> If you want accuracy the only way is to avoid those inexact types.
>>>>>>>>> Using *decimals
>>>>>>>>> *during operations will mitigate the problem but will not remove
>> it.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not
>> mistaken
>>>>>>> in
>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
>> MS SQL
>>>>>>>>> server and Oracle do. So all thoses databases will lose precision
>> if you
>>>>>>>>> are not carefull.
>>>>>>>>> 
>>>>>>>>> If you truly need precision you can have it by using exact numeric
>> types
>>>>>>>>> for your data types. Of course it has a cost on performance,
>> memory and
>>>>>>>>> disk usage.
>>>>>>>>> 
>>>>>>>>> The advantage of the current approach is that it give you the
>> choice.
>>>>>>> It is
>>>>>>>>> up to you to decide what you need for your application. It is also
>> in
>>>>>>> line
>>>>>>>>> with the way CQL behave everywhere else.
>>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Muru
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Reply via email to