Re: Implicit Casts for Arithmetic Operators

Jonathan Haddad Fri, 16 Nov 2018 07:37:29 -0800

Sounds good to me.

On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <bened...@apache.org>
wrote:


> So, this thread somewhat petered out.
>
> There are still a number of unresolved issues, but to make progress I
> wonder if it would first be helpful to have a vote on ensuring we are ANSI
> SQL 92 compliant for our arithmetic?  This seems like a sensible baseline,
> since we will hopefully minimise surprise to operators this way.
>
> If people largely agree, I will call a vote, and we can pick up a couple
> of more focused discussions afterwards on how we interpret the leeway it
> gives.
>
>
> > On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > Hi,
> >
> > From reading the spec. Precision is always implementation defined. The
> spec specifies scale in several cases, but never precision for any type or
> operation (addition/subtraction, multiplication, division).
> >
> > So we don't implement anything remotely approaching precision and scale
> in CQL when it comes to numbers I think? So we aren't going to follow the
> spec for scale. We are already pretty far down that road so I would leave
> it alone.
> >
> > I don't think the spec is asking for the most approximate type. It's
> just saying the result is approximate, and the precision is implementation
> defined. We could return either float or double. I think if one of the
> operands is a double we should return a double because clearly the schema
> thought a double was required to represent that number. I would also be in
> favor of returning a double all the time so that people can expect a
> consistent type from expressions involving approximate numbers.
> >
> > I am a big fan of widening for arithmetic expressions in a database to
> avoid having to error on overflow. You can go to the trouble of only
> widening the minimum amount, but I think it's simpler if we always widen to
> bigint and double. This would be something the spec allows.
> >
> > Definitely if we can make overflow not occur we should and the spec
> allows that. We should also not return different types for the same operand
> types just to work around overflow if we detect we need more precision.
> >
> > Ariel
> > On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> >> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this
> >> out (and Mike for getting some empirical examples).
> >>
> >> We still have to decide on the approximate data type to return; right
> >> now, we have float+bigint=double, but float+int=float.  I think this is
> >> fairly inconsistent, and either the approximate type should always win,
> >> or we should always upgrade to double for mixed operands.
> >>
> >> The quoted spec also suggests that decimal+float=float, and decimal
> >> +double=double, whereas we currently have decimal+float=decimal, and
> >> decimal+double=decimal
> >>
> >> If we’re going to go with an approximate operand implying an
> approximate
> >> result, I think we should do it consistently (and consistent with the
> >> SQL92 spec), and have the type of the approximate operand always be the
> >> return type.
> >>
> >> This would still leave a decision for float+double, though.  The most
> >> consistent behaviour with that stated above would be to always take the
> >> most approximate type to return (i.e. float), but this would seem to me
> >> to be fairly unexpected for the user.
> >>
> >>
> >>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I agree with what's been said about expectations regarding expressions
> involving floating point numbers. I think that if one of the inputs is
> approximate then the result should be approximate.
> >>>
> >>> One thing we could look at for inspiration is the SQL spec. Not to
> follow dogmatically necessarily.
> >>>
> >>> From the SQL 92 spec regarding assignment
> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> >>> "
> >>>        Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> >>>        FLOAT, REAL, and DOUBLE PRECISION are numbers and are all
> mutually
> >>>        comparable and mutually assignable. If an assignment would
> result
> >>>        in a loss of the most significant digits, an exception condition
> >>>        is raised. If least significant digits are lost, implementation-
> >>>        defined rounding or truncating occurs with no exception
> condition
> >>>        being raised. The rules for arithmetic are generally governed by
> >>>        Subclause 6.12, "<numeric value expression>".
> >>> "
> >>>
> >>> Section 6.12 numeric value expressions:
> >>> "
> >>>        1) If the data type of both operands of a dyadic arithmetic
> opera-
> >>>           tor is exact numeric, then the data type of the result is
> exact
> >>>           numeric, with precision and scale determined as follows:
> >>> ...
> >>>        2) If the data type of either operand of a dyadic arithmetic op-
> >>>           erator is approximate numeric, then the data type of the re-
> >>>           sult is approximate numeric. The precision of the result is
> >>>           implementation-defined.
> >>> "
> >>>
> >>> And this makes sense to me. I think we should only return an exact
> result if both of the inputs are exact.
> >>>
> >>> I think we might want to look closely at the SQL spec and especially
> when the spec requires an error to be generated. Those are sometimes in the
> spec to prevent subtle paths to wrong answers. Any time we deviate from the
> spec we should be asking why is it in the spec and why are we deviating.
> >>>
> >>> Another issue besides overflow handling is how we determine precision
> and scale for expressions involving two exact types.
> >>>
> >>> Ariel
> >>>
> >>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> >>>> Hi,
> >>>>
> >>>> I'm not sure if I would prefer the Postgres way of doing things,
> which is
> >>>> returning just about any type depending on the order of operators.
> >>>> Considering it actually mentions in the docs that using
> numeric/decimal is
> >>>> slow and also multiple times that floating points are inexact. So
> doing
> >>>> some math with Postgres (9.6.5):
> >>>>
> >>>> SELECT 2147483647::bigint*1.0::double precision returns double
> >>>> precision 2147483647
> >>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> >>>> SELECT 2147483647::bigint*1.0::real returns double
> >>>> SELECT 2147483647::double precision*1::bigint returns double
> 2147483647
> >>>> SELECT 2147483647::double precision*1.0::bigint returns double
> 2147483647
> >>>>
> >>>> With + - we can get the same amount of mixture of returned types.
> There's
> >>>> no difference in those calculations, just some casting. To me
> >>>> floating-point math indicates inexactness and has errors and whoever
> mixes
> >>>> up two different types should understand that. If one didn't want
> exact
> >>>> numeric type, why would the server return such? The floating point
> value
> >>>> itself could be wrong already before the calculation - trying to say
> we do
> >>>> it lossless is just wrong.
> >>>>
> >>>> Fun with 2.65:
> >>>>
> >>>> SELECT 2.65::real * 1::int returns double 2.65000009536743
> >>>> SELECT 2.65::double precision * 1::int returns double 2.65
> >>>>
> >>>> SELECT round(2.65) returns numeric 4
> >>>> SELECT round(2.65::double precision) returns double 4
> >>>>
> >>>> SELECT 2.65 * 1 returns double 2.65
> >>>> SELECT 2.65 * 1::bigint returns numeric 2.65
> >>>> SELECT 2.65 * 1.0 returns numeric 2.650
> >>>> SELECT 2.65 * 1.0::double precision returns double 2.65
> >>>>
> >>>> SELECT round(2.65) * 1 returns numeric 3
> >>>> SELECT round(2.65) * round(1) returns double 3
> >>>>
> >>>> So as we're going to have silly values in any case, why pretend
> something
> >>>> else? Also, exact calculations are slow if we crunch large amount of
> >>>> numbers. I guess I slightly deviated towards Postgres' implemention
> in this
> >>>> case, but I wish it wasn't used as a benchmark in this case. And most
> >>>> importantly, I would definitely want the exact same type returned
> each time
> >>>> I do a calculation.
> >>>>
> >>>> - Micke
> >>>>
> >>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith <
> bened...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> As far as I can tell we reached a relatively strong consensus that we
> >>>>> should implement lossless casts by default?  Does anyone have
> anything more
> >>>>> to add?
> >>>>>
> >>>>> Looking at the emails, everyone who participated and expressed a
> >>>>> preference was in favour of the “Postgres approach” of upcasting to
> decimal
> >>>>> for mixed float/int operands?
> >>>>>
> >>>>> I’d like to get a clear-cut decision on this, so we know what we’re
> doing
> >>>>> for 4.0.  Then hopefully we can move on to a collective decision on
> Ariel’s
> >>>>> concerns about overflow, which I think are also pressing -
> particularly for
> >>>>> tinyint and smallint.  This does also impact implicit casts for mixed
> >>>>> integer type operations, but an approach for these will probably
> fall out
> >>>>> of any decision on overflow.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan <
> murukesh.moha...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> I think you're conflating two things here. There's the loss
> resulting
> >>>>> from
> >>>>>> using some operators, and loss involved in casting. Dividing an
> integer
> >>>>> by
> >>>>>> another integer to obtain an integer result can result in loss, but
> >>>>> there's
> >>>>>> no implicit casting there and no loss due to casting.  Casting an
> integer
> >>>>>> to a float can also result in loss. So dividing an integer by a
> float,
> >>>>> for
> >>>>>> example, with an implicit cast has an additional avenue for loss:
> the
> >>>>>> implicit cast for the operands so that they're of the same type. I
> >>>>> believe
> >>>>>> this discussion so far has been about the latter, not the loss from
> the
> >>>>>> operations themselves.
> >>>>>>
> >>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer <
> benjamin.le...@datastax.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I would like to try to clarify things a bit to help people to
> understand
> >>>>>>> the true complexity of the problem.
> >>>>>>>
> >>>>>>> The *float *and *double *types are inexact numeric types. Not only
> at
> >>>>> the
> >>>>>>> operation level.
> >>>>>>>
> >>>>>>> If you insert 676543.21 in a *float* column and then read it, you
> will
> >>>>>>> realize that the value has been truncated to 676543.2.
> >>>>>>>
> >>>>>>> If you want accuracy the only way is to avoid those inexact types.
> >>>>>>> Using *decimals
> >>>>>>> *during operations will mitigate the problem but will not remove
> it.
> >>>>>>>
> >>>>>>>
> >>>>>>> I do not recall PostgreSQL behaving has described. If I am not
> mistaken
> >>>>> in
> >>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what
> MS SQL
> >>>>>>> server and Oracle do. So all thoses databases will lose precision
> if you
> >>>>>>> are not carefull.
> >>>>>>>
> >>>>>>> If you truly need precision you can have it by using exact numeric
> types
> >>>>>>> for your data types. Of course it has a cost on performance,
> memory and
> >>>>>>> disk usage.
> >>>>>>>
> >>>>>>> The advantage of the current approach is that it give you the
> choice.
> >>>>> It is
> >>>>>>> up to you to decide what you need for your application. It is also
> in
> >>>>> line
> >>>>>>> with the way CQL behave everywhere else.
> >>>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Muru
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Implicit Casts for Arithmetic Operators

Reply via email to