Sounds good to me. On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith <bened...@apache.org> wrote:
> So, this thread somewhat petered out. > > There are still a number of unresolved issues, but to make progress I > wonder if it would first be helpful to have a vote on ensuring we are ANSI > SQL 92 compliant for our arithmetic? This seems like a sensible baseline, > since we will hopefully minimise surprise to operators this way. > > If people largely agree, I will call a vote, and we can pick up a couple > of more focused discussions afterwards on how we interpret the leeway it > gives. > > > > On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > > Hi, > > > > From reading the spec. Precision is always implementation defined. The > spec specifies scale in several cases, but never precision for any type or > operation (addition/subtraction, multiplication, division). > > > > So we don't implement anything remotely approaching precision and scale > in CQL when it comes to numbers I think? So we aren't going to follow the > spec for scale. We are already pretty far down that road so I would leave > it alone. > > > > I don't think the spec is asking for the most approximate type. It's > just saying the result is approximate, and the precision is implementation > defined. We could return either float or double. I think if one of the > operands is a double we should return a double because clearly the schema > thought a double was required to represent that number. I would also be in > favor of returning a double all the time so that people can expect a > consistent type from expressions involving approximate numbers. > > > > I am a big fan of widening for arithmetic expressions in a database to > avoid having to error on overflow. You can go to the trouble of only > widening the minimum amount, but I think it's simpler if we always widen to > bigint and double. This would be something the spec allows. > > > > Definitely if we can make overflow not occur we should and the spec > allows that. We should also not return different types for the same operand > types just to work around overflow if we detect we need more precision. > > > > Ariel > > On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: > >> If it’s in the SQL spec, I’m fairly convinced. Thanks for digging this > >> out (and Mike for getting some empirical examples). > >> > >> We still have to decide on the approximate data type to return; right > >> now, we have float+bigint=double, but float+int=float. I think this is > >> fairly inconsistent, and either the approximate type should always win, > >> or we should always upgrade to double for mixed operands. > >> > >> The quoted spec also suggests that decimal+float=float, and decimal > >> +double=double, whereas we currently have decimal+float=decimal, and > >> decimal+double=decimal > >> > >> If we’re going to go with an approximate operand implying an > approximate > >> result, I think we should do it consistently (and consistent with the > >> SQL92 spec), and have the type of the approximate operand always be the > >> return type. > >> > >> This would still leave a decision for float+double, though. The most > >> consistent behaviour with that stated above would be to always take the > >> most approximate type to return (i.e. float), but this would seem to me > >> to be fairly unexpected for the user. > >> > >> > >>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote: > >>> > >>> Hi, > >>> > >>> I agree with what's been said about expectations regarding expressions > involving floating point numbers. I think that if one of the inputs is > approximate then the result should be approximate. > >>> > >>> One thing we could look at for inspiration is the SQL spec. Not to > follow dogmatically necessarily. > >>> > >>> From the SQL 92 spec regarding assignment > http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6: > >>> " > >>> Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT, > >>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all > mutually > >>> comparable and mutually assignable. If an assignment would > result > >>> in a loss of the most significant digits, an exception condition > >>> is raised. If least significant digits are lost, implementation- > >>> defined rounding or truncating occurs with no exception > condition > >>> being raised. The rules for arithmetic are generally governed by > >>> Subclause 6.12, "<numeric value expression>". > >>> " > >>> > >>> Section 6.12 numeric value expressions: > >>> " > >>> 1) If the data type of both operands of a dyadic arithmetic > opera- > >>> tor is exact numeric, then the data type of the result is > exact > >>> numeric, with precision and scale determined as follows: > >>> ... > >>> 2) If the data type of either operand of a dyadic arithmetic op- > >>> erator is approximate numeric, then the data type of the re- > >>> sult is approximate numeric. The precision of the result is > >>> implementation-defined. > >>> " > >>> > >>> And this makes sense to me. I think we should only return an exact > result if both of the inputs are exact. > >>> > >>> I think we might want to look closely at the SQL spec and especially > when the spec requires an error to be generated. Those are sometimes in the > spec to prevent subtle paths to wrong answers. Any time we deviate from the > spec we should be asking why is it in the spec and why are we deviating. > >>> > >>> Another issue besides overflow handling is how we determine precision > and scale for expressions involving two exact types. > >>> > >>> Ariel > >>> > >>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: > >>>> Hi, > >>>> > >>>> I'm not sure if I would prefer the Postgres way of doing things, > which is > >>>> returning just about any type depending on the order of operators. > >>>> Considering it actually mentions in the docs that using > numeric/decimal is > >>>> slow and also multiple times that floating points are inexact. So > doing > >>>> some math with Postgres (9.6.5): > >>>> > >>>> SELECT 2147483647::bigint*1.0::double precision returns double > >>>> precision 2147483647 > >>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0 > >>>> SELECT 2147483647::bigint*1.0::real returns double > >>>> SELECT 2147483647::double precision*1::bigint returns double > 2147483647 > >>>> SELECT 2147483647::double precision*1.0::bigint returns double > 2147483647 > >>>> > >>>> With + - we can get the same amount of mixture of returned types. > There's > >>>> no difference in those calculations, just some casting. To me > >>>> floating-point math indicates inexactness and has errors and whoever > mixes > >>>> up two different types should understand that. If one didn't want > exact > >>>> numeric type, why would the server return such? The floating point > value > >>>> itself could be wrong already before the calculation - trying to say > we do > >>>> it lossless is just wrong. > >>>> > >>>> Fun with 2.65: > >>>> > >>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 > >>>> SELECT 2.65::double precision * 1::int returns double 2.65 > >>>> > >>>> SELECT round(2.65) returns numeric 4 > >>>> SELECT round(2.65::double precision) returns double 4 > >>>> > >>>> SELECT 2.65 * 1 returns double 2.65 > >>>> SELECT 2.65 * 1::bigint returns numeric 2.65 > >>>> SELECT 2.65 * 1.0 returns numeric 2.650 > >>>> SELECT 2.65 * 1.0::double precision returns double 2.65 > >>>> > >>>> SELECT round(2.65) * 1 returns numeric 3 > >>>> SELECT round(2.65) * round(1) returns double 3 > >>>> > >>>> So as we're going to have silly values in any case, why pretend > something > >>>> else? Also, exact calculations are slow if we crunch large amount of > >>>> numbers. I guess I slightly deviated towards Postgres' implemention > in this > >>>> case, but I wish it wasn't used as a benchmark in this case. And most > >>>> importantly, I would definitely want the exact same type returned > each time > >>>> I do a calculation. > >>>> > >>>> - Micke > >>>> > >>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith < > bened...@apache.org> > >>>> wrote: > >>>> > >>>>> As far as I can tell we reached a relatively strong consensus that we > >>>>> should implement lossless casts by default? Does anyone have > anything more > >>>>> to add? > >>>>> > >>>>> Looking at the emails, everyone who participated and expressed a > >>>>> preference was in favour of the “Postgres approach” of upcasting to > decimal > >>>>> for mixed float/int operands? > >>>>> > >>>>> I’d like to get a clear-cut decision on this, so we know what we’re > doing > >>>>> for 4.0. Then hopefully we can move on to a collective decision on > Ariel’s > >>>>> concerns about overflow, which I think are also pressing - > particularly for > >>>>> tinyint and smallint. This does also impact implicit casts for mixed > >>>>> integer type operations, but an approach for these will probably > fall out > >>>>> of any decision on overflow. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan < > murukesh.moha...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> I think you're conflating two things here. There's the loss > resulting > >>>>> from > >>>>>> using some operators, and loss involved in casting. Dividing an > integer > >>>>> by > >>>>>> another integer to obtain an integer result can result in loss, but > >>>>> there's > >>>>>> no implicit casting there and no loss due to casting. Casting an > integer > >>>>>> to a float can also result in loss. So dividing an integer by a > float, > >>>>> for > >>>>>> example, with an implicit cast has an additional avenue for loss: > the > >>>>>> implicit cast for the operands so that they're of the same type. I > >>>>> believe > >>>>>> this discussion so far has been about the latter, not the loss from > the > >>>>>> operations themselves. > >>>>>> > >>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer < > benjamin.le...@datastax.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I would like to try to clarify things a bit to help people to > understand > >>>>>>> the true complexity of the problem. > >>>>>>> > >>>>>>> The *float *and *double *types are inexact numeric types. Not only > at > >>>>> the > >>>>>>> operation level. > >>>>>>> > >>>>>>> If you insert 676543.21 in a *float* column and then read it, you > will > >>>>>>> realize that the value has been truncated to 676543.2. > >>>>>>> > >>>>>>> If you want accuracy the only way is to avoid those inexact types. > >>>>>>> Using *decimals > >>>>>>> *during operations will mitigate the problem but will not remove > it. > >>>>>>> > >>>>>>> > >>>>>>> I do not recall PostgreSQL behaving has described. If I am not > mistaken > >>>>> in > >>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to what > MS SQL > >>>>>>> server and Oracle do. So all thoses databases will lose precision > if you > >>>>>>> are not carefull. > >>>>>>> > >>>>>>> If you truly need precision you can have it by using exact numeric > types > >>>>>>> for your data types. Of course it has a cost on performance, > memory and > >>>>>>> disk usage. > >>>>>>> > >>>>>>> The advantage of the current approach is that it give you the > choice. > >>>>> It is > >>>>>>> up to you to decide what you need for your application. It is also > in > >>>>> line > >>>>>>> with the way CQL behave everywhere else. > >>>>>>> > >>>>>> -- > >>>>>> > >>>>>> Muru > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade