The goal is simply to agree on a set of well-defined principles for how we should behave. If we don’t like the implications that arise, we’ll have another vote? A democracy cannot bind itself, so I never understood this fear of a decision.
A database also has a thousand toggles. If we absolutely need to, we can introduce one more. We should be doing this upfront a great deal more often. Doing it retrospectively sucks, but in my opinion it's a bad reason to bind ourselves to whatever made it in. Do we anywhere define the principles of our current behaviour? I couldn’t find it. > On 21 Nov 2018, at 21:08, Sylvain Lebresne <lebre...@gmail.com> wrote: > > On Tue, Nov 20, 2018 at 5:02 PM Benedict Elliott Smith <bened...@apache.org> > wrote: > >> FWIW, my meaning of arithmetic in this context extends to any features we >> have already released (such as aggregates, and perhaps other built-in >> functions) that operate on the same domain. We should be consistent, after >> all. >> >> Whether or not we need to revisit any existing functionality we can figure >> out after the fact, once we have agreed what our behaviour should be. >> > > I'm not sure I correctly understand the process suggested, but I don't > particularly like/agree with what I understand. What I understand is a > suggestion for voting on agreeing to be ANSI SQL 92 compliant, with no real > evaluation of what that entails (at least I haven't seen one), and that > this vote, if passed, would imply we'd then make any backward incompatible > change necessary to achieve compliance ("my meaning of arithmetic in this > context extends to any features we have already released" and "Whether or > not we need to revisit any existing functionality we can figure out after > the fact, once we have agreed what our behaviour should be"). > > This might make sense of a new product, but at our stage that seems > backward to me. I think we owe our users to first make the effort of > identifying what "inconsistencies" our existing arithmetic has[1] and > _then_ consider what options we have to fix those, with their pros and cons > (including how bad they break backward compatibility). And if _then_ > getting ANSI SQL 92 compliant proves to not be disruptive (or at least > acceptably so), then sure, that's great. > > [1]: one possibly efficient way to do that could actually be to compare our > arithmetic to ANSI SQL 92. Not that all differences found would imply > inconsistencies/wrongness of our arithmetic, but still, it should be > helpful. And I guess my whole point is that we should that analysis first, > and then maybe decide that being ANSI SQL 92 is a reasonable option, not > decide first and live with the consequences no matter what they are. > > -- > Sylvain > > >> I will make this more explicit for the vote, but just to clarify the >> intention so that we are all discussing the same thing. >> >> >>> On 20 Nov 2018, at 14:18, Ariel Weisberg <adwei...@fastmail.fm> wrote: >>> >>> Hi, >>> >>> +1 >>> >>> This is a public API so we will be much better off if we get it right >> the first time. >>> >>> Ariel >>> >>>> On Nov 16, 2018, at 10:36 AM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >>>> >>>> Sounds good to me. >>>> >>>> On Fri, Nov 16, 2018 at 5:09 AM Benedict Elliott Smith < >> bened...@apache.org> >>>> wrote: >>>> >>>>> So, this thread somewhat petered out. >>>>> >>>>> There are still a number of unresolved issues, but to make progress I >>>>> wonder if it would first be helpful to have a vote on ensuring we are >> ANSI >>>>> SQL 92 compliant for our arithmetic? This seems like a sensible >> baseline, >>>>> since we will hopefully minimise surprise to operators this way. >>>>> >>>>> If people largely agree, I will call a vote, and we can pick up a >> couple >>>>> of more focused discussions afterwards on how we interpret the leeway >> it >>>>> gives. >>>>> >>>>> >>>>>> On 12 Oct 2018, at 18:10, Ariel Weisberg <ar...@weisberg.ws> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> From reading the spec. Precision is always implementation defined. The >>>>> spec specifies scale in several cases, but never precision for any >> type or >>>>> operation (addition/subtraction, multiplication, division). >>>>>> >>>>>> So we don't implement anything remotely approaching precision and >> scale >>>>> in CQL when it comes to numbers I think? So we aren't going to follow >> the >>>>> spec for scale. We are already pretty far down that road so I would >> leave >>>>> it alone. >>>>>> >>>>>> I don't think the spec is asking for the most approximate type. It's >>>>> just saying the result is approximate, and the precision is >> implementation >>>>> defined. We could return either float or double. I think if one of the >>>>> operands is a double we should return a double because clearly the >> schema >>>>> thought a double was required to represent that number. I would also >> be in >>>>> favor of returning a double all the time so that people can expect a >>>>> consistent type from expressions involving approximate numbers. >>>>>> >>>>>> I am a big fan of widening for arithmetic expressions in a database to >>>>> avoid having to error on overflow. You can go to the trouble of only >>>>> widening the minimum amount, but I think it's simpler if we always >> widen to >>>>> bigint and double. This would be something the spec allows. >>>>>> >>>>>> Definitely if we can make overflow not occur we should and the spec >>>>> allows that. We should also not return different types for the same >> operand >>>>> types just to work around overflow if we detect we need more precision. >>>>>> >>>>>> Ariel >>>>>>> On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote: >>>>>>> If it’s in the SQL spec, I’m fairly convinced. Thanks for digging >> this >>>>>>> out (and Mike for getting some empirical examples). >>>>>>> >>>>>>> We still have to decide on the approximate data type to return; right >>>>>>> now, we have float+bigint=double, but float+int=float. I think this >> is >>>>>>> fairly inconsistent, and either the approximate type should always >> win, >>>>>>> or we should always upgrade to double for mixed operands. >>>>>>> >>>>>>> The quoted spec also suggests that decimal+float=float, and decimal >>>>>>> +double=double, whereas we currently have decimal+float=decimal, and >>>>>>> decimal+double=decimal >>>>>>> >>>>>>> If we’re going to go with an approximate operand implying an >>>>> approximate >>>>>>> result, I think we should do it consistently (and consistent with the >>>>>>> SQL92 spec), and have the type of the approximate operand always be >> the >>>>>>> return type. >>>>>>> >>>>>>> This would still leave a decision for float+double, though. The most >>>>>>> consistent behaviour with that stated above would be to always take >> the >>>>>>> most approximate type to return (i.e. float), but this would seem to >> me >>>>>>> to be fairly unexpected for the user. >>>>>>> >>>>>>> >>>>>>>> On 12 Oct 2018, at 17:23, Ariel Weisberg <ar...@weisberg.ws> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I agree with what's been said about expectations regarding >> expressions >>>>> involving floating point numbers. I think that if one of the inputs is >>>>> approximate then the result should be approximate. >>>>>>>> >>>>>>>> One thing we could look at for inspiration is the SQL spec. Not to >>>>> follow dogmatically necessarily. >>>>>>>> >>>>>>>> From the SQL 92 spec regarding assignment >>>>> http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6: >>>>>>>> " >>>>>>>> Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT, >>>>>>>> FLOAT, REAL, and DOUBLE PRECISION are numbers and are all >>>>> mutually >>>>>>>> comparable and mutually assignable. If an assignment would >>>>> result >>>>>>>> in a loss of the most significant digits, an exception >> condition >>>>>>>> is raised. If least significant digits are lost, >> implementation- >>>>>>>> defined rounding or truncating occurs with no exception >>>>> condition >>>>>>>> being raised. The rules for arithmetic are generally governed >> by >>>>>>>> Subclause 6.12, "<numeric value expression>". >>>>>>>> " >>>>>>>> >>>>>>>> Section 6.12 numeric value expressions: >>>>>>>> " >>>>>>>> 1) If the data type of both operands of a dyadic arithmetic >>>>> opera- >>>>>>>> tor is exact numeric, then the data type of the result is >>>>> exact >>>>>>>> numeric, with precision and scale determined as follows: >>>>>>>> ... >>>>>>>> 2) If the data type of either operand of a dyadic arithmetic >> op- >>>>>>>> erator is approximate numeric, then the data type of the re- >>>>>>>> sult is approximate numeric. The precision of the result is >>>>>>>> implementation-defined. >>>>>>>> " >>>>>>>> >>>>>>>> And this makes sense to me. I think we should only return an exact >>>>> result if both of the inputs are exact. >>>>>>>> >>>>>>>> I think we might want to look closely at the SQL spec and especially >>>>> when the spec requires an error to be generated. Those are sometimes >> in the >>>>> spec to prevent subtle paths to wrong answers. Any time we deviate >> from the >>>>> spec we should be asking why is it in the spec and why are we >> deviating. >>>>>>>> >>>>>>>> Another issue besides overflow handling is how we determine >> precision >>>>> and scale for expressions involving two exact types. >>>>>>>> >>>>>>>> Ariel >>>>>>>> >>>>>>>>> On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm not sure if I would prefer the Postgres way of doing things, >>>>> which is >>>>>>>>> returning just about any type depending on the order of operators. >>>>>>>>> Considering it actually mentions in the docs that using >>>>> numeric/decimal is >>>>>>>>> slow and also multiple times that floating points are inexact. So >>>>> doing >>>>>>>>> some math with Postgres (9.6.5): >>>>>>>>> >>>>>>>>> SELECT 2147483647::bigint*1.0::double precision returns double >>>>>>>>> precision 2147483647 >>>>>>>>> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0 >>>>>>>>> SELECT 2147483647::bigint*1.0::real returns double >>>>>>>>> SELECT 2147483647::double precision*1::bigint returns double >>>>> 2147483647 >>>>>>>>> SELECT 2147483647::double precision*1.0::bigint returns double >>>>> 2147483647 >>>>>>>>> >>>>>>>>> With + - we can get the same amount of mixture of returned types. >>>>> There's >>>>>>>>> no difference in those calculations, just some casting. To me >>>>>>>>> floating-point math indicates inexactness and has errors and >> whoever >>>>> mixes >>>>>>>>> up two different types should understand that. If one didn't want >>>>> exact >>>>>>>>> numeric type, why would the server return such? The floating point >>>>> value >>>>>>>>> itself could be wrong already before the calculation - trying to >> say >>>>> we do >>>>>>>>> it lossless is just wrong. >>>>>>>>> >>>>>>>>> Fun with 2.65: >>>>>>>>> >>>>>>>>> SELECT 2.65::real * 1::int returns double 2.65000009536743 >>>>>>>>> SELECT 2.65::double precision * 1::int returns double 2.65 >>>>>>>>> >>>>>>>>> SELECT round(2.65) returns numeric 4 >>>>>>>>> SELECT round(2.65::double precision) returns double 4 >>>>>>>>> >>>>>>>>> SELECT 2.65 * 1 returns double 2.65 >>>>>>>>> SELECT 2.65 * 1::bigint returns numeric 2.65 >>>>>>>>> SELECT 2.65 * 1.0 returns numeric 2.650 >>>>>>>>> SELECT 2.65 * 1.0::double precision returns double 2.65 >>>>>>>>> >>>>>>>>> SELECT round(2.65) * 1 returns numeric 3 >>>>>>>>> SELECT round(2.65) * round(1) returns double 3 >>>>>>>>> >>>>>>>>> So as we're going to have silly values in any case, why pretend >>>>> something >>>>>>>>> else? Also, exact calculations are slow if we crunch large amount >> of >>>>>>>>> numbers. I guess I slightly deviated towards Postgres' implemention >>>>> in this >>>>>>>>> case, but I wish it wasn't used as a benchmark in this case. And >> most >>>>>>>>> importantly, I would definitely want the exact same type returned >>>>> each time >>>>>>>>> I do a calculation. >>>>>>>>> >>>>>>>>> - Micke >>>>>>>>> >>>>>>>>> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith < >>>>> bened...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> As far as I can tell we reached a relatively strong consensus >> that we >>>>>>>>>> should implement lossless casts by default? Does anyone have >>>>> anything more >>>>>>>>>> to add? >>>>>>>>>> >>>>>>>>>> Looking at the emails, everyone who participated and expressed a >>>>>>>>>> preference was in favour of the “Postgres approach” of upcasting >> to >>>>> decimal >>>>>>>>>> for mixed float/int operands? >>>>>>>>>> >>>>>>>>>> I’d like to get a clear-cut decision on this, so we know what >> we’re >>>>> doing >>>>>>>>>> for 4.0. Then hopefully we can move on to a collective decision >> on >>>>> Ariel’s >>>>>>>>>> concerns about overflow, which I think are also pressing - >>>>> particularly for >>>>>>>>>> tinyint and smallint. This does also impact implicit casts for >> mixed >>>>>>>>>> integer type operations, but an approach for these will probably >>>>> fall out >>>>>>>>>> of any decision on overflow. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 3 Oct 2018, at 11:38, Murukesh Mohanan < >>>>> murukesh.moha...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I think you're conflating two things here. There's the loss >>>>> resulting >>>>>>>>>> from >>>>>>>>>>> using some operators, and loss involved in casting. Dividing an >>>>> integer >>>>>>>>>> by >>>>>>>>>>> another integer to obtain an integer result can result in loss, >> but >>>>>>>>>> there's >>>>>>>>>>> no implicit casting there and no loss due to casting. Casting an >>>>> integer >>>>>>>>>>> to a float can also result in loss. So dividing an integer by a >>>>> float, >>>>>>>>>> for >>>>>>>>>>> example, with an implicit cast has an additional avenue for loss: >>>>> the >>>>>>>>>>> implicit cast for the operands so that they're of the same type. >> I >>>>>>>>>> believe >>>>>>>>>>> this discussion so far has been about the latter, not the loss >> from >>>>> the >>>>>>>>>>> operations themselves. >>>>>>>>>>> >>>>>>>>>>> On Wed, 3 Oct 2018 at 18:35 Benjamin Lerer < >>>>> benjamin.le...@datastax.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I would like to try to clarify things a bit to help people to >>>>> understand >>>>>>>>>>>> the true complexity of the problem. >>>>>>>>>>>> >>>>>>>>>>>> The *float *and *double *types are inexact numeric types. Not >> only >>>>> at >>>>>>>>>> the >>>>>>>>>>>> operation level. >>>>>>>>>>>> >>>>>>>>>>>> If you insert 676543.21 in a *float* column and then read it, >> you >>>>> will >>>>>>>>>>>> realize that the value has been truncated to 676543.2. >>>>>>>>>>>> >>>>>>>>>>>> If you want accuracy the only way is to avoid those inexact >> types. >>>>>>>>>>>> Using *decimals >>>>>>>>>>>> *during operations will mitigate the problem but will not remove >>>>> it. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I do not recall PostgreSQL behaving has described. If I am not >>>>> mistaken >>>>>>>>>> in >>>>>>>>>>>> PostgreSQL *SELECT 3/2* will return *1*. Which is similar to >> what >>>>> MS SQL >>>>>>>>>>>> server and Oracle do. So all thoses databases will lose >> precision >>>>> if you >>>>>>>>>>>> are not carefull. >>>>>>>>>>>> >>>>>>>>>>>> If you truly need precision you can have it by using exact >> numeric >>>>> types >>>>>>>>>>>> for your data types. Of course it has a cost on performance, >>>>> memory and >>>>>>>>>>>> disk usage. >>>>>>>>>>>> >>>>>>>>>>>> The advantage of the current approach is that it give you the >>>>> choice. >>>>>>>>>> It is >>>>>>>>>>>> up to you to decide what you need for your application. It is >> also >>>>> in >>>>>>>>>> line >>>>>>>>>>>> with the way CQL behave everywhere else. >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Muru >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>>> -- >>>> Jon Haddad >>>> http://www.rustyrazorblade.com >>>> twitter: rustyrazorblade >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >>