Thanks for starting this discussion. I’m definitely in the lossless camp. I’m curious of the performance impact of choosing lossless vs lossy.
Dinesh > On Oct 2, 2018, at 10:54 AM, Benedict Elliott Smith <bened...@apache.org> > wrote: > > I agree, in broad strokes at least. Interested to hear others’ positions. > > > >> On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote: >> >> Hi, >> >> I think overflow and the role of widening conversions are pretty linked so >> I'll continue to inject that into this discussion. Also overflow is much >> worse since most applications won't be impacted by a loss of precision when >> an expression involves an int and float, but will care quite a bit if they >> get some nonsense wrapped number in an integer only expression. >> >> For VoltDB in practice we didn't run into issues with applications not >> making progress due to exceptions with real data due to the widening >> conversions. The range of double and long are pretty big and that hides wrap >> around/infinity. >> >> I think the proposal of having all operations return a decimal is attractive >> in that these expressions always result in a consistent type. Two pain >> points might be whether client languages have decimal support and whether >> there is a performance issue? The nice thing about always returning decimal >> is we can sidestep the issue of overflow. >> >> I would start with seeing if that's acceptable, and if it isn't then look at >> other approaches like returning a variety of types such when doing int + int >> return a bigint or int + float return a double. >> >> If we take an approach that allows overflow the ideal end state IMO would be >> to get all users to run Cassandra in way that overflow results in an error >> even in the context of aggregation. The road to get there is tricky, but >> maybe start by having it as an opt in tunable in cassandra.yaml. I don't >> know how/when we could ever change that as a default and it's unfortunate >> having an option like this that 99% won't know they should flip. >> >> It seems like having the default throw on overflow is not as bad as it >> sounds if you do the widening conversions since most people won't run into >> them. The change in the column types of results sets actually sounds worse >> if we want to also improve aggregrations. Many applications won't notice if >> the client library abstracts that away, but I think there are still cases >> where people would notice the type changing. >> >> Ariel >> >>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote: >>> This (overflow) is an excellent point, but this also affects >>> aggregations which were introduced a long time ago. They already >>> inherit Java semantics for all of the relevant types (silent wrap >>> around). We probably want to be consistent, meaning either changing >>> aggregations (which incurs a cost for changing API) or continuing the >>> java semantics here. >>> >>> This is why having these discussions explicitly in the community before >>> a release is so critical, in my view. It’s very easy for these semantic >>> changes to go unnoticed on a JIRA, and then ossify. >>> >>> >>>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote: >>>> >>>> Hi, >>>> >>>> I think we should decide based on what is least surprising as you mention, >>>> but isn't overridden by some other concern. >>>> >>>> It seems to me the priorities are >>>> >>>> * Correctness >>>> * Performance >>>> * User visible complexity >>>> * Developer visible complexity >>>> >>>> Defaulting to silent implicit data loss is not ideal from a correctness >>>> standpoint. >>>> >>>> Doing something better like using wider types doesn't seem like a >>>> performance issue. >>>> >>>> From a user standpoint doing something less lossy doesn't look more >>>> complex as long as it's consistent, and documented and doesn't change from >>>> version to version. >>>> >>>> There is some developer complexity, but this is a public API and we only >>>> get one shot at this. >>>> >>>> I wonder about how overflow is handled as well. In VoltDB I think we threw >>>> on overflow and tended to just do widening conversions to make that less >>>> common. We didn't imitate another database (as far as I know) we just went >>>> with what least likely to silently corrupt data. >>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 >>>> >>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213> >>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 >>>> >>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764> >>>> >>>> Ariel >>>> >>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote: >>>>> ç introduced arithmetic operators, and alongside these >>>>> came implicit casts for their operands. There is a semantic decision to >>>>> be made, and I think the project would do well to explicitly raise this >>>>> kind of question for wider input before release, since the project is >>>>> bound by them forever more. >>>>> >>>>> In this case, the choice is between lossy and lossless casts for >>>>> operations involving integers and floating point numbers. In essence, >>>>> should: >>>>> >>>>> (1) float + int = float, double + bigint = double; or >>>>> (2) float + int = double, double + bigint = decimal; or >>>>> (3) float + int = decimal, double + bigint = decimal >>>>> >>>>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> >>>>> double. Simply casting between these types changes the value. This is >>>>> what MS SQL Server does. >>>>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) >>>>> is what PostgreSQL does. >>>>> >>>>> The question I’m interested in is not just which is the right decision, >>>>> but how the right decision should be arrived at. My view is that we >>>>> should primarily aim for least surprise to the user, but I’m keen to >>>>> hear from others. >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> <mailto:dev-unsubscr...@cassandra.apache.org> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> <mailto:dev-h...@cassandra.apache.org> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> <mailto:dev-unsubscr...@cassandra.apache.org> >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> <mailto:dev-h...@cassandra.apache.org> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org