Re: Implicit Casts for Arithmetic Operators

Ariel Weisberg Tue, 02 Oct 2018 08:45:43 -0700

Hi,

I think overflow and the role of widening conversions are pretty linked so I'll 
continue to inject that into this discussion. Also overflow is much worse since 
most applications won't be impacted by a loss of precision when an expression 
involves an int and float, but will care quite a bit if they get some nonsense 
wrapped number in an integer only expression.


For VoltDB in practice we didn't run into issues with applications not making 
progress due to exceptions with real data due to the widening conversions. The 
range of double and long are pretty big and that hides wrap around/infinity. 

I think the proposal of having all operations return a decimal is attractive in 
that these expressions always result in a consistent type. Two pain points 
might be whether client languages have decimal support and whether there is a 
performance issue? The nice thing about always returning decimal is we can 
sidestep the issue of overflow.

I would start with seeing if that's acceptable, and if it isn't then look at 
other approaches like returning a variety of types such when doing int + int 
return a bigint or int + float return a double.

If we take an approach that allows overflow the ideal end state IMO would be to 
get all users to run Cassandra in way that overflow results in an error even in 
the context of aggregation. The road to get there is tricky, but maybe start by 
having it as an opt in tunable in cassandra.yaml. I don't know how/when we 
could ever change that as a default and it's unfortunate having an option like 
this that 99% won't know they should flip.

It seems like having the default throw on overflow is not as bad as it sounds 
if you do the widening conversions since most people won't run into them. The 
change in the column types of results sets actually sounds worse if we want to 
also improve aggregrations. Many applications won't notice if the client 
library abstracts that away, but I think there are still cases where people 
would notice the type changing.

Ariel

On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> This (overflow) is an excellent point, but this also affects 
> aggregations which were introduced a long time ago.  They already 
> inherit Java semantics for all of the relevant types (silent wrap 
> around).  We probably want to be consistent, meaning either changing 
> aggregations (which incurs a cost for changing API) or continuing the 
> java semantics here.
> 
> This is why having these discussions explicitly in the community before 
> a release is so critical, in my view.  It’s very easy for these semantic 
> changes to go unnoticed on a JIRA, and then ossify.
> 
> 
> > On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> > Hi,
> > 
> > I think we should decide based on what is least surprising as you mention, 
> > but isn't overridden by some other concern.
> > 
> > It seems to me the priorities are
> > 
> > * Correctness
> > * Performance
> > * User visible complexity
> > * Developer visible complexity
> > 
> > Defaulting to silent implicit data loss is not ideal from a correctness 
> > standpoint.
> > 
> > Doing something better like using wider types doesn't seem like a 
> > performance issue.
> > 
> > From a user standpoint doing something less lossy doesn't look more complex 
> > as long as it's consistent, and documented and doesn't change from version 
> > to version.
> > 
> > There is some developer complexity, but this is a public API and we only 
> > get one shot at this. 
> > 
> > I wonder about how overflow is handled as well. In VoltDB I think we threw 
> > on overflow and tended to just do widening conversions to make that less 
> > common. We didn't imitate another database (as far as I know) we just went 
> > with what least likely to silently corrupt data.
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> > 
> > Ariel
> > 
> > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >> ç introduced arithmetic operators, and alongside these 
> >> came implicit casts for their operands.  There is a semantic decision to 
> >> be made, and I think the project would do well to explicitly raise this 
> >> kind of question for wider input before release, since the project is 
> >> bound by them forever more.
> >> 
> >> In this case, the choice is between lossy and lossless casts for 
> >> operations involving integers and floating point numbers.  In essence, 
> >> should:
> >> 
> >> (1) float + int = float, double + bigint = double; or
> >> (2) float + int = double, double + bigint = decimal; or
> >> (3) float + int = decimal, double + bigint = decimal
> >> 
> >> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> >> double.  Simply casting between these types changes the value.  This is 
> >> what MS SQL Server does.
> >> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> >> is what PostgreSQL does.
> >> 
> >> The question I’m interested in is not just which is the right decision, 
> >> but how the right decision should be arrived at.  My view is that we 
> >> should primarily aim for least surprise to the user, but I’m keen to 
> >> hear from others.
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> >> <mailto:dev-unsubscr...@cassandra.apache.org>
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org 
> >> <mailto:dev-h...@cassandra.apache.org>
> >> 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> > <mailto:dev-unsubscr...@cassandra.apache.org>
> > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> > <mailto:dev-h...@cassandra.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Reply via email to