Re: Implicit Casts for Arithmetic Operators

Benedict Elliott Smith Tue, 02 Oct 2018 10:55:26 -0700

I agree, in broad strokes at least.  Interested to hear others’ positions.




> On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> Hi,
> 
> I think overflow and the role of widening conversions are pretty linked so 
> I'll continue to inject that into this discussion. Also overflow is much 
> worse since most applications won't be impacted by a loss of precision when 
> an expression involves an int and float, but will care quite a bit if they 
> get some nonsense wrapped number in an integer only expression.
> 
> For VoltDB in practice we didn't run into issues with applications not making 
> progress due to exceptions with real data due to the widening conversions. 
> The range of double and long are pretty big and that hides wrap 
> around/infinity. 
> 
> I think the proposal of having all operations return a decimal is attractive 
> in that these expressions always result in a consistent type. Two pain points 
> might be whether client languages have decimal support and whether there is a 
> performance issue? The nice thing about always returning decimal is we can 
> sidestep the issue of overflow.
> 
> I would start with seeing if that's acceptable, and if it isn't then look at 
> other approaches like returning a variety of types such when doing int + int 
> return a bigint or int + float return a double.
> 
> If we take an approach that allows overflow the ideal end state IMO would be 
> to get all users to run Cassandra in way that overflow results in an error 
> even in the context of aggregation. The road to get there is tricky, but 
> maybe start by having it as an opt in tunable in cassandra.yaml. I don't know 
> how/when we could ever change that as a default and it's unfortunate having 
> an option like this that 99% won't know they should flip.
> 
> It seems like having the default throw on overflow is not as bad as it sounds 
> if you do the widening conversions since most people won't run into them. The 
> change in the column types of results sets actually sounds worse if we want 
> to also improve aggregrations. Many applications won't notice if the client 
> library abstracts that away, but I think there are still cases where people 
> would notice the type changing.
> 
> Ariel
> 
>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
>> This (overflow) is an excellent point, but this also affects 
>> aggregations which were introduced a long time ago.  They already 
>> inherit Java semantics for all of the relevant types (silent wrap 
>> around).  We probably want to be consistent, meaning either changing 
>> aggregations (which incurs a cost for changing API) or continuing the 
>> java semantics here.
>> 
>> This is why having these discussions explicitly in the community before 
>> a release is so critical, in my view.  It’s very easy for these semantic 
>> changes to go unnoticed on a JIRA, and then ossify.
>> 
>> 
>>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> I think we should decide based on what is least surprising as you mention, 
>>> but isn't overridden by some other concern.
>>> 
>>> It seems to me the priorities are
>>> 
>>> * Correctness
>>> * Performance
>>> * User visible complexity
>>> * Developer visible complexity
>>> 
>>> Defaulting to silent implicit data loss is not ideal from a correctness 
>>> standpoint.
>>> 
>>> Doing something better like using wider types doesn't seem like a 
>>> performance issue.
>>> 
>>> From a user standpoint doing something less lossy doesn't look more complex 
>>> as long as it's consistent, and documented and doesn't change from version 
>>> to version.
>>> 
>>> There is some developer complexity, but this is a public API and we only 
>>> get one shot at this. 
>>> 
>>> I wonder about how overflow is handled as well. In VoltDB I think we threw 
>>> on overflow and tended to just do widening conversions to make that less 
>>> common. We didn't imitate another database (as far as I know) we just went 
>>> with what least likely to silently corrupt data.
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
>>> 
>>> Ariel
>>> 
>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>>>> ç introduced arithmetic operators, and alongside these 
>>>> came implicit casts for their operands.  There is a semantic decision to 
>>>> be made, and I think the project would do well to explicitly raise this 
>>>> kind of question for wider input before release, since the project is 
>>>> bound by them forever more.
>>>> 
>>>> In this case, the choice is between lossy and lossless casts for 
>>>> operations involving integers and floating point numbers.  In essence, 
>>>> should:
>>>> 
>>>> (1) float + int = float, double + bigint = double; or
>>>> (2) float + int = double, double + bigint = decimal; or
>>>> (3) float + int = decimal, double + bigint = decimal
>>>> 
>>>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>>>> double.  Simply casting between these types changes the value.  This is 
>>>> what MS SQL Server does.
>>>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>>>> is what PostgreSQL does.
>>>> 
>>>> The question I’m interested in is not just which is the right decision, 
>>>> but how the right decision should be arrived at.  My view is that we 
>>>> should primarily aim for least surprise to the user, but I’m keen to 
>>>> hear from others.
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>>>> <mailto:dev-unsubscr...@cassandra.apache.org>
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>>>> <mailto:dev-h...@cassandra.apache.org>
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>>> <mailto:dev-unsubscr...@cassandra.apache.org>
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>>> <mailto:dev-h...@cassandra.apache.org>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Implicit Casts for Arithmetic Operators

Reply via email to