Thanks for starting this discussion. I’m definitely in the lossless camp. I’m 
curious of the performance impact of choosing lossless vs lossy.

Dinesh

> On Oct 2, 2018, at 10:54 AM, Benedict Elliott Smith <bened...@apache.org> 
> wrote:
> 
> I agree, in broad strokes at least.  Interested to hear others’ positions.
> 
> 
> 
>> On 2 Oct 2018, at 16:44, Ariel Weisberg <ar...@weisberg.ws> wrote:
>> 
>> Hi,
>> 
>> I think overflow and the role of widening conversions are pretty linked so 
>> I'll continue to inject that into this discussion. Also overflow is much 
>> worse since most applications won't be impacted by a loss of precision when 
>> an expression involves an int and float, but will care quite a bit if they 
>> get some nonsense wrapped number in an integer only expression.
>> 
>> For VoltDB in practice we didn't run into issues with applications not 
>> making progress due to exceptions with real data due to the widening 
>> conversions. The range of double and long are pretty big and that hides wrap 
>> around/infinity. 
>> 
>> I think the proposal of having all operations return a decimal is attractive 
>> in that these expressions always result in a consistent type. Two pain 
>> points might be whether client languages have decimal support and whether 
>> there is a performance issue? The nice thing about always returning decimal 
>> is we can sidestep the issue of overflow.
>> 
>> I would start with seeing if that's acceptable, and if it isn't then look at 
>> other approaches like returning a variety of types such when doing int + int 
>> return a bigint or int + float return a double.
>> 
>> If we take an approach that allows overflow the ideal end state IMO would be 
>> to get all users to run Cassandra in way that overflow results in an error 
>> even in the context of aggregation. The road to get there is tricky, but 
>> maybe start by having it as an opt in tunable in cassandra.yaml. I don't 
>> know how/when we could ever change that as a default and it's unfortunate 
>> having an option like this that 99% won't know they should flip.
>> 
>> It seems like having the default throw on overflow is not as bad as it 
>> sounds if you do the widening conversions since most people won't run into 
>> them. The change in the column types of results sets actually sounds worse 
>> if we want to also improve aggregrations. Many applications won't notice if 
>> the client library abstracts that away, but I think there are still cases 
>> where people would notice the type changing.
>> 
>> Ariel
>> 
>>> On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
>>> This (overflow) is an excellent point, but this also affects 
>>> aggregations which were introduced a long time ago.  They already 
>>> inherit Java semantics for all of the relevant types (silent wrap 
>>> around).  We probably want to be consistent, meaning either changing 
>>> aggregations (which incurs a cost for changing API) or continuing the 
>>> java semantics here.
>>> 
>>> This is why having these discussions explicitly in the community before 
>>> a release is so critical, in my view.  It’s very easy for these semantic 
>>> changes to go unnoticed on a JIRA, and then ossify.
>>> 
>>> 
>>>> On 2 Oct 2018, at 15:48, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I think we should decide based on what is least surprising as you mention, 
>>>> but isn't overridden by some other concern.
>>>> 
>>>> It seems to me the priorities are
>>>> 
>>>> * Correctness
>>>> * Performance
>>>> * User visible complexity
>>>> * Developer visible complexity
>>>> 
>>>> Defaulting to silent implicit data loss is not ideal from a correctness 
>>>> standpoint.
>>>> 
>>>> Doing something better like using wider types doesn't seem like a 
>>>> performance issue.
>>>> 
>>>> From a user standpoint doing something less lossy doesn't look more 
>>>> complex as long as it's consistent, and documented and doesn't change from 
>>>> version to version.
>>>> 
>>>> There is some developer complexity, but this is a public API and we only 
>>>> get one shot at this. 
>>>> 
>>>> I wonder about how overflow is handled as well. In VoltDB I think we threw 
>>>> on overflow and tended to just do widening conversions to make that less 
>>>> common. We didn't imitate another database (as far as I know) we just went 
>>>> with what least likely to silently corrupt data.
>>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
>>>>  
>>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
>>>> https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764
>>>>  
>>>> <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
>>>> 
>>>> Ariel
>>>> 
>>>>> On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
>>>>> ç introduced arithmetic operators, and alongside these 
>>>>> came implicit casts for their operands.  There is a semantic decision to 
>>>>> be made, and I think the project would do well to explicitly raise this 
>>>>> kind of question for wider input before release, since the project is 
>>>>> bound by them forever more.
>>>>> 
>>>>> In this case, the choice is between lossy and lossless casts for 
>>>>> operations involving integers and floating point numbers.  In essence, 
>>>>> should:
>>>>> 
>>>>> (1) float + int = float, double + bigint = double; or
>>>>> (2) float + int = double, double + bigint = decimal; or
>>>>> (3) float + int = decimal, double + bigint = decimal
>>>>> 
>>>>> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
>>>>> double.  Simply casting between these types changes the value.  This is 
>>>>> what MS SQL Server does.
>>>>> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
>>>>> is what PostgreSQL does.
>>>>> 
>>>>> The question I’m interested in is not just which is the right decision, 
>>>>> but how the right decision should be arrived at.  My view is that we 
>>>>> should primarily aim for least surprise to the user, but I’m keen to 
>>>>> hear from others.
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>>>>> <mailto:dev-unsubscr...@cassandra.apache.org>
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>>>>> <mailto:dev-h...@cassandra.apache.org>
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
>>>> <mailto:dev-unsubscr...@cassandra.apache.org>
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org 
>>>> <mailto:dev-h...@cassandra.apache.org>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to