Re: [DISCUSS] Vector type and empty value

J. D. Jordan Tue, 19 Sep 2023 15:34:32 -0700

When does empty mean null?  My understanding was that empty is a valid value 
for the types that support it, separate from null (aka a tombstone). Do we have 
types where writing an empty value creates a tombstone?


I agree with David that my preference would be for only blob and string like 
types to support empty. It’s too late for the existing types, but we should 
hold to this going forward. Which is what I think the idea was in 
https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was sad 
the existing numerics were emptiable, but too late to change, and we could 
correct it for newer types.

> On Sep 19, 2023, at 12:12 PM, David Capwell <[email protected]> wrote:
> 
> 
>> 
>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making 
>> types non -emptiable. This approach makes more sense to me as having to deal 
>> with empty value is error prone in my opinion.
> 
> I agree it’s confusing, and in the patch I found that different code paths 
> didn’t handle things correctly as we have some times (most) that support 
> empty bytes, and some that do not…. Empty also has different meaning in 
> different code paths; for most it means “null”, and for some other types it 
> means “empty”…. To try to make things more clear I added 
> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
> org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so each 
> type can define if empty is null or not.
> 
>> I also think that it would be good to standardize on one approach to avoid 
>> confusion.
> 
> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s 
> say I have a “blob” type and I write an empty byte… what does this mean?  
> What does it mean for "text" type?  The fact I get back a null in both those 
> cases was very confusing to me… I do feel that some types should support 
> empty, and the common code of empty == null I think is very brittle 
> (blob/text was not correct in different places due to this...)… so I am cool 
> with removing that relationship, but don’t think we should have a rule 
> blocking empty for all current / future types as it some times does make 
> sense.
> 
>> empty vector (I presume) for the vector type?
> 
> Empty vectors (vector[0]) are blocked at the type level, the smallest vector 
> is vector[1]
> 
>> as types that can never be null
> 
> One pro here is that “null” is cheaper (in some regards) than delete (though 
> we can never purge), but having 2 similar behaviors (write null, do a delete) 
> at the type level is a bit confusing… Right now I am allowed to do the 
> following (the below isn’t valid CQL, its a hybrid of CQL + Java code…)
> 
> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
> 
> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, 
> pk2));
> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new 
> byte[0], new byte[0]) — valid as the partition key is not empty as its a 
> composite of 2 empty values, this is the same as new byte[2]
> 
> The first time I ever found out that empty bytes was valid was when a user 
> was trying to abuse this in collections (also the fact collections support 
> null in some cases and not others is fun…)…. It was blowing up in random 
> places… good times!
> 
> I am personally not in favor of allowing empty bytes (other than for blob / 
> text as that is actually valid for the domain), but having similar types 
> having different semantics I feel is more problematic...
> 
>>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <[email protected]> wrote:
>>> 
>>> I am strongly in favour of permitting the table definition forbidding nulls 
>>> - and perhaps even defaulting to this behaviour. But I don’t think we 
>>> should have types that are inherently incapable of being null.
>> I'm with Benedict. Seems like this could help prevent whatever "nulls in 
>> primary key columns" problems Aleksey was alluding to on those tickets back 
>> in the day that pushed us towards making the new types non-emptiable as well 
>> (i.e. primary keys are non-null in table definition).
>> 
>> Furthering Alex' question, having a default value for unset fields in any 
>> non-collection context seems... quite surprising to me in a database. I 
>> could see the argument for making container / collection types non-nullable, 
>> maybe, but that just keeps us in a potential straddle case (some types 
>> nullable, some not).
>> 
>>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>>> 
>>> If I understand this suggestion correctly it is a whole can of worms, as 
>>> types that can never be null prevent us ever supporting outer joins that 
>>> return these types.
>>> 
>>> I am strongly in favour of permitting the table definition forbidding nulls 
>>> - and perhaps even defaulting to this behaviour. But I don’t think we 
>>> should have types that are inherently incapable of being null. I also 
>>> certainly don’t think we should have bifurcated our behaviour between types 
>>> like this.
>>> 
>>> 
>>> 
>>>> On 19 Sep 2023, at 11:54, Alex Petrov <[email protected]> wrote:
>>>> 
>>>> To make sure I understand this right; does that mean there will be a 
>>>> default value for unset fields? Like 0 for numerical values, and an empty 
>>>> vector (I presume) for the vector type?
>>>> 
>>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>>>>> Hi everybody,
>>>>> 
>>>>> I noticed that the new Vector type accepts empty ByteBuffer values as an 
>>>>> input representing null.
>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
>>>>> types non -emptiable. This approach makes more sense to me as having to 
>>>>> deal with empty value is error prone in my opinion.
>>>>> I also think that it would be good to standardize on one approach to 
>>>>> avoid confusion.
>>>>> 
>>>>> Should we make the Vector type non-emptiable and stick to it for the new 
>>>>> types?
>>>>> 
>>>>> I like to hear your opinion.
> 
>

Re: [DISCUSS] Vector type and empty value

Reply via email to