Re: [DISCUSS] Vector type and empty value

David Capwell Tue, 19 Sep 2023 10:12:25 -0700

> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
> types non -emptiable. This approach makes more sense to me as having to deal 
> with empty value is error prone in my opinion.

I agree it’s confusing, and in the patch I found that different code paths 
didn’t handle things correctly as we have some times (most) that support empty 
bytes, and some that do not…. Empty also has different meaning in different 
code paths; for most it means “null”, and for some other types it means 
“empty”…. To try to make things more clear I added 
org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so each 
type can define if empty is null or not.

> I also think that it would be good to standardize on one approach to avoid 
> confusion.

I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. Let’s say 
I have a “blob” type and I write an empty byte… what does this mean?  What does 
it mean for "text" type?  The fact I get back a null in both those cases was 
very confusing to me… I do feel that some types should support empty, and the 
common code of empty == null I think is very brittle (blob/text was not correct 
in different places due to this...)… so I am cool with removing that 
relationship, but don’t think we should have a rule blocking empty for all 
current / future types as it some times does make sense.

> empty vector (I presume) for the vector type?

Empty vectors (vector[0]) are blocked at the type level, the smallest vector is 
vector[1]

>  as types that can never be null

One pro here is that “null” is cheaper (in some regards) than delete (though we 
can never purge), but having 2 similar behaviors (write null, do a delete) at 
the type level is a bit confusing… Right now I am allowed to do the following 
(the below isn’t valid CQL, its a hybrid of CQL + Java code…)

CREATE TABLE fluffykittens (pk int primary key, cuteness int);
INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])

CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY ((pk1, 
pk2));
INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new byte[0], 
new byte[0]) — valid as the partition key is not empty as its a composite of 2 
empty values, this is the same as new byte[2]

The first time I ever found out that empty bytes was valid was when a user was 
trying to abuse this in collections (also the fact collections support null in 
some cases and not others is fun…)…. It was blowing up in random places… good 
times!

I am personally not in favor of allowing empty bytes (other than for blob / 
text as that is actually valid for the domain), but having similar types having 
different semantics I feel is more problematic...

> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <jmcken...@apache.org> wrote:
> 
>> I am strongly in favour of permitting the table definition forbidding nulls 
>> - and perhaps even defaulting to this behaviour. But I don’t think we should 
>> have types that are inherently incapable of being null.
> I'm with Benedict. Seems like this could help prevent whatever "nulls in 
> primary key columns" problems Aleksey was alluding to on those tickets back 
> in the day that pushed us towards making the new types non-emptiable as well 
> (i.e. primary keys are non-null in table definition).
> 
> Furthering Alex' question, having a default value for unset fields in any 
> non-collection context seems... quite surprising to me in a database. I could 
> see the argument for making container / collection types non-nullable, maybe, 
> but that just keeps us in a potential straddle case (some types nullable, 
> some not).
> 
> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>> 
>> If I understand this suggestion correctly it is a whole can of worms, as 
>> types that can never be null prevent us ever supporting outer joins that 
>> return these types.
>> 
>> I am strongly in favour of permitting the table definition forbidding nulls 
>> - and perhaps even defaulting to this behaviour. But I don’t think we should 
>> have types that are inherently incapable of being null. I also certainly 
>> don’t think we should have bifurcated our behaviour between types like this.
>> 
>> 
>> 
>>> On 19 Sep 2023, at 11:54, Alex Petrov <al...@coffeenco.de> wrote:
>>> 
>>> To make sure I understand this right; does that mean there will be a 
>>> default value for unset fields? Like 0 for numerical values, and an empty 
>>> vector (I presume) for the vector type?
>>> 
>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>>>> Hi everybody,
>>>> 
>>>> I noticed that the new Vector type accepts empty ByteBuffer values as an 
>>>> input representing null.
>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making 
>>>> types non -emptiable. This approach makes more sense to me as having to 
>>>> deal with empty value is error prone in my opinion.
>>>> I also think that it would be good to standardize on one approach to avoid 
>>>> confusion.
>>>> 
>>>> Should we make the Vector type non-emptiable and stick to it for the new 
>>>> types?
>>>>     
>>>> I like to hear your opinion.

Re: [DISCUSS] Vector type and empty value

Reply via email to