Re: [DISCUSS] Vector type and empty value

Benedict Wed, 20 Sep 2023 08:02:08 -0700

Yes, if this is what was meant by empty I agree. It’s nonsensical for most 
types. Apologies for any confusion.


I don’t think we can readily migrate old types away from this however, without 
breaking backwards compatibility. We can only prevent its use in the CQL layer 
where support isn’t required. My understanding was that we had at least tried 
to do this for all non-thrift schemas, but perhaps we did not do so thoroughly 
and now may have some CQL legacy support requirements as well.

> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko <alek...@apple.com> wrote:
> 
> Allowing zero-length byte arrays for most old types is just a legacy from 
> Darker Days. It’s a distinct concern from columns being nullable or not.
> 
> There are a couple types where this makes sense: strings and blobs. All else 
> should not allow this except for backward compatibility reasons. So, not for 
> new types.
> 
>>> On 20 Sep 2023, at 00:08, David Capwell <dcapw...@apple.com> wrote:
>>> 
>>> When does empty mean null?
>> 
>> 
>> Most types are this way
>> 
>> @Test
>> public void nullExample()
>> {
>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new 
>> byte[0]));
>> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined");
>> }
>> 
>> 
>> This test will NPE in getInt as the returned BB is seen as “null” for int32 
>> type, you can make it “safer” by changing to the following
>> 
>> if (result.has("cuteness")) System.out.println("Cuteness score: " + 
>> Int32Type.instance.compose(result.getBlob("cuteness")));
>> 
>> Now we get the log "Cuteness score: null”
>> 
>> What’s even better (just found this out) is that client isn’t consistent or 
>> correct in these cases!
>> 
>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, 
>> "SELECT * FROM %s WHERE pk=0").one();
>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness 
>> score: " + result.getInt("cuteness"));
>> else System.out.println("Cuteness score is undefined”);
>> 
>> This prints "Cuteness score: 0”
>> 
>> So for Cassandra we think the value is “null” but java driver thinks it’s 0?
>> 
>>> Do we have types where writing an empty value creates a tombstone?
>> 
>> Empty does not generate a tombstone for any type, but empty has a similar 
>> user experience as we return null in both cases (but just found out that the 
>> drivers may not be consistent with this…)
>> 
>>>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan <jeremiah.jor...@gmail.com> 
>>>> wrote:
>>> 
>>> When does empty mean null?  My understanding was that empty is a valid 
>>> value for the types that support it, separate from null (aka a tombstone). 
>>> Do we have types where writing an empty value creates a tombstone?
>>> 
>>> I agree with David that my preference would be for only blob and string 
>>> like types to support empty. It’s too late for the existing types, but we 
>>> should hold to this going forward. Which is what I think the idea was in 
>>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it was 
>>> sad the existing numerics were emptiable, but too late to change, and we 
>>> could correct it for newer types.
>>> 
>>>> On Sep 19, 2023, at 12:12 PM, David Capwell <dcapw...@apple.com> wrote:
>>>> 
>>>> 
>>>>> 
>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started 
>>>>> making types non -emptiable. This approach makes more sense to me as 
>>>>> having to deal with empty value is error prone in my opinion.
>>>> 
>>>> I agree it’s confusing, and in the patch I found that different code paths 
>>>> didn’t handle things correctly as we have some times (most) that support 
>>>> empty bytes, and some that do not…. Empty also has different meaning in 
>>>> different code paths; for most it means “null”, and for some other types 
>>>> it means “empty”…. To try to make things more clear I added 
>>>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, 
>>>> org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so 
>>>> each type can define if empty is null or not.
>>>> 
>>>>> I also think that it would be good to standardize on one approach to 
>>>>> avoid confusion.
>>>> 
>>>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. 
>>>> Let’s say I have a “blob” type and I write an empty byte… what does this 
>>>> mean?  What does it mean for "text" type?  The fact I get back a null in 
>>>> both those cases was very confusing to me… I do feel that some types 
>>>> should support empty, and the common code of empty == null I think is very 
>>>> brittle (blob/text was not correct in different places due to this...)… so 
>>>> I am cool with removing that relationship, but don’t think we should have 
>>>> a rule blocking empty for all current / future types as it some times does 
>>>> make sense.
>>>> 
>>>>> empty vector (I presume) for the vector type?
>>>> 
>>>> Empty vectors (vector[0]) are blocked at the type level, the smallest 
>>>> vector is vector[1]
>>>> 
>>>>> as types that can never be null
>>>> 
>>>> One pro here is that “null” is cheaper (in some regards) than delete 
>>>> (though we can never purge), but having 2 similar behaviors (write null, 
>>>> do a delete) at the type level is a bit confusing… Right now I am allowed 
>>>> to do the following (the below isn’t valid CQL, its a hybrid of CQL + Java 
>>>> code…)
>>>> 
>>>> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
>>>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
>>>> 
>>>> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY 
>>>> ((pk1, pk2));
>>>> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new 
>>>> byte[0], new byte[0]) — valid as the partition key is not empty as its a 
>>>> composite of 2 empty values, this is the same as new byte[2]
>>>> 
>>>> The first time I ever found out that empty bytes was valid was when a user 
>>>> was trying to abuse this in collections (also the fact collections support 
>>>> null in some cases and not others is fun…)…. It was blowing up in random 
>>>> places… good times!
>>>> 
>>>> I am personally not in favor of allowing empty bytes (other than for blob 
>>>> / text as that is actually valid for the domain), but having similar types 
>>>> having different semantics I feel is more problematic...
>>>> 
>>>>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>>>>> 
>>>>>> I am strongly in favour of permitting the table definition forbidding 
>>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t think 
>>>>>> we should have types that are inherently incapable of being null.
>>>>> I'm with Benedict. Seems like this could help prevent whatever "nulls in 
>>>>> primary key columns" problems Aleksey was alluding to on those tickets 
>>>>> back in the day that pushed us towards making the new types non-emptiable 
>>>>> as well (i.e. primary keys are non-null in table definition).
>>>>> 
>>>>> Furthering Alex' question, having a default value for unset fields in any 
>>>>> non-collection context seems... quite surprising to me in a database. I 
>>>>> could see the argument for making container / collection types 
>>>>> non-nullable, maybe, but that just keeps us in a potential straddle case 
>>>>> (some types nullable, some not).
>>>>> 
>>>>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>>>>>> 
>>>>>> If I understand this suggestion correctly it is a whole can of worms, as 
>>>>>> types that can never be null prevent us ever supporting outer joins that 
>>>>>> return these types.
>>>>>> 
>>>>>> I am strongly in favour of permitting the table definition forbidding 
>>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t think 
>>>>>> we should have types that are inherently incapable of being null. I also 
>>>>>> certainly don’t think we should have bifurcated our behaviour between 
>>>>>> types like this.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 19 Sep 2023, at 11:54, Alex Petrov <al...@coffeenco.de> wrote:
>>>>>>> 
>>>>>>> To make sure I understand this right; does that mean there will be a 
>>>>>>> default value for unset fields? Like 0 for numerical values, and an 
>>>>>>> empty vector (I presume) for the vector type?
>>>>>>> 
>>>>>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>>>>>>>> Hi everybody,
>>>>>>>> 
>>>>>>>> I noticed that the new Vector type accepts empty ByteBuffer values as 
>>>>>>>> an input representing null.
>>>>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started 
>>>>>>>> making types non -emptiable. This approach makes more sense to me as 
>>>>>>>> having to deal with empty value is error prone in my opinion.
>>>>>>>> I also think that it would be good to standardize on one approach to 
>>>>>>>> avoid confusion.
>>>>>>>> 
>>>>>>>> Should we make the Vector type non-emptiable and stick to it for the 
>>>>>>>> new types?
>>>>>>>> 
>>>>>>>> I like to hear your opinion.
>>>> 
>>>> 
>> 
>

Re: [DISCUSS] Vector type and empty value

Reply via email to