Re: [DISCUSS] Vector type and empty value

Andrés de la Peña Fri, 22 Sep 2023 06:03:32 -0700

I have just created CASSANDRA-18876 for this. I'll post a patch very soon.

On Wed, 20 Sept 2023 at 19:41, David Capwell <dcapw...@apple.com> wrote:


> I don’t think we can readily migrate old types away from this however,
> without breaking backwards compatibility.
>
>
> Given that java driver has a different behavior from server, I wouldn’t be
> shocked to see that other drivers also have their own custom behaviors… so
> not clear how to migrate unless we actually hand a user facing standard per
> type… if all drivers use a “default value” and is consistent, I do think we
> could migrate, but would need to live with this till at least 6.0+
>
> We can only prevent its use in the CQL layer where support isn’t required.
>
>
> +1
>
> On Sep 20, 2023, at 7:38 AM, Benedict <bened...@apache.org> wrote:
>
> Yes, if this is what was meant by empty I agree. It’s nonsensical for most
> types. Apologies for any confusion.
>
> I don’t think we can readily migrate old types away from this however,
> without breaking backwards compatibility. We can only prevent its use in
> the CQL layer where support isn’t required. My understanding was that we
> had at least tried to do this for all non-thrift schemas, but perhaps we
> did not do so thoroughly and now may have some CQL legacy support
> requirements as well.
>
> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko <alek...@apple.com> wrote:
>
> Allowing zero-length byte arrays for most old types is just a legacy from
> Darker Days. It’s a distinct concern from columns being nullable or not.
>
> There are a couple types where this makes sense: strings and blobs. All
> else should not allow this except for backward compatibility reasons. So,
> not for new types.
>
> On 20 Sep 2023, at 00:08, David Capwell <dcapw...@apple.com> wrote:
>
> When does empty mean null?
>
>
>
> Most types are this way
>
> @Test
> public void nullExample()
> {
> createTable("CREATE TABLE %s (pk int primary key, cuteness int)");
> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new
> byte[0]));
> Row result = execute("SELECT * FROM %s WHERE pk=0").one();
> if (result.has("cuteness")) System.out.println("Cuteness score: " +
> result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined");
> }
>
>
> This test will NPE in getInt as the returned BB is seen as “null” for
> int32 type, you can make it “safer” by changing to the following
>
> if (result.has("cuteness")) System.out.println("Cuteness score: " +
> Int32Type.instance.compose(result.getBlob("cuteness")));
>
> Now we get the log "Cuteness score: null”
>
> What’s even better (just found this out) is that client isn’t consistent
> or correct in these cases!
>
> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT,
> "SELECT * FROM %s WHERE pk=0").one();
> if (result.getBytesUnsafe("cuteness") != null)
> System.out.println("Cuteness score: " + result.getInt("cuteness"));
> else System.out.println("Cuteness score is undefined”);
>
> This prints "Cuteness score: 0”
>
> So for Cassandra we think the value is “null” but java driver thinks it’s
> 0?
>
> Do we have types where writing an empty value creates a tombstone?
>
>
> Empty does not generate a tombstone for any type, but empty has a similar
> user experience as we return null in both cases (but just found out that
> the drivers may not be consistent with this…)
>
> On Sep 19, 2023, at 3:33 PM, J. D. Jordan <jeremiah.jor...@gmail.com>
> wrote:
>
>
> When does empty mean null?  My understanding was that empty is a valid
> value for the types that support it, separate from null (aka a tombstone).
> Do we have types where writing an empty value creates a tombstone?
>
> I agree with David that my preference would be for only blob and string
> like types to support empty. It’s too late for the existing types, but we
> should hold to this going forward. Which is what I think the idea was in
> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well?  That it
> was sad the existing numerics were emptiable, but too late to change, and
> we could correct it for newer types.
>
> On Sep 19, 2023, at 12:12 PM, David Capwell <dcapw...@apple.com> wrote:
>
> 
>
>
> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started making
> types non -emptiable. This approach makes more sense to me as having to
> deal with empty value is error prone in my opinion.
>
>
> I agree it’s confusing, and in the patch I found that different code paths
> didn’t handle things correctly as we have some times (most) that support
> empty bytes, and some that do not…. Empty also has different meaning in
> different code paths; for most it means “null”, and for some other types it
> means “empty”…. To try to make things more clear I added
> org.apache.cassandra.db.marshal.AbstractType#isNull(V,
> org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so
> each type can define if empty is null or not.
>
> I also think that it would be good to standardize on one approach to avoid
> confusion.
>
>
> I agree, but also don’t feel it’s a perfect one-size-fits-all thing….
> Let’s say I have a “blob” type and I write an empty byte… what does this
> mean?  What does it mean for "text" type?  The fact I get back a null in
> both those cases was very confusing to me… I do feel that some types should
> support empty, and the common code of empty == null I think is very brittle
> (blob/text was not correct in different places due to this...)… so I am
> cool with removing that relationship, but don’t think we should have a rule
> blocking empty for all current / future types as it some times does make
> sense.
>
> empty vector (I presume) for the vector type?
>
>
> Empty vectors (vector[0]) are blocked at the type level, the smallest
> vector is vector[1]
>
> as types that can never be null
>
>
> One pro here is that “null” is cheaper (in some regards) than delete
> (though we can never purge), but having 2 similar behaviors (write null, do
> a delete) at the type level is a bit confusing… Right now I am allowed to
> do the following (the below isn’t valid CQL, its a hybrid of CQL + Java
> code…)
>
> CREATE TABLE fluffykittens (pk int primary key, cuteness int);
> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0])
>
> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY
> ((pk1, pk2));
> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new
> byte[0], new byte[0]) — valid as the partition key is not empty as its a
> composite of 2 empty values, this is the same as new byte[2]
>
> The first time I ever found out that empty bytes was valid was when a user
> was trying to abuse this in collections (also the fact collections support
> null in some cases and not others is fun…)…. It was blowing up in random
> places… good times!
>
> I am personally not in favor of allowing empty bytes (other than for blob
> / text as that is actually valid for the domain), but having similar types
> having different semantics I feel is more problematic...
>
> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>
> I am strongly in favour of permitting the table definition forbidding
> nulls - and perhaps even defaulting to this behaviour. But I don’t think we
> should have types that are inherently incapable of being null.
>
> I'm with Benedict. Seems like this could help prevent whatever "nulls in
> primary key columns" problems Aleksey was alluding to on those tickets back
> in the day that pushed us towards making the new types non-emptiable as
> well (i.e. primary keys are non-null in table definition).
>
> Furthering Alex' question, having a default value for unset fields in any
> non-collection context seems... quite surprising to me in a database. I
> could see the argument for making container / collection types
> non-nullable, maybe, but that just keeps us in a potential straddle case
> (some types nullable, some not).
>
> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote:
>
>
> If I understand this suggestion correctly it is a whole can of worms, as
> types that can never be null prevent us ever supporting outer joins that
> return these types.
>
> I am strongly in favour of permitting the table definition forbidding
> nulls - and perhaps even defaulting to this behaviour. But I don’t think we
> should have types that are inherently incapable of being null. I also
> certainly don’t think we should have bifurcated our behaviour between types
> like this.
>
>
>
> On 19 Sep 2023, at 11:54, Alex Petrov <al...@coffeenco.de> wrote:
> 
> To make sure I understand this right; does that mean there will be a
> default value for unset fields? Like 0 for numerical values, and an empty
> vector (I presume) for the vector type?
>
> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote:
>
> Hi everybody,
>
> I noticed that the new Vector type accepts empty ByteBuffer values as an
> input representing null.
> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making
> types non -emptiable. This approach makes more sense to me as having to
> deal with empty value is error prone in my opinion.
> I also think that it would be good to standardize on one approach to avoid
> confusion.
>
> Should we make the Vector type non-emptiable and stick to it for the new
> types?
>
> I like to hear your opinion.
>
>
>
>
>
>
>
>

Re: [DISCUSS] Vector type and empty value

Reply via email to