Thanks for the reply!

> AFAIK this EMPTY stuff goes back to thrift days.

This is what I was told, but the expected semantics are not clear so my goal is 
to help flesh things out.

> We let people insert these zero length values back then, so we have to 
> support those zero length values existing for ever :/.

We allow this for some types but not all.  I think where I am coming from is 
write != select, so if we say empty = “no value” or “null” then why does select 
treat it as a value?  Is this the expected behavior?

> maybe we should be converting such values into a tombstone / NULL

Tombstones can be purged, where as empty can’t, so should it?

> Everything besides SAI, including the table based 2i and ALLOW FILTERING, 
> treat EMPTY as a distant value which can be inserted and queried on.  We have 
> supported it in the past, so we should continue to support it into the 
> future, even if it is painful to do.

I guess where I come from here is what semantics do we expect.

So lets say v0 is empty bytes int

SELECT CAST(v0 AS text) 

Is this null or empty bytes?  In our project this is null

SELECT JSON v0

Is this null or empty bytes?  In our project this is null

SELECT avg(v0) …

Is this null or empty bytes?  In our project this is null

So in most places you touch empty bytes we treat it as null, but only in 
filtering do we not.

> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan <jeremiah.jor...@gmail.com> 
> wrote:
> 
> AFAIK this EMPTY stuff goes back to thrift days.  We let people insert these 
> zero length values back then, so we have to support those zero length values 
> existing for ever :/.
> 
> How useful is such a distinction?  I don’t know.  Is anybody actually doing 
> this?  Well Andres brought up 
> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because we 
> had an end user create an SAI index on a column which contained EMPTY values 
> in it.  So people are inserting these into the database.  Would they expect 
> to be able to query by EMPTY?  I do not know.
> 
> This is the first I have heard of the “isEmptyValueMeaningless” setting.  The 
> meaning of EMPTY to me has always been the same for an Integer or a String, 
> “this column has a value of no value” vs NULL which means "this column is not 
> set/has no value”.  If we truly want to follow the spirit of that setting, 
> then maybe we should be converting such values into a tombstone / NULL up 
> front when deserializing them, rather than storing the EMPTY byte buffer in 
> the DB?
> 
> Anyway, I am kind of rambling here.  I am of two minds.
> I can see that this does seem like a silly distinction to have for some 
> types, so maybe we should just decide that in a CQL world, EMPTY means NULL 
> for some types, and actually just make that a tombstone.  Maybe 6.0 would be 
> a good major version change to make such a “breaking” behavior change in.
> 
> I can also see the “don’t screw up the legacy apps” use case.  Everything 
> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY as 
> a distant value which can be inserted and queried on.  We have supported it 
> in the past, so we should continue to support it into the future, even if it 
> is painful to do.
> 
> Flip a coin and I can argue either side.  So I would love to hear others 
> thoughts to convince me one way to the other.
> 
> -Jeremiah
> 
> 
> 
> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe <calebrackli...@gmail.com 
> <mailto:calebrackli...@gmail.com>> wrote:
>> The case where allowsEmpty == true AND is meaningless == true is especially 
>> confusing. If I could design this from scratch, I would reject writes and 
>> filtering on EMPTY values for int and the other types where meaningless == 
>> true. (In other words, if we allow EMPTY, it is meaningful and queryable. If 
>> we don't, it isn't.) That avoids problems that can't have anything other 
>> than an arbitrary solution, like what we do with < and > for EMPTY for int. 
>> When we add IS [NOT] NULL support, that would preferably NOT match EMPTY 
>> values for the types where empty means something, like strings. For 
>> everything else, EMPTY could be equivalent to null and match IS NULL.
>> 
>> The only real way to make SAI compatible with the current behavior is to add 
>> something like a special postings list to its data structures that 
>> corresponds to the rows where the indexed column value is EMPTY.
>> 
>> On Tue, Feb 11, 2025 at 12:21 PM David Capwell <dcapw...@apple.com 
>> <mailto:dcapw...@apple.com>> wrote:
>>> Bringing this discussion to dev@ rather than Slack as we try to figure out 
>>> CASSANDRA-20313 and CASSANDRA-19461.
>>> 
>>> In the type system, we have 2 different (but related) methods:
>>> 
>>> AbstractType#allowsEmpty                        - if the user gives empty 
>>> bytes (new byte[0]) will the type reject it
>>> AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes, 
>>> should this be handled like null?
>>> 
>>> In practice, there are 2 cases that matter:
>>> 
>>> allowsEmpty = true AND is meaningless = false - stuff like text and bytes
>>> allowsEmpty = true AND is meaningless = true  - many types, example "int"
>>> 
>>> What this means is that users are able to use empty bytes when writing to 
>>> these types, but this leads to complexity in the filter path, and is 
>>> something we are trying to flesh out the “correct” semantics for SAI.
>>> 
>>> Simple example:
>>> 
>>> {code}
>>> @Test
>>> public void test() throws IOException
>>> {
>>>     try (Cluster cluster = Cluster.build(1).start())
>>>     {
>>>         init(cluster);
>>>         cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int 
>>> primary key, v int)"));
>>>         IInvokableInstance node = cluster.get(1);
>>>         for (int i = 0; i < 10; i++)
>>>             node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) 
>>> VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER);
>>> 
>>>         var qr = node.executeInternalWithResult(withKeyspace("SELECT * FROM 
>>> %s.tbl WHERE v=? ALLOW FILTERING"), ByteBufferUtil.EMPTY_BYTE_BUFFER);
>>>         StringBuilder sb = new StringBuilder();
>>>         sb.append(qr.names());
>>>         while (qr.hasNext())
>>>         {
>>>             var next = qr.next();
>>>             sb.append('\n').append(next);
>>>         }
>>>         System.out.println(sb);
>>>     }
>>> }
>>> {code}
>>> 
>>> “Should” this return 10 rows or 0?  In this case, the type is int, and int 
>>> defines empty as meaningless, which means it should act as a null; yet this 
>>> query returns 10 rows, which violates CQL as foo = null == false.
>>> 
>>> Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is 
>>> still open), but if we did add such a thing we would also need to figure 
>>> out the semantics with regard to these cases.

Reply via email to