+1 to the flow of:

1: ORDER BY?

2:  Oh. Yeah. That *does *makes sense.

;)

(sending from fastmail in the hopes the image doesn't get stripped. Thanks ASF 
smtp server...)

~Josh

On Wed, May 24, 2023, at 1:00 AM, Jeremiah D Jordan wrote:
> At first I wasn’t sure about using ORDER BY, but the more I think about what 
> is actually going on, I think it does make sense.
> 
> This also matches up with some ideas that have been floating around about 
> being able to ORDER BY a sorted SAI index.
> 
> -Jeremiah
> 
>> On May 22, 2023, at 2:28 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have a branch of vector search based on cep-7-sai at 
>> _https://github.com/datastax/cassandra/tree/cep-vsearch_. Compared to the 
>> original POC branch, this one is based on the SAI code that will be mainline 
>> soon, and handles distributed scatter/gather.  Updates and deletes to vector 
>> values are still not supported.
>> 
>> I also put together a demo that uses this branch to provide context to 
>> OpenAI’s GPT, available here: _https://github.com/jbellis/cassgpt_.  
>> 
>> Here is the query that gets executed:
>> 
>>     SELECT id, start, end, text 
>>     FROM {self.keyspace}.{self.table} 
>>     WHERE embedding ANN OF %s 
>>     LIMIT %s
>> 
>> The more I used the proposed `ANN OF` syntax, the less I liked it.  This is 
>> because we don’t want an actual boolean predicate; we just want to order 
>> results.  Put another way, `ANN OF` will include all rows of the table given 
>> a high enough `LIMIT`, and that makes it a bad fit for expression processing 
>> that expects to be able to filter out rows before it starts LIMIT-ing.  And 
>> in fact the code to support executing the query looks suspiciously like what 
>> you’d want for `ORDER BY`.
>> 
>> I propose that we adopt `ORDER BY` syntax, supporting it for vector indexes 
>> first and eventually for all SAI indexes.  So this query would become
>> 
>>     SELECT id, start, end, text 
>>     FROM {self.keyspace}.{self.table} 
>>     ORDER BY embedding ANN OF %s 
>>     LIMIT %s
>> 
>> And it would compose with other SAI indexes with syntax like
>> 
>>     SELECT id, start, end, text 
>>     FROM {self.keyspace}.{self.table} 
>>     WHERE publish_date > %s
>>     ORDER BY embedding ANN OF %s 
>>     LIMIT %s
>> 
>> Related work:
>> 
>> This is similar to the approach used by pgvector, except they invented the 
>> symbolic operator `<->` that has the same semantics as `ANN OF`.  I am okay 
>> with adopting their operator, but I think ANN OF is more readable.
>> 
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced

Reply via email to