Hello all

Sorry to disturb the discussion but there is an official announcement from
Microsoft about CosmosDB supporting Vector Search

https://devblogs.microsoft.com/cosmosdb/introducing-vector-search-in-azure-cosmos-db-for-mongodb-vcore/

Looks like Jonathan is spot on about this feature, it's quite a hot topic
for the moment !

Regards

Duy Hai DOAN

On Wed, May 24, 2023 at 2:44 PM Josh McKenzie <jmcken...@fastmail.com>
wrote:

> +1 to the flow of:
>
> 1: ORDER BY?
>
> 2:  Oh. Yeah. That *does *makes sense.
>
> ;)
>
> (sending from fastmail in the hopes the image doesn't get stripped. Thanks
> ASF smtp server...)
>
> ~Josh
>
> On Wed, May 24, 2023, at 1:00 AM, Jeremiah D Jordan wrote:
>
> At first I wasn’t sure about using ORDER BY, but the more I think about
> what is actually going on, I think it does make sense.
>
> This also matches up with some ideas that have been floating around about
> being able to ORDER BY a sorted SAI index.
>
> -Jeremiah
>
> On May 22, 2023, at 2:28 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
> Hi all,
>
> I have a branch of vector search based on cep-7-sai at
> *https://github.com/datastax/cassandra/tree/cep-vsearch*
> <https://github.com/datastax/cassandra/tree/cep-vsearch>. Compared to the
> original POC branch, this one is based on the SAI code that will be
> mainline soon, and handles distributed scatter/gather.  Updates and deletes
> to vector values are still not supported.
>
> I also put together a demo that uses this branch to provide context to
> OpenAI’s GPT, available here: *https://github.com/jbellis/cassgpt*
> <https://github.com/jbellis/cassgpt>.
>
> Here is the query that gets executed:
>
>     SELECT id, start, end, text
>     FROM {self.keyspace}.{self.table}
>     WHERE embedding ANN OF %s
>     LIMIT %s
>
> The more I used the proposed `ANN OF` syntax, the less I liked it.  This
> is because we don’t want an actual boolean predicate; we just want to order
> results.  Put another way, `ANN OF` will include all rows of the table
> given a high enough `LIMIT`, and that makes it a bad fit for expression
> processing that expects to be able to filter out rows before it starts
> LIMIT-ing.  And in fact the code to support executing the query looks
> suspiciously like what you’d want for `ORDER BY`.
>
> I propose that we adopt `ORDER BY` syntax, supporting it for vector
> indexes first and eventually for all SAI indexes.  So this query would
> become
>
>     SELECT id, start, end, text
>     FROM {self.keyspace}.{self.table}
>     ORDER BY embedding ANN OF %s
>     LIMIT %s
>
> And it would compose with other SAI indexes with syntax like
>
>     SELECT id, start, end, text
>     FROM {self.keyspace}.{self.table}
>     WHERE publish_date > %s
>     ORDER BY embedding ANN OF %s
>     LIMIT %s
>
> Related work:
>
> This is similar to the approach used by pgvector, except they invented the
> symbolic operator `<->` that has the same semantics as `ANN OF`.  I am okay
> with adopting their operator, but I think ANN OF is more readable.
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>

Reply via email to