Hello all Sorry to disturb the discussion but there is an official announcement from Microsoft about CosmosDB supporting Vector Search
https://devblogs.microsoft.com/cosmosdb/introducing-vector-search-in-azure-cosmos-db-for-mongodb-vcore/ Looks like Jonathan is spot on about this feature, it's quite a hot topic for the moment ! Regards Duy Hai DOAN On Wed, May 24, 2023 at 2:44 PM Josh McKenzie <jmcken...@fastmail.com> wrote: > +1 to the flow of: > > 1: ORDER BY? > > 2: Oh. Yeah. That *does *makes sense. > > ;) > > (sending from fastmail in the hopes the image doesn't get stripped. Thanks > ASF smtp server...) > > ~Josh > > On Wed, May 24, 2023, at 1:00 AM, Jeremiah D Jordan wrote: > > At first I wasn’t sure about using ORDER BY, but the more I think about > what is actually going on, I think it does make sense. > > This also matches up with some ideas that have been floating around about > being able to ORDER BY a sorted SAI index. > > -Jeremiah > > On May 22, 2023, at 2:28 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > Hi all, > > I have a branch of vector search based on cep-7-sai at > *https://github.com/datastax/cassandra/tree/cep-vsearch* > <https://github.com/datastax/cassandra/tree/cep-vsearch>. Compared to the > original POC branch, this one is based on the SAI code that will be > mainline soon, and handles distributed scatter/gather. Updates and deletes > to vector values are still not supported. > > I also put together a demo that uses this branch to provide context to > OpenAI’s GPT, available here: *https://github.com/jbellis/cassgpt* > <https://github.com/jbellis/cassgpt>. > > Here is the query that gets executed: > > SELECT id, start, end, text > FROM {self.keyspace}.{self.table} > WHERE embedding ANN OF %s > LIMIT %s > > The more I used the proposed `ANN OF` syntax, the less I liked it. This > is because we don’t want an actual boolean predicate; we just want to order > results. Put another way, `ANN OF` will include all rows of the table > given a high enough `LIMIT`, and that makes it a bad fit for expression > processing that expects to be able to filter out rows before it starts > LIMIT-ing. And in fact the code to support executing the query looks > suspiciously like what you’d want for `ORDER BY`. > > I propose that we adopt `ORDER BY` syntax, supporting it for vector > indexes first and eventually for all SAI indexes. So this query would > become > > SELECT id, start, end, text > FROM {self.keyspace}.{self.table} > ORDER BY embedding ANN OF %s > LIMIT %s > > And it would compose with other SAI indexes with syntax like > > SELECT id, start, end, text > FROM {self.keyspace}.{self.table} > WHERE publish_date > %s > ORDER BY embedding ANN OF %s > LIMIT %s > > Related work: > > This is similar to the approach used by pgvector, except they invented the > symbolic operator `<->` that has the same semantics as `ANN OF`. I am okay > with adopting their operator, but I think ANN OF is more readable. > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > > >