Hi all, Following the recent discussion threads, I would like to propose CEP-30 to add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached Indexes (SAI) to Apache Cassandra.
The primary goal of this proposal is to implement ANN vector search capabilities, making Cassandra more useful to AI developers and organizations managing large datasets that can benefit from fast similarity search. The implementation will leverage Lucene's Hierarchical Navigable Small World (HNSW) library and introduce a new CQL data type for vector embeddings, a new SAI index for ANN search functionality, and a new CQL operator for performing ANN search queries. We are targeting the 5.0 release for this feature, in conjunction with the release of SAI. The proposed changes will maintain compatibility with existing Cassandra functionality and compose well with the already-approved SAI features. Please find the full CEP document here: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes -- Jonathan Ellis co-founder, http://www.datastax.com @spyced