branch: externals/vecdb commit 7362e46ae356eb2d07f26a0b225673a3d031feff Author: Andrew Hyatt <ahy...@gmail.com> Commit: Andrew Hyatt <ahy...@gmail.com>
Remove references to "embed" in the README --- README.org | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/README.org b/README.org index 579befe6d3..7c010e6410 100644 --- a/README.org +++ b/README.org @@ -1,30 +1,30 @@ #+TITLE: vecdb: Vector Search Library for Emacs * Introduction -The =vecdb= package provides an interface to a vector database, where vectors are embeddings representing pieces of text. These databases enable "semantic search", which is a powerful way to search over meaning. This kind of search needs specialized storage and retrieval. +The =vecdb= package provides an interface to a vector database, where vectors are vecdbdings representing pieces of text. These databases enable "semantic search", which is a powerful way to search over meaning. This kind of search needs specialized storage and retrieval. This package doesn't provide end-user functionality on its own; it is designed to be used in other packages that need semantic search. -The package does not provide embeddings, that can be done with the [[https://github.com/ahyatt/llm][llm]] package, or any source of embeddings. +The package does not provide vecdbdings, that can be done with the [[https://github.com/ahyatt/llm][llm]] package, or any source of vecdbdings. * Configuring the collection -There are two concepts that together define a collection database of embeddings: the /provider/, and the /collection/. The provider is what kind of backend we are using, right now either =chroma=, or =qdrant=. This is a struct defined by the exact provider you want to use. +There are two concepts that together define a collection database of vecdbdings: the /provider/, and the /collection/. The provider is what kind of backend we are using, right now either =chroma=, or =qdrant=. This is a struct defined by the exact provider you want to use. -The collection is, for that provider, what exact database is getting used, with each collection having its own separate data. Collections must be created before being used. The collection is defined by the struct ~vecdb-collection~ which has a ~name~ (used to identify the collection), ~vector-size~, and ~payload-fields~. The ~vector-size~ will be based on the size of the embedding vector from your provider. 1536 is what Open AI uses. ~payload-fields~ is an alist of fields and their [...] +The collection is, for that provider, what exact database is getting used, with each collection having its own separate data. Collections must be created before being used. The collection is defined by the struct ~vecdb-collection~ which has a ~name~ (used to identify the collection), ~vector-size~, and ~payload-fields~. The ~vector-size~ will be based on the size of the vecdbding vector from your provider. 1536 is what Open AI uses. ~payload-fields~ is an alist of fields and their [...] An example, putting it all together, is: #+begin_src emacs-lisp -(defvar my-embed-provider (make-embed-qdrant-provider :api-key my-qdrant-api-key :url my-qdrant-url)) -(defvar my-embed-collection (make-vecdb-collection :name "my test collection" :vector-size 1536 :payload-fields (('my-id . 'string)))) +(defvar my-vecdb-provider (make-vecdb-provider :api-key my-qdrant-api-key :url my-qdrant-url)) +(defvar my-vecdb-collection (make-vecdb-collection :name "my test collection" :vector-size 1536 :payload-fields (('my-id . 'string)))) #+end_src -The provider will be supplied by the end-user, specifying how they want things stored, and any data necessary for that storage and retrieval to function. The collection is typically partially supplied by the application, with the possible exception of embedding size, which may be dependent on the exact embedding provider they are using. +The provider will be supplied by the end-user, specifying how they want things stored, and any data necessary for that storage and retrieval to function. The collection is typically partially supplied by the application, with the possible exception of vecdbding size, which may be dependent on the exact vecdbding provider they are using. Collections must be created before they can be used with ~vecdb-create~, and ~vecdb-exists~ can return whether the collection exists. #+begin_src emacs-lisp -(unless (vecdb-exists my-embed-provider my-embed-collection) - (vecdb-create my-embed-provider my-embed-collection)) +(unless (vecdb-exists my-vecdb-provider my-vecdb-collection) + (vecdb-create my-vecdb-provider my-vecdb-collection)) #+end_src They can also be deleted with ~vecdb-delete~. @@ -36,7 +36,7 @@ or replaces it, based on the =id= of the item. Here's an example of adding or replacing one item: #+begin_src emacs-lisp -(vecdb-upsert-items my-embed-provider my-embed-collection +(vecdb-upsert-items my-vecdb-provider my-vecdb-collection (list (make-vecdb-item :id "example-id" :vector [0.1 0.2 0.3 0.4] @@ -50,7 +50,7 @@ IDs used in =vecdb= *must* be =uint64= values. If you have another ID you need Querying the database can be done with ~vecdb-search-by-vector~, passing it a vector and optionally a number of results to return (10 is the default). #+begin_src emacs-lisp -(vecdb-search-by-vector my-embed-provider my-embed-collection [0.3 0.1 0.5 -0.9] 20) +(vecdb-search-by-vector my-vecdb-provider my-vecdb-collection [0.3 0.1 0.5 -0.9] 20) #+end_src This will return the specifies number of =vecdb-item= structs, with the payloads they were stored with. @@ -61,7 +61,7 @@ This will return the specifies number of =vecdb-item= structs, with the payloads A qdrant provider is defined like: #+begin_src emacs-lisp -(defvar my-embed-provider (make-embed-qdrant-provider :api-key my-qdrant-api-key :url my-qdrant-url)) +(defvar my-vecdb-provider (make-vecdb-qdrant-provider :api-key my-qdrant-api-key :url my-qdrant-url)) #+end_src Substitute =my-qdrant-api-key= with your key, and =my-qdrant-url= is the URL of the server that is used to serve your data. This will be unique to your collection in the cloud, or a local URL for docker. @@ -73,13 +73,13 @@ If running locally, before use, you must run =chroma run= to start the server. The chroma provider has two additional divisions of data above the collection, and these are specified in the provider itself: the /tenant/ and the /database/. These will both default to ="default"=, but can be specifed. Because the chroma provider is local, my default, no configuration is needed: #+begin_src emacs-lisp -(defvar my-chroma-provider (make-chroma-provider)) +(defvar my-chroma-provider (make-vecdb-chroma-provider)) #+end_src However, the full set of options, here demonstrating the equivalent settings to the defaults are: #+begin_src emacs-lisp -(defvar my-chroma-provider (make-chroma-provider +(defvar my-chroma-provider (make-vecdb-chroma-provider :binary "chroma" :url "http://localhost:8000" :tenant "default"