Hi Albert, on top of the very good answers already in the thread, in line: *1. Can we do text search and vector similarity?* Lucene can do Vector similarity and you can achieve the same with Solr with some caveats. Direct and full support is still a work in progress, here are some resources for you: *London Information Retrieval Meetup* We discussed the topic a few months ago at the London Information Retrieval Meetup: https://www.slideshare.net/SeaseLtd/interactive-questions-and-answers-london-information-retrieval-meetup https://www.youtube.com/watch?v=BIILaSb4aRY&t=259s *Blogs* I started a series of blogs on the topic, so far only the intro: https://sease.io/2021/07/artificial-intelligence-applied-to-search-introduction.html But within the end of the summer I am planning on writing the Lucene, Solr and Elasticsearch episode *Training* We are also hosting a related training in October, I take the chance to link it in case you find it useful: https://sease.io/training/artificial-intelligence-in-search-training
*2. Can we filter by metadata?* Yes, pretty much similar to Elasticsearch with query (scored) and filter query (un-scored). It's a big topic though, take a look at the standard query parser to have an idea: https://solr.apache.org/guide/8_9/the-standard-query-parser.html *3. How about index/memory consumption? 1st tier needs around 4000Membeddings vector (128 fp32) + metadata stored in memory* No quick silver-bullet answer for this, you need to be much deeper in the project to then build a prototype and benchmarking infrastructure that can give you the answers *4. Can we execute models in the DB itself? (not outside SOLr). We haveper-user models, and we need a way of executing TensorFlow models on thedatabase to prevent moving data outside of the DB* The closer you get is the Learning To Rank integration. Apache Solr supports linear models, tree-based models, and neural networks based models. You need to train your model, export it in the supported JSON format and then use it: https://solr.apache.org/guide/8_9/learning-to-rank.html We have written many blogs on the topic: https://sease.io/category/learning-to-rank https://sease.io/2016/10/apache-solr-learning-to-rank-better-part-4.html <https://sease.io/category/learning-to-rank> And have also a training dedicated: https://sease.io/training/learning-to-rank-training *5. Subsecond queries* You are generally well under the second, even integrating with complex learning to rank, ranking models. The more complex your matching and ranking algorithm, the slower (but in general Apache Solr is super fast and you shouldn't have problems.) *6. Real-time indexing (or near real-time) of new data* Since Soft commits (that arrived many years ago) Apache Solr is quite good in this. https://solr.apache.org/guide/8_9/updatehandlers-in-solrconfig.html https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ <https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/>*7. Easily scalable* You have this covered: https://solr.apache.org/guide/8_9/solrcloud.html Good Luck! -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Fri, 13 Aug 2021 at 17:33, Jan Høydahl <jan....@cominvent.com> wrote: > I know you are in the Solr forum here, but I'll take the chance of > mentioning the new kid on the block wrt open source search engines, namely > Vespa. Since your use case seems to be highly geared towards > personalization, it may be worth checking it out as they seem to push > Tensors and personalized results as key differentiator. It is not Lucene > based and may be quite different from what you already know with ES and > Solr, and to be honest I have never tested it, nor am I affiliated in any > way. Here's the link: https://vespa.ai/ > > Jan > > > 13. aug. 2021 kl. 16:26 skrev Albert Dfm <alberich...@gmail.com>: > > > > For example, for relevance ranking the usual approach is to execute a > > machine learned model, e.g. using xgboost, or lightgbm. Tensorflow and > > pytorch are other frameworks to build machine learning models. > > While xgboost and lightgbm are ensembles of decision trees, tensorflow > and > > pytorch are mainly related to neutal networks. > > > > Elasticsearch allows to execute xgboost models for example for relevance > > ranking. > > The question could be applied similarly to SOLr: can we use pytorch or > > tensorflow at relevance ranking phase? > > > > > > > > On Fri, Aug 13, 2021 at 4:18 PM Shawn Heisey <apa...@elyograg.org> > wrote: > > > >> On 8/13/2021 7:59 AM, Albert Dfm wrote: > >>> Regarding executing models (question number 4), let me explain this a > bit > >>> better: > >>> Can SOLr run custom tensorflow/pytorch models? This is not a feature in > >>> lucene, it is something on top of it. > >> > >> With that info, I am even less familiar with what you're doing than I > >> was before. I have no idea what either of those things are. Google > >> wasn't helpful ... I probably would have to spend a week or two > >> researching to even have a minimal understanding. I was able to tell > >> that it's probably related to machine learning, but that's all. I have > >> zero experience in that arena. > >> > >> It's unlikely that Solr has any direct support for those software > >> programs, but if they can build queries that Solr understands, you could > >> probably get something going. > >> > >> Thanks, > >> Shawn > >> > >> > >