Thanks a lot for the very detailed answers and time I have a lot to read by now! It's so good to have an impressive community support like this one, thank you so much!!
On Mon, Aug 16, 2021 at 12:32 PM Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi Albert, > on top of the very good answers already in the thread, in line: > > *1. Can we do text search and vector similarity?* > Lucene can do Vector similarity and you can achieve the same with Solr with > some caveats. > Direct and full support is still a work in progress, here are some > resources for you: > *London Information Retrieval Meetup* > We discussed the topic a few months ago at the London Information Retrieval > Meetup: > > https://www.slideshare.net/SeaseLtd/interactive-questions-and-answers-london-information-retrieval-meetup > https://www.youtube.com/watch?v=BIILaSb4aRY&t=259s > *Blogs* > I started a series of blogs on the topic, so far only the intro: > > https://sease.io/2021/07/artificial-intelligence-applied-to-search-introduction.html > But within the end of the summer I am planning on writing the Lucene, Solr > and Elasticsearch episode > *Training* > We are also hosting a related training in October, I take the chance to > link it in case you find it useful: > https://sease.io/training/artificial-intelligence-in-search-training > > *2. Can we filter by metadata?* > Yes, pretty much similar to Elasticsearch with query (scored) and filter > query (un-scored). > It's a big topic though, take a look at the standard query parser to have > an idea: > https://solr.apache.org/guide/8_9/the-standard-query-parser.html > > > *3. How about index/memory consumption? 1st tier needs around > 4000Membeddings vector (128 fp32) + metadata stored in memory* > No quick silver-bullet answer for this, you need to be much deeper in the > project to then build a prototype and benchmarking infrastructure that can > give you the answers > > > > *4. Can we execute models in the DB itself? (not outside SOLr). We > haveper-user models, and we need a way of executing TensorFlow models on > thedatabase to prevent moving data outside of the DB* > The closer you get is the Learning To Rank integration. > Apache Solr supports linear models, tree-based models, and neural networks > based models. > You need to train your model, export it in the supported JSON format and > then use it: > https://solr.apache.org/guide/8_9/learning-to-rank.html > We have written many blogs on the topic: > https://sease.io/category/learning-to-rank > https://sease.io/2016/10/apache-solr-learning-to-rank-better-part-4.html > <https://sease.io/category/learning-to-rank> > And have also a training dedicated: > https://sease.io/training/learning-to-rank-training > > *5. Subsecond queries* > You are generally well under the second, even integrating with complex > learning to rank, ranking models. > The more complex your matching and ranking algorithm, the slower (but in > general Apache Solr is super fast and you shouldn't have problems.) > > *6. Real-time indexing (or near real-time) of new data* > Since Soft commits (that arrived many years ago) Apache Solr is quite good > in this. > https://solr.apache.org/guide/8_9/updatehandlers-in-solrconfig.html > > https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > < > https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > >*7. > Easily scalable* > You have this covered: > https://solr.apache.org/guide/8_9/solrcloud.html > > Good Luck! > > -------------------------- > Alessandro Benedetti > Apache Lucene/Solr Committer > Director, R&D Software Engineer, Search Consultant > > www.sease.io > > > On Fri, 13 Aug 2021 at 17:33, Jan Høydahl <jan....@cominvent.com> wrote: > > > I know you are in the Solr forum here, but I'll take the chance of > > mentioning the new kid on the block wrt open source search engines, > namely > > Vespa. Since your use case seems to be highly geared towards > > personalization, it may be worth checking it out as they seem to push > > Tensors and personalized results as key differentiator. It is not Lucene > > based and may be quite different from what you already know with ES and > > Solr, and to be honest I have never tested it, nor am I affiliated in any > > way. Here's the link: https://vespa.ai/ > > > > Jan > > > > > 13. aug. 2021 kl. 16:26 skrev Albert Dfm <alberich...@gmail.com>: > > > > > > For example, for relevance ranking the usual approach is to execute a > > > machine learned model, e.g. using xgboost, or lightgbm. Tensorflow and > > > pytorch are other frameworks to build machine learning models. > > > While xgboost and lightgbm are ensembles of decision trees, tensorflow > > and > > > pytorch are mainly related to neutal networks. > > > > > > Elasticsearch allows to execute xgboost models for example for > relevance > > > ranking. > > > The question could be applied similarly to SOLr: can we use pytorch or > > > tensorflow at relevance ranking phase? > > > > > > > > > > > > On Fri, Aug 13, 2021 at 4:18 PM Shawn Heisey <apa...@elyograg.org> > > wrote: > > > > > >> On 8/13/2021 7:59 AM, Albert Dfm wrote: > > >>> Regarding executing models (question number 4), let me explain this a > > bit > > >>> better: > > >>> Can SOLr run custom tensorflow/pytorch models? This is not a feature > in > > >>> lucene, it is something on top of it. > > >> > > >> With that info, I am even less familiar with what you're doing than I > > >> was before. I have no idea what either of those things are. Google > > >> wasn't helpful ... I probably would have to spend a week or two > > >> researching to even have a minimal understanding. I was able to tell > > >> that it's probably related to machine learning, but that's all. I > have > > >> zero experience in that arena. > > >> > > >> It's unlikely that Solr has any direct support for those software > > >> programs, but if they can build queries that Solr understands, you > could > > >> probably get something going. > > >> > > >> Thanks, > > >> Shawn > > >> > > >> > > > > >