Re: Considering SOLR as our new infra

Albert Dfm Tue, 17 Aug 2021 00:13:38 -0700

Thanks a lot for the very detailed answers and time
I have a lot to read by now!
It's so good to have an impressive community support like this one, thank
you so much!!



On Mon, Aug 16, 2021 at 12:32 PM Alessandro Benedetti <a.benede...@sease.io>
wrote:

> Hi Albert,
> on top of the very good answers already in the thread, in line:
>
> *1. Can we do text search and vector similarity?*
> Lucene can do Vector similarity and you can achieve the same with Solr with
> some caveats.
> Direct and full support is still a work in progress, here are some
> resources for you:
> *London Information Retrieval Meetup*
> We discussed the topic a few months ago at the London Information Retrieval
> Meetup:
>
> https://www.slideshare.net/SeaseLtd/interactive-questions-and-answers-london-information-retrieval-meetup
> https://www.youtube.com/watch?v=BIILaSb4aRY&t=259s
> *Blogs*
> I started a series of blogs on the topic, so far only the intro:
>
> https://sease.io/2021/07/artificial-intelligence-applied-to-search-introduction.html
> But within the end of the summer I am planning on writing the Lucene, Solr
> and Elasticsearch episode
> *Training*
> We are also hosting a related training in October, I take the chance to
> link it in case you find it useful:
> https://sease.io/training/artificial-intelligence-in-search-training
>
> *2. Can we filter by metadata?*
> Yes, pretty much similar to Elasticsearch with query (scored) and filter
> query (un-scored).
> It's a big topic though, take a look at the standard query parser to have
> an idea:
> https://solr.apache.org/guide/8_9/the-standard-query-parser.html
>
>
> *3. How about index/memory consumption? 1st tier needs around
> 4000Membeddings vector (128 fp32) + metadata stored in memory*
> No quick silver-bullet answer for this, you need to be much deeper in the
> project to then build a prototype and benchmarking infrastructure that can
> give you the answers
>
>
>
> *4. Can we execute models in the DB itself? (not outside SOLr). We
> haveper-user models, and we need a way of executing TensorFlow models on
> thedatabase to prevent moving data outside of the DB*
> The closer you get is the Learning To Rank integration.
> Apache Solr supports linear models, tree-based models, and neural networks
> based models.
> You need to train your model, export it in the supported JSON format and
> then use it:
> https://solr.apache.org/guide/8_9/learning-to-rank.html
> We have written many blogs on the topic:
> https://sease.io/category/learning-to-rank
> https://sease.io/2016/10/apache-solr-learning-to-rank-better-part-4.html
> <https://sease.io/category/learning-to-rank>
> And have also a training dedicated:
> https://sease.io/training/learning-to-rank-training
>
> *5. Subsecond queries*
> You are generally well under the second, even integrating with complex
> learning to rank, ranking models.
> The more complex your matching and ranking algorithm, the slower (but in
> general Apache Solr is super fast and you shouldn't have problems.)
>
> *6. Real-time indexing (or near real-time) of new data*
> Since Soft commits (that arrived many years ago) Apache Solr is quite good
> in this.
> https://solr.apache.org/guide/8_9/updatehandlers-in-solrconfig.html
>
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> <
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >*7.
> Easily scalable*
> You have this covered:
> https://solr.apache.org/guide/8_9/solrcloud.html
>
> Good Luck!
>
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Fri, 13 Aug 2021 at 17:33, Jan Høydahl <jan....@cominvent.com> wrote:
>
> > I know you are in the Solr forum here, but I'll take the chance of
> > mentioning the new kid on the block wrt open source search engines,
> namely
> > Vespa. Since your use case seems to be highly geared towards
> > personalization, it may be worth checking it out as they seem to push
> > Tensors and personalized results as key differentiator. It is not Lucene
> > based and may be quite different from what you already know with ES and
> > Solr, and to be honest I have never tested it, nor am I affiliated in any
> > way. Here's the link: https://vespa.ai/
> >
> > Jan
> >
> > > 13. aug. 2021 kl. 16:26 skrev Albert Dfm <alberich...@gmail.com>:
> > >
> > > For example, for relevance ranking the usual approach is to execute a
> > > machine learned model, e.g. using xgboost, or lightgbm. Tensorflow  and
> > > pytorch are other frameworks to build machine learning models.
> > > While xgboost and lightgbm are ensembles of decision trees, tensorflow
> > and
> > > pytorch are mainly related to neutal networks.
> > >
> > > Elasticsearch allows to execute xgboost models for example for
> relevance
> > > ranking.
> > > The question could be applied similarly to SOLr: can we use pytorch or
> > > tensorflow at relevance ranking phase?
> > >
> > >
> > >
> > > On Fri, Aug 13, 2021 at 4:18 PM Shawn Heisey <apa...@elyograg.org>
> > wrote:
> > >
> > >> On 8/13/2021 7:59 AM, Albert Dfm wrote:
> > >>> Regarding executing models (question number 4), let me explain this a
> > bit
> > >>> better:
> > >>> Can SOLr run custom tensorflow/pytorch models? This is not a feature
> in
> > >>> lucene, it is something on top of it.
> > >>
> > >> With that info, I am even less familiar with what you're doing than I
> > >> was before.  I have no idea what either of those things are.  Google
> > >> wasn't helpful ... I probably would have to spend a week or two
> > >> researching to even have a minimal understanding.  I was able to tell
> > >> that it's probably related to machine learning, but that's all.  I
> have
> > >> zero experience in that arena.
> > >>
> > >> It's unlikely that Solr has any direct support for those software
> > >> programs, but if they can build queries that Solr understands, you
> could
> > >> probably get something going.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
> >
>

Re: Considering SOLR as our new infra

Reply via email to