Considering SOLR as our new infra

2021-08-13 Thread Albert Dfm
Hello List! I'm new to the list, and that's my first message. We got to know about SOLR, and we are very excited about it to replace our current elasticsearch infra.Currently, our main issue is regarding data and model size running on each machine. *Our setup:* 1. We use the following search arch:

UnifiedHighlighter BreakIterator

2021-08-13 Thread Clive Lewis
Hello! *Problem:* I have a multivalue field that stores paragraphs of the text. (1 paragraph = 1 value). position gap between values = 5000. Right now I use fastVectorHighlighter and it works as expected for queries like "Big Bang Theory"~5000 (because of 5000 slop it searches only inside of one v

Re: Delete using Streaming Expressions

2021-08-13 Thread Jan Høydahl
Please see https://solr.apache.org/guide/8_9/stream-decorator-reference.html#delete Jan > 12. aug. 2021 kl. 21:05 skrev mtn search : > > Hello, > > I have heard that there can be issues when using the Solr delete by query > approach ( *:*) for large sets of > documents. That it may block othe

Re: Considering SOLR as our new infra

2021-08-13 Thread Shawn Heisey
On 8/13/2021 2:25 AM, Albert Dfm wrote: We got to know about SOLR, and we are very excited about it to replace our current elasticsearch infra.Currently, our main issue is regarding data and model size running on each machine. *Our setup:* 1. We use the following search arch: 1st tier, the fast

Re: ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

2021-08-13 Thread Reej M
Hi Team, Have any of you found a solution for the Task Queue processing has stalled for 20077 ms with 0 remaining elements to process. We are using solr 8.8.2, randomly we get this error while indexing. Is there any way we need to tune the solr.autocommit.maxtime? For few cores we have it as 1

Re: ConcurrentUpdateSolrClient stall prevention bug in Solr 8.4+

2021-08-13 Thread Shawn Heisey
On 8/13/2021 7:36 AM, Reej M wrote: Have any of you found a solution for the Task Queue processing has stalled for 20077 ms with 0 remaining elements to process. We are using solr 8.8.2, randomly we get this error while indexing. Is there any way we need to tune the solr.autocommit.maxtime? F

Re: Considering SOLR as our new infra

2021-08-13 Thread Albert Dfm
Thanks a lot Shawn for the very detailed reply, very informative and much appreciated!! I will check the link for performance problems. Regarding executing models (question number 4), let me explain this a bit better: Can SOLr run custom tensorflow/pytorch models? This is not a feature in lucene,

Re: Considering SOLR as our new infra

2021-08-13 Thread Shawn Heisey
On 8/13/2021 7:59 AM, Albert Dfm wrote: Regarding executing models (question number 4), let me explain this a bit better: Can SOLr run custom tensorflow/pytorch models? This is not a feature in lucene, it is something on top of it. With that info, I am even less familiar with what you're doing

Re: Considering SOLR as our new infra

2021-08-13 Thread Albert Dfm
For example, for relevance ranking the usual approach is to execute a machine learned model, e.g. using xgboost, or lightgbm. Tensorflow and pytorch are other frameworks to build machine learning models. While xgboost and lightgbm are ensembles of decision trees, tensorflow and pytorch are mainly

Re: Considering SOLR as our new infra

2021-08-13 Thread Shawn Heisey
On 8/13/2021 8:26 AM, Albert Dfm wrote: The question could be applied similarly to SOLr: can we use pytorch or tensorflow at relevance ranking phase? I have no idea.  I have never touched that functionality.  Those terms are not mentioned in the docs: https://solr.apache.org/guide/8_9/learni

Re: Considering SOLR as our new infra

2021-08-13 Thread Walter Underwood
pytorch and tensorflow are both written in Python and both Solr and Elasticsearch are written in Java, so that seems like an obvious “no” for executing them internally. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 13, 2021, at 7:26 AM, Albert

Re: Considering SOLR as our new infra

2021-08-13 Thread Jörn Franke
You probably need to write a plugin for this - both can be also used from within Java. Some of the models in eg tensorflowranking such as Svm maybe directly usable in Solr without a plugin. > Am 13.08.2021 um 16:33 schrieb Shawn Heisey : > > On 8/13/2021 8:26 AM, Albert Dfm wrote: >> The ques

Re: Considering SOLR as our new infra

2021-08-13 Thread Jörn Franke
Tensorflow and Pytorch have Java bindings. However this is also not really needed. if the trained model weights are exported to json which I see at least possible for tensorflow ranking then they can be used out of the box, eg svm and lambda exist both in tensorflow ranking and solr. Xgboost cou

Re: Considering SOLR as our new infra

2021-08-13 Thread Stephen Green
Although you could export models to the ONNX format and then use the Java API for the ONNX Runtime to run the models in Java. On Fri, Aug 13, 2021 at 11:11 AM Walter Underwood wrote: > pytorch and tensorflow are both written in Python and both Solr and > Elasticsearch > are written in Java, so t

Re: Considering SOLR as our new infra

2021-08-13 Thread Jan Høydahl
I know you are in the Solr forum here, but I'll take the chance of mentioning the new kid on the block wrt open source search engines, namely Vespa. Since your use case seems to be highly geared towards personalization, it may be worth checking it out as they seem to push Tensors and personalize

Re: Delete using Streaming Expressions

2021-08-13 Thread mtn search
Thanks Jan - Exactly what I was looking for. Matthew On Fri, Aug 13, 2021 at 3:35 AM Jan Høydahl wrote: > Please see > https://solr.apache.org/guide/8_9/stream-decorator-reference.html#delete > > Jan > > > 12. aug. 2021 kl. 21:05 skrev mtn search : > > > > Hello, > > > > I have heard that there

Re: Time Routed Alias

2021-08-13 Thread Matt Kuiper
Thanks David, this test link is helpful. @David @Gus - From your viewpoint do you see TRAs as an accepted/proven technique within SolrCloud? My small POC works great. Would like to hear if others are using TRA in production deployments successfully at scale. Thanks, Matt On Wed, Aug 11, 2021 a