Dedup across shards

2024-09-24 Thread Dan Rosher
Hello Everyone,We have 3 shards, with skus linked to merchants. We don't currently, but could co-locate skus for a specific merchant on the same shard with document routing, and then dedup similar skus for the same merchant. But similar skus, that should be deduped can appear for different merchant

Email alerts with streaming expressions

2021-09-06 Thread Dan Rosher
Hi, I was wondering if anyone had tried email alerts with streaming expressions, and what their experience was if attempting this with say 12 million emails / day? Traditionally this might have been done with a database cursor iterator daily. I was thinking if something like the following pseudoc

Re: Email alerts with streaming expressions

2021-09-07 Thread Dan Rosher
he.org/jira/browse/LUCENE-8766, which was originally > > Luwak - at my previous company Flax we helped build several large-scale > > monitoring systems with this https://github.com/flaxsearch/luwak . It's > > not officially surfaced in Solr yet although my colleague Scott Stults &g

Re: Email alerts with streaming expressions

2021-09-08 Thread Dan Rosher
r layer and then the code that > > uses this to generate alerts - and Solcolator and > > https://github.com/o19s/solr-monitor are two examples of how to do the > > first part, which you can build on. The facility to do a reverse search > > is not built into Solr - yet, unlike E

NewRelic useragent and Solr memory leaks

2021-10-13 Thread Dan Rosher
Hi. We use newrelic to gather stats via their solr-jmx instrumentation implementation. I've noticed a memory leak with their implementation that allows metrics to be registered, but not deregistered (as MetricsManager does in Solr) I was wondering if anyone else uses newrelic and has noticed perf

Re: NewRelic useragent and Solr memory leaks

2021-10-13 Thread Dan Rosher
Is there a public link to the ticket? > > > > On 10/13/21, 11:39 AM, "Dan Rosher" wrote: > > > > Hi. > > > > We use newrelic to gather stats via their solr-jmx instrumentation > > implementation. I've noticed a me

Multi polygon spatial search - isochrones

2021-11-12 Thread Dan Rosher
Hi, We're looking at implementing commutability / reachability search to our users with isochrones e.g. https://en.wikipedia.org/wiki/Isochrone_map. There are a number of open and commercial services which we are looking at. Some of these, in particular with public transport, return multiple poly

Searcher and autoSoftCommits + softCommit

2021-11-23 Thread Dan Rosher
Hi, It seems to me that false is not being honored, or does a softCommit always happen on an autoCommit? Cache reloads seems to coincide with solr.autoCommit.maxTime: We have the following solrconfig snippet: ${solr.autoCommit.maxTime:15000} false ${solr.autoSo

Re: Searcher and autoSoftCommits + softCommit

2021-11-24 Thread Dan Rosher
Hi Shawn, You were spot on, commitWithin was being set on each commit. I was able to verify by temporarily turning on debug logging for DirectUpdateHandler2. Thanks for your help. Kind regards, Dan On Tue, 23 Nov 2021 at 16:49, Shawn Heisey wrote: > On 11/23/21 8:31 AM, Dan Rosher wr

Re: Searcher and autoSoftCommits + softCommit

2021-11-26 Thread Dan Rosher
Hi Andy, in the solr UI you can goto /solr/#/~logging/level, then scroll down to -update-DirectUpdateHandler2. click on 'null' or whenever the current setting is and select DEBUG. Dio remember to turn off if your on a live system though back to what setting was before otherwise you light fill up

De-Duplication using DocBasedVersionConstraintsProcessorFactory

2021-11-29 Thread Dan Rosher
Hi, We have documents from multiple sources, which might have duplicates from different sources. We might identify a duplicate document which shares say md5(title,short_desc,location), although a more up to date doc might come AFTER an older one (order not guaranteed) added to solr. One thought

Re: De-Duplication using DocBasedVersionConstraintsProcessorFactory

2021-11-29 Thread Dan Rosher
example id=97d88afe14f66e8d7f54986c3e8a and version=1638206899 with schema.xml and solrconfig.xml is the same as before. Cheers, Dan On Mon, 29 Nov 2021 at 17:36, Dan Rosher wrote: > Hi, > > We have documents from multiple sources, which might have duplicates from > different

commitWithin UpdateProcessor for existing docs

2021-12-14 Thread Dan Rosher
Hi, We have a requirement to update existing docs live within say 30s. New docs updated depending on solrconfig autoSoftCommit. As our dev team are finding this difficult to implement within our middleware, I was thinking of writing an update processor as a 'post-processor' to add commitWithin (u

Prometheus solr 7.2.1

2022-02-23 Thread Dan Rosher
Hi, At our organisation we're still on solr 7.2.1. We'd like to use prometheus, just wondering if anyone had knowledge that the 7.3 prometheus contrib will work with solr 7.2.1? Also we're thinking of having it work on a separate docker container to the solr docker containers, connecting to the

solr relatedness weirdness on json facet function

2022-04-05 Thread Dan Rosher
Hi, If I run a facet on relatedness on a qf field (examples below) which has stopword removal, I get stopwords in the json facet? Anyone know why, and if this can be avoided? Many thanks, Dan = Details Solr 7.7.2 http://localhost:8983/solr/collection/select? q=my query& defTyp

Re: solr relatedness weirdness on json facet function

2022-04-06 Thread Dan Rosher
hing that > should determine that is the field and fieldType config for the terms facet > "field" property -- i.e., "description". Can you share that information, > including index-time analysis chain config? > > On Tue, Apr 5, 2022 at 8:

Re: solr relatedness weirdness on json facet function

2022-04-07 Thread Dan Rosher
probably best to avoid unless you have a specific need to do this (which > you may well indeed have!). > > Also, index-time WordDelimiterGraphFilter configured to both "split" and > "catenate" tokens can yield subtly strange results in phrase queries, if > that ma

Partial updates with Update Stream Decorator

2022-06-29 Thread Dan Rosher
Hi, Is it possible to do partial/atomic or in-place updates with Update Streaming expression Decorator? The following simply overwrites. update(collection1, select( search(collection1, q=*:*, qt="/export", fl="id,a_s,a_i,a_f,s_multi,i

Re: Partial updates with Update Stream Decorator

2022-07-06 Thread Dan Rosher
: "2644126" }, { "cand_id": { "add-distinct": 7408 }, "id": "2658316" }, ... Cheers, Dan On Wed, 29 Jun 2022 at 15:27, Dan Rosher wrote: > Hi, > > Is it possible to do partial/atomic or in-place updates with Update > Streaming ex

Re: Partial updates with Update Stream Decorator

2022-07-07 Thread Dan Rosher
2:12, Joel Bernstein wrote: > That would be quite tricky to create with existing functions. Did you find > a way to inject a tuple into a field? > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, Jul 6, 2022 at 12:10 PM Dan Rosher wrote: > > > Answering my

Re: Partial updates with Update Stream Decorator

2022-07-12 Thread Dan Rosher
I've just added SOLR-16287 if you want to have a look and try it out. On Thu, 7 Jul 2022 at 13:18, Eric Pugh wrote: > Something worth sharing? > > > On Jul 7, 2022, at 3:59 AM, Dan Rosher wrote: > > > > No unfortunately I had to make my own streaming function to

Re: ExternalFileField2, massively scalable external file fields

2022-07-28 Thread Dan Rosher
Out of interest, did you also look into in-place updates, so not having to re-index the whole document, just one field, or were the conditions too restrictive? On Thu, 28 Jul 2022 at 14:38, Gael Jourdan-Weil < gael.jourdan-w...@kelkoogroup.com> wrote: > Definitely something that can benefit the c

EdgeNGram question and query parsing

2024-09-03 Thread Dan Rosher
Hi All, I have an EdgeNGram questionIf I have a ft like so > > > words="lang/stopwords_en.txt" /> > > maxGramSize="20" /> > > > > And a query like this: q={!edismax+qf=name_ngram+q.op=AND}baseball+bat with debug on I get:

Re: Logging Client IP

2025-01-07 Thread Dan Rosher
Looks like the jetty request log contains an IP address so no need for subclassing, just logging twice though. On Tue, 7 Jan 2025 at 10:33, Dan Rosher wrote: > Hi All, > > Is there a standard way to add the client IP address to the Solr log > output. > > I recall subclassing

Logging Client IP

2025-01-07 Thread Dan Rosher
Hi All, Is there a standard way to add the client IP address to the Solr log output. I recall subclassing SolrDispatchFilter before and placing her IP address into the MDC hash, but was wondering if there is now a built-in process ? Cheers Dan

Re: How add custom providerClass for CurrencyFieldType with package manager

2025-06-18 Thread Dan Rosher
returned, and then pkg:class is not a valid class. The other way is to use in solrconfig instead of package manager and can then use existing CurrencyFieldType On Mon, 16 Jun 2025 at 16:37, Dan Rosher wrote: > Hi All,Should package manager enable a custom providerClass for > CurrencyFie

How add custom providerClass for CurrencyFieldType with package manager

2025-06-16 Thread Dan Rosher
Hi All,Should package manager enable a custom providerClass for CurrencyFieldType ?I've added the custom jar with package manager like so: # upload jar with sig curl --data-binary @./lib/solr-mypkg-1.0-SNAPSHOT.jar -X PUT 'http://localhost:8983/api/cluster/files/mypkg/1.0/mypkg.jar?sig=...' # reg