Thanks Eric, Charlie and Yuval for all the feedback and suggestions.

Eric: Yes I thought the monitoring might be a it of a pain, esp with
millions of them, I'll have to check out the topic code, but I wondered if
I can look @ the checkpoint collections for uniqueIds that haven't been
updated for a 'while' which might suggest the demon had stopped/died,
rather than checking each daemon individually?

I was also wondering whether it's possible, or a useful enhancement to look
at the replica index version (as opposed to _vesion_ ) for the topic
streaming expression to skip queries where the replica index is the same as
what we might store in the checkpoint collection ? For collections that
update infrequently I think this might be useful.

Charlie: It was for email alerts, so a user stores a query for collection
docs to match against, and then the system emails matches to the user. Do
you think solr-monitor can be used for this purpose?

Yuval: I like the idea of using the UpdateProcessor, at least there's no
need for deamons or monitoring of them, but would this scale for millions
of email queries though?

Many thanks again to all.

Kind regards,
Dan




On Mon, 6 Sept 2021 at 18:47, Yuval Paz <yuval.p...@mail.huji.ac.il> wrote:

> Me and my team are building upon this solcolator:
> https://github.com/SOLR4189/solcolator
>
> Currently the processor is build for Solr 6.5.1, we are working on updating
> our Solr and I hope to release a complete version of our Solcolator  as
> open source then (it will be for version 8.6.x).
>
> Making it an update processor (either make it the last element and replace
> the usual processor that index the document, or by using it as the one from
> last processor in the collection, and so allow monitoring also atomic
> updates [which is relatively costly]).
>
> By making it an update processor we don't rely on the streaming deamon,
> which we found unsatisfying as we wish to allow users to define their own
> monitors over the index.
>
> On Mon, Sep 6, 2021, 8:25 PM Charlie Hull <ch...@opensourceconnections.com
> >
> wrote:
>
> > Are you trying to monitor a stream of emails for certain patterns? In
> > which case you might look at the Lucene Monitor
> >
> >
> https://lucene.apache.org/core/8_2_0/monitor/index.html?overview-summary.html
> > https://issues.apache.org/jira/browse/LUCENE-8766, which was originally
> > Luwak - at my previous company Flax we helped build several large-scale
> > monitoring systems with this https://github.com/flaxsearch/luwak . It's
> > not officially surfaced in Solr yet although my colleague Scott Stults
> > has been working on some ideas: https://github.com/o19s/solr-monitor
> >
> > best
> > Charlie
> >
> > On 06/09/2021 14:32, Dan Rosher wrote:
> > > Hi,
> > >
> > > I was wondering if anyone had tried email alerts with streaming
> > > expressions, and what their experience was if attempting this with say
> 12
> > > million emails / day? Traditionally this might have been done with a
> > > database cursor iterator daily.
> > >
> > > I was thinking if something like the following pseudocode expression
> with
> > > 'kafka' as a custom push expression:
> > >
> > > daemon(id="alertId",
> > >         runInterval="1000",
> > >         kafka(
> > >          kafka_topic,
> > >          alertId,
> > >          topic(email_alerts,
> > >            doc_collection,
> > >            q="email query",
> > >            fl="id, title, abstract",
> > >            id="alertId",
> > >            initialCheckpoint=0)
> > >          )
> > >
> > > If you have done something like this 'where' would you typically run
> the
> > > daemon, on replicas away from replicas running web queries?
> > >
> > > Many thanks in advance for any advice / suggestions,
> > >
> > > Dan
> > >
> >
> > --
> > Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > <www.o19s.com>
> > Founding member of The Search Network <https://thesearchnetwork.com/>
> > and co-author of Searching the Enterprise
> > <https://opensourceconnections.com/about-us/books-resources/>
> > tel/fax: +44 (0)8700 118334
> > mobile: +44 (0)7767 825828
> >
> > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> > Amtsgericht Charlottenburg | HRB 230712 B
> > Geschäftsführer: John M. Woodell | David E. Pugh
> > Finanzamt: Berlin Finanzamt für Körperschaften II
> >
>

Reply via email to