Hi All, My team and I are trying to match documents against a set of queries. Both need to be updated in real-time (at least on the order of seconds) and the system needs to scale to potentially millions of queries. Although lucene monitor greatly optimizes this kind of alerting logic, as far as we know, it won't handle scaling as it isn't wrapped by a sophisticated runtime.
Reading through this thread https://lists.apache.org/list.html?dev@find_parent=true pointed me at https://github.com/SOLR4189/solcolator. Overall, this project looks promising but I'm concerned that it was contributed by an anonymous user and is lacking basic maintenance, at the very least it needs dependency upgrades. My second concern with it is that it ingests _documents_ via AddUpdateCommand and then writes the output to a configurable sink. In my team's use case, it would be better to invert the flow and process documents in the SearchHandler (so that the output can be written to SolrQueryResponse) and AddUpdate the queries to take better advantage of solr's sharding. The second concern is not as strong as the first as I am still thinking about how to implement the "solcolator" within our constraints. Perhaps the document-as-query approach has its own drawbacks that I am not seeing. In either case I have two questions: 1. Has anyone here used solcolator and with what effect? I would be grateful to learn specifics about the scale of your solution. 2. What do you think about forking or riffing off this project and adding a document-as-query control flow? As it stands now we are designing a system that coordinates lucene monitor by directly interacting with zookeeper and it feels like we are reinventing some of solr's capabilities. Any feedback anyone might have would be greatly appreciated. Many thanks, Luke