On 4/16/2022 8:25 PM, Jay Scott wrote:
I hope that the search software lets me search for combinations
of words; I've been assuming that's built in.

Yes, most likely Solr will handle this need.

I want to do all of this locally -- not use the cloud or
anything like that.  mnogoSearch worked okay for me, but
it's dead, and I'd like to move on to something modern.
Apache nutch is a web crawler -- setting up a web server
solely for the purpose of specifying what files I want
indexed seems -- artificial.  I guess I could do that
but, golly....  Seems like there ought to be something more
direct.

As I understand it, Nutch doesn't actually do search.  It's really good at crawling a website and gathering all the data it contains, but relies on other software for searching what it has gathered. We hear from a lot of people that are having Solr handle indexing for Nutch.

Solr was suggested as a way to do this.  Do I want something
else?

That's a tough question to answer and be sure the answer is right.  In general, Solr probably meets the needs of just about any kind of searching you want to do, but sometimes people manage to find things where Solr isn't the right solution.

Based on what little information is here about your needs, I'm going to cautiously say Solr is probably a good fit.  To be sure that answer is correct, we will need more information.  Exactly what information we will need is not completely straightforward. If you start with some high level information about the data you want to search, then we will know what questions to ask next.

The first thing to nail down ... what do you want to get as the result of a search?  Do you want Solr to provide ALL of the information in the result grid, or is it enough for Solr to return some kind of unique ID that your software can then look up in another system to provide detail to the user?  That is the start of defining a "document" for Solr.  In one large system that I designed, a Solr document was basically a row in a database table.  The table had 160 million rows ... the entire table file in MySQL was over a terabyte.  Solr actually did have a lot of information stored for each of those documents, so a search result grid displayed to the user was populated entirely from Solr.  If the user then clicked on one of those results, the database would be consulted for full details, using the unique identifier in the search results.

Thanks,
Shawn

Reply via email to