On 4/16/2022 8:25 PM, Jay Scott wrote:
I hope that the search software lets me search for combinations of words; I've been assuming that's built in.
Yes, most likely Solr will handle this need.
I want to do all of this locally -- not use the cloud or anything like that. mnogoSearch worked okay for me, but it's dead, and I'd like to move on to something modern. Apache nutch is a web crawler -- setting up a web server solely for the purpose of specifying what files I want indexed seems -- artificial. I guess I could do that but, golly.... Seems like there ought to be something more direct.
As I understand it, Nutch doesn't actually do search. It's really good at crawling a website and gathering all the data it contains, but relies on other software for searching what it has gathered. We hear from a lot of people that are having Solr handle indexing for Nutch.
Solr was suggested as a way to do this. Do I want something else?
That's a tough question to answer and be sure the answer is right. In general, Solr probably meets the needs of just about any kind of searching you want to do, but sometimes people manage to find things where Solr isn't the right solution.
Based on what little information is here about your needs, I'm going to cautiously say Solr is probably a good fit. To be sure that answer is correct, we will need more information. Exactly what information we will need is not completely straightforward. If you start with some high level information about the data you want to search, then we will know what questions to ask next.
The first thing to nail down ... what do you want to get as the result of a search? Do you want Solr to provide ALL of the information in the result grid, or is it enough for Solr to return some kind of unique ID that your software can then look up in another system to provide detail to the user? That is the start of defining a "document" for Solr. In one large system that I designed, a Solr document was basically a row in a database table. The table had 160 million rows ... the entire table file in MySQL was over a terabyte. Solr actually did have a lot of information stored for each of those documents, so a search result grid displayed to the user was populated entirely from Solr. If the user then clicked on one of those results, the database would be consulted for full details, using the unique identifier in the search results.
Thanks, Shawn