On Sat, Apr 16, 2022 at 9:27 PM Shawn Heisey <apa...@elyograg.org> wrote:
> On 4/16/2022 8:25 PM, Jay Scott wrote: > > I hope that the search software lets me search for combinations > > of words; I've been assuming that's built in. > > Yes, most likely Solr will handle this need. > > > I want to do all of this locally -- not use the cloud or > > anything like that. mnogoSearch worked okay for me, but > > it's dead, and I'd like to move on to something modern. > > Apache nutch is a web crawler -- setting up a web server > > solely for the purpose of specifying what files I want > > indexed seems -- artificial. I guess I could do that > > but, golly.... Seems like there ought to be something more > > direct. > > As I understand it, Nutch doesn't actually do search. It's really good > at crawling a website and gathering all the data it contains, but relies > on other software for searching what it has gathered. We hear from a lot > of people that are having Solr handle indexing for Nutch. > > > Solr was suggested as a way to do this. Do I want something > > else? > > That's a tough question to answer and be sure the answer is right. In > general, Solr probably meets the needs of just about any kind of > searching you want to do, but sometimes people manage to find things > where Solr isn't the right solution. > > Based on what little information is here about your needs, I'm going to > cautiously say Solr is probably a good fit. To be sure that answer is > correct, we will need more information. Exactly what information we > will need is not completely straightforward. If you start with some high > level information about the data you want to search, then we will know > what questions to ask next. > > The first thing to nail down ... what do you want to get as the result > of a search? Do you want Solr to provide ALL of the information in the > result grid, or is it enough for Solr to return some kind of unique ID > that your software can then look up in another system to provide detail > to the user? That is the start of defining a "document" for Solr. In > one large system that I designed, a Solr document was basically a row in > a database table. The table had 160 million rows ... the entire table > file in MySQL was over a terabyte. Solr actually did have a lot of > information stored for each of those documents, so a search result grid > displayed to the user was populated entirely from Solr. If the user > then clicked on one of those results, the database would be consulted > for full details, using the unique identifier in the search results. > > Thanks, > Shawn > >