On Sat, Apr 16, 2022 at 9:27 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/16/2022 8:25 PM, Jay Scott wrote:
> > I hope that the search software lets me search for combinations
> > of words; I've been assuming that's built in.
>
> Yes, most likely Solr will handle this need.
>
> > I want to do all of this locally -- not use the cloud or
> > anything like that.  mnogoSearch worked okay for me, but
> > it's dead, and I'd like to move on to something modern.
> > Apache nutch is a web crawler -- setting up a web server
> > solely for the purpose of specifying what files I want
> > indexed seems -- artificial.  I guess I could do that
> > but, golly....  Seems like there ought to be something more
> > direct.
>
> As I understand it, Nutch doesn't actually do search.  It's really good
> at crawling a website and gathering all the data it contains, but relies
> on other software for searching what it has gathered. We hear from a lot
> of people that are having Solr handle indexing for Nutch.
>
> > Solr was suggested as a way to do this.  Do I want something
> > else?
>
> That's a tough question to answer and be sure the answer is right.  In
> general, Solr probably meets the needs of just about any kind of
> searching you want to do, but sometimes people manage to find things
> where Solr isn't the right solution.
>
> Based on what little information is here about your needs, I'm going to
> cautiously say Solr is probably a good fit.  To be sure that answer is
> correct, we will need more information.  Exactly what information we
> will need is not completely straightforward. If you start with some high
> level information about the data you want to search, then we will know
> what questions to ask next.
>
> The first thing to nail down ... what do you want to get as the result
> of a search?  Do you want Solr to provide ALL of the information in the
> result grid, or is it enough for Solr to return some kind of unique ID
> that your software can then look up in another system to provide detail
> to the user?  That is the start of defining a "document" for Solr.  In
> one large system that I designed, a Solr document was basically a row in
> a database table.  The table had 160 million rows ... the entire table
> file in MySQL was over a terabyte.  Solr actually did have a lot of
> information stored for each of those documents, so a search result grid
> displayed to the user was populated entirely from Solr.  If the user
> then clicked on one of those results, the database would be consulted
> for full details, using the unique identifier in the search results.
>
> Thanks,
> Shawn
>
>

Reply via email to