so to sum up, it's indexation at data storing time right?
Much appreciated.

Le dim. 29 janv. 2023 à 17:59, Gus Heck <gus.h...@gmail.com> a écrit :

> Definately all up front. The entire premise of search is that we do as much
> work at index time as possible so that queries are fast. More importantly,
> the whole point of the search is to discover what documents the user might
> want. If you don't index everything from the start you would need a process
> like:
>
> 1. Determine which docs the user wants
> 2. index them.
> 3. query the index.
>
> But once  you've done step 1 you can already just send those results to the
> user and skip the rest! So with search you index everything you think any
> user might want, storing the location to find the document at the same time
> (in a field) when you do your search, the result contains the id of the
> documents that seem relevant and the location you stored at index time
> (often a URL). Then you show that list of urls to the user and they click
> on one (the classic 10 blue links as you see on google). There are more
> complicated scenarios, and ways to make the display more useful for the
> user for sure, but that's the basic idea.
>
> As for size limit, it depends. Most of the limits are derived from the
> underlying hardware, and on what metric you are measuring (doc count or
> size on disk), how much hardware you can afford and what type of documents
> you are indexing. Lucene has a technical limitation of MAX_INT documents
> per physical index, but solr allows you to query across multiple physical
> lucene indexes so that's not a problem. I had a client working with very
> small documents that indexed 450 billion of them and another with full
> multi-page documents that had over a billion. If you think you might have
> anything like those levels, there's some significant work in setting up
> systems that large, and you may want to hire a consultant to avoid
> painful and costly mis-steps. (Hardware on amazon for systems of that size
> costs many hundreds of thousands or more annually)
>
> -Gus
>
> On Sun, Jan 29, 2023 at 10:19 AM marc nicole <mk1853...@gmail.com> wrote:
>
> > Hello - I want to know whether it is common practice to index all the
> > datasets from the start or the indexation should be performed when the
> data
> > is being queried?
> > Also, is there a size limit on the data to index into Solr?
> > Thanks.
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Reply via email to