so to sum up, it's indexation at data storing time right? Much appreciated.
Le dim. 29 janv. 2023 à 17:59, Gus Heck <gus.h...@gmail.com> a écrit : > Definately all up front. The entire premise of search is that we do as much > work at index time as possible so that queries are fast. More importantly, > the whole point of the search is to discover what documents the user might > want. If you don't index everything from the start you would need a process > like: > > 1. Determine which docs the user wants > 2. index them. > 3. query the index. > > But once you've done step 1 you can already just send those results to the > user and skip the rest! So with search you index everything you think any > user might want, storing the location to find the document at the same time > (in a field) when you do your search, the result contains the id of the > documents that seem relevant and the location you stored at index time > (often a URL). Then you show that list of urls to the user and they click > on one (the classic 10 blue links as you see on google). There are more > complicated scenarios, and ways to make the display more useful for the > user for sure, but that's the basic idea. > > As for size limit, it depends. Most of the limits are derived from the > underlying hardware, and on what metric you are measuring (doc count or > size on disk), how much hardware you can afford and what type of documents > you are indexing. Lucene has a technical limitation of MAX_INT documents > per physical index, but solr allows you to query across multiple physical > lucene indexes so that's not a problem. I had a client working with very > small documents that indexed 450 billion of them and another with full > multi-page documents that had over a billion. If you think you might have > anything like those levels, there's some significant work in setting up > systems that large, and you may want to hire a consultant to avoid > painful and costly mis-steps. (Hardware on amazon for systems of that size > costs many hundreds of thousands or more annually) > > -Gus > > On Sun, Jan 29, 2023 at 10:19 AM marc nicole <mk1853...@gmail.com> wrote: > > > Hello - I want to know whether it is common practice to index all the > > datasets from the start or the indexation should be performed when the > data > > is being queried? > > Also, is there a size limit on the data to index into Solr? > > Thanks. > > > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >