Hi, I would like to give some extra context here, so that it would help in getting better suggestions
*Our goal:To improve our search system either by optimizing indexing or by improving solr response times* *Current approach while indexing at our end:* Even with change in a single field of document, we send the entire document for indexing. (~2cr docs are being reindexed on a daily basis) Solr version: V9.6.1 *To Optimize Indexing:* 1. POC on external file field: [which stores frequently changed fields in external file and loads after each commit, instead of indexing into solr for each change] https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html Observation: a. works only with numeric fields b. Also the community suggested not to go with this, as its old feature. so, I dropped this. 2. POC on Inplace update: (Which helps in indexing fields which contains changes, but not entire document) https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#in-place-updates Observation: a. Works with only single values fields b. Looks promising wrt indexing optimization but not suitable wrt our schema (as we have more multivalued fields). so, dropped Then we moved for alternatives which is expected to help in optimizing response times *To improve Solr Response time:*Nested Documents POC: https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html *wrt this statement:* "In terms of performance,* indexing the relationships between documents usually yields much faster queries* than an equivalent "query time join", since the relationships are already stored in the index and do not need to be computed" But here we found, complete block will be reindexed even with change in single child document So, we would like to know more about this feature, 1. If this complete block reindexing is heavy when compared with traditional indexing? [As we have more documents for reindexing per single day i.e ~2cr] 2. What we can expect with this nested document feature in terms of performance (wrt tradeoff in indexing/querying) 3. If not, do we have any other alternative which we can work upon *Thanks & Regards,* *Uday Kumar* On Mon, Mar 3, 2025 at 7:17 PM Uday Kumar <uday.p...@indiamart.com> wrote: > Also in place updates happen on very specific conditions, have you checked > you satisfy them before even attempting to see some sort of impact on your > use case? > Yes we considered those specifications, here, we didnt mean to say > it's not impactful in itself. but with our project & schema > > *Thanks & Regards,* > *Uday Kumar* > *Product Search Tech* > > > On Fri, Feb 28, 2025 at 6:06 PM Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > >> What is your problem? Rather than asking about a solution you attempted is >> usually better to start from the problem. >> >> You talk about grouping, have you considered field collapsing? >> >> According to my experience going with nested documents rarely justify the >> performance and functional overhead both at indexing and query time. >> >> But sometimes you need them. >> >> Also in place updates happen on very specific conditions, have you checked >> you satisfy them before even attempting to see some sort of impact on your >> use case? >> >> Cheers >> >> On Fri, 28 Feb 2025, 08:30 Uday Kumar, <uday.p...@indiamart.com.invalid> >> wrote: >> >> > Does this mean it will not be impactful in performance to use Nested >> > Indexing in production with such an indexing rate? >> > >> > We have tried POC on inplace updates and found its not impactful either >> wrt >> > our project, so we would not be using this in combination too >> > >> > *Thanks & Regards,* >> > *Uday Kumar* >> > *Product Search Tech* >> > >> > >> > On Thu, Feb 27, 2025 at 12:31 PM Mikhail Khludnev <m...@apache.org> >> wrote: >> > >> > > Changing one child rewrites the whole block period. >> > > However in-place updating child docValues is promising in theory, >> > although >> > > I don't know how it works in practice. >> > > >> > > On Thu, Feb 27, 2025 at 8:05 AM Uday Kumar <uday.p...@indiamart.com >> > > .invalid> >> > > wrote: >> > > >> > > > Hi all, >> > > > We are doing a POC on indexing nested documents in expectation of >> > > reducing >> > > > grouping overhead while querying time. >> > > > >> > > > On Prod Indexing, we are using the traditional approach of >> reindexing >> > the >> > > > entire document if there is any change in any of the fields. [we >> > reindex >> > > > ~2cr documents per day, FYI] >> > > > Solr Version: v9.6.1 >> > > > >> > > > But I have come across a caution in solr documentation: *DOC >> > > > < >> > > > >> > > >> > >> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html#:~:text=By%20way%20of%20examples%3A%20nested,%2F%20colors)%20and%20supporting%20documentation%20( >> > > > >*, >> > > > where it says: *Solr must internally reindex an entire nested >> document >> > > tree >> > > > if there are updates to it.* >> > > > Which means If a root or parent has 1000 child documents, even with >> a >> > > > change in single document in any one of the fields, entire nested >> > childs >> > > > are reindexed, which is not good enough. >> > > > >> > > > This made us rethink of performance gains that we will have, if >> nested >> > > > documents are used in production. >> > > > >> > > > If that's the case, pls let us know if there are any other solutions >> > > which >> > > > would help us in performance gains. >> > > > >> > > > *Note:* >> > > > We have already done POC on external file fields and In-Place >> updates >> > > where >> > > > we found they are not impactful for our project. >> > > > >> > > > *Thanks & Regards,* >> > > > *Uday Kumar* >> > > > >> > > >> > > >> > > -- >> > > Sincerely yours >> > > Mikhail Khludnev >> > > >> > >> >