Hi Uday, Your email is a perfect example of https://en.m.wikipedia.org/wiki/XY_problem.
Both for indexing and query time you need to explain your problems and use cases rather than your attempted solutions. Then we'll be able to give some recommendations. On Wed, 5 Mar 2025, 06:39 Uday Kumar, <uday.p...@indiamart.com.invalid> wrote: > Hi, > I would like to give some extra context here, so that it would help in > getting better suggestions > > > *Our goal:To improve our search system either by optimizing indexing or by > improving solr response times* > > *Current approach while indexing at our end:* > Even with change in a single field of document, we send the entire document > for indexing. (~2cr docs are being reindexed on a daily basis) > Solr version: V9.6.1 > > *To Optimize Indexing:* > 1. POC on external file field: [which stores frequently changed fields in > external file and loads after each commit, instead of indexing into solr > for each change] > > https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html > Observation: > a. works only with numeric fields > b. Also the community suggested not to go with this, as its old feature. > so, I dropped this. > > 2. POC on Inplace update: (Which helps in indexing fields which contains > changes, but not entire document) > > https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#in-place-updates > Observation: > a. Works with only single values fields > b. Looks promising wrt indexing optimization but not suitable wrt our > schema (as we have more multivalued fields). so, dropped > > > Then we moved for alternatives which is expected to help in optimizing > response times > > *To improve Solr Response time:*Nested Documents POC: > > https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html > *wrt this statement:* > "In terms of performance,* indexing the relationships between documents > usually yields much faster queries* than an equivalent "query time join", > since the relationships are already stored in the index and do not need to > be computed" > > But here we found, complete block will be reindexed even with change in > single child document > So, we would like to know more about this feature, > 1. If this complete block reindexing is heavy when compared with > traditional indexing? [As we have more documents for reindexing per single > day i.e ~2cr] > 2. What we can expect with this nested document feature in terms of > performance (wrt tradeoff in indexing/querying) > 3. If not, do we have any other alternative which we can work upon > > *Thanks & Regards,* > *Uday Kumar* > > > On Mon, Mar 3, 2025 at 7:17 PM Uday Kumar <uday.p...@indiamart.com> wrote: > > > Also in place updates happen on very specific conditions, have you > checked > > you satisfy them before even attempting to see some sort of impact on > your > > use case? > > Yes we considered those specifications, here, we didnt mean to say > > it's not impactful in itself. but with our project & schema > > > > *Thanks & Regards,* > > *Uday Kumar* > > *Product Search Tech* > > > > > > On Fri, Feb 28, 2025 at 6:06 PM Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > >> What is your problem? Rather than asking about a solution you attempted > is > >> usually better to start from the problem. > >> > >> You talk about grouping, have you considered field collapsing? > >> > >> According to my experience going with nested documents rarely justify > the > >> performance and functional overhead both at indexing and query time. > >> > >> But sometimes you need them. > >> > >> Also in place updates happen on very specific conditions, have you > checked > >> you satisfy them before even attempting to see some sort of impact on > your > >> use case? > >> > >> Cheers > >> > >> On Fri, 28 Feb 2025, 08:30 Uday Kumar, <uday.p...@indiamart.com > .invalid> > >> wrote: > >> > >> > Does this mean it will not be impactful in performance to use Nested > >> > Indexing in production with such an indexing rate? > >> > > >> > We have tried POC on inplace updates and found its not impactful > either > >> wrt > >> > our project, so we would not be using this in combination too > >> > > >> > *Thanks & Regards,* > >> > *Uday Kumar* > >> > *Product Search Tech* > >> > > >> > > >> > On Thu, Feb 27, 2025 at 12:31 PM Mikhail Khludnev <m...@apache.org> > >> wrote: > >> > > >> > > Changing one child rewrites the whole block period. > >> > > However in-place updating child docValues is promising in theory, > >> > although > >> > > I don't know how it works in practice. > >> > > > >> > > On Thu, Feb 27, 2025 at 8:05 AM Uday Kumar <uday.p...@indiamart.com > >> > > .invalid> > >> > > wrote: > >> > > > >> > > > Hi all, > >> > > > We are doing a POC on indexing nested documents in expectation of > >> > > reducing > >> > > > grouping overhead while querying time. > >> > > > > >> > > > On Prod Indexing, we are using the traditional approach of > >> reindexing > >> > the > >> > > > entire document if there is any change in any of the fields. [we > >> > reindex > >> > > > ~2cr documents per day, FYI] > >> > > > Solr Version: v9.6.1 > >> > > > > >> > > > But I have come across a caution in solr documentation: *DOC > >> > > > < > >> > > > > >> > > > >> > > >> > https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html#:~:text=By%20way%20of%20examples%3A%20nested,%2F%20colors)%20and%20supporting%20documentation%20( > >> > > > >*, > >> > > > where it says: *Solr must internally reindex an entire nested > >> document > >> > > tree > >> > > > if there are updates to it.* > >> > > > Which means If a root or parent has 1000 child documents, even > with > >> a > >> > > > change in single document in any one of the fields, entire nested > >> > childs > >> > > > are reindexed, which is not good enough. > >> > > > > >> > > > This made us rethink of performance gains that we will have, if > >> nested > >> > > > documents are used in production. > >> > > > > >> > > > If that's the case, pls let us know if there are any other > solutions > >> > > which > >> > > > would help us in performance gains. > >> > > > > >> > > > *Note:* > >> > > > We have already done POC on external file fields and In-Place > >> updates > >> > > where > >> > > > we found they are not impactful for our project. > >> > > > > >> > > > *Thanks & Regards,* > >> > > > *Uday Kumar* > >> > > > > >> > > > >> > > > >> > > -- > >> > > Sincerely yours > >> > > Mikhail Khludnev > >> > > > >> > > >> > > >