Hi Uday,
Your email is a perfect example  of
https://en.m.wikipedia.org/wiki/XY_problem.

Both for indexing and query time you need to explain your problems and use
cases rather than your attempted solutions.


Then we'll be able to give some recommendations.


On Wed, 5 Mar 2025, 06:39 Uday Kumar, <uday.p...@indiamart.com.invalid>
wrote:

> Hi,
> I would like to give some extra context here, so that it would help in
> getting better suggestions
>
>
> *Our goal:To improve our search system either by optimizing indexing or by
> improving solr response times*
>
> *Current approach while indexing at our end:*
> Even with change in a single field of document, we send the entire document
> for indexing. (~2cr docs are being reindexed on a daily basis)
> Solr version: V9.6.1
>
> *To Optimize Indexing:*
> 1. POC on external file field: [which stores frequently changed fields in
> external file and loads after each commit, instead of indexing into solr
> for each change]
>
> https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html
> Observation:
> a. works only with numeric fields
> b. Also the community suggested not to go with this, as its old feature.
> so, I dropped this.
>
> 2. POC on Inplace update: (Which helps in indexing fields which contains
> changes, but not entire document)
>
> https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#in-place-updates
> Observation:
> a. Works with only single values fields
> b. Looks promising wrt indexing optimization but not suitable wrt our
> schema (as we have more multivalued fields). so, dropped
>
>
> Then we moved for alternatives which is expected to help in optimizing
> response times
>
> *To improve Solr Response time:*Nested Documents POC:
>
> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html
> *wrt this statement:*
> "In terms of performance,* indexing the relationships between documents
> usually yields much faster queries* than an equivalent "query time join",
> since the relationships are already stored in the index and do not need to
> be computed"
>
> But here we found, complete block will be reindexed even with change in
> single child document
> So, we would like to know more about this feature,
> 1. If this complete block reindexing is heavy when compared with
> traditional indexing? [As we have more documents for reindexing per single
> day i.e ~2cr]
> 2. What we can expect with this nested document feature in terms of
> performance (wrt tradeoff in indexing/querying)
>  3. If not, do we have any other alternative which we can work upon
>
> *Thanks & Regards,*
> *Uday Kumar*
>
>
> On Mon, Mar 3, 2025 at 7:17 PM Uday Kumar <uday.p...@indiamart.com> wrote:
>
> > Also in place updates happen on very specific conditions, have you
> checked
> > you satisfy them before even attempting to see some sort of impact on
> your
> > use case?
> > Yes we considered those specifications, here, we didnt mean to say
> > it's not impactful in itself. but with our project & schema
> >
> > *Thanks & Regards,*
> > *Uday Kumar*
> > *Product Search Tech*
> >
> >
> > On Fri, Feb 28, 2025 at 6:06 PM Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> >> What is your problem? Rather than asking about a solution you attempted
> is
> >> usually better to start from the problem.
> >>
> >> You talk about grouping, have you considered field collapsing?
> >>
> >> According to my experience going with nested documents rarely justify
> the
> >> performance and functional overhead both at indexing and query time.
> >>
> >> But sometimes you need them.
> >>
> >> Also in place updates happen on very specific conditions, have you
> checked
> >> you satisfy them before even attempting to see some sort of impact on
> your
> >> use case?
> >>
> >> Cheers
> >>
> >> On Fri, 28 Feb 2025, 08:30 Uday Kumar, <uday.p...@indiamart.com
> .invalid>
> >> wrote:
> >>
> >> > Does this mean it will not be impactful in performance to use Nested
> >> > Indexing in production with such an indexing rate?
> >> >
> >> > We have tried POC on inplace updates and found its not impactful
> either
> >> wrt
> >> > our project, so we would not be using this in combination too
> >> >
> >> > *Thanks & Regards,*
> >> > *Uday Kumar*
> >> > *Product Search Tech*
> >> >
> >> >
> >> > On Thu, Feb 27, 2025 at 12:31 PM Mikhail Khludnev <m...@apache.org>
> >> wrote:
> >> >
> >> > > Changing one child rewrites the whole block period.
> >> > > However in-place updating child docValues is promising in theory,
> >> > although
> >> > > I don't know how it works in practice.
> >> > >
> >> > > On Thu, Feb 27, 2025 at 8:05 AM Uday Kumar <uday.p...@indiamart.com
> >> > > .invalid>
> >> > > wrote:
> >> > >
> >> > > > Hi all,
> >> > > > We are doing a POC on indexing nested documents in expectation of
> >> > > reducing
> >> > > > grouping overhead while querying time.
> >> > > >
> >> > > > On Prod Indexing, we are using the traditional approach of
> >> reindexing
> >> > the
> >> > > > entire document if there is any change in any of the fields. [we
> >> > reindex
> >> > > > ~2cr documents per day, FYI]
> >> > > > Solr Version: v9.6.1
> >> > > >
> >> > > > But I have come across a caution in solr documentation: *DOC
> >> > > > <
> >> > > >
> >> > >
> >> >
> >>
> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-nested-documents.html#:~:text=By%20way%20of%20examples%3A%20nested,%2F%20colors)%20and%20supporting%20documentation%20(
> >> > > > >*,
> >> > > > where it says: *Solr must internally reindex an entire nested
> >> document
> >> > > tree
> >> > > > if there are updates to it.*
> >> > > > Which means If a root or parent has 1000 child documents, even
> with
> >> a
> >> > > > change in single document  in any one of the fields, entire nested
> >> > childs
> >> > > > are reindexed, which is not good enough.
> >> > > >
> >> > > > This made us rethink of performance gains that we will have, if
> >> nested
> >> > > > documents are used in production.
> >> > > >
> >> > > > If that's the case, pls let us know if there are any other
> solutions
> >> > > which
> >> > > > would help us in performance gains.
> >> > > >
> >> > > > *Note:*
> >> > > > We have already done POC on external file fields and In-Place
> >> updates
> >> > > where
> >> > > > we found they are not impactful for our project.
> >> > > >
> >> > > > *Thanks & Regards,*
> >> > > > *Uday Kumar*
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours
> >> > > Mikhail Khludnev
> >> > >
> >> >
> >>
> >
>

Reply via email to