Hi all,
In our index, we have data of suppliers along with their products which we
display on front-end, wrt search requests.



*Example:For a supplier with id: 678, we have 2 products in our index*
*product-id(unique)*
*document1:*
{
product-id: 123
product-price: 2000rs
product-name: Jute bags

*supplier-id: 678company-name: BagFactoryLimited*
}

*document2:*
{
product-id: 863
product-price: 4500rs
product-name: trolley bags

*supplier-id: 678company-name: BagFactoryLimited*
}



*As you can see from above, each document in our index containsproduct
details i.e product-id, product-price, product-nameand also supplier
details i.e supplier-id, company-name*

*Problem1: (while indexing)*
Here, whenever there is a change in supplier specific details/field, we are
re-indexing all the products of the supplier although the supplier data
will be the same in all of his products.
*FYI*
We re-index ~5Cr documents per day


*We would like to know, if there is any better way to optimize this which
helps to avoid indexing of redundant data*
*Problem2: (while querying)*
Now, when the data in our current index is queried, we display the single
most relevant product of a supplier. [even if the query matches 1 or more
documents in our index]

For this we are using a *collapse query* on supplier-id field (as we dont
know relationship between documents) [which is resource intensive]
*Ex:*
fq={!collapse field=supplier-id}

*FYI*
We serve ~25 Lakh Queries per day

*We would like to know if there is any better way to organize index, so
that we can avoid such resource intensive queries, thereby optimizing
search response*

*Our Solr Infra Stats: FYI*
*Version:* v9.6.1
*No. of nodes:* 8
*No. of shards:* 62
*Heap per node: *12G
*RAM per node: *50G
*No. of cpu cores per node: *16
*Count of docs:* ~20Cr
*Size of Index: *~250G
*Routing used:* implicit

*Thanks & Regards,*
*Uday Kumar*
*Product Search Tech*

Reply via email to