Hi all, In our index, we have data of suppliers along with their products which we display on front-end, wrt search requests.
*Example:For a supplier with id: 678, we have 2 products in our index* *product-id(unique)* *document1:* { product-id: 123 product-price: 2000rs product-name: Jute bags *supplier-id: 678company-name: BagFactoryLimited* } *document2:* { product-id: 863 product-price: 4500rs product-name: trolley bags *supplier-id: 678company-name: BagFactoryLimited* } *As you can see from above, each document in our index containsproduct details i.e product-id, product-price, product-nameand also supplier details i.e supplier-id, company-name* *Problem1: (while indexing)* Here, whenever there is a change in supplier specific details/field, we are re-indexing all the products of the supplier although the supplier data will be the same in all of his products. *FYI* We re-index ~5Cr documents per day *We would like to know, if there is any better way to optimize this which helps to avoid indexing of redundant data* *Problem2: (while querying)* Now, when the data in our current index is queried, we display the single most relevant product of a supplier. [even if the query matches 1 or more documents in our index] For this we are using a *collapse query* on supplier-id field (as we dont know relationship between documents) [which is resource intensive] *Ex:* fq={!collapse field=supplier-id} *FYI* We serve ~25 Lakh Queries per day *We would like to know if there is any better way to organize index, so that we can avoid such resource intensive queries, thereby optimizing search response* *Our Solr Infra Stats: FYI* *Version:* v9.6.1 *No. of nodes:* 8 *No. of shards:* 62 *Heap per node: *12G *RAM per node: *50G *No. of cpu cores per node: *16 *Count of docs:* ~20Cr *Size of Index: *~250G *Routing used:* implicit *Thanks & Regards,* *Uday Kumar* *Product Search Tech*