Has anyway solved the following problem, or have good suggestions?
Each document is assigned to one or more category nodes in a hierarchy. For example, Document1: /Computer/Desktop, Document2: /Computer/Notebook; /Salesforce/ExtremePortable Document3: /Computer/Server ...... For each search operations, not only a list of documents hit is presented but a list of categories containing those documents as well as the count of documents are also computed /Computer/Desktop(30) /Computer/Notebook(12) /Computer/Accessories(51) One can see this really useful because it can "guide" the user while refining the search criteria and quickly reduce the size of the result. I know we can do this, by brut force, by going through the entire result set, retrieving data for the category field and start aggregating and counting. It's not scalable though if the number of documents needs to go through is high. It can create performance issues under load if each execution thread held on to the index reader for too long (due to the number of documents needs to go through). Is there any API or approach we can leverage at search time? Is there anything we can do at the indexing time? Or, is there any technology we need to integrate, like those for data warehousing? Any comments or pointers will be greatly appreciated. Thanks Ching-pei