Hi there,

I am running an application comprising approx. 20 million documents. To make these documents searchable, I decided to give SOLR a try and feed meta-information about my documents into SOLR using a Python script. This all works fine. My question is there less a technical one, but rather a structural/strategic one.

In my document collection, documents can have parent-documents, and in particular not only one parent document, but potentially several parent documents. Each document is identified uniquely by an ID value. Child documents refer to their parents using a multi-value field 'parent' which holds the parent's ID values.

I am interested in the paths that lead from leaf-documents (documents that only have parents, but no further children) back to the root document. My idea was to add any parent document of a child document (also those further away than the immediate parents, so grandparent and grand-grand-... parent documents) into this multi-value parent property.

Later, I want to be able to pick any document X from my 20 million documents and efficiently determine the set of documents of which X is a parent. I.e., all documents that have an entry of X's ID somewhere in their parent field.

* Is my strategy of structuring my documents by means of the multi-value property sensible? * Does SOLR provide better methods (that I'm not aware of) to achieve the same? * Will this perform properly? Or is my structuring method likely to keel over at some stage if the number of documents keeps growing?

Best,
Marc

Reply via email to