Hi there,
I am running an application comprising approx. 20 million documents. To
make these documents searchable, I decided to give SOLR a try and feed
meta-information about my documents into SOLR using a Python script.
This all works fine. My question is there less a technical one, but
rather a structural/strategic one.
In my document collection, documents can have parent-documents, and in
particular not only one parent document, but potentially several parent
documents. Each document is identified uniquely by an ID value. Child
documents refer to their parents using a multi-value field 'parent'
which holds the parent's ID values.
I am interested in the paths that lead from leaf-documents (documents
that only have parents, but no further children) back to the root
document. My idea was to add any parent document of a child document
(also those further away than the immediate parents, so grandparent and
grand-grand-... parent documents) into this multi-value parent property.
Later, I want to be able to pick any document X from my 20 million
documents and efficiently determine the set of documents of which X is a
parent. I.e., all documents that have an entry of X's ID somewhere in
their parent field.
* Is my strategy of structuring my documents by means of the multi-value
property sensible?
* Does SOLR provide better methods (that I'm not aware of) to achieve
the same?
* Will this perform properly? Or is my structuring method likely to keel
over at some stage if the number of documents keeps growing?
Best,
Marc
- Document path representation as multivalued fields: sensible? Marc
-