Hi there,

I saw that SOLR's got a special graph query parser. If I understand things correctly, that parser could comput the documents paths at query time. Instead of storing all parent node references in each child element, storing only the immediate parents should be sufficient with this parser.

Can someone enlighten me about the time such a path computation would take? Considering up to 20 levels of hierarchy (wild guess), would it be beneficial to use the graph query parser over some application layer logic as outlined in my initial posting below? Can anyone share experiences with the graph query parser?

Best,
Marc

Am 22.11.2024 10:54, schrieb ufuk yılmaz:
I don’t like multivalued fields much because they don’t play nice with
docValues which enable many cool features I like about Solr. They also
don’t match indexes (find docs that have this value on the 3rd
position). But don’t take this as a suggestion not to use them, they
have their use.

Check
https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#path-hierarchy-tokenizer
as an alternative to see if it can help with your case

-ufuk yilmaz



—

On Nov 21, 2024, at 22:13, Marc <sedsh...@busy-byte.org> wrote:

Hi there,

I am running an application comprising approx. 20 million documents. To make these documents searchable, I decided to give SOLR a try and feed meta-information about my documents into SOLR using a Python script. This all works fine. My question is there less a technical one, but rather a structural/strategic one.

In my document collection, documents can have parent-documents, and in particular not only one parent document, but potentially several parent documents. Each document is identified uniquely by an ID value. Child documents refer to their parents using a multi-value field 'parent' which holds the parent's ID values.

I am interested in the paths that lead from leaf-documents (documents that only have parents, but no further children) back to the root document. My idea was to add any parent document of a child document (also those further away than the immediate parents, so grandparent and grand-grand-... parent documents) into this multi-value parent property.

Later, I want to be able to pick any document X from my 20 million documents and efficiently determine the set of documents of which X is a parent. I.e., all documents that have an entry of X's ID somewhere in their parent field.

* Is my strategy of structuring my documents by means of the multi-value property sensible? * Does SOLR provide better methods (that I'm not aware of) to achieve the same? * Will this perform properly? Or is my structuring method likely to keel over at some stage if the number of documents keeps growing?

Best,
Marc

Reply via email to