Hi there,
I saw that SOLR's got a special graph query parser. If I understand
things correctly, that parser could comput the documents paths at query
time. Instead of storing all parent node references in each child
element, storing only the immediate parents should be sufficient with
this parser.
Can someone enlighten me about the time such a path computation would
take? Considering up to 20 levels of hierarchy (wild guess), would it be
beneficial to use the graph query parser over some application layer
logic as outlined in my initial posting below? Can anyone share
experiences with the graph query parser?
Best,
Marc
Am 22.11.2024 10:54, schrieb ufuk yılmaz:
I don’t like multivalued fields much because they don’t play nice with
docValues which enable many cool features I like about Solr. They also
don’t match indexes (find docs that have this value on the 3rd
position). But don’t take this as a suggestion not to use them, they
have their use.
Check
https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#path-hierarchy-tokenizer
as an alternative to see if it can help with your case
-ufuk yilmaz
—
On Nov 21, 2024, at 22:13, Marc <sedsh...@busy-byte.org> wrote:
Hi there,
I am running an application comprising approx. 20 million documents.
To make these documents searchable, I decided to give SOLR a try and
feed meta-information about my documents into SOLR using a Python
script. This all works fine. My question is there less a technical
one, but rather a structural/strategic one.
In my document collection, documents can have parent-documents, and in
particular not only one parent document, but potentially several
parent documents. Each document is identified uniquely by an ID value.
Child documents refer to their parents using a multi-value field
'parent' which holds the parent's ID values.
I am interested in the paths that lead from leaf-documents (documents
that only have parents, but no further children) back to the root
document. My idea was to add any parent document of a child document
(also those further away than the immediate parents, so grandparent
and grand-grand-... parent documents) into this multi-value parent
property.
Later, I want to be able to pick any document X from my 20 million
documents and efficiently determine the set of documents of which X is
a parent. I.e., all documents that have an entry of X's ID somewhere
in their parent field.
* Is my strategy of structuring my documents by means of the
multi-value property sensible?
* Does SOLR provide better methods (that I'm not aware of) to achieve
the same?
* Will this perform properly? Or is my structuring method likely to
keel over at some stage if the number of documents keeps growing?
Best,
Marc