Re: Document path representation as multivalued fields: sensible?

Marc Mon, 25 Nov 2024 05:21:24 -0800

Hi there,

I saw that SOLR's got a special graph query parser. If I understandthings correctly, that parser could comput the documents paths at querytime. Instead of storing all parent node references in each childelement, storing only the immediate parents should be sufficient withthis parser.

Can someone enlighten me about the time such a path computation wouldtake? Considering up to 20 levels of hierarchy (wild guess), would it bebeneficial to use the graph query parser over some application layerlogic as outlined in my initial posting below? Can anyone shareexperiences with the graph query parser?


Best,
Marc

Am 22.11.2024 10:54, schrieb ufuk yılmaz:

I don’t like multivalued fields much because they don’t play nice with
docValues which enable many cool features I like about Solr. They also
don’t match indexes (find docs that have this value on the 3rd
position). But don’t take this as a suggestion not to use them, they
have their use.

Check
https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#path-hierarchy-tokenizer
as an alternative to see if it can help with your case

-ufuk yilmaz



—
On Nov 21, 2024, at 22:13, Marc <sedsh...@busy-byte.org> wrote:

Hi there,
I am running an application comprising approx. 20 million documents.To make these documents searchable, I decided to give SOLR a try andfeed meta-information about my documents into SOLR using a Pythonscript. This all works fine. My question is there less a technicalone, but rather a structural/strategic one.
In my document collection, documents can have parent-documents, and inparticular not only one parent document, but potentially severalparent documents. Each document is identified uniquely by an ID value.Child documents refer to their parents using a multi-value field'parent' which holds the parent's ID values.
I am interested in the paths that lead from leaf-documents (documentsthat only have parents, but no further children) back to the rootdocument. My idea was to add any parent document of a child document(also those further away than the immediate parents, so grandparentand grand-grand-... parent documents) into this multi-value parentproperty.
Later, I want to be able to pick any document X from my 20 milliondocuments and efficiently determine the set of documents of which X isa parent. I.e., all documents that have an entry of X's ID somewherein their parent field.
* Is my strategy of structuring my documents by means of themulti-value property sensible?* Does SOLR provide better methods (that I'm not aware of) to achievethe same?* Will this perform properly? Or is my structuring method likely tokeel over at some stage if the number of documents keeps growing?
Best,
Marc

Re: Document path representation as multivalued fields: sensible?

Reply via email to