[
https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469375#comment-16469375
]
David Smiley commented on SOLR-12298:
-------------------------------------
Quoting [~hossman] here inline (hoping for his input):
{quote}Are you suggesting we model child documents as objects
(SolrInputDocuments i guess?) in a special field?
{quote}
Yes. Not as a special field, although _anonymous_ children (those that don't
have any particular label (no named relationship)) could use the
_childDocuments_ key as it's consistent with existing use of this label.
{quote}... what if i put child documents in multiple fields? would that signify
the different types of child?
{quote}
Yes indeed. This is largely the point of this approach, since the current
anonymous relationship has a loss of semantics in the relationship.
{quote}how would solr model that in the (lucene) Documents when giving them to
the InddexWriter?
{quote}
In this issue, Moshe has proposed a labeled path field, e.g. "post.comment".
This path would be added in an URP, or perhaps it would be done by
\{{AddUpdateCommand.flatten/recUnwrap}} right when the URP chain is done.
{quote}How would solr know how to order the children in from multiple
fields/lists when creating the block?
{quote}
Ah, I think that's a non-issue as they are indexed in the order given
(notwistanding the hierarchy flattening with parent last). If you meant how
might the order be reconstituted later at retrieval time then we can rely on
the docID order since they are kept in order and never broken up.
{quote}Wouldn't the "type of child" information be better living in the child
documents itself? (particularly since that "type" information needs to be in
the child documents anyway so that the filter query for a BJQ can be specified.)
{quote}
_Ultimately_ it does in the generated Lucene Document.
{quote}It also seems like it would require code that wants to know what
children exist in a document to do a lot of work to find that out (need to
iterate ever field in the SolrInputDocument and do reflection to see if they
are child-documents or not)
{quote}
I looked at this; it's AddSchemaFieldsUpdateProcessorFactory and
AddUpdateCommand.flatten/recUnwrap. I'm not concerned about the former as it's
for schema-guessing; only the latter. Perhaps this is no big deal; it's only
the number of distinct field names in the average document? Also if the schema
contained special "ChildDoc" fields or some-such, then the schema could guide
these code paths to know which field names to lookup in the incoming document.
{quote}Another concern off the top of my head is that a lot of existing code
(including any custom update processors people might have) would assume those
child documents are multivaluved field values and would probably break – hence
a new method on SolrInputDocument seems wiser (code that doens't know about may
not do what you want, but at least it won't break it)
{quote}
Fixable on a case by case basis. If this is worse than I imagine it is, then
what URP would be the worst offender?
In summary, the current approach doesn't retain the semantic information of
relationships, and I believe removing SolrInputFields.childDocuments will
result in something _simpler_. It also allows a cleaner separation between the
format-specific input (JSON vs XML vs ...) and logic that should be ignorant to
that.
The next-best alternative I can think of that doesn't disturb
SolrInputDocument._childDocuments would be if hypothetically SolrInputDocument
had overloaded addChildDocument to accept a relationship string. And the impl
would add the child document along with mutating it to have the fields moshe
has spoken of. But this seems trappy to me since some methods would do this
and the existing ones wouldn't, and so the format loader would need to be
careful to always use or or the other.
> Index Full nested document Hierarchy For Queries (umbrella issue)
> -----------------------------------------------------------------
>
> Key: SOLR-12298
> URL: https://issues.apache.org/jira/browse/SOLR-12298
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: mosh
> Priority: Major
>
> Solr ought to have the ability to index deeply nested objects, while storing
> the original document hierarchy.
> Currently the client has to index the child document's full path and level
> to manually reconstruct the original document structure, since the children
> are flattened and returned in the reserved "__childDocuments__" key.
> Ideally you could index a nested document, having Solr transparently add the
> required fields while providing a document transformer to rebuild the
> original document's hierarchy.
>
> This issue is an umbrella issue for the particular tasks that will make it
> all happen – either subtasks or issue linking.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]