[
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034711#comment-13034711
]
Simon Willnauer commented on LUCENE-3112:
-----------------------------------------
bq. Initial patch.
nice simple idea! I like the refactorings into pre/postUpdate - looks much
cleaner. Yet, I think you should push the document iteration etc into DWPT to
actually apply the delterm only once to make it really atomic. I also wonder if
we should allow multiple delTerm e.g. Tuple<DelTerm, Document> otherwise you
would be bound to one delterm pre "collection" but what if you want to remove
only one of the "sub-documents"? So if we would have those tuples you really
want to push the iteration into DWPT to make a final finishDocument(Term[]
terms) call pushing the terms into a single DeleteItem.
> Add IW.add/updateDocuments to support nested documents
> ------------------------------------------------------
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene. It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier". I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable<Document>), updateDocuments(Term
> delTerm, Iterable<Document>) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]