mikemccand commented on code in PR #13486:
URL: https://github.com/apache/lucene/pull/13486#discussion_r1640752212
##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -1838,6 +1838,20 @@ public long updateDocument(Term term, Iterable<? extends
IndexableField> doc) th
term == null ? null : DocumentsWriterDeleteQueue.newNode(term),
List.of(doc));
}
+ /**
+ * Similar to {@link #updateDocuments(Term, Iterable)}, but only apply
deletion once for all
+ * flushed segments. This is useful for unique filed like ES's _id.
+ *
+ * @lucene.experimental
+ */
+ // TODO: If it is unnecessary to validate unique constraint, we can add a
isUnique setting to
+ // Term.
+ public long updateDocument(Term term, boolean isUnique, Iterable<? extends
IndexableField> doc)
Review Comment:
Hmm this makes me nervous -- we are relying on the application to properly
claim `isUnique` but the application may get it wrong. Though I suppose worst
case if the application gets it wrong, documents fail to get deleted (just the
first occurrence will), not for example index corruption.
Could Lucene maybe track that a field is actually unique internally and then
apply this optimization automatically / always correctly? We may have to
tighten the opto to "isUnique and isNonNull (every doc has a value for the
field)", which OpenSearch/Elasaticsearch `id` field would meet?
If not for deletes/updates we could compare number of unique terms in the
field == `totalTermFreq`. Or, maybe we'd instead track that "this field
always has exactly one value" so that neither freqs nor positions would have to
be indexed for this opto to apply? And then if number of unique terms is >=
`numDocs`, and every doc has one term in this field, then it is a "primary key"?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]