easyice opened a new issue, #12983:
URL: https://github.com/apache/lucene/issues/12983
### Description
I'm using elasticsearch in the advertising analysis system, some index will
have heavy updates operation. In general, we would disable softdelete because
of its performance issues.However, this feature must be enabled in CCR
scenarios. sometimes, the`SoftDeletesRetentionMergePolicy#numDeletesToMerge`
method can take quite a long time to execute, that will block many write
threads like this:
<details>
<summary >Mering thread</summary>
```
"elasticsearch[fdbd:xxx::30_9201][write][T#123]" #498 daemon prio=5
os_prio=0 cpu=62471737.46ms elapsed=5866392.54s tid=0x00007fb330149000
nid=0x10f555 runnable [0x00007fafe83c0000]
java.lang.Thread.State: RUNNABLE
at
org.apache.lucene.codecs.lucene80.IndexedDISI.advance(IndexedDISI.java:384)
at
org.apache.lucene.codecs.lucene80.IndexedDISI.nextDoc(IndexedDISI.java:459)
at
org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene80DocValuesProducer.java:496)
at
org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:245)
at
org.apache.lucene.index.SoftDeletesRetentionMergePolicy.numDeletesToMerge(SoftDeletesRetentionMergePolicy.java:145)
at
org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104)
at
org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy.lambda$numDeletesToMerge$0(CachedSoftDeletesCountMergePolicy.java:87)
at
org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy$$Lambda$4159/0x00007fb0e2a5a960.load(Unknown
Source)
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
at
org.elasticsearch.index.engine.CachedSoftDeletesCountMergePolicy.numDeletesToMerge(CachedSoftDeletesCountMergePolicy.java:87)
at
org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104)
at
org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104)
at
org.apache.lucene.index.FilterMergePolicy.numDeletesToMerge(FilterMergePolicy.java:104)
at
org.apache.lucene.index.PendingDeletes.numDeletesToMerge(PendingDeletes.java:235)
at
org.apache.lucene.index.PendingSoftDeletes.numDeletesToMerge(PendingSoftDeletes.java:177)
at
org.apache.lucene.index.ReadersAndUpdates.numDeletesToMerge(ReadersAndUpdates.java:235)
- locked <0x00007fc04eda8b20> (a
org.apache.lucene.index.ReadersAndUpdates)
at
org.apache.lucene.index.IndexWriter.numDeletesToMerge(IndexWriter.java:5225)
at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:559)
at
org.apache.lucene.index.TieredMergePolicy.getSortedBySegmentSize(TieredMergePolicy.java:294)
at
org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:323)
at
org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46)
at
org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47)
at
org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47)
at
org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46)
at
org.apache.lucene.index.OneMergeWrappingMergePolicy.findMerges(OneMergeWrappingMergePolicy.java:47)
at
org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46)
at
org.apache.lucene.index.FilterMergePolicy.findMerges(FilterMergePolicy.java:46)
at
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2194)
- locked <0x00007fc04c208750> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2157)
```
</details>
<details>
<summary >Other write thread</summary>
```
"elasticsearch[fdbd:xxx::30_9201][write][T#29]" #403 daemon prio=5 os_prio=0
cpu=62619000.88ms elapsed=5866397.19s tid=0x00007fb3ec115800 nid=0x10f35a
waiting for monitor entry [0x00007fb0e16c4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.lucene.index.IndexWriter.getNextMerge(IndexWriter.java:2225)
- waiting to lock <0x00007fc04c208750> (a
org.apache.lucene.index.IndexWriter)
at
org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:529)
- locked <0x00007fc04c1f88b8> (a
org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2158)
at
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5136)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1597)
at
org.apache.lucene.index.IndexWriter.softUpdateDocument(IndexWriter.java:1654)
```
</details>
I'm thinking about if possible to speed up
`SoftDeletesRetentionMergePolicy#numDeletesToMerge`, Currently, we execute a
search to collect docs that need to be retained with soft delete, then remove
it from `liveDocs`. if most of the documents in the index have been soft
deleted(user deleted or CCR follower lag), perhaps we can consider collecting
the docs that don't need to be retained in soft delete(add a
`reverseRetentionQuerySupplier`?), that's the number of deletes for a merge
would claim.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]