bruno-roustant commented on PR #2021: URL: https://github.com/apache/solr/pull/2021#issuecomment-1768027856
Here is my understanding. VersionBucket It serves 2 purposes. First, it is a lock object to lock atomic operations on a doc ID. It is a bucket because it is used to lock operations of all the doc IDs which hash falls into this bucket. Second, it keeps the highest version which is the max of the versions of the docs in the bucket. This is an optimization only when the leader forwards the update request to another replica (there could be an option to not store this long if only the leader updates and never forwards). When adding a doc with version v, to compare first with the highest version vh of the bucket. If v > vh, then we know the doc version is ordered, without having to look in the transaction log or index for the precise indexed doc version. Why 65536 VersionBucket? For both VersionBucket goals, the more buckets there are, the more precise is the locking and the highest optimization. The drawback is the memory usage. In SOLR-XXX, the number of buckets was studied. If there are not enough buckets, it happens that update threads are blocked waiting on the same lock when an update operation takes a long time. With a large number of buckets, the probability of updates locking the same bucket, within the duration of a long operation, is sufficiently low to not impact perf. Can we manage buckets dynamically? For the locking aspect, yes, but there is more synchronization required if we support bucket removal. If we only support lazy creation, we only need to synchronize when creating a bucket. For the highest version optimization, partially, we cannot remove the highest value if it is different than the common "seed" value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org