benwtrent opened a new pull request, #13090:
URL: https://github.com/apache/lucene/pull/13090
The initial release of scalar quantization would periodically create a
humongous allocation, which can put unwarranted pressure on the GC & on the
heap usage as a whole.
This commit adjusts this by only allocating a float array of 20*dimensions
and averaging the discovered quantiles from there.
Why does this work?
- Quantiles based on confidence intervals are (generally) unbiased and
doing an average gives statistically good results
- The selector algorithm scales linearly, so the cost is just about the same
- We need to do more than `1` vector at a time to prevent extreme
confidence intervals interacting strangely with edge cases
I benchmarked this over 500k vectors.
candidate
```
Force merge done in: 691533 ms
0.817 0.04 500000 0 16 250 2343 596410 1.00
post-filter
```
baseline
```
Force merge done in: 685618 ms
0.818 0.04 500000 0 16 250 2346 582242 1.00
post-filter
```
100k vectors
candidate
```
0.855 0.03 100000 0 16 250 2207 144173 1.00
post-filter
```
baseline
```
0.858 0.03 100000 0 16 250 2205 141578 1.00
post-filter
```
There does seem to be a slight increase in merge time (these are single
threaded numbers) and a slight change in recall.
But to me, these seem acceptable given we are no longer allocating a
ginormous array.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]