Hi all,
some of the information above was incorrect. This is what happens:
- the source "vec" files are indeed read twice, but for a different reason:
once to calculate the checksum and once to copy the live vectors to the
"vec_temp" file.
- the "vec.tmp" file is then closed for writing and opened
I can confirm the temp file isn't renamed, but it's copied a second time.
I'm on vacation next week.
Dňa pi 27. 6. 2025, 21:24 Michael Sokolov napísal(a):
> Right! Thanks for the pointer. It does seem like there is room for
> improvement then, maybe Viliam wants to tackle it?
>
> On Fri, Jun 27,
Right! Thanks for the pointer. It does seem like there is room for
improvement then, maybe Viliam wants to tackle it?
On Fri, Jun 27, 2025 at 12:57 PM Adrien Grand wrote:
>
> Mike, I believe that the answer to your question is in this PR review
> comment: https://github.com/apache/lucene/pull/601
Mike, I believe that the answer to your question is in this PR review
comment: https://github.com/apache/lucene/pull/601#discussion_r783711025.
Merging is currently implemented by looping over fields once, and merging
them. Writing the vec file first would require merging flat vectors for all
fiel
Without this temp file we would need to load the entire set of vectors
for the new merged segment into RAM in order to support building an
HNSW graph from it. This way we can read the vectors off the disk in
the same way we would do during normal searches. I'm not sure, but I
think the temp file s
Hi all,
I noticed that during merging in an index that contains vector fields, the
new segment contains a temporary file with ".vec_temp_N.tmp" extension,
which contains all the vectors being merged. This file is used to search
for neighbors for the new HNSW graph. It is later deleted, and the seg