Re: Temporary vector file during merging

2025-07-22 Thread Viliam Ďurina
Hi all, some of the information above was incorrect. This is what happens: - the source "vec" files are indeed read twice, but for a different reason: once to calculate the checksum and once to copy the live vectors to the "vec_temp" file. - the "vec.tmp" file is then closed for writing and opened

Re: Temporary vector file during merging

2025-06-27 Thread Viliam Ďurina
I can confirm the temp file isn't renamed, but it's copied a second time. I'm on vacation next week. Dňa pi 27. 6. 2025, 21:24 Michael Sokolov napísal(a): > Right! Thanks for the pointer. It does seem like there is room for > improvement then, maybe Viliam wants to tackle it? > > On Fri, Jun 27,

Re: Temporary vector file during merging

2025-06-27 Thread Michael Sokolov
Right! Thanks for the pointer. It does seem like there is room for improvement then, maybe Viliam wants to tackle it? On Fri, Jun 27, 2025 at 12:57 PM Adrien Grand wrote: > > Mike, I believe that the answer to your question is in this PR review > comment: https://github.com/apache/lucene/pull/601

Re: Temporary vector file during merging

2025-06-27 Thread Adrien Grand
Mike, I believe that the answer to your question is in this PR review comment: https://github.com/apache/lucene/pull/601#discussion_r783711025. Merging is currently implemented by looping over fields once, and merging them. Writing the vec file first would require merging flat vectors for all fiel

Re: Temporary vector file during merging

2025-06-27 Thread Michael Sokolov
Without this temp file we would need to load the entire set of vectors for the new merged segment into RAM in order to support building an HNSW graph from it. This way we can read the vectors off the disk in the same way we would do during normal searches. I'm not sure, but I think the temp file s

Temporary vector file during merging

2025-06-26 Thread Viliam Ďurina
Hi all, I noticed that during merging in an index that contains vector fields, the new segment contains a temporary file with ".vec_temp_N.tmp" extension, which contains all the vectors being merged. This file is used to search for neighbors for the new HNSW graph. It is later deleted, and the seg