Hi all,

I noticed that during merging in an index that contains vector fields, the
new segment contains a temporary file with ".vec_temp_N.tmp" extension,
which contains all the vectors being merged. This file is used to search
for neighbors for the new HNSW graph. It is later deleted, and the segment
will contain a ".vec" file with the same vectors. So vectors are copied two
times and more space is temporarily needed on disk.

In my index, the ".vec" file is 98% of the index size and the index is many
GB. Is it really necessary to have the temp file? Couldn't Lucene query the
"vec" file directly? I checked the code around it, one temp file is created
per field and the temp file is probably deleted before starting the next
field, but still, there is another copy of the vector, so the temp file
seems unnecessary.

Is there some specific need for the temp file? I might try to do a PR
removing the need for it.

Viliam

Reply via email to