Hi all, I noticed that during merging in an index that contains vector fields, the new segment contains a temporary file with ".vec_temp_N.tmp" extension, which contains all the vectors being merged. This file is used to search for neighbors for the new HNSW graph. It is later deleted, and the segment will contain a ".vec" file with the same vectors. So vectors are copied two times and more space is temporarily needed on disk.
In my index, the ".vec" file is 98% of the index size and the index is many GB. Is it really necessary to have the temp file? Couldn't Lucene query the "vec" file directly? I checked the code around it, one temp file is created per field and the temp file is probably deleted before starting the next field, but still, there is another copy of the vector, so the temp file seems unnecessary. Is there some specific need for the temp file? I might try to do a PR removing the need for it. Viliam