Mike, I believe that the answer to your question is in this PR review
comment: https://github.com/apache/lucene/pull/601#discussion_r783711025.

Merging is currently implemented by looping over fields once, and merging
them. Writing the vec file first would require merging flat vectors for all
fields first, and then doing a second pass over all fields to create their
HNSW graph. This sounds doable, but we never got to it.



On Fri, Jun 27, 2025 at 2:19 PM Michael Sokolov <msoko...@gmail.com> wrote:

> Without this temp file we would need to load the entire set of vectors
> for the new merged segment into RAM in order to support building an
> HNSW graph from it. This way we can read the vectors off the disk in
> the same way we would do during normal searches.  I'm not sure, but I
> think the temp file simply gets renamed into the new segment and
> doesn't have to be physically copied a second time.  It would be good
> to confirm that.
>
> On Thu, Jun 26, 2025 at 4:52 PM Viliam Ďurina <viliam.dur...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I noticed that during merging in an index that contains vector fields,
> the
> > new segment contains a temporary file with ".vec_temp_N.tmp" extension,
> > which contains all the vectors being merged. This file is used to search
> > for neighbors for the new HNSW graph. It is later deleted, and the
> segment
> > will contain a ".vec" file with the same vectors. So vectors are copied
> two
> > times and more space is temporarily needed on disk.
> >
> > In my index, the ".vec" file is 98% of the index size and the index is
> many
> > GB. Is it really necessary to have the temp file? Couldn't Lucene query
> the
> > "vec" file directly? I checked the code around it, one temp file is
> created
> > per field and the temp file is probably deleted before starting the next
> > field, but still, there is another copy of the vector, so the temp file
> > seems unnecessary.
> >
> > Is there some specific need for the temp file? I might try to do a PR
> > removing the need for it.
> >
> > Viliam
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien

Reply via email to