Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

via GitHub Fri, 28 Mar 2025 08:33:36 -0700


tveasey commented on PR #14420:
URL: https://github.com/apache/lucene/pull/14420#issuecomment-2761706401


   I discussed this a bit with Mayya.
   
   So the underlying issue is how we account for the gain for degree 1 
vertices. We say that the gain contribution from any vertex is at least 2, i.e. 
$\max\left(2, \frac{|N_s(v)|}{4}\right)$. This forces us to insert degree 1 
vertices which was intentional. The problem is we can get at most +1 
contribution from them to $g_{tot}$. Mayya tried it and when we correct the 
gain to $\min\left(\max\left(2, \frac{|N_s(v)|}{4}\right), |N_s(v)|\right)$ we 
terminate before exhausting the heap, as expected.
   
   Arguably this is slightly more conceptually correct. However, it represents 
a change in behaviour and invalidates the testing Mayya's done. The current 
behaviour increases the relative size of $J$ for graphs with many degree one 
vertices. This seems conceptually ok. In the case the heap is exhausted we've 
just picked $J = V$ the set of all vertices and reverted to the original merge 
procedure which is clearly harmless. However, this is going to be an extreme 
edge case since we normally only have a small fraction of degree one vertices 
and we'll nearly always terminate on $g_{tot}<g_{exit}$.
   
   Therefore, my vote is to just go with Ben's fix and leave the $g_{tot}$ 
calculation as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] For hnsw merger, do not pop from empty heap [lucene]

Reply via email to