Morning.
I noticed a condition choosing sparse and dense format underneath
https://github.com/apache/lucene/blob/6053e1e31378378f6d310a05ea6d7dcdfc45f48b/lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java#L108
perhaps it may achieve your performance requirements.
Hi,
Thanks for the answer!
I think this is similar to my initial implementation, where I built the
query as follows (PyLucene):
def build_query(query):
builder = BooleanQuery.Builder()
for term in torch.nonzero(query):
field_name = to_field_name(term.item())
value = query[
Another way is using postings - you can represent each dimension as a
term (`dim0`, `dim1`, etc) and index those that occur in a document.
To encode a value for a dimension you can either provide a custom term
frequency, or index the term multiple times. Then when searching you
can form a BooleanQu
Hi,
Thanks for the reply.
I haven't tried to do that.
However, I do not fully understand how in this case an inverted index will
be constructed for an efficient search by terms (O(1) for each term as a key
)?
пн, 2 дек. 2024 г. в 21:55, Patrick Zhai :
> Hi, have you tried to encode the sparse v
Hi, have you tried to encode the sparse vector yourself using the
BinaryDocValueField? One way I can think of is to encode it as (size,
index_array, value_array) per doc
Intuitively I feel like this should be more efficient than one dimension
per field if your dimension is high enough
Patrick
On
Hi!
I need to index sparse vectors, whereas as I understand it,
KnnFloatVectorField is designed for dense vectors.
Therefore, it seems that this approach will not work.
вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev :
> Hi,
> May it look like KnnFloatVectorField(... DOT_PRODUCT)
> and KnnFloatVect
With random vectors, the `HnswUtil.components` returns ~76k components on
level 0 (in `HnswGraphBuilder`, line 445, in Lucene 10.0). With first 100k
vectors of SIFT1M, it finds 5 components. Why that happens I don't know, I
don't understand the algorithm enough, I might look into that later, but
fo