s
> to segments - you could apply this to an existing index. But again,
> this is not really intended for use in a production on-line index that
> receives updates.
>
> On Fri, Oct 15, 2021 at 1:27 PM Alex K wrote:
> >
> > Thanks Adrien. This makes me think I might not be
only indexes the data while index
> sorting requires doc values.
>
> On Fri, Oct 15, 2021 at 6:40 PM Alex K wrote:
>
> > Hi all,
> >
> > Could someone point me to an example of using the
> > IndexWriterConfig.setIndexSort for a field containing binary values?
&g
Hi all,
Could someone point me to an example of using the
IndexWriterConfig.setIndexSort for a field containing binary values?
To be specific, the fields are constructed using the Field(String name,
byte[] value, IndexableFieldType type) constructor, and I'd like to try
using the java.util.Arrays
.de/sites/berlinbuzzwords.de/files/2021-06/The%20future%20of%20Lucene%27s%20MMapDirectory.pdf>,
and his great post about MMapDirectory from a few years ago
<https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>.
Definitely recommended for others.
Thanks,
Alex
On Mon, Jul 5,
ene to run
> a single query over so many indexes.
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Alex K
> > Sent: Monday, July 5, 2021 4:04 AM
ID is a
> typical use
> > case for an inverted index. If you still need to store it as DocValues
> field, just
> > add it with both types.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> >
Hi all,
I'm trying to figure out if there is a way to control the number of
segments in an index without explicitly calling forceMerge.
My use-case looks like this: I need to index a static dataset of ~1
billion documents. I know the exact number of docs before indexing starts.
I know the VM wher
Hi all,
I am curious if there is anything in Lucene that resembles a covering index
(from the relational database world) as an alternative to DocValues for
commonly-accessed values?
Consider the following use-case: I'm indexing docs in a Lucene index. Each
doc has some terms, which are not stored
as possible before flushing.
>
> -Mike
>
> On Wed, May 26, 2021 at 9:43 AM Michael Wechner
> wrote:
> >
> > Hi Alex
> >
> > Thank you very much for your feedback and the various insights!
> >
> > Am 26.05.21 um 04:41 schrieb Alex K:
> > >
NN search algorithms, and we have
> >> been working to make sure the VectorFormat API (might still get
> >> renamed due to confusion with other kinds of vectors existing in
> >> Lucene) can support alternative KNN implementations.
> >>
> >> On Wed, M
There were a couple additions recently merged into lucene but not yet
released:
- A first-class vector codec
- An implementation of HNSW for approximate nearest neighbor search
They are however available in the snapshot releases. I started on a small
project to get the HNSW implementation into the
ow
> > > and ImpactsSource#getImpacts (
> > >
> > >
> >
> https://lucene.apache.org/core/8_6_0/core/org/apache/lucene/index/ImpactsSource.html
> > > ).
> > >
> > > You can look at ImpactsDISI to see how this metadat
; and ImpactsSource#getImpacts (
>
> https://lucene.apache.org/core/8_6_0/core/org/apache/lucene/index/ImpactsSource.html
> ).
>
> You can look at ImpactsDISI to see how this metadata is leveraged in
> practice to turn this metadata into score upper bounds, which is in-turn
> used to skip i
Hi all,
There was some fairly recent work in Lucene to introduce Block-Max WAND
Scoring (
https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf
, https://issues.apache.org/jira/browse/LUCENE-8135).
I've been working on a use-case where I need very efficient top-k scoring
ching 10s to 100s of terms? It seems the bottleneck is in the
PostingsFormat implementation. Perhaps there is a PostingsFormat better
suited for this usecase?
Thanks,
Alex
On Fri, Jul 24, 2020 at 7:59 AM Alex K wrote:
> Thanks Ali. I don't think that will work in this case, since
FWIW, I agree with Michael: this is not a simple problem and there's been a
lot of effort in Elasticsearch and Solr to solve it in a robust way. If you
can't use ES/solr, I believe there are some posts on the ES blog about how
they write/delete/merge shards (Lucene indices).
On Tue, Sep 1, 2020 at
Hi,
Also have a look here:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-9378
Seems it might be related.
- Alex
On Sun, Jul 26, 2020, 23:31 Trejkaz wrote:
> Hi all.
>
> I've been tracking down slow seeking performance in TermsEnum after
> updating to Lucene 8.5.1.
>
> On 8
up in
> Lucene is, but I've previously used https://github.com/npgall/cqengine for
> similar stuff. It provided really good performance, especially if you're
> just counting things.
>
> On Fri, Jul 24, 2020 at 6:55 AM Alex K wrote:
>
> > Hi all,
> >
&
Hi all,
I am working on a query that takes a set of terms, finds all documents
containing at least one of those terms, computes a subset of candidate docs
with the most matching terms, and applies a user-provided scoring function
to each of the candidate docs
Simple example of the query:
- query
Hi Mikhail,
I'm not sure about the state of ANN in lucene proper. Very interested to
see the response from others.
I've been doing some work on ANN for an Elasticsearch plugin:
http://elastiknn.klibisz.com/
I think it's possible to extract my custom queries and modeling code so
that it's elasticse
d
> [3] : https://arxiv.org/abs/1910.10208
>
>
>
>
>
> On Wed, 24 Jun 2020 at 19:47, Alex K wrote:
>
> > Hi Toke. Indeed a nice coincidence. It's an interesting and fun problem
> > space!
> >
> > My implementation isn't specific to any pa
On Wed, Jun 24, 2020 at 8:44 AM Toke Eskildsen wrote:
> On Tue, 2020-06-23 at 09:50 -0400, Alex K wrote:
> > I'm working on an Elasticsearch plugin (using Lucene internally) that
> > allows users to index numerical vectors and run exact and approximate
> > k-nearest
of the speed...
>
> On Tue, Jun 23, 2020 at 8:52 PM Alex K wrote:
> >
> > The TermsInSetQuery is definitely faster. Unfortunately it doesn't seem
> to
> > return the number of terms that matched in a given document. Rather it
> just
> > returns the boost v
n 23, 2020 at 3:17 PM Alex K wrote:
> Hi Michael,
> Thanks for the quick response!
>
> I will look into the TermInSetQuery.
>
> My usage of "heap" might've been confusing.
> I'm using a FunctionScoreQuery from Elasticsearch.
> This gets instantiated with
e there really two heaps? Do you override the standard
> collector?
>
> On Tue, Jun 23, 2020 at 9:51 AM Alex K wrote:
> >
> > Hello all,
> >
> > I'm working on an Elasticsearch plugin (using Lucene internally) that
> > allows users to index numerical vectors
Hello all,
I'm working on an Elasticsearch plugin (using Lucene internally) that
allows users to index numerical vectors and run exact and approximate
k-nearest-neighbors similarity queries.
I'd like to get some feedback about my usage of BooleanQueries and
TermQueries, and see if there are any op
26 matches
Mail list logo