Hi all,
I'm trying to build a (elastic) suggester that uses context in
completionqueries to implement authorization for these suggestions.
Basically, I only want suggestions from the contexts where the user has
rights.
(not sure if this is the best way, suggestions (no pun intended) welcome)
What
rror. Perhaps I'm misunderstanding the way the sampling is done
> and that later case cannot happen.
>
> Marc
>
>
> On Tue, Oct 8, 2024 at 1:27 PM Rob Audenaerde
> wrote:
>
> > Hi Marc,
> >
> > I worked extensively on an application that leveraged fac
Hi Marc,
I worked extensively on an application that leveraged facet counts in
lucene 8 series (and also aggregation by leveraging the facet fields,
albeit with a custom implementation) for documents sets with over 100M
documents. We settled for random sampling of the number of hits was greater
th
>
>
>
> On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde
> wrote:
>
> > Hi Martynas
> >
> > How did you measure that?
> >
> > I ask, because writing a good benchmark is not an easy task, since there
> > are so many factors (class loading times, J
gt; duration
> > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > retrieving any number of documents.
> >
> > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde >
> > wrote:
> >
> > > Hi Martrynas,
> > >
> > > In
le.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
>
> IndexGenerator - creates a dummy index.
> IndexReader - retrieves documents - duration time with 7.5.0 version is
> ~2s, while ~6s with 8.7.0
>
> Regards,
> Martynas
>
>
> On Thu, Jan 21, 2021 at 8:21 PM Rob Au
There is no attachment in the previous email that I can see? Maybe you can
post it online?
On Thu, Jan 21, 2021 at 4:54 PM Martynas L wrote:
> Hello,
>
> Are there any comments on this issue?
> If there is no workaround, we will be forced to rollback to the 7.5.0
> version.
>
> Best regards,
> M
use DocValuesFieldExistsQuery.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 13, 2020 at 7:56 AM Rob Audenaerde
> wrote:
>
>> Hi all,
>>
>> We have implemented some security on our index by adding a field
>> 'groups_al
Hi all,
We have implemented some security on our index by adding a field
'groups_allowed' to documents, and wrap a boolean must query around the
original query, that checks if one of the given user-groups matches at
least one groups_allowed.
We chose to leave the groups_allowed field empty when t
tQuery.
>
> Also beware than IndexSearcher#count will look at index statistics if your
> queries have a single term, which would no longer work if you use this
> query as a filter for another query.
>
> On Tue, Oct 13, 2020 at 12:51 PM Rob Audenaerde
> wrote:
>
> > I reduced
AM Rob Audenaerde
wrote:
> Hello Adrien,
>
> Thanks for the swift reply. I'll add the details:
>
> Lucene version: 8.6.2
>
> The restrictionQuery is indeed a conjunction, it allowes for a document to
> be a hit if the 'roles' field is empty as well. It
the number
> of clauses is less than 16, so I would not expect major performance
> differences between a TermInSetQuery over less than 16 terms and a
> BooleanQuery wrapped in a ConstantScoreQuery.
>
> On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde
> wrote:
>
> > Hello
Hello,
I'm benchmarking an application which implements security on lucene by
adding a multivalue field "roles". If the user has one of these roles, he
can find the document.
I implemented this as a Boolean and query, added the original query and the
restriction with Occur.MUST.
I'm having some
documents and/or field names/contents with extreme
sizes, so we can delete those from the index without needing to re-index
all data.
What would be the best approach for this?
Thanks,
Rob Audenaerde
something?
Thanks in advance.
Rob Audenaerde
Your query can be seen as an inner join:
select t0.* from employee t0 inner join employee t1 on t0.dept_no =
t1.dept_no where t1.email='a...@email.com'
Maybe JoinUtill can help you.
http://lucene.apache.org/core/7_0_0/join/org/apache/lucene/search/join/JoinUtil.html?is-external=true
On Tue, Apr
should consider moving to a
> time-based policy? eg. commit every 10 minutes?
>
> Le mer. 31 janv. 2018 à 10:25, Rob Audenaerde a
> écrit :
>
> > Hi all,
> >
> > We ran the benchmarks (6.6 vs 7.1) with IW info stream and (as attachment
> > cannot be too large)
gt; so they commit very seldom. If the system crashes, the changes are replayed
>> from tranlog since last commit.
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
we
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> > Sent: Monday, January 29, 2018 11:29 AM
> > To
> > where indexing time goes?
> >
> > If you can run with a profiler, this might also give useful information.
> >
> > Le jeu. 18 janv. 2018 à 11:23, Rob Audenaerde
> a
> > écrit :
> >
> >> Hi all,
> >>
> >> We recently upgra
increase in indexing time is to be expected as
result of the sparse docvalues change?
Kind regards,
Rob Audenaerde
file and readding them.
>
> Is there an update method, is it better performance than remove then add? I
> was simply removing modified files from the index (which doesn't seem to
> take long), and readd them.
>
> On Tue, May 9, 2017 at 9:33 AM Rob Audenaerde
> wrote:
>
Do you update each entire document? (vs updating numeric docvalues?)
That is implemented as 'delete and add' so I guess that will be slower than
clean sheet indexing. Not sure if it is 3x slower, that seems a bit much?
On Tue, May 9, 2017 at 3:24 PM, Kudrettin Güleryüz
wrote:
> Hi,
>
> For a 5.
; literal, i.e. it's case-sensitive but you can send terms.prefix=jo and
> case things properly on the app side.
>
> Best,
> Erick
>
> On Wed, Apr 12, 2017 at 6:33 AM, Rob Audenaerde
> wrote:
> > I have a Lucene (6.4.2) index with about 2-5M documents, and each
>
uthor / John Doe'
'Author / Joan Deville'
...
Are there built-in options to create such an autocomplete? Or do I have to
build it myself?
I prefer not to do a search on all the matching documents and collect
facets for those, because that is not very fast
Any hints?
Thanks in advan
on
> too, e.g. Kafka, so that your application doesn't need to keep track
> of which docs were not yet committed.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 30, 2016 at 8:50 AM, Rob Audenaerde
> wrote:
> > Hi all,
> >
Hi all,
Currently we call commit() many times on our index (about 5M docs, where
some 10.000-100.000 modifications during the day). The commit times
typically get more expensive when the index grows, up to several seconds,
so we want to reduce the number of calls.
(Historically, we had Lucene com
Whoops! You are correct! Sorry 'bout that.
On Fri, Oct 28, 2016 at 1:26 PM, Alan Woodward wrote:
> Hi Rob, I think you posted this to the wrong mailing list?
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 28 Oct 2016, at 12:13, Rob Audenaerde
> wrote:
> >
Hi all,
I have a DataTable which, in onConfigure(), sets a selected item. I want
another (detail) panel, outside of this component, to react on that
selection i.e. set it's visibility and render details of the selected item.
What I see is that the onConfigure() of the detail component is called
B
Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> > Sent: Thursday, June 30, 2016 12:00 PM
> > To: java-user@luce
Hi all,
For increasing the speed of some of my application tests, I want to
re-use/copy a pre-populated RAMDirectory over and over.
I'm on Lucene 6.0.1
It seems an RAMDirectory can be a copy of a FSDirectory, but not of another
RAMDirectory. Also RAMDirectory is not Clonable.
What would be the
Hi Gimantha,
You don't need to store the aggregates and don't need to retrieve
Documents. The aggregates are calculated during collection using the
BinaryDocValues from the facet-module. What I do, is that I need to store
values in the facets using AssociationFacetFields. (for example
FloatAssocia
Hi Simona,
In addition to Ericks' questions:
Are you talking about *search* time or facet-collection time? And how many
results are in your result set?
I have some experience with collecting facets from large results set, these
are typically slow (as they have to retrieve all the relevant facet
Hi Sandeep,
How many threads do you use to do the indexing? The benchmarks of Lucene
are done on >20 threads IIRC.
-Rob
On Tue, Feb 23, 2016 at 8:01 AM, sandeep das wrote:
> Hi,
>
> I've implemented a tool using lucene-5.2.0 to index my CSV files. The tool
> is reading data from CSV files(resi
n't happen? Are you sure?
> >
> > I'll look at the 6.6 GB infoStream to see what it says about the ref
> counts.
> >
> > Did you fix the issue in your app where you're not closing all opened
> > NRT readers?
> >
> > Mike McCandless
> >
ava.com/view_bug.do?bug_id=4724038
> >
> > http://mail-archives.apache.org/mod_mbox/lucene-
> > dev/201509.mbox/%3c55f0461a.2070...@gmail.com%3E
> >
> > hth
> > -will
> >
> >
> >
> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> >
> Hi Rob,
>
> A couple more things:
>
> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>
> Also, can you try your test using NIOFSDirectory instead? Curious if
> that changes things...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
wrong here.
>
> Can you set the (public, static) boolean
> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
> log? This causes IW to log the ref count of each file it's tracking
> ...
>
> I'll also add a bit more verbosity to IW when NRT readers
EF_COUNTS to true, and then re-generate this
> log? This causes IW to log the ref count of each file it's tracking
> ...
>
> I'll also add a bit more verbosity to IW when NRT readers are opened
> and close, for 5.4.0.
>
> Mike McCandless
>
> http://blog.mikem
Hi all,
I'm still debugging the growing-index size. I think closing index readers
might help (work in progress), but I can't really see them holding on to
files (at least, using lsof ). Restarting the application sheds some light,
I see logging on files that are no longer referenced.
What I see i
rom
> the Searcher we get the Reader. After the query you call
> searcherManager.release(searcher). The SearcherManager takes care of the
> rest.
>
> Regards,
>
> Jürgen.
>
>
> Am 10.11.2015 um 13:27 schrieb Rob Audenaerde:
>
>> Hi Jürgen, Michael
>>
&g
i Rob,
>
> we had a similar problem. In our case we had open index readers, that
> blocked the index from merging its segments and thus deleting the marked
> segments.
>
> Regards,
>
> Jürgen.
>
>
> Am 06.11.2015 um 08:59 schrieb Rob Audenaerde:
>
>> Hi wil
On Fri, Nov 6, 2015 at 11:29 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
It's also important to IndexWriter.commit (as well as open new NRT
> readers) periodically or after doing a large set of updates, as that
> lets Lucene remove any old segments referenced by the prior commit
> p
gt; lets Lucene remove any old segments referenced by the prior commit
> point.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 6, 2015 at 2:59 AM, Rob Audenaerde
> wrote:
> > Hi will, others
> >
> > Thanks for you reply,
> >
t; There's some configuration/runtime activities you don't mention And
> you make testing process sound like a mirror of production? (Including
> configuration?)
>
>
> -will
>
>
> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>
>> Hi all,
>>
>>
Hi all,
I'm currently investigating an issue we have with our index. It keeps
getting bigger, and I don't het why.
Here is our use case:
We index a database of about 4 million records; spread over a few hundred
tables. The data consists of a mix of text, dates, numbers etc. We also add
all these
Hi all,
I was wondering about the number of threads to use for indexing.
There is a setting: getMaxThreadStates() in the IndexWriterConfig that
determines how many threads can write to the index simultaneously.
The luceneutil Indexer.java (that is used for the nightly benchmarks),
seems to use
You can write a custom (facet) collector to do this. I have done something
similar, I'll describe my approach:
For all the values that need grouping or aggregating, I have added a
FacetField ( an AssociatedFacetField, so I can store the value alongside
the ordinal) . The main search stays the same
Hi all,
I'm doing some analytics with a custom Collector on a fairly large number
of searchresults (+-100.000, all the hits that return from a query). I need
to retrieve them by a query (so using search), but I don't need any scoring
nor keeping the documents in any order.
When profiling the appl
Hi all,
I'm building an application in which users can add arbitrary documents, and
all fields will be added as facets as well. This allows users to browse
their documents by their own defined facets easily.
However, when the number of documents gets very large, I switch to
random-sampled facets
Hi Jamie,
What is included in the 5 minutes?
Just the call to the searcher?
seacher.search(...) ?
Can you show a bit more of the code you use?
On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote:
> Vitaly
>
> Thanks for the contribution. Unfortunately, we cannot use Lucene's
> pagination function
compile expressions, but the methods should take only double values. So I
>> think it should be some sort of binding, but I'm not sure yet how to do it.
>> Perhaps it should be a name like max_fieldName, which you add a custom
>> Expression to as a binding ... I will try to lo
e multi-valued numeric field, and given that NDV
> is single valued, we went w/ BDV.
>
> If I misunderstood the scenario, I'd appreciate if you clarify it :)
>
> Shai
>
>
> On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde >wrote:
>
> > Hi Shai, all
x is that it lets you look up documents very
> > quickly based on *precomputed* values.
> >
> > -Mike
> >
> >
> > On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
> >
> >> Hi all,
> >>
> >> I'm looking for a way to use multi-values
x is that it lets you look up documents very
> quickly based on *precomputed* values.
>
> -Mike
>
>
>
> On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
>
>> Hi all,
>>
>> I'm looking for a way to use multi-values in a filter.
>>
>> I wa
Hi all,
I'm looking for a way to use multi-values in a filter.
I want to be able to search on sum(field)=100, where field has values in
one documents:
field=60
field=40
In this case 'field' is a LongField. I examined the code in the FieldCache,
but that seems to focus on single-valued fields o
Hi all,
I have a issue using the near real-time search in the taxonomy. I could
really use some advise on how to debug/proceed this issue.
The issue is as follows:
I index 100k documents, with about 40 fields each. For each field, I also
add a FacetField (issues arises both with FacetField as
Fl
57 matches
Mail list logo