Le 21/12/2016 à 13:27, David Causse a écrit :
But given that some efforts have been done to separate sub scorers
from "top-level" scorers (see
https://issues.apache.org/jira/browse/LUCENE-5487) would it make sense
now to make BulkScorers aware of some time constraints?
Looking a
Hi,
This subject has been discussed in the past but I don't think that any
real solution was implemented yet.
Here is a small test case to illustrate the problem:
https://github.com/nomoa/lucene-solr/commit/2f025b18899038c8606da64c2cf9f4e1f643607f#diff-65ae49ceb38e45a3fc05115be5e61a2dR387
T
unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands
afraid of some cases where
payload instance or data could be buffered and then overwritten by
myself while building the next token.
Thanks for your help.
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-
ateDocument(Query query, Document doc)
Hi,
as updateDocument(Term t, Document d) is just a delete + add, you can
use :
IndexWriter.delete(Query query);
IndexWriter.add(Document d);
Regards.
--
David Causse
Spotter
http://www.spotter.com/
d your two indexed fields in the same Document object.
Regards.
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
use multiple analyzers at index time you'll
have to use multiple analyzers at query time (tricky part of the
process).
Regards.
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...
lector that do the
whole job on a doc by doc basis and not collecting and saving all docs
in a Collection.
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
short lived thread,
(mostly due to a not so smart IW usage, the new NRT Reader helps in this
way).
A good idea would be MergeScheduler implementation that accept an
application controlled thread pool, some sort of
ExecutorServiceMergeScheduler.
Regards.
--
David
ot something like
this :
// goto to the doc with skipTo(int internalId) or next()
// Iterate over positions
for(int i = 0; i < currentTermPos.freq(); i++) {
int p = currentTermPos.nextPosition();
payloadBuffer = currentTermPos.getPayload(payloadBuffer, 0);
...
}
--
#x27;m looking for alternative ways to skin this cat.
>
> Herb
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
with
IW.getReader() overriding the old NRT reader reference with no care...
So I'll take extra care of my NRT reader instances and pool it myself.
Sorry for the noise.
On Mon, Apr 12, 2010 at 12:46:02PM +0200, David Causse wrote:
> Hi,
>
> I found a bug in my application, there was
398bde30b9/indexes/FR/main/_27.cfs
(deleted)
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
> View this message in context:
> http://www.nabble.com/index-reader-for-multiple-indexes-tp25716741p25726159.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
--
David Causse
Spotter
http://www.spotter.com/
---
On Tue, Oct 06, 2009 at 07:51:44PM +0200, Karl Wettin wrote:
>
> 6 okt 2009 kl. 18.54 skrev David Causse:
>
> David, your timing couldn't be better. Just the other day I proposed
> that we deprecate InstantiatedIndexWriter. The sum of the reasons to
> this is that I&
Hi,
Karl prefer to answer on the ml so here is some informations he asked on
how we use InstantiatedIndex.
- Forwarded message from David Causse -
Date: Tue, 6 Oct 2009 15:45:57 +0200
From: David Causse
To: Karl Wettin
Subject: Re: InstatiatedIndex questions
Hi,
sorry for the delay
- Optimize duration : 0ms
4009 [main] DEBUG spotter - next/exportForSort/export
(MATCHES_WITH_OFFSET) average : 139/62 011/287 332 ns, total 6 125 691,
nb (tot/exp) 14/14
4010 [main] DEBUG spotter - Total time spent (14 result(s)) : 7ms
--
David Causse
Spotter
http://www.spotter.com
On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote:
> Hi,
>
> On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote:
> > If I use tika for parsing HTML code and inject parsed String to a lucene
> > analyzer. What about the offset information for KWIC and return
tive array of tika parsed string offsets vs
actual offsets and use a sort of token filter to rectify
OffsetAttribute?
--
David Causse
Spotter
http://www.spotter.com/
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac
Hi,
We noticed this behaviour also, so we do like this :
Map result = new HashMap();
TermEnum all;
if(matcher.fullScan()) {
all = reader.terms(new Term(field));
} else {
all = reader.terms(new Term(field, matcher.prefix()));
}
if(all == null) return result;
Term t;
do {
t = a
Hi,
Searcher and IndexReader use an internal cache, when your searcher is
created the first query is slow cause lucene fills its cache.
We re-use whenever possible searchers and readers instances.
I've heard on this list that it's also a solution to launch warmup
queries just after reader/sear
Hi,
After adding fields, those fields are analyzed and this is the step you
are looking for.
The payloads are stored on each Token, so you need your own Analyzer to
do so.
just use reusableToken.setPayload(myPayLoad) somewhere, look at already
existing analyzers.
In our case we use TokenStream
common
words".
http://www.google.com/support/bin/answer.py?hl=en&answer=981
Hope that answers your questions.
Regards,
Aleks
On Thu, 27 Nov 2008 14:34:00 +0100, David Causse <[EMAIL PROTECTED]>
wrote:
Hi,
Look at this google query :
http://www.google.fr/search?q=%22HOW+at+at
Hi,
Look at this google query :
http://www.google.fr/search?q=%22HOW+at+at+of+a+A+a%22
What do you think about that concerning stop words?
Google has no stop words?
David.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addi
st want to
solve the specific problem: reset all pre-tokenized streams before
they are tokenized in InstantiatedIndexWriter#addDocument and make
TermVectorOffsetInfo implement Serializable.
karl
On Wed, Nov 19, 2008 at 11:00 AM, David Causse <[EMAIL PROTECTED]> wrote:
Hi,
Here are
Hi,
Here are some differences I noticed between InstanciatedIndex and
RAMDirectory :
- RAMDirectory seems to do a reset on tokenStreams the first time, this
permits to initialise some objects before starting streaming,
InstanciatedIndex does not.
- I can Serialize a RAMDirectory but I cannot
27 matches
Mail list logo