Re: TimeLimitingCollector accuracy

2016-12-22 Thread David Causse
Le 21/12/2016 à 13:27, David Causse a écrit : But given that some efforts have been done to separate sub scorers from "top-level" scorers (see https://issues.apache.org/jira/browse/LUCENE-5487) would it make sense now to make BulkScorers aware of some time constraints? Looking a

TimeLimitingCollector accuracy

2016-12-21 Thread David Causse
Hi, This subject has been discussed in the past but I don't think that any real solution was implemented yet. Here is a small test case to illustrate the problem: https://github.com/nomoa/lucene-solr/commit/2f025b18899038c8606da64c2cf9f4e1f643607f#diff-65ae49ceb38e45a3fc05115be5e61a2dR387 T

Re: what is the offsets and payload in DocsAndPositionsEnum for ??

2012-11-27 Thread David Causse
unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands

Payload instance and byte buffer instance re-use

2011-12-14 Thread David Causse
afraid of some cases where payload instance or data could be buffered and then overwritten by myself while building the next token. Thanks for your help. -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-

Re: Update Document based on Query instead of Term

2011-04-13 Thread David Causse
ateDocument(Query query, Document doc) Hi, as updateDocument(Term t, Document d) is just a delete + add, you can use : IndexWriter.delete(Query query); IndexWriter.add(Document d); Regards. -- David Causse Spotter http://www.spotter.com/

Re: using 2 different Analyzer for indexing ?

2011-03-25 Thread David Causse
d your two indexed fields in the same Document object. Regards. -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-22 Thread David Causse
use multiple analyzers at index time you'll have to use multiple analyzers at query time (tricky part of the process). Regards. -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...

Re: FieldSelector with Lucene 2.3.2

2011-03-17 Thread David Causse
lector that do the whole job on a doc by doc basis and not collecting and saving all docs in a Collection. -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Is ConcurrentMergeScheduler useful for multiple running IndexWriter's?

2011-03-11 Thread David Causse
short lived thread, (mostly due to a not so smart IW usage, the new NRT Reader helps in this way). A good idea would be MergeScheduler implementation that accept an application controlled thread pool, some sort of ExecutorServiceMergeScheduler. Regards. -- David

Re: determining the type of a term - retrieving a payload

2010-10-15 Thread David Causse
ot something like this : // goto to the doc with skipTo(int internalId) or next() // Iterate over positions for(int i = 0; i < currentTermPos.freq(); i++) { int p = currentTermPos.nextPosition(); payloadBuffer = currentTermPos.getPayload(payloadBuffer, 0); ... } --

Re: How to get the tokens for a given document

2010-04-12 Thread David Causse
#x27;m looking for alternative ways to skin this cat. > > Herb -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Too many open files

2010-04-12 Thread David Causse
with IW.getReader() overriding the old NRT reader reference with no care... So I'll take extra care of my NRT reader instances and pool it myself. Sorry for the noise. On Mon, Apr 12, 2010 at 12:46:02PM +0200, David Causse wrote: > Hi, > > I found a bug in my application, there was

Too many open files

2010-04-12 Thread David Causse
398bde30b9/indexes/FR/main/_27.cfs (deleted) -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: index reader for multiple indexes

2009-12-09 Thread David Causse
> View this message in context: > http://www.nabble.com/index-reader-for-multiple-indexes-tp25716741p25726159.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Getting left and right offsets of term search results

2009-10-09 Thread David Causse
--- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- David Causse Spotter http://www.spotter.com/ ---

Re: InstantiatedIndex questions

2009-10-08 Thread David Causse
On Tue, Oct 06, 2009 at 07:51:44PM +0200, Karl Wettin wrote: > > 6 okt 2009 kl. 18.54 skrev David Causse: > > David, your timing couldn't be better. Just the other day I proposed > that we deprecate InstantiatedIndexWriter. The sum of the reasons to > this is that I&

Forwarded: InstantiatedIndex questions

2009-10-06 Thread David Causse
Hi, Karl prefer to answer on the ml so here is some informations he asked on how we use InstantiatedIndex. - Forwarded message from David Causse - Date: Tue, 6 Oct 2009 15:45:57 +0200 From: David Causse To: Karl Wettin Subject: Re: InstatiatedIndex questions Hi, sorry for the delay

InstantiatedIndex feedback

2009-10-05 Thread David Causse
- Optimize duration : 0ms 4009 [main] DEBUG spotter - next/exportForSort/export (MATCHES_WITH_OFFSET) average : 139/62 011/287 332 ns, total 6 125 691, nb (tot/exp) 14/14 4010 [main] DEBUG spotter - Total time spent (14 result(s)) : 7ms -- David Causse Spotter http://www.spotter.com

Re: Use of tika for parsing, offsets questions

2009-09-04 Thread David Causse
On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote: > Hi, > > On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > > If I use tika for parsing HTML code and inject parsed String to a lucene > > analyzer. What about the offset information for KWIC and return

Use of tika for parsing, offsets questions

2009-09-02 Thread David Causse
tive array of tika parsed string offsets vs actual offsets and use a sort of token filter to rectify OffsetAttribute? -- David Causse Spotter http://www.spotter.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: IndexReader.Terms - internals

2009-05-11 Thread David Causse
Hi, We noticed this behaviour also, so we do like this : Map result = new HashMap(); TermEnum all; if(matcher.fullScan()) { all = reader.terms(new Term(field)); } else { all = reader.terms(new Term(field, matcher.prefix())); } if(all == null) return result; Term t; do { t = a

Re: First request for search is taking longer time and subequent requests are very fast

2009-03-23 Thread David Causse
Hi, Searcher and IndexReader use an internal cache, when your searcher is created the first query is slow cause lucene fills its cache. We re-use whenever possible searchers and readers instances. I've heard on this list that it's also a solution to launch warmup queries just after reader/sear

Re: Payload Question

2008-12-15 Thread David Causse
Hi, After adding fields, those fields are analyzed and this is the step you are looking for. The payloads are stored on each Token, so you need your own Analyzer to do so. just use reusableToken.setPayload(myPayLoad) somewhere, look at already existing analyzers. In our case we use TokenStream

Re: [OT] About stopwords

2008-11-27 Thread David Causse
common words". http://www.google.com/support/bin/answer.py?hl=en&answer=981 Hope that answers your questions. Regards, Aleks On Thu, 27 Nov 2008 14:34:00 +0100, David Causse <[EMAIL PROTECTED]> wrote: Hi, Look at this google query : http://www.google.fr/search?q=%22HOW+at+at

[OT] About stopwords

2008-11-27 Thread David Causse
Hi, Look at this google query : http://www.google.fr/search?q=%22HOW+at+at+of+a+A+a%22 What do you think about that concerning stop words? Google has no stop words? David. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addi

Re: InstatiatedIndex questions

2008-11-19 Thread David Causse
st want to solve the specific problem: reset all pre-tokenized streams before they are tokenized in InstantiatedIndexWriter#addDocument and make TermVectorOffsetInfo implement Serializable. karl On Wed, Nov 19, 2008 at 11:00 AM, David Causse <[EMAIL PROTECTED]> wrote: Hi, Here are

InstatiatedIndex questions

2008-11-19 Thread David Causse
Hi, Here are some differences I noticed between InstanciatedIndex and RAMDirectory : - RAMDirectory seems to do a reset on tokenStreams the first time, this permits to initialise some objects before starting streaming, InstanciatedIndex does not. - I can Serialize a RAMDirectory but I cannot