Re: NIO2 Directory implementations

2013-03-17 Thread Devon H. O';Dell
2013/3/17 Michael McCandless : > Hi Michael (directly CC'd this time...), > > Maybe you're not subscribed to the list? Your first email got some > responses, eg: > > http://lucene.markmail.org/thread/lrv7miivzmjm3ml5 Indeed, he's not, I didn't auto-subscribe him when putting his message throu

Re: lucene 4 index

2013-02-28 Thread Devon H. O';Dell
2013/2/28 ash nix : > Hi, > > Can anyone please send me document on lucene 4 index format? > Want to know internals of index. It is part of the Lucene documentation. http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/codecs/lucene41/package-summary.html#package_description --dho > -- >

Re: java-user-subscribe

2012-12-17 Thread Devon H. O';Dell
2012/12/17 dokondr : > java-user-subscribe Sorry, I let this message through forgetting that the allow / accept addresses just send the message and don't actually subscribe the user. If you would like to subscribe to the list, please send an email to java-user-subscr...@lucene.apache.org. --dho

Re: Efficient string lookup using Lucene

2012-08-25 Thread Devon H. O';Dell
Seems worth mentioning in partial response to this thread's topics that (almost) regardless of index strategy, lucene performance hinges on number of matched documents per query, not total docs in index. There are other mitigating factors (disk type, ram size, etc), but worst case performance analy

Re: IndexReader#reopen() on externally changed index

2011-10-16 Thread Devon H. O';Dell
In my experience, reopen will find all changes on an index, whether it was modified by the same process or not. If you're replicating over a network, you might need some barrier / lock around the reopen call to make sure the replicated index is complete. Obviously with something as fickle as a netw

Re: How to ignore apostrophes in indexes and queries?

2011-09-12 Thread Devon H. O';Dell
One way to do this is to create an Analyzer and Tokenizer that are used on both index and search side. In the tokenStream method, you return a new normalizing tokenizer; in the Tokenizer, you override the normalize method to ignore apostrophes. --dho 2011/9/12 SBS : > In out situation we need it

Re: No subsearcher in Lucene 3.3?

2011-08-30 Thread Devon H. O';Dell
2011/8/30 Joe MA : > When searching a single collection, no problem.  But if I want to search the > two collections at the same time, I need to know which collection the hit > came from so I can retrieve the base_path from the database.  These > base_paths can be different.  As mentioned, this w

Re: No subsearcher in Lucene 3.3?

2011-08-29 Thread Devon H. O';Dell
2011/8/29 Uwe Schindler : > Why do you need to know the subreader? If you want to get the document's > stored fields, use the MultiReader. > > If you really want to know the subreader, use this: > http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/util/ReaderUtil.html#subReader(int, >

Re: No subsearcher in Lucene 3.3?

2011-08-29 Thread Devon H. O';Dell
2011/8/29 Joseph MarkAnthony : > Greetings, >    In the past (Lucene version 2.x) I successfully used > MultiSearcher.subsearcher() to identify the searchable within a MultiSearcher > to which a hit belonged. > > In moving to Lucene 3.3, MultiSearcher is now deprecated, and I am trying to > crea

Re: Thread locking while merging (ConcurrentMergeScheduler issue?)

2011-08-03 Thread Devon H. O';Dell
For what it's worth, I've seen this happen too (using the stock Lucene 3.3 Java APIs), but it requires me to index many millions of documents, and doesn't start being a really big problem until the indexes get to be closer to 250GB in size. When they reach around 1TB, it will take around an hour fo

Re: Short circuiting Collector

2011-07-20 Thread Devon H. O';Dell
and perhaps my hackish solution will work for you (if you're not already doing this). But indeed on searches returning several million records, it's kind of silly to keep spinning. Kind regards, Devon H. O'Dell > Thanks. > > - Chris >

Re: stop the search

2011-05-22 Thread Devon H. O';Dell
I have my own collector, but implemented this functionality by running the search in a thread pool and terminating the FutureTask running the job if it took longer than some configurable amount of time. That seemed to do the trick for me. (In my case, the IndexReader is explicitly opened readonly,

Re: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed

2011-04-01 Thread Devon H. O';Dell
2011/4/1 Yogesh Dabhi : > Hi > > Concurrently 5 user access same lucene directory for searching document > > That time I got bellow exception > > org.apache.lucene.store.AlreadyClosedException: this IndexReader is > closed > > is there a way to handle such error Use a ReentrantReaderWriterLock aro

Re: a faster way to addDocument and get the ID just added?

2011-03-30 Thread Devon H. O';Dell
2011/3/30 Simon Willnauer : > On Wed, Mar 30, 2011 at 8:14 AM, Li Li wrote: >> merge will also change docid >> all segments' docId begin with 0 > > for all released version this is not true. Before trunk (and I think > its in 3.1 also) merge only merged continuous segments so the actual > per-segm

Re: Should I use MultiSearcher?

2011-03-24 Thread Devon H. O';Dell
2011/3/24 Uwe Schindler : > Don't use MultiSearcher. Instead create a MultiReader around the separate > IndexReaders for each index and pass that MultiReader to a conventional > IndexSearcher as IndexReader. MultiSearcher is very buggy. Could you elaborate on this point at all, Uwe? I'm using Para

Re: ParallelMultisearcher

2011-03-17 Thread Devon H. O';Dell
2011/3/17 Ganesh : > Is this bug https://issues.apache.org/jira/browse/LUCENE-2249 got fixed in > 3.0.3? The linked ticket shows that it was fixed in 3.0.3. --dho > Regards > Ganesh > > - Original Message - > From: "Ganesh" > To: > Sent: Thursday, March 17, 2011 7:03 PM > Subject: Re:

Re: Detecting duplicates

2011-03-05 Thread Devon H. O';Dell
There is a DuplicateFilter class in contrib that works pretty well. 2011/3/5 Grant Ingersoll : > See http://wiki.apache.org/solr/Deduplication.  Should be fairly easy to pull > out if you are doing just Lucene. > > On Mar 5, 2011, at 1:49 AM, Mark wrote: > >> Is there a way one could detect dupli