RE: Question to the writer of MultiPassIndexSplitter

2010-08-05 Thread Burton-West, Tom
The work on MultiPassIndexSplitter is being done by Andrzej Bialecki, the creator of Luke. See http://lucene-eurocon.org/sessions-track1-day1.html#3 http://lucene-eurocon.org/slides/Munching-&-crunching-Lucene-index-post-processing-and-applications_Andrzej-Bialecki.pdf The slides say "SinglePas

RE: Question to the writer of MultiPassIndexSplitter

2010-08-05 Thread Christopher Condit
> > > I heard work is being done on re-writing MultiPassIndexSplitter so it > > > will be a single pass and work quicker. > > Because that was so slow I just wrote a utility class to create a list of N > > IndexWriters and round robin documents to them as the index is created. > > Then we use a Pa

Re: Boost and ordering based on most recently updated

2010-08-05 Thread Felipe Lobo
You have to check if the lucene normalization isn't approaching diferent boosts to the same value. 2010/8/4 jayendra patil > you can probably try using the sort parameter with the primary sort being > on > score and the secondary sort being on the recent update date. > e.g. sort=score desc, rec

Re: Get fields from a Query object

2010-08-05 Thread Anuj Shah
Apologies Erick, missed your question. I'm on version 3.0 On Thu, Aug 5, 2010 at 11:52 AM, Anuj Shah wrote: > Having delved a bit more into the code it looks like every MultiTermQuery > descendant fails to implement the extractTerms method. This does make sense, > as it is not possible to list

Re: Get fields from a Query object

2010-08-05 Thread Anuj Shah
Having delved a bit more into the code it looks like every MultiTermQuery descendant fails to implement the extractTerms method. This does make sense, as it is not possible to list every term that satisfies a wildcard query. I also notice that most Query classes including the MultiTermQuery's, hav

Re: will load fdx into memory make search faster?

2010-08-05 Thread Michael McCandless
I see. If you made deeper mods to Lucene you could hold this index in a packed ints array (trunk only) and save some RAM. Mike On Thu, Aug 5, 2010 at 5:53 AM, Li Li wrote: > 100 docs per query. Because we want to do collapse and rerank search result. > > 2010/8/5 Michael McCandless : >> This se

Re: will load fdx into memory make search faster?

2010-08-05 Thread Li Li
100 docs per query. Because we want to do collapse and rerank search result. 2010/8/5 Michael McCandless : > This seems like a good idea. > > One simple way to do it is to use FileSwitchDirectory, and host *.fdx > in RAMDirectory.  You can't use compound file format though. > > Though, how many do

Re: how to adjust buffer size of reading file?

2010-08-05 Thread Michael McCandless
Which Directory impl do you use now? EG MMapDir does no buffering in javaland when reading. You can make your own Directory impl, subclassing the one you use now, and calling BufferedIndexInput.setBufferSize on every (certain?), IndexInput/s returned from openInput? Also, the low level API for o

Re: will load fdx into memory make search faster?

2010-08-05 Thread Michael McCandless
This seems like a good idea. One simple way to do it is to use FileSwitchDirectory, and host *.fdx in RAMDirectory. You can't use compound file format though. Though, how many docs are you "typically" retrieving per search? Mike On Thu, Aug 5, 2010 at 3:37 AM, Li Li wrote: > hi all >    we an

will load fdx into memory make search faster?

2010-08-05 Thread Li Li
hi all we analyze system call of lucene and find that the fdx file is always read when we get field values. In my application the fdt is about 50GB and fdx is about 120MB. I think it may be benifit to load fdx into memory just like tii. Anyone else tried this ?