Re: NGramTokenizer stops working after about 1000 terms

2010-01-03 Thread Otis Gospodnetic
This actually rings a bell for me... have a look at Lucene's JIRA, I think this was reported as a bug once and perhaps has been fixed. Note that Lucene in Action 2 has a case study that talks about searching source code. You may find that study interesting. Otis -- Sematext -- http://sematext

Re: Copy and augment an indexed Document

2010-01-03 Thread Karl Wettin
31 dec 2009 kl. 02.19 skrev Erick Erickson: It is possible to reconstruct a document from the terms, but it's a lossy process. Luke does this (you can see from the UI, and the code is available). There's no utility that I know of to make this easy. http://svn.apache.org/repos/asf/lucene/java/

Re: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Yonik Seeley
On Sun, Jan 3, 2010 at 10:42 AM, Karl Wettin wrote: > > 3 jan 2010 kl. 16.32 skrev Yonik Seeley: > >> Perhaps this is just a huge index, and not enough of it can be cached in >> RAM. >> Adding additional clauses to a boolean query incrementally destroys >> locality. >> >> 104GB of index and 4GB of

Re: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Karl Wettin
3 jan 2010 kl. 16.32 skrev Yonik Seeley: Perhaps this is just a huge index, and not enough of it can be cached in RAM. Adding additional clauses to a boolean query incrementally destroys locality. 104GB of index and 4GB of RAM means you're going to be hitting the disk constantly. You need

Re: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Yonik Seeley
Perhaps this is just a huge index, and not enough of it can be cached in RAM. Adding additional clauses to a boolean query incrementally destroys locality. 104GB of index and 4GB of RAM means you're going to be hitting the disk constantly. You need more hardware - if you're requirements are low (

Re: about optimize() quetion ,Looking forward to hearing from you soon! Thank you in advance!

2010-01-03 Thread Karl Wettin
3 jan 2010 kl. 13.33 skrev luocanrao: 1、if the readers do not call re-open, segment file the readers will see is after merged or before merged when optimize() done 2、when old segment file on disk is removed,if old segment files are removed after optimize() done at once, How can the read

about optimize() quetion ,Looking forward to hearing from you soon! Thank you in advance!

2010-01-03 Thread luocanrao
If some but not all readers re-open while an optimize is underway, this will cause > 2X temporary space to be consumed as those new readers will then hold open the partially optimized segments at that time. It is best not to re-open readers while optimize is running my question is: 1、i

RE: NumericRangeQuery performance with 1/2 billion documents in the index

2010-01-03 Thread Uwe Schindler
Hi Kumanan, Just for completeness: Have you tried out how long the NRQ takes without the BooleanQuery? If it is also fast, then there is indeed a problem with the BQ. You measure the time that the search method needs to e.g. return the n top matching docs? Or do you iterate over all results? Uwe