TermScorer default buffer size

2009-01-06 Thread John Wang
Hi: The default buffer size (for docid,score etc) is 32 in TermScorer. We have a large index with some terms to have very dense doc sets. By increasing the buffer size we see very dramatic performance improvements. With our index (may not be typical), here are some numbers with buffer

Re: ORs and Ranks

2009-01-06 Thread Walt Stoneburner
Erick, Thanks for taking a moment to address my question. I suspect the confusion expressed in the answer was from a slight transcription error that added additional punctuation. In your reply, the query was expressed using fields (note the use of extra use of colons that changes the query m

Re: java.io.IOException: read past EOF non-corrupt index

2009-01-06 Thread Erick Erickson
I guess my first question, based on your statement that you ran checkindex from a different machine would be whether you have the same version of Lucene installed on both machines? And how did you get your index where it is now? did you optmize it in place or did you optimize it somewhere else and

java.io.IOException: read past EOF non-corrupt index

2009-01-06 Thread 1world1love
Greetings all. I have an index that I have optimized and when I try to open the index I get this: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java) at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInte

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-06 Thread Glen Newton
- Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/ - Paper: Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/ifi-2007.02.pdf - FastSimilarSearch.java http://fastss.csg.uzh.ch/FastSimilarSearch.java - Paper: Fast Similarity Search in Peer-to-Peer Networks

fastssfuzzy code

2009-01-06 Thread Robert Muir
https://issues.apache.org/jira/browse/LUCENE-1513 sorry for the mess... its pretty much untested since i'm doing some pretty wierd stuff with fastss algorithm. i hooked it to commons-lang levenshtein for simplicity... i'm pretty busy at work so if someone wants to improve it that would be helpful

Re: FastSSFuzzy for faster fuzzy queries in Lucene

2009-01-06 Thread Robert Muir
hi, yes, the results that come back from the lucene index i verify at runtime before expanding the query. i considered trying to store delete positions as payloads or something but fastssWC is good enough for me. i'll see about posting my code today. On Tue, Jan 6, 2009 at 4:52 AM, Thomas Bocek