Re: Splitting the index

2006-09-28 Thread Rob Young
On Wednesday 27 September 2006 18:51, Erik Hatcher wrote: > Lots of possible issues, but we need more information to troubleshoot > this properly. > How big is your index, number of documents? CDs 137,390 DVDs 41,049 Games 3,360 Books 648,941 Total 830,740 > total fi

Splitting the index

2006-09-27 Thread Rob Young
Hi, I'm using Lucene to search a product database (CDs, DVDs, games and now books). Recently that index has increased in size to over a million items (added books). I have been performance testing our search server and the throughput of requests has dropped significantly, profiling the server i

Re: Searching across spaces

2006-05-11 Thread Rob Young
That sounds like just what I'm looking for. Do you know if this is covered in Lucene in Action or where I can find more information about it. Eric Isakson wrote: You might consider using overlapping bi-gram tokenization with stripped out whitespace and a PhraseQuery. So your tokenized conten

Re: restart interrupted index

2006-03-21 Thread Rob Young
Paulo Silveira wrote: Chris, I really would like only this extra files, but I have the same problem here. If I interrupt my IndexWriter with a kill signal, must of the time I will be left with a lock file AND corrupted index files (the searcher will throw some IllegalStateExceptions after the

Memory Issues

2006-01-17 Thread Rob Young
Hi, I've developed a service which accepts search requests over the network, runs them with Lucene and pumps out results. I have noticed that if I use RAMDirectory the memory usage is much more (more than expected) and it grows as the service is left running. The lucene index is 34Mb but when

List of removed stop words?

2005-10-31 Thread Rob Young
Hi, Is there an easy way to list stop words that were removed from a string? I'm using the standard analyzer on user's searchstrings and I would like to let them know when stop words have been removed (ala google). Any ideas? Cheers Rob ---

Re: StandardTokenizer throws extra exceptions

2005-10-31 Thread Rob Young
Roxana Angheluta wrote: I had the same problem. I solved it by manually editing the file ParseException.java every time when modifying .jj file: import java.io.*; public class ParseException extends IOException { It's not the most elegant way to do it, I'm also interested in a more scalable

StandardTokenizer throws extra exceptions

2005-10-31 Thread Rob Young
Hi, I'm trying to create another, slightly changed, version of StandardAnalyzer. I've coppied out the source, editted the .jj file and re-built the StandardTokenizer class. The problem I am facing is, when I have all this in eclipse it's telling me that the ParseException is not compatible wi

Re: Usage RAMDirectory

2005-10-28 Thread Rob Young
How important is it that the search index be absolutely up to date? I read from a RAMDirectory based index but the actual index is in a FSDirectory. The way I managed it was to have the RAMDirectory periodically (two hourly) reloaded. My data doesn't have to be completely up to date so this wor

Better analysis of hyphenated words

2005-10-27 Thread Rob Young
Hi, I'm using StandardAnalyzer during indexing and I have noticed that it splits hyphenated words in two, ditching the hyphen. This is messing up some of my search results. I would like to keep using StandardAnalyzer because it's very good on the whole, however I would like to add an extra te

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: mark harwood wrote: I'd be more inclined to guess that kylie->klyie falls below the 0.5f similarity threshold you pass. Try print out the results of fuzzyQuery.rewrite(indexReader).toString(); This will rewrite the fuzzyQuery to a BooleanQuery which explicitly l

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
mark harwood wrote: I'd be more inclined to guess that kylie->klyie falls below the 0.5f similarity threshold you pass. Try print out the results of fuzzyQuery.rewrite(indexReader).toString(); This will rewrite the fuzzyQuery to a BooleanQuery which explicitly lists the TermQuery objects that

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: mark harwood wrote: It comes down to your choice of analyzer. Don't forget your "all" field is broken down into discreet terms by your choice of analyzer. Most often, you will want to use the same analyzer at query-time with the query parser to make sure t

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
mark harwood wrote: It comes down to your choice of analyzer. Don't forget your "all" field is broken down into discreet terms by your choice of analyzer. Most often, you will want to use the same analyzer at query-time with the query parser to make sure the user's input matches the stored doc

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Rob Young wrote: Erik Hatcher wrote: On 25 Oct 2005, at 07:35, Rob Young wrote: Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Use the FuzzyQuery constructor that

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Erik Hatcher wrote: On 25 Oct 2005, at 07:35, Rob Young wrote: Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Use the FuzzyQuery constructor that sets this

Re: Funny results with Fuzzy

2005-10-25 Thread Rob Young
Try setting the QueryParser.setFuzzyPrefixLength to 1. That would be a great start. How would I implement that if I'm using FuzzyQuery rather than QueryParser? Cheers Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For ad

Using analyzers with term queries

2005-10-25 Thread Rob Young
Hi, I am using TermQuery s (and FuzzyQuery s) on the searching side and I would like to keep doing so. However, I would like to use the MetaphoneReplacementAnalyzer (from Lucene in Action) when indexing. How can I allow for this in searching if I'm using TermQuery? Thanks Rob --

Funny results with Fuzzy

2005-10-25 Thread Rob Young
Hi, I've just set up a system with lucene to search our product database. I want to have fuzzy searching to help the many seemingly illiterate users I have. Just testing this out and the results are proving a little funny. If I search for the term klyie (hoping for kylie to be almost exclusi

Re: Search on all fields in a document

2005-10-20 Thread Rob Young
Chris Hostetter wrote: There is some advice on this in the FAQ... http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d7 Is there no way to say "search on all searchable fields in every document in the index" without explicitly providing a list of "all s

Re: Join Me

2005-10-20 Thread Rob Young
Ooops, wrong address ... that was supposed to be to [EMAIL PROTECTED] Dan Quaroni wrote: And together we will rule the galaxy as father and son? -Original Message- From: Rob Young [mailto:[EMAIL PROTECTED] Sent: Thursday, October 20, 2005 2:22 PM To: java-user@lucene.apache.org

Join Me

2005-10-20 Thread Rob Young
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Search on all fields in a document

2005-10-20 Thread Rob Young
Hi, If I have a document with a number of fields in it, is there a way to say that I want to search for a term across all those fields without stating the terms explicitly? Detail: I have multiple different types of products (cds, dvds, games, books etc.) each different product has differen

Command Line index browser for 1.4

2005-10-20 Thread Rob Young
Hi, Where can I find a command line index browser like lucli for Lucene 1.4? I tried to use lucli but it's using the 1.3 library and it isn't working. Thanks Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comm