Re: Scoring by number of terms in field

2006-01-09 Thread Eric Jain
Paul Elschot wrote: For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East and Africa" 4. subtitle:"Fairy Tales from Europe" Perhaps with this query (assuming the default implicit OR): title:europe subtitle:europe^

RE: Basic question on opening 2 IndexWriters on same Directory - I do not get IOException ...

2006-01-09 Thread Koji Sekiguchi
Hello Dick, Why you couldn't get an IOException when obtaining the second writer because you used IndexWriter(String,Analyzer,boolean) version constructor. Try IndexWriter(Directory,Analyzer,boolean) version instead: To do it, add the following code on your program: import org.apache.lucene.stor

Re: Lock obtain timed out + IndexSearcher

2006-01-09 Thread Yonik Seeley
Lock files aren't contained in the index directory, but in the standard temp directory. remove the file referenced in the exception: C:\DOCUME~1\harini\LOCALS~1\Temp\lucene-1b92bc48efc5c13ac4ef4ad9fd17c158-commit.lock -Yonik On 1/9/06, Harini Raghavan <[EMAIL PROTECTED]> wrote: > Hi All, > > All

Re: highlighting phrases

2006-01-09 Thread Erik Hatcher
On Jan 9, 2006, at 1:16 PM, Harini Raghavan wrote: I am using the highlighter package to highlight my search results. The query I am passing to the Highlighter is: +(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple Computer" Title:"Apple Comp") But the Highlighter is highlighting

Re: Scoring by number of terms in field

2006-01-09 Thread Erik Hatcher
Sorry for the quick reply, but yes you can accomplish this by tweaking a custom Similarity implementation (or DefaultSimilarity subclass). Check out IndexSearcher.explain on a query and a document and then tinker. Erik On Jan 9, 2006, at 4:34 AM, Eric Jain wrote: Lucene seems to

Re: Lock obtain timed out + IndexSearcher

2006-01-09 Thread Otis Gospodnetic
Probably a stale lock - remove it. Otis - Original Message From: Harini Raghavan <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Mon 09 Jan 2006 01:36:53 PM EST Subject: Lock obtain timed out + IndexSearcher Hi All, All of a sudden I have started getting LockTimeOut exception

Re: Scoring by number of terms in field

2006-01-09 Thread Paul Elschot
On Monday 09 January 2006 10:34, Eric Jain wrote: > Lucene seems to prefer matches in shorter documents. Is it possible to > influence the scoring mechanism to have matches in shorter fields score > higher instead? A query is always in at least one field of a document. > > For example, a query

Lock obtain timed out + IndexSearcher

2006-01-09 Thread Harini Raghavan
Hi All, All of a sudden I have started getting LockTimeOut exception while searching the index. There is no write.lock file in the index directory, so why should this issue come while searching? I tried to delete the index directory and restarted the server, but still no luck. What could be w

highlighting phrases

2006-01-09 Thread Harini Raghavan
Hi All, I am using the highlighter package to highlight my search results. The query I am passing to the Highlighter is: +(Content:"Apple Computer" Content:"Apple Comp") +(Title:"Apple Computer" Title:"Apple Comp") But the Highlighter is highlighting even occurances of terms 'Computer'/'Comp'.

Re: top n words within a results set?

2006-01-09 Thread Chris Brown
Okay great! Thanks for the quick response and pointing me in the right direction. I'll go get out my Lucene in Action book ;) and learn all about term vectors. - Original Message - From: "Grant Ingersoll" <[EMAIL PROTECTED]> To: Sent: Monday, January 09, 2006 12:34 PM Subject: Re: to

Re: top n words within a results set?

2006-01-09 Thread Grant Ingersoll
You could use term vectors to accomplish this. Get your hits for the website, then load the term vector for the field containing the keywords and add up the frequencies Chris Brown wrote: Hello, Is it possible to retrieve the top 'n' most often appearing words within a search criteria? I'v

Re: Finding similar documents

2006-01-09 Thread Grant Ingersoll
I believe there is a MoreLikeThis class floating around somewhere (I think it is in the contrib/similarity package). The Lucene book also has a good example, and I have some examples at http://www.cnlp.org/apachecon2005 that demonstrate using term vectors to do this Klaus wrote: Hi, is th

top n words within a results set?

2006-01-09 Thread Chris Brown
Hello, Is it possible to retrieve the top 'n' most often appearing words within a search criteria? I've seen the High Frequency Terms code in the sandbox but it works across the whole index. To put this question into context: We're developing website that hosts a user's photo website. Searches

Finding similar documents

2006-01-09 Thread Klaus
Hi, is there are build-in method for finding similar documents to one given document? Thx, Klaus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Deleting a Document

2006-01-09 Thread Harini Raghavan
Hi Koji, Thanks for the suggestion. It worked when I closed the reader before refreshing the IndexSearcher instance. Harini Koji Sekiguchi wrote: Hi Harini, I meant you close the reader first, then get a new searcher. regards, Koji -Original Message- From: Harini Raghavan [m

RE: WordNet alternatives

2006-01-09 Thread Yilmazel, Sibel
We currently use AltaVista. It uses its own dictionary to expand queries. For Lucene, we would like to experiment with different dictionaries to see if the precision improves. WordNet is one alternative, that we know of, with Lucene and I was wondering if there are any other dictionaries for Luc

RE: online incremental indexing

2006-01-09 Thread Vanlerberghe, Luc
You can open a new instance *before* closing the previous one. If you have queries that occur often, you can 'warm up' the new instance before starting to use it. Just make sure you don't close an IndexSearcher instance that is in use by Hits instances... Luc -Original Message- From: zzz

Scoring by number of terms in field

2006-01-09 Thread Eric Jain
Lucene seems to prefer matches in shorter documents. Is it possible to influence the scoring mechanism to have matches in shorter fields score higher instead? For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East a

Re: how to forbid prefetching found Documents?

2006-01-09 Thread Leos Literak
Yonik Seeley wrote: > On 1/7/06, Leos Literak <[EMAIL PROTECTED]> wrote: > >>Yonik, I want to display 120th. up to 150th. document >>in Hits. Do you mean that Hits does not contain id >>of all relevant documents? > > > Correct, it does not. The first time Hits is returned to you, it will > inte