recall/precision with lucene

2008-02-08 Thread Panos Konstantinidis
Hello I am a new lucene user. I am trying to calculate the recall/precision of a query and I was wondering if lucene provides an easy way to do it. Currently I have a number of documents that match a given query. Then I am doing a search and I am getting back all the Hits. I then divide the numbe

Re: Faceting with payloads

2008-02-08 Thread Matt Ronge
On Feb 8, 2008, at 11:17 AM, Karl Wettin wrote: 6 feb 2008 kl. 23.10 skrev Matt Ronge: I may index the token "house" maybe found in different places with different types. If the user query contains house, I want to report the number of instances of the token house of type A, type B and so

Re: IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread Michael McCandless
It's complicated. In 2.3, you can use IW.flush to write docs to disk. But that method will be deprecated in 2.4 and replaced with commit. Or, you can close. If application (jvm) dies or killed, the index will be fine but won't have any un-flushed buffered docs. If machine dies (os cras

RE: IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread spring
OK, so there is nothing in 2.3 besides IndexWriter.close to ensure that the docs are written to disk and that the index will survive an application / machine death? > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 19:34 > To: java

Re: IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread Michael McCandless
Well ... every time the RAM buffer is full, a new segment is flushed to the Directory, but that is not necessarily a "commit" in that an IndexReader would see the new segment, nor, that the segment would survive if the machine suddenly crashed. You should't rely on when specifically IndexWriter m

Re: MultiFieldQueryParser question

2008-02-08 Thread Chris Hostetter
: Subject: MultiFieldQueryParser question : Date: Fri, 8 Feb 2008 14:53:37 - : Message-ID: : <[EMAIL PROTECTED]> : In-Reply-To: <[EMAIL PROTECTED]> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please

Re: Faceting with payloads

2008-02-08 Thread Grant Ingersoll
On Feb 8, 2008, at 12:17 PM, Karl Wettin wrote: 6 feb 2008 kl. 23.10 skrev Matt Ronge: I may index the token "house" maybe found in different places with different types. If the user query contains house, I want to report the number of instances of the token house of type A, type B and so

RE: MultiFieldQueryParser question

2008-02-08 Thread Mitchell, Erica
Solved it, for those who were wondering where I went wrong I've built up an ArrayList while adding my attributes to a lucence doc and updated my multiquerySearchParser to contain all these attribute names as follows Object[] attributeNamesArray = (Object[]) attrList.toArray(); String[

Re: Faceting with payloads

2008-02-08 Thread Karl Wettin
6 feb 2008 kl. 23.10 skrev Matt Ronge: I may index the token "house" maybe found in different places with different types. If the user query contains house, I want to report the number of instances of the token house of type A, type B and so on. Should I be using payloads for this? If so

Re: Distributed Indexes

2008-02-08 Thread Ruslan Sivak
The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances running on two servers for load balancing. Each app does the searches, and the search times are small, the index is small, but it takes a long time to fully create the index

IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread spring
Hi, if I understand this property correctly every time the ram buffer is full it gets automaticaly written to disk. Something like a commit in a database. Thus if my application dies, all docs in the buffer get lost. Right? If so, is there any event/callback etc. which informs my application that

large term vectors

2008-02-08 Thread marc.dumontier
Hi, I have a large index which is around 275GB. As I search different parts of the index, the memory footprint grows with large byte arrays being stored. They never seem to get unloaded or GC'ed. Is there any way to control this behavior so that I can periodically unload cached information?

RE: Which analyzer

2008-02-08 Thread spring
OK, I will try it. Thank you. > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 14:25 > To: java-user@lucene.apache.org > Subject: Re: Which analyzer > > WhitespaceAnalyzer should do the trick. Give it a try... > > My point was that

Re: Performance guarantees and index format

2008-02-08 Thread Doron Cohen
I was once involved in modified a search index implementation (not Lucene) to write posting lists so that they can be traversed (only) in reverse order. Docids were preserved but you got higher IDs first. This was a non-trivial code change. Now the suggestion to (optionally) order merged segments

MultiFieldQueryParser question

2008-02-08 Thread Mitchell, Erica
Hi, I'm trying to build up an index of fields to represent an org.eclipse.emf.ecore.EObject; So I'm adding these fields to my lucene doc doc.add(new Field(NAME, cls.getName(), Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field(IDENTITY, obj.eGet(cls.getEIDAttribute()).toString(), Fiel

Re: Which analyzer

2008-02-08 Thread Erick Erickson
WhitespaceAnalyzer should do the trick. Give it a try... My point was that RangeQuerys wouldn't work very well, but since you're not trying to do that, WhitespaceAnalyzer should handle your case. Erick On Feb 8, 2008 4:40 AM, <[EMAIL PROTECTED]> wrote: > Hello, > > lets say the document contain

Re: Lucene syntax query matched against a string content

2008-02-08 Thread Erick Erickson
You might want to check out MemoryIndex before rejecting putting a single doc in memory and searching against it. It's quite fast, although whether it'll work in your situation only measurement will tell. It's in contrib as I remember. Erick On Feb 7, 2008 11:48 PM, Nilesh Bansal <[EMAIL PROTECTE

RE: Searching the backlog of mailing list for lucene java users

2008-02-08 Thread Steven A Rowe
Hi Erica, Another good place to look is at the FAQ: Steve On 02/08/2008 at 8:10 AM, Grant Ingersoll wrote: > http://wiki.apache.org/lucene-java/MailingListArchives has a variety > of options (although the readlist one is not listed) > > On Feb

Re: Searching the backlog of mailing list for lucene java users

2008-02-08 Thread Grant Ingersoll
http://wiki.apache.org/lucene-java/MailingListArchives has a variety of options (although the readlist one is not listed) On Feb 8, 2008, at 6:31 AM, Mitchell, Erica wrote: Hi, I've found this link to trawl through the backlog of questions and answers from this mailing list. I'm new to luc

Searching the backlog of mailing list for lucene java users

2008-02-08 Thread Mitchell, Erica
Hi, I've found this link to trawl through the backlog of questions and answers from this mailing list. I'm new to lucene and don't want to send questions that have already been answered in the list. Is there other link to search previous entries rather than clicking the the Prev and Next links lo

RE: Which analyzer

2008-02-08 Thread spring
Hello, lets say the document contains 01.02.1999 and 152,45 Then I want to search for: 01.02.1999 AND 152,45 01.02.1999 152,45 1999 152 Thank you. > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 00:20 > To: java-user@lucene.apa

Re: Lucene syntax query matched against a string content

2008-02-08 Thread Paul Elschot
Without using a RAMDirectory index it would be necessary to implement all Scorers used by the query directly top of the token stream that normally goes into the index. This is possible, but Lucene is not designed to do this, so it won't be easy. But especially for more preparsed queries against a