Re: efficiently finding all terms used on a particular field within Documents matching a query

2005-11-09 Thread Chris Hostetter
: For example I would like to find the set of terms used within a particular : date range, where all Documents have a date field on them. What I've done : to date is simply perform a query to find all Documents that match the : date range query, then iterate over each one and construct a Set of al

Re: Ask about method QueryParser.parser

2005-11-09 Thread Hai Do Thanh
Thanks for your reply :) I have already debugged the input string s. As I said before, s is a string which is sent by client through the "doPost()" method of servlet At first, I thought that the analyzer is the cause of the problem and that it lowercase all leters. However, then, I have also kno

Re: Sorting: string vs int

2005-11-09 Thread Yonik Seeley
The FieldCache (which is used for sorting), uses arrays of size maxDoc() to cache field values. String sorting will involve caching a String[] (or StringIndex) and int sorting will involve caching an int[]. Unique string values are shared in the array, but the String values plus the String[] will

Re: Search Help

2005-11-09 Thread Erik Hatcher
On 9 Nov 2005, at 19:54, [EMAIL PROTECTED] wrote: Is there a way to limit the number of hits I want returned? Sometimes I just want one document. Is there an issue with just accessing hits.doc(0) in this case? Erik

Sorting: string vs int

2005-11-09 Thread Monsur Hossain
Hi all. I have a question about sorting. Lucene in Action says: "For numeric types, each field being sorted for each document in the index requires that four bytes be cached. For String types, each unique term is also cached for each document." I want to make sure I'm understanding this correct

Re: Memory Usage

2005-11-09 Thread Marvin Humphrey
On Nov 9, 2005, at 4:48 PM, Daniel Noll wrote: My question is: is this 1/128 figure set in stone, or can it be changed without major consequences? You want indexInterval. Here's an excerpt from the docs in TermInfosWriter. // TODO: the default values for these two parameters // shoul

Search Help

2005-11-09 Thread Daniel . Clark
Is there a way to limit the number of hits I want returned? Sometimes I just want one document. ~ Daniel Clark, Senior Consultant Sybase Federal Professional Services 6550 Rock Spring Drive, Suite 800 Bethesda, MD 20817 Office - (301) 896-1103 Office Fax

Memory Usage

2005-11-09 Thread Daniel Noll
Hi. What is the expected memory usage of Lucene these days? I dug up an old email [1] from 2001 which gave the following summary of memory usage: An IndexReader requires: one byte per field per document in index (norms) one open file per file in index 1/128 of the Terms in the index a T

efficiently finding all terms used on a particular field within Documents matching a query

2005-11-09 Thread Matt Magoffin
I've used the example posted at http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-a801793d7479264e29157d92440199b35266dc18 to find all terms used in a complete index, but was wondering if there is an efficient way to find all terms used within only a set of Documents matching a query? For examp

Encountered "" using queryparser in XSP

2005-11-09 Thread Tricia Williams
Hi All, I'm using an html form to send a query to an xsp which uses lucene to search and then returns the results as xml. Perhaps some one has experienced the problem that I'm currently experiencing. When the query is parsed org.apache.lucene.queryParser.ParseException is thrown stating that

Re: going from Document -> IndexReader's docid

2005-11-09 Thread tlittell
Ahh, thank you very much. That's exactly what I needed, I just didn't see that in the API. cheers, Todd > The question is, how did you get that Document? If you got it from > Hits, you can get the document id from Hits.id(hit_num). > > Erik > > > > On 9 Nov 2005, at 11:13, Yonik Seeley w

Re: A lot of short documents, optimal query?

2005-11-09 Thread Chris Hostetter
: ( : +( : (+raimonds +marschan) : (+raimonds +marschol) : (+raimonds +marschel) : (+raimonds +marschalfr) : (+raimonds +marschalek) : (+raimonds +marscha) : ... : ) : +(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245* : ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242* : ZIPS:243* ZIPS:254* ZIPS:253

Re: going from Document -> IndexReader's docid

2005-11-09 Thread Erik Hatcher
The question is, how did you get that Document? If you got it from Hits, you can get the document id from Hits.id(hit_num). Erik On 9 Nov 2005, at 11:13, Yonik Seeley wrote: There really isn't a generic way... you have to search for the document. If you have a unique id field in yo

A lot of short documents, optimal query?

2005-11-09 Thread eks dev
Hi all, Can somebody please suggest a way/ways on how to optimize execution times this query below (or to use some of Trunk BooleanScorers)... Probably I do not see obvious. Use Case: Here I have names of people with query expansion for individual tokens (not using Fuzzy Query) that should be fou

Re: going from Document -> IndexReader's docid

2005-11-09 Thread Yonik Seeley
There really isn't a generic way... you have to search for the document. If you have a unique id field in your document, you can find the document id quickly via IndexReader.termDocs(term) -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/9/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wr

going from Document -> IndexReader's docid

2005-11-09 Thread tlittell
If I have a Document object (doc), and I also have an IndexReader open, how can I find out IndexReader's docid corresponding to (doc)? IndexReader has a map from docid -> Document, but I don't see the reverse. thanks in advance, Todd

Re: RangeQuery over many indexed documents seems to be buggy

2005-11-09 Thread Yonik Seeley
The limited number of terms in a range query should hopefully be addressed before Lucene 1.9 comes out. I'd give you a reference to the bug, but JIRA seems like it's currently down. search for ConstantScoreRangeQuery if interested. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 ---

Re: RangeQuery over many indexed documents seems to be buggy

2005-11-09 Thread Joachim Rösener
Am Mittwoch, den 09.11.2005, 08:53 -0500 schrieb Erik Hatcher: > On 9 Nov 2005, at 08:43, Joachim Rösener wrote: [...] > > Can you explain, maybe fix this? > > ah, the lure of young women ;) What else?! :-) > Is it perhaps you're getting an exception and eating it somewhere > along the way?

Re: RangeQuery over many indexed documents seems to be buggy

2005-11-09 Thread Erik Hatcher
On 9 Nov 2005, at 08:43, Joachim Rösener wrote: "sex:female AND birthday:[19800101 TO 19810101]" This gives the following results: 1980-1981: found 424 women. 1981-1982: found 329 women. 1982-1983: found 237 women. 1983-1984: found 232 women. 1984-1985: found 175 women. To proof if it works, a

RangeQuery over many indexed documents seems to be buggy

2005-11-09 Thread Joachim Rösener
Hello, I am currently developing a singles dating service with lucene-1.4.3 as search engine. Due to the limitation that ages < 1970-01-01 cannot be indexed with Field.Keyword(String name, Date value) (produces a RuntimeException ("time too early")), the age indexing is done via Field.Text(String

Re: Ask about method QueryParser.parser

2005-11-09 Thread Karl Øie
Sounds very strange, have you debugged the input string s? Where does it come from? Karl On 9. nov. 2005, at 05.00, Hai Do Thanh wrote: Dear all, I really appreciate your work on Lucene. It is apparently a helpful API for my project on indexed Document searching. On the whole, It works prop