Re: A lot of short documents, optimal query?

2005-11-10 Thread eks dev
Thanks Hoss, I've looked intio it and you were absolutely right, could not be simpler. Two quick ones on the same topic (my personal education like questions): - What is the purpose of hasCode and equals methods in XxxFilter? (this is a question about actual usage in Lucene, not java elementary

Document as parameter?

2005-11-10 Thread bib_lucene bib
Hi All I use the following code to display search results LuceneHitHighlighter highlighter = new LuceneHitHighlighter(queryStr, "snippet", "body"); for (int i = 0; i < hits.size(); i++) { Document doc = (Document) hits.get(i); highlighter

Re: efficiently finding all terms used on a particular field withinDocuments matching a query

2005-11-10 Thread Chris Hostetter
: Thank you, I had thought a BitSet was appropriate here somehow, I'll work : on this approach. Paul's suggestion is acctually a lot simpler, and I suspect it might be faster -- but it does require that you index with TermVectors. If that's soemthing you're already doing, then you should definit

Re: efficiently finding all terms used on a particular field withinDocuments matching a query

2005-11-10 Thread Matt Magoffin
> : I'm wondering if there a more efficient way to accomplish this? > > I believe there is -- provided the terms are index. > > 1) Get yourself a BitSet representing the Documents you are interested in > (you mentioned having a a date range, you can either use a RangeFilter nad > call the bits meth

Re: Sorting: string vs int

2005-11-10 Thread Chris Hostetter
: I guess it would be nice to have some way of telling the searcher (and : the fieldcache) whether the actual string values are needed or not... : it could save a lot of memory when there are a lot of unique terms. you're talking about something like LUCENE-457 right? ... but make it optional so

Re: Search Help

2005-11-10 Thread Chris Hostetter
: That's what I'm doing now, but I was thinking that if I limit the number of : results I get back, I can save query time. Maybe I'm wrong? one thing that does slightly bug me about the way the Hits class works, is that the constructor (which is called by the Searcher.search(Query) calls getMore

RE: Sorting: string vs int

2005-11-10 Thread Monsur Hossain
Ah, I got it. retArray is an array of ints; in order to return the string value, it needs the mterms array to do the mapping. Thanks, Yonik! Monsur > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 10, 2005 1:33 PM > To: java-user@lucen

Re: Sorting: string vs int

2005-11-10 Thread Yonik Seeley
Here is a snippet of the current StringIndex class: public static class StringIndex { /** All the term values, in natural order. */ public final String[] lookup; /** For each document, an index into the lookup array. */ public final int[] order; } The order field is used for sor

Re: Input File Format

2005-11-10 Thread Erik Hatcher
You must be using the demo program that comes with Lucene. That is merely an example, and a barely decent one at that. Have a look under the covers of that code or the code that ships with Lucene in Action at http://www.lucenebook.com You can slice and dice "documents" in whatever granula

RE: Sorting: string vs int

2005-11-10 Thread Monsur Hossain
Thanks Yonik, it makes sense now. So getStringIndex indexes every sorted string field in the retArray (one per document), and then each unique string term in the mterms array. What is the purpose of the mterms array? Thanks, Monsur > -Original Message- > From: Yonik Seeley [mailto:[

Input File Format

2005-11-10 Thread Satyanarayana Ashwin
Hello, I am new to Lucene. I was trying to use Lucene with TREC-6 Data. The question is that each input file given by TREC have multiple documents (some files contain over 200 documents) tagged by DOCID. The result given by Lucene to a query is a list of files and not documents. Q1) Is there a wa

Re: Search Help

2005-11-10 Thread Erik Hatcher
On 10 Nov 2005, at 08:01, [EMAIL PROTECTED] wrote: That's what I'm doing now, but I was thinking that if I limit the number of results I get back, I can save query time. Maybe I'm wrong? Out of curiosity - what kind of query are you issuing and what kind of response times are you seeing?

Re: Search Help

2005-11-10 Thread Daniel . Clark
That's what I'm doing now, but I was thinking that if I limit the number of results I get back, I can save query time. Maybe I'm wrong? ~ Daniel Clark, Senior Consultant Sybase Federal Professional Services 6550 Rock Spring Drive, Suite 800 Bethesda, MD 20

Re: Ask about method QueryParser.parser

2005-11-10 Thread Karl Øie
If you have Tomcat it defaults to iso-8859-1 as character encoding i think, try to recode your input to utf-8 before feeding it to lucene. s = new String(s.getBytes(),"UTF-8"); Karl On 10. nov. 2005, at 04.10, Hai Do Thanh wrote: Thanks for your reply :) I have already debugged the input s

Re: efficiently finding all terms used on a particular field within Documents matching a query

2005-11-10 Thread Paul Elschot
On Thursday 10 November 2005 08:12, Chris Hostetter wrote: > > : For example I would like to find the set of terms used within a particular > : date range, where all Documents have a date field on them. What I've done > : to date is simply perform a query to find all Documents that match the > : d