Re: relevancy "buckets" and secondary searching

2007-02-05 Thread Erick Erickson
Well, I'm glad to see that it's not just me. What I'd tentatively thought about was using a TopDocs (search(Query, Filter, Num) form) to get me the top Num documents (by relevance), then doing the "bucketing" and sorting myself at that point. But your suggestion made me look at FieldSortedHitQueue

RE: An arguable bug in Lucene 1.9.1

2007-02-05 Thread Lee_Gary
I am seeing this issue as well with the exact same stack trace using spanQueries. Does anyone know if this has been fixed for versions after 1.9.1? Thanks Gary -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 23, 2006 07:23 AM To: java-user@lucene.apa

Re: relevancy "buckets" and secondary searching

2007-02-05 Thread Peter Keegan
Hi Erick, The timing of your posting is ironic because I'm currently working on the same issue. Here's a solution that I'm going to try: Use a HitCollector with a PriorityQueue to sort all hits by raw Lucene score, ignoring the secondary sort field. After the search, re-sort just the hits from

relevancy "buckets" and secondary searching

2007-02-05 Thread Erick Erickson
Am I missing anything obvious here and/or what would folks suggest... Conceptually, I want to normalize the scores of my documents during a search BUT BEFORE SORTING into 5 discrete values, say 0.1, 0.3, 0.5, 0.7, 0.9 and apply a secondary sort when two documents have the same score. Applying the

Re : Re: Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks ! I thought that would be the case too, but it's not. "2003" is just stored in the "contents" field as everything else. The only field indexed is the "contents" field. Since only the "contents" field is indexed, everything that is searched should be found. The number problem does restri

Re: Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Chiradeep Vittal
Perhaps the number (dates?) are being indexed in a separate field? Lucene will only search the default field with the queries you have shown. If, for instance the year was being stored in the "year" field, then your query should be report AND year:2003 HTH - Original Message From: Xav

Re : Re: Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for your help ! Wow, I never expected that many replies. Cool ! I did try to print out the query before and after it gets processed by QueryParser and let say my query is "2003", before and after it will be "2003". If I put "report 2003" the query will be, before and after getting into t

Re: Re : Re: Problem with a search engine

2007-02-05 Thread Erick Erickson
Have you tried looking at the actual query submitted with Query.toString()? That might give you an insight into what is actually being submitted to Lucene and a place to start. Also be aware that QueryParser, the default operator is OR which can produce unexpected results if you assume AND. Best

Re : Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for your help, As I stated before, the numbers, whether pure or not, are indexed, for I can search them with luke. But supposing what you're saying was the case, the search for "10-year" should return 4 items (according to the number of occurence found by luke). Problem is that the numbe

Re: Problem with a search engine

2007-02-05 Thread Mark Miller
My bad...looking at a modified StandardAnalyzer instead of the correct one. Belay last. On 2/5/07, Mark Miller <[EMAIL PROTECTED]> wrote: StandardAnalyzer does not index pure numbers. It will index alphanumeric tokens and numbers that are connected with one of: "_"|"-"|"/"|"."|"," If you wish t

Re: Problem with a search engine

2007-02-05 Thread Mark Miller
StandardAnalyzer does not index pure numbers. It will index alphanumeric tokens and numbers that are connected with one of: "_"|"-"|"/"|"."|"," If you wish to index pure numbers you might want to add another regex to StandardAnalyzer that recognizes a series of digits - don't forget to add the new

Re: Problem with a search engine

2007-02-05 Thread Xavier To
Thanks for taking time to answer me. The problem is that I'm not allowed to post code due to a confidentiality contract that I was required to sign. I'll try to see if I can get a special permission to post code since I'm wasting so much time trying to find the answer to this. I tried looking f

Re: Building lucene index using 100 Gb Mobile HardDisk

2007-02-05 Thread John Haxby
maureen tanuwidjaja wrote: Oh is it?I didn't know about that...so Is it means I cant use this Mobile HDD.. Damien McCarthy <[EMAIL PROTECTED]> wrote: FAT 32 imposes a lower file size limitation than NTF. Attempts to create files greater that 4Gig on FAT32 will throw error you are seeing. No