Re: Fine Tuning Lucene implementation

Dmitry Tue, 24 Jul 2007 23:01:41 -0700

Askar,
why do you need to add +id:<idWeCareAbout>?
thanks,
dt,
www.ejinz.com
search engine news forms

----- Original Message -----From: "Askar Zaidi" <[EMAIL PROTECTED]>

To: <java-user@lucene.apache.org>; <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene implementation

Hey Hira ,

Thanks so much for the reply. Much appreciate it.

Quote:

Would it be possible to just include a query clause?
  - i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>

How can I do that ?

I see my query as :

+contents:harvard +contents:business +contents:review

where the search phrase was: harvard business review

Now how can I add +id:<idWeCareAbout>  ??

This would give me that one exact document I am looking for , for that id.I

don't have to iterate through hits.

thanks,

Askar



On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:


I'm no expert on this (so please accept the comments in that context)
but 2 things seem weird to me:

1.  Iterating over each hit is an expensive proposition.  I've often
seen people recommending a HitCollector.

2.  It seems that doBodySearch() is essentially saying, do this search
and return the score pertinent to this ID (using an exhaustive loop).
Would it be possible to just include a query clause?
    - i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>

In general though, I think your algorithm seems inefficient (if I
understand it correctly):-- if I want to search for one term among 3 in
a "collection" of 300 documents (as defined by some external attribute),
I will wind up executing 300 x 3 searches, and for each search that is
executed, I will iterate over every Hit, even if I've already found the
one that I "care about".

What would break if you:
1.  Included "creator" in the Lucene index (or, filtered out the Hits
using a BitSet or something like it)
2.  Executed 1 search
3.  Collected the results of the first N Hits (where N is some
reasonable limit, like 100 or 500)

-h


On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:

> Sure.
>
>  public float doBodySearch(Searcher searcher,String query, int id){
>
>                  try{
>                                 score = search(searcher, query,id);
>                      }
>                       catch(IOException io){}
>                       catch(ParseException pe){}
>
>                       return score;
>
>                 }
>
>  private float search(Searcher searcher, String queryString, int id)
> throws ParseException, IOException {
>
>         // Build a Query object
>
>         QueryParser queryParser = new QueryParser("contents", new
> KeywordAnalyzer());
>
>         queryParser.setDefaultOperator(QueryParser.Operator.AND);
>
>         Query query = queryParser.parse(queryString);
>
>         // Search for the query
>
>         Hits hits = searcher.search(query);
>         Document doc = null;
>
>         // Examine the Hits object to see if there were any matches
>         int hitCount = hits.length();
>
>                 for(int i=0;i<hitCount;i++){
>                 doc = hits.doc(i);
>                 String str = doc.get("item");
>                 int tmp = Integer.parseInt(str);
>                 if(tmp==id)
>                 score = hits.score(i);
>                 }
>
>         return score;
>     }
>
> I really need to optimize doBodySearch(...) as this takes the most
> time.
>
> thanks guys,
> Askar
>
>
> On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
>
>         Could you show us the relevant source from doBodySearch()?
>
>         -h
>
>         On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
>         > I ran some tests and it seems that the slowness is from
>         Lucene calls when I
>         > do "doBodySearch", if I remove that call, Lucene gives me
>         results in 5
>         > seconds. otherwise it takes about 50 seconds.
>         >
>         > But I need to do Body search and that field contains lots of
>         text. The field
>         > is <contents>. How can I optimize that ?
>         >
>         > thanks,
>         > Askar
>         >
>         >



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fine Tuning Lucene implementation

Reply via email to