Askar,
why do you need to add +id:<idWeCareAbout>?
thanks,
dt,
www.ejinz.com
search engine news forms
----- Original Message -----
From: "Askar Zaidi" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>; <[EMAIL PROTECTED]>
Sent: Wednesday, July 25, 2007 12:39 AM
Subject: Re: Fine Tuning Lucene implementation
Hey Hira ,
Thanks so much for the reply. Much appreciate it.
Quote:
Would it be possible to just include a query clause?
- i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>
How can I do that ?
I see my query as :
+contents:harvard +contents:business +contents:review
where the search phrase was: harvard business review
Now how can I add +id:<idWeCareAbout> ??
This would give me that one exact document I am looking for , for that id.
I
don't have to iterate through hits.
thanks,
Askar
On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
I'm no expert on this (so please accept the comments in that context)
but 2 things seem weird to me:
1. Iterating over each hit is an expensive proposition. I've often
seen people recommending a HitCollector.
2. It seems that doBodySearch() is essentially saying, do this search
and return the score pertinent to this ID (using an exhaustive loop).
Would it be possible to just include a query clause?
- i.e., instead of just contents:<userQuery>, also add
+id:<idWeCareAbout>
In general though, I think your algorithm seems inefficient (if I
understand it correctly):-- if I want to search for one term among 3 in
a "collection" of 300 documents (as defined by some external attribute),
I will wind up executing 300 x 3 searches, and for each search that is
executed, I will iterate over every Hit, even if I've already found the
one that I "care about".
What would break if you:
1. Included "creator" in the Lucene index (or, filtered out the Hits
using a BitSet or something like it)
2. Executed 1 search
3. Collected the results of the first N Hits (where N is some
reasonable limit, like 100 or 500)
-h
On Tue, 2007-07-24 at 20:14 -0400, Askar Zaidi wrote:
> Sure.
>
> public float doBodySearch(Searcher searcher,String query, int id){
>
> try{
> score = search(searcher, query,id);
> }
> catch(IOException io){}
> catch(ParseException pe){}
>
> return score;
>
> }
>
> private float search(Searcher searcher, String queryString, int id)
> throws ParseException, IOException {
>
> // Build a Query object
>
> QueryParser queryParser = new QueryParser("contents", new
> KeywordAnalyzer());
>
> queryParser.setDefaultOperator(QueryParser.Operator.AND);
>
> Query query = queryParser.parse(queryString);
>
> // Search for the query
>
> Hits hits = searcher.search(query);
> Document doc = null;
>
> // Examine the Hits object to see if there were any matches
> int hitCount = hits.length();
>
> for(int i=0;i<hitCount;i++){
> doc = hits.doc(i);
> String str = doc.get("item");
> int tmp = Integer.parseInt(str);
> if(tmp==id)
> score = hits.score(i);
> }
>
> return score;
> }
>
> I really need to optimize doBodySearch(...) as this takes the most
> time.
>
> thanks guys,
> Askar
>
>
> On 7/24/07, N. Hira <[EMAIL PROTECTED]> wrote:
>
> Could you show us the relevant source from doBodySearch()?
>
> -h
>
> On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote:
> > I ran some tests and it seems that the slowness is from
> Lucene calls when I
> > do "doBodySearch", if I remove that call, Lucene gives me
> results in 5
> > seconds. otherwise it takes about 50 seconds.
> >
> > But I need to do Body search and that field contains lots of
> text. The field
> > is <contents>. How can I optimize that ?
> >
> > thanks,
> > Askar
> >
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]