A lot of short documents, optimal query?

eks dev Wed, 09 Nov 2005 09:34:30 -0800

Hi all,

Can somebody please suggest a way/ways on how to
optimize execution times this query below (or to use
some of Trunk BooleanScorers)... Probably I do not see
obvious.


Use Case:
Here I have names of people with query expansion for
individual tokens (not using Fuzzy Query) that should
be found in ceartin zip codes.
The collection contains approx 30Mio rather short
documents with only two fields, name (2-4 tokens) and
ZIP code.

My tougts were to index ony 2 char prefix to get rid
of the prefix part of the query. Sounds plausible?

Also worth of mentioning, I would like to get all hits
(currently using HitCollector) and score is not
needed, any way to avoid scoring (would that help at
all?)

Befor adding ZIPS:12* part of the query, Lucene worked
like a charm, a lot under 1 second on 25Mio
collection! Now it jumped into 10 second range.

Trunk is ok for me.

Thanks a lot! 
eks dev

(
+(
(+raimonds +marschan) 
(+raimonds +marschol) 
(+raimonds +marschel) 
(+raimonds +marschalfr) 
(+raimonds +marschalek) 
(+raimonds +marscha) 
...
) 
+(ZIPS:22* ZIPS:21* ZIPS:20* ZIPS:23* ZIPS:245*
ZIPS:246* ZIPS:247* ZIPS:240* ZIPS:241* ZIPS:242*
ZIPS:243* ZIPS:254* ZIPS:253* ZIPS:255* ZIPS:256*
ZIPS:257* ZIPS:295* ZIPS:296* ZIPS:273* ZIPS:274*
ZIPS:275* ZIPS:276* ZIPS:192* ZIPS:190*)
)


                
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

A lot of short documents, optimal query?

Reply via email to