Sami,
You're on to the right approach seeking something other than
FuzzyQuery. FuzzyQuery is rarely generally useful and there are
other ways to achieve the same sort of thing (soundex, metaphone) in
an efficient manner.
If you could share some details about these properties and how you
need to query them I'm sure the community could offer suggestions on
an efficient and clean implementation. Without details, its not
possible to (easily) know how recommend a specific technique.
Erik
On May 30, 2006, at 11:12 AM, Sami Dalouche wrote:
Hi,
I have 2 million documents, with a name property. (~15 to 20
characters).
Fuzzy searching against this property takes around 3 seconds, which is
way too much for what I plan to do, so I am considering the possible
optimizations. I can add a property to each of the documents, that
could
partition the document space into 400 spaces. Each space would then be
limited to 5000 documents, which should be small enough to make the
fuzzy search faster.
However, my question is : how do I take advantage of this additional
property ? Using a traditional RDBMS, I would add an index on the
field,
but on Lucene, I'm not sure of how to proceed. Would filters be the
way
to go ?
(http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
Filter.html)
Could a Caching Wrapperfilter help even more ?
(http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
CachingWrapperFilter.html)
Additionnally, the additional property is an id, so can I store it
as a
number so that it is faster (I guess) than string comparison ?
Thanks a lot,
Sami Dalouche
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]