On Sunday 25 November 2007 11:54:15 markharw00d wrote:
> For "fuzzy" you're going to pay one way or another.

But which one is the cheapest? :)

> You can use ngram analyzers on indexed content and queries which will
> add IO costs ("files" becomes "fi","fil", "file","il","ile","iles" in
> both your query and index) or you can use some form of query-time edit
> distance comparison on "files" and pay the CPU costs. You can use

The ngram approach is probably more powerful regarding the quality of the 
search result (rememberin the "Al Jazeera" example in LIA) however it will 
blow up the index in size tremendously.

Can somebody estimate whether it's more promising to keep the index as small 
as possible and hold it completely in a RAMDirectory (or MMapDirectory, 
LUCENE-1035 whatever)? What is it that takes the time in 
FuzzyQuery.rewrite() - IO or CPU? Or both?

> WordNet and "files" becomes "registers". You can examine large volumes
> of user queries and look at what is the most likely interpretation. You
> can use Soundex and then if you're lucky files==philes but there's no
> room for error and they either match or they dont - there is no measure
> of similarity.
>
> There's no free lunch here.
>
> Timo Nentwig wrote:
> > On Saturday 24 November 2007 18:28:48 markharw00d wrote:
> >> term. You can limit the number of edit distance comparisons conducted by
> >> setting the minimum prefix length. This is a property of the QueryParser
> >
> > Well, javadoc: "prefixLength - length of common (non-fuzzy) prefix". So,
> > this is some kind of "wildcard fuzzy" but not real fuzzy anymore.
> >
> > I understand the optimitation but right now I hardly can image a
> > reasonable use-case. Who care whether the levenstein distance is a the
> > beginnen, middle or end of word, .e.g when searching fuzzy for "philes" I
> > want to find "files"...
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to