Re: Tokenization and PrefixQuery

Yann-Erwan Perio Fri, 14 Feb 2014 04:13:01 -0800

On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless
<luc...@mikemccandless.com> wrote:


> This is similar to PathHierarchyTokenizer, I think.

Ah, yes, very much. I'll check it out and see if I can make something
of it. I am not sure to what extent it'll be reusable though, as my
tokenizer also sets payloads (the next coming "path part" is set on
the current token as a payload, so as to provide a perspective of
what's coming ahead, at search time).

>> Regarding the PrefixQuery: it seems that it stops matching documents
>> when the length of the searched string exceeds a certain length. Is
>> that the expected behavior, an if so, can I / should I manage this
>> length?
>
> That should not be the case: it should match all terms with that
> prefix regardless of the term's length.  Try to boil it down to a
> small test case?

I guess I've been too shallow with my testing, then :( Well, I'll dig
deeper, and if I find something wrong with Lucene, I'll post a small
test case demonstrating the issue - but so far, the errors were always
on my side.

> I think your approach is a typical one (adding more terms to the index
> so you get TermQuery instead of MoreCostlyQuery).  E.g.,
> ShingleFilter, CommonGrams are examples of the same general idea.
> Another example is AnalyingInfixSuggester, which does the same thing
> you are doing under-the-hood but one byte at a time (i.e. all term
> prefixes up to a certain depth), and it also makes its analysis depth
> controllable.  Maybe expose it to your users as a very expert tunable?

This is what I have done, letting the clients of the framework specify
the analysis depth through their configuration file.

Thanks a lot for your feedback, it's very appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Tokenization and PrefixQuery

Reply via email to