On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless <luc...@mikemccandless.com> wrote:
> This is similar to PathHierarchyTokenizer, I think. Ah, yes, very much. I'll check it out and see if I can make something of it. I am not sure to what extent it'll be reusable though, as my tokenizer also sets payloads (the next coming "path part" is set on the current token as a payload, so as to provide a perspective of what's coming ahead, at search time). >> Regarding the PrefixQuery: it seems that it stops matching documents >> when the length of the searched string exceeds a certain length. Is >> that the expected behavior, an if so, can I / should I manage this >> length? > > That should not be the case: it should match all terms with that > prefix regardless of the term's length. Try to boil it down to a > small test case? I guess I've been too shallow with my testing, then :( Well, I'll dig deeper, and if I find something wrong with Lucene, I'll post a small test case demonstrating the issue - but so far, the errors were always on my side. > I think your approach is a typical one (adding more terms to the index > so you get TermQuery instead of MoreCostlyQuery). E.g., > ShingleFilter, CommonGrams are examples of the same general idea. > Another example is AnalyingInfixSuggester, which does the same thing > you are doing under-the-hood but one byte at a time (i.e. all term > prefixes up to a certain depth), and it also makes its analysis depth > controllable. Maybe expose it to your users as a very expert tunable? This is what I have done, letting the clients of the framework specify the analysis depth through their configuration file. Thanks a lot for your feedback, it's very appreciated. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org