On 04 Mar 2014, at 12:09, Guillaume Smet <guillaume.s...@gmail.com> wrote:
> On Tue, Mar 4, 2014 at 11:09 AM, Emmanuel Bernard > <emman...@hibernate.org> wrote: >> I would like to separate the notion of autosuggestion from the wildcard >> problem. To me they are separate and I would love to Hibernate Search to >> offer an autosuggest and spell checker API. > > AFAICS from the changelog of each version, autosuggest is still a vast > work in progress in Lucene/Solr. So? :) > >> Back to wildcard. If we have an analyser stack that separates normaliser >> filters from filters generating additional tokens (see my email [AND]), then >> it is piece of cake to apply the right filters, raise an exception if >> someone tries to wildcard on ngrams, and simply ignore the synonym filter. > > In theory, yes. > > But for the tokenization, we use WhitespaceTokenizer and > WordDelimiterFilter which generates new tokens (for example, depending > on the options you use, you can index wi-fi as wi and fi, wi-fi and > wifi). Ok that poses a problem for the wildcard if wi and if are separated. But I don’t think it’s an issue for the AND case as we would get the expected query: - hotel AND wi-fi - hotel AND wi AND fi And to be fair, how do you plan to make wildcard and wi fi work together in Lucene (any solution available). The solution I can think of is to index the property with an analyzer stack that does not split words like that in two tokens. > > The problem of this particular filter is also that we put it after the > ASCIIFoldingFilter because we want the input to be as clean as > possible but before the LowerCaseFilter as WordDelimiterFilter can do > its magic on case change too. > > If you separate normalizer from tokenizer, I don't think it's going to > be easy to order them adequately. _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev