It looks like an interesting idea especially as it keep the simple use
case simple (ie simply not defining an queryAnalyzer.

Can you explain to me why you would need a different analyzer for a
wildcard query? My brain is still tanning on the beach.

Brainstorming here we could do the following

@AnalyzerDef.target

enum AnalyzerTarget { ALL, INDEXING, QUERY, WILDCARD }

So you could define the same @AnalyzerDef.name several times provided
that they did not share the same targets.

But that would also change the API for the dynamic analyzer I suppose.
It also does not cover the @Analyzer.impl usage.

On Tue 2013-08-13 10:13, Guillaume Smet wrote:
> Hi,
> 
> Note: this is just a prospective idea I'd like to discuss. Even if
> it's a good idea, it's definitely 5.0 material.
> 
> Those who have used Solr and are familiar with the Solr schema have
> already seen the ability to use different analyzer for indexing and
> querying.
> 
> It's usually useful when you use analyzers which returns several
> tokens for a given token: the QueryParser usually can't build the
> correct query with these analyzers.
> 
> To take an example from my current work on HSEARCH-917 (soon to come
> \o/), I have the following case. From i-pod , the analyzer builds ipod
> i pod i-pod. ipod and i-pod aren't the issue here but the fact that i
> pod is on two tokens makes the QueryParser build an incorrect query
> (even if I use the Lucene 4.4 version which is a little bit smarter
> about these cases and at least make the i-pod ipod case work
> correctly).
> 
> The fact is that if the analyzer used at indexing has correctly
> indexed all the tokens, I don't need to expand the terms at querying
> and it should be sufficient to use a simple analyzer to lowercase the
> string and remove the accents.
> 
> Solr introduced this feature a long time ago (it was already there in
> the good old times of 1.3) and I'm wondering if we shouldn't introduce
> it in Hibernate Search too.
> 
> As for the implementation, I was thinking about adding an attribute
> queryAnalyzer to the @Field annotation. I was also wondering if we
> shouldn't add the ability to define an Analyzer for wildcard queries
> (Lucene introduced recently an AnalyzingQueryParser to do something
> like that).
> 
> And maybe, in this case, it would be a good idea to centralize the
> configuration with types as it's done in Solr? Usually, the three
> analyzers definitions would come together.
> 
> As for my particular needs, most of my full text fields would be
> analyzed like this:
> 
> indexing:
>       @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
>                       tokenizer = @TokenizerDef(factory = 
> WhitespaceTokenizerFactory.class),
>                       filters = {
>                                       @TokenFilterDef(factory = 
> ASCIIFoldingFilterFactory.class),
>                                       @TokenFilterDef(factory = 
> WordDelimiterFilterFactory.class, params = {
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "generateWordParts", value = "1"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "generateNumberParts", value = "1"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "catenateWords", value = "1"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "catenateNumbers", value = "0"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "catenateAll", value = "0"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "splitOnCaseChange", value = "0"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "splitOnNumerics", value = "0"),
>                                                                       
> @org.hibernate.search.annotations.Parameter(name =
> "preserveOriginal", value = "1")
>                                                       }
>                                       ),
>                                       @TokenFilterDef(factory = 
> LowerCaseFilterFactory.class)
>                       }
>       ),
> querying:
>       @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
>                       tokenizer = @TokenizerDef(factory = 
> StandardTokenizerFactory.class),
>                       filters = {
>                                       @TokenFilterDef(factory = 
> ASCIIFoldingFilterFactory.class),
>                                       @TokenFilterDef(factory = 
> LowerCaseFilterFactory.class)
>                       }
>       ),
> wildcard:
>       @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
>                       tokenizer = @TokenizerDef(factory = 
> WhitespaceTokenizerFactory.class),
>                       filters = {
>                                       @TokenFilterDef(factory = 
> ASCIIFoldingFilterFactory.class),
>                                       @TokenFilterDef(factory = 
> LowerCaseFilterFactory.class)
>                       }
>       ),
> 
> I could contribute time to work on this if we can agree on the way to
> pursue this idea.
> 
> Thanks for your feedback.
> 
> -- 
> Guillaume
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Reply via email to