> I don't disagree, I'm merely aiming to have in future Analyzer(s) > defined in a non-Lucene specific way, possibly allowing controlled > exceptions. > When changing the definitions API we'll be able to reconsider if we > want Analyzer definitions to be scoped per index like Elasticsearch > does.
Actually, for HSEARCH-2534, it would be enough if analyzer definitions were scoped by indexing service (Lucene/ES). But sure, that would be a good solution. If we wait for 6.0. > Sure that wouldn't allow to map on Hibernate Search an existing ES > cluster which uses conflicting names on different indexes, but reverse > engineering of existing ES clusters isn't our focus at this time; > people badly needing it can change their names to saner choices (as > I'd argue that name reuse for different things wouldn't be a sane > configuration, probably it won't be common either). I agree with you on that. To be honest I didn't even think of such an issue, since currently the analyzer definitions are scoped globally. > is there something which prevents > us to refine this decision and rather generate ES definitions out of > all known Analyzer definitions, rather than just the ones being > referred? Well, yes, that was in my first message; see below: > First we'd have to have a different namespace for each indexing service, > but I've already implemented that much. > Second, some analyzer definitions are only valid for one indexing service, > and not for the other. > For instance, analyzer definitions using ElasticsearchTokenFilterFactory > are specific to Elasticsearch. And Analyzer definitions using the > WhitespaceTokenizerFactory with the "rule" parameter are only valid with > embedded Lucene. And so on. To sum up, I'm not sure we can do > something smart. What prevents us to generate ES definitions out of all known analyzer definitions is that there may be definitions that *cannot* be translated to ES, simply because they are supposed to be used only with Lucene. I guess we could say "let's try to generate ES definitions, and if it fails just ignore it and log a warning", but it seems a bit unsafe... > Let's keep in mind that we're only able "translate" a very limited set > of well-known Analyzer definitions [...] For translations it's true, but any ES analyzer definition can be expressed with Hibernate Search by using Elasticsearch*Factory. In fact, it's the recommended approach. See https://docs.jboss.org/hibernate/search/5.6/reference/en-US/html_single/#_custom_analyzers . > In short, I think what matters most now is not how to define such > analyzers as there are viable (better?) alternatives, but we need to > make sure one can run a query with the right query-time overrides, > especially be able to refer to an Analyzer which has been manually > defined on ES but is possibly not known to us. (As discussed > previously with the exception of More-Like-This Queries which will > have to wait). We already have discussed this many times, but once again: users will not be able to define their analyzers manually on ES starting from ES 5.0 for various reasons. So that's clearly not a long-term solution. It's "viable" for now, but since it's not future-proof it's certainly not better. As for the short term, if I understand correctly, what you're proposing is that users don't add an @AnalyzerDef for query-only analyzers, and that we allow using unknown analyzers in queries? I guess we could do that, but that basically amounts to solution 3 "don't do it". Which is fine as long as we plan to fix it later. Also note we'd still have to explain users that query-only analyzer definitions are not supported with Elasticsearch. Yoann Rodière <yo...@hibernate.org> Hibernate NoORM Team On 5 January 2017 at 15:06, Sanne Grinovero <sa...@hibernate.org> wrote: > On 5 January 2017 at 13:06, Yoann Rodiere <yo...@hibernate.org> wrote: > >> I'm wondering how you'd all feel about the third solution: > >> 3. don't do it. > >> This depends of course how far it is blocking in practice. > > > > "Don't do it until 6.0" would be acceptable, I guess, since it's still > just > > a technical preview. Though we would introduce a limitation that would > only > > be our fault (since Elasticsearch supports query-time analyzers) and that > > would not exist with the Lucene integration. > > > > "Don't do it ever" seems really bad. As we've already discussed at length > > (multiple times), not being able to define analyzers from Hibernate > Search > > would be a real pain for users, especially in Elasticsearch 5. That's > true > > for indexing analyzers, and that's also true for querying-only analyzers. > > I wouldn't say that query-only analyzers are widespread, but they're at > > least useful, and I'm sure there are problems that can only be solved by > > using a different analyzer when querying than when indexing... > > I don't disagree, I'm merely aiming to have in future Analyzer(s) > defined in a non-Lucene specific way, possibly allowing controlled > exceptions. > When changing the definitions API we'll be able to reconsider if we > want Analyzer definitions to be scoped per index like Elasticsearch > does. > > But since today the Analyzer map is "global" (as in one map per > SearchIntegrator), I don't see why we can't treat them consistently on > both technologies and consider them global on one ES as well, i.e. > we'd copy all definitions to each ES index definition. > Sure that wouldn't allow to map on Hibernate Search an existing ES > cluster which uses conflicting names on different indexes, but reverse > engineering of existing ES clusters isn't our focus at this time; > people badly needing it can change their names to saner choices (as > I'd argue that name reuse for different things wouldn't be a sane > configuration, probably it won't be common either). > > > > >> I guess that I'm missing why you'd want to force people to express > >> that a specific Analyzer is meant to be used only at query time > >> differently than one which is used at indexing time. > >> If there is need to clearly make such discrimination then this should > >> be made very clear to our users too, so I'd prefer if we could avoid > >> introducing new concepts for people to learn.. unless there's strong > >> need of course. > > > > Analyzer definitions are interpreted as either Lucene analyzers (to be > > instantiated) or Elasticsearch analyzers (to be pushed to the ES index > > settings) based on where they are referenced (using @Analyzer). > > When I say an analyzer definition is query-only, it means there is an > > @AnalyzerDefinition but there isn't any @Analyzer referencing it. So > > Hibernate Search wouldn't know how to interpret it (ES or Lucene). > > Currently, the default for those definitions is to interpret them as > Lucene > > analyzers, which leads to HSEARCH-2534: we can't have Elasticsearch > > query-only analyzers. > > Ok, I understand the status quo, but is there something which prevents > us to refine this decision and rather generate ES definitions out of > all known Analyzer definitions, rather than just the ones being > referred? > > Let's keep in mind that we're only able "translate" a very limited set > of well-known Analyzer definitions so - while it's cool to help > migrations were we can - our primary focus is to make sure that people > can use any custom Analyzer configuration which they have defined > "manually" on ES. > > In short, I think what matters most now is not how to define such > analyzers as there are viable (better?) alternatives, but we need to > make sure one can run a query with the right query-time overrides, > especially be able to refer to an Analyzer which has been manually > defined on ES but is possibly not known to us. (As discussed > previously with the exception of More-Like-This Queries which will > have to wait). > > Thanks, > Sanne > > > > > Maybe with this piece of information, my original message makes more > sense? > > I.e.: > > > > Solution 1, interpret those definitions as both Lucene and Elasticsearch > > analyzer (there are problems with that, see my first message) > > Solution 2, make users "reference" those definitions using a new > > @QueryAnalyzer annotation. > > > >> Maybe I'm > >> missing something, but couldn't a user simply use an additional > >> @AnalyzerDef, so that the analyzer definition is associated to a name, > >> and use that? > > > > As mentioned above, an @AnalyzerDef that is not referenced is considered > as > > a Lucene analyzer, so it's not pushed to Elasticsearch and it can't be > used > > when querying Elasticsearch. > > The only workaround I see would be to add a dummy, always-empty field > like > > that: > > > > @Transient > > @Field(name = "__dummy", analyzer = @Analyzer(definition = > > "myQueryOnlyAnalyzer)) > > public String getMyQueryOnlyAnalyzerDummyField() { > > return null; > > } > > > > Which means there will be a useless field in the schema just to make > > Hibernate Search happy. > > > >> Is this issue relating to a specific user request? > > > > No, it's just a feature that is available for Lucene but not for > > Elasticsearch. > > > > > > Yoann Rodière <yo...@hibernate.org> > > Hibernate NoORM Team > > > > On 5 January 2017 at 13:04, Sanne Grinovero <sa...@hibernate.org> wrote: > >> > >> Hello, > >> > >> I'm wondering how you'd all feel about the third solution: > >> > >> 3. don't do it. > >> > >> This depends of course how far it is blocking in practice. Maybe I'm > >> missing something, but couldn't a user simply use an additional > >> @AnalyzerDef, so that the analyzer definition is associated to a name, > >> and use that? > >> > >> I guess that I'm missing why you'd want to force people to express > >> that a specific Analyzer is meant to be used only at query time > >> differently than one which is used at indexing time. > >> If there is need to clearly make such discrimination then this should > >> be made very clear to our users too, so I'd prefer if we could avoid > >> introducing new concepts for people to learn.. unless there's strong > >> need of course. > >> > >> Is this issue relating to a specific user request? > >> > >> Thanks, > >> Sanne > >> > >> > >> > >> On 4 January 2017 at 16:00, Yoann Rodiere <yo...@hibernate.org> wrote: > >> > Hello team, > >> > > >> > I'm currently working on HSEARCH-2534, "Query-only analyzer > definitions > >> > are > >> > never added to the index settings with Elasticsearch". > >> > This issue is about using analyzers only when querying with > >> > Elasticsearch. > >> > It is already possible with Lucene, but not in Elasticsearch, because > we > >> > assume that any analyzer definition that is not referenced by a > >> > @Analyzer > >> > annotation is a Lucene analyzer [1]. > >> > > >> > To be precise, the exact place where query-only analyzers are used is > in > >> > EntityContext.overridesForField [2], and the overrides are leveraged > >> > even > >> > with Elasticsearch, for instance in ConnectedMultiFieldsTermQueryB > uilder > >> > [3]. > >> > > >> > I can see two solutions to the issue: > >> > > >> > 1. Make all analyzer definitions available for all indexing > services. > >> > 2. Allow users to define, for each entity, which analyzer > definitions > >> > will be necessary when querying, even though the definitions are > not > >> > used > >> > when indexing. > >> > > >> > Solution 1 seems quite hard to implement correctly. > >> > First we'd have to have a different namespace for each indexing > service, > >> > but I've already implemented that much. > >> > Second, some analyzer definitions are only valid for one indexing > >> > service, > >> > and not for the other. > >> > For instance, analyzer definitions using > ElasticsearchTokenFilterFactory > >> > are specific to Elasticsearch. And Analyzer definitions using > >> > the WhitespaceTokenizerFactory with the "rule" parameter are only > valid > >> > with embedded Lucene. And so on. To sum up, I'm not sure we can do > >> > something smart. > >> > > >> > Solution 2 is easier to implement, but requires to add a bit of API: > the > >> > way for users to declare that a given analyzer definition is to be > >> > available when querying a given entity. I would add type-level > >> > @QueryAnalyzer(definition = "foo") and @QueryAnalyzers annotation. > >> > > >> > I know nobody wants to add new annotations in a minor, but right now > >> > that > >> > seems to be the only workable solution. > >> > > >> > What do you think? > >> > > >> > [1] > >> > > >> > https://github.com/hibernate/hibernate-search/blob/ > 1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/ > java/org/hibernate/search/engine/impl/ConfigContext.java#L277 > >> > [2] > >> > > >> > https://github.com/hibernate/hibernate-search/blob/ > 1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/ > java/org/hibernate/search/query/dsl/EntityContext.java#L14 > >> > [3] > >> > > >> > https://github.com/hibernate/hibernate-search/blob/ > 1847bd222128395056cdf6e7cfb601ceed5e40c3/engine/src/main/ > java/org/hibernate/search/query/dsl/impl/ConnectedMultiFieldsTermQueryB > uilder.java#L222 > >> > > >> > > >> > Yoann Rodière <yo...@hibernate.org> > >> > Hibernate NoORM Team > >> > _______________________________________________ > >> > hibernate-dev mailing list > >> > hibernate-dev@lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev > > > > > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev