In the context of implementing Elasticsearch support for Hibernate Search, there's a recurring need to transform the domain model to the "Document" representation using a strategy which depends on the storage choice, i.e. Lucene vs Elasticsearch.
For example Guillaume working on HSEARCH-2067 needs to associate the entities document builder with a FieldBridge choice which needs to know if the output document will be indexed in ES, rather than Lucene. The choice of FieldBridge implementation affects the DocumentBuilder bound to each type; this implies that we're "tainting" the DocumentBuilder for all instance of a type. The abstraction of "IndexManager" is meant to initialize and manage an *index* - but remember that there's no guarantee that a single type is bound to a single index (and so to a single IndexManager). - We have the case of a single type being spread out on multiple indexes, using Sharding. - We also have the opposite, of multiple different types sharing and index - Subtypes of indexed types can opt to be indexed in a different type - All of two above can be mixed freely, as there's a clear distinction between type (identified by a Class) and index (identified by a String) [I'm not stating that the above facts are necessarily all required, just that they are currently supported.. so we could in theory discuss taking away some of this flexibility now, but implementing such restrictions would need to wait for version 6.0.] When a Query is run on a type A, we're transparently running the query on all indexes of shards containing A, and also its indexed subtypes on different indexes. We're also filtering out incompatible types transparently, if any of these sub-indexes are shared with other types. We also allow running a FullTextQuery on multiple, unrelated types and the same rules apply. To perform such a Query on multiple indexes, the trick currently used with Lucene based backends is the usage of MultiReaders: we wrap multiple indexes and present them as one index reader to the query engine, it's a "unified view" on which the query is performed. For obvious reasons we can not wrap a MultiReader across both Lucene indexes and Elasticsearch's query capabilities (or maybe we could eventually, but that's a whole lot of R&D to be done for questionable usefulness). So, we need to introduce a new concept: something like "index families" to properly abstract the boundaries as clearly some indexes can work together better within the same kind than with indexes of other kind. Stuff indexed in Lucene embedded would belong to a family A, stuff in the Elasticsearch cluster would be family B, and I guess one might have a secondary independent Elasticsearch cluster which would need to be in a different family C, or eventually a Solr cluster in yet another separated family. Such an "index family" would give us: - a place were the connection settings, connections pools are handled for Elasticsearch - clear boundaries about which types can be queried "as one": only the types in the same family, and subtypes might be allowed a different index but it must live in the same family. Same for Sharding. - a reasonable place to query for which "kind of storage" is being used for a specific type - An Analyzer might exist only within a family (Defined on one ES cluster, not on the other) - We have a long standing issue with Similarity: you can only have one in a group of indexes, but the group concept is undefined (and only loosely validatable) - And "index family" could have a type, therefore define what kind of FieldBridge(s) need to be generated I'm not saying that this is all blocking for 5.6. My proposal is to see if we agree on such a design as a longer term objective (set some foundation in 5.7, finalize for 6). For 5.6 I'd be happy enough to essentially document that there's only one family allowed, which allows us to cut some corners like: - single set of Analyzers to validate - know that the Search instance is fully using ES exclusively, or Lucene exclusively - know that all IndexManagers are connected to the same set of ES nodes (if using ES) So not much changing.. just hope this helps in shaping our internals with an eye on the next step, and make sure that the listed limitations which we've been accepting already can be clearly documented. It would be great to already have the basics for index families in place, for example to define the proper API to read metadata for a type (like Guillaume is needing), and to cleanup some things, such as make the Similarity definition clearly associated to such a thing. Naming: index family ? index groups? Not sure if there's need to add anything to the configuration properties; for now it could simply reflect our interpretation of the existing configuration, yet expose useful and clean metadata to the internal components which need this. Thanks for any comments! Sanne _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev