Hi all, among the various plans for Hibernate Search 6, one of the reasons we had to do the Elasticsearch integration sooner as experimental was to get ourselves a clearer picture of what's going to be needed in terms of internal cleanup.
Our DocumentBuilder is ancient, and several new features have been added since it was a well designed, simple piece of code.. So, while we have discussed several wishes already, I started now a document to try get all our thoughts to converge. For convenience, pasting the current content below. - https://docs.google.com/document/d/1JwKanIRHVTw1LvCdLGyY6EKuyvvQn6gvlkmPGqDjFxw/edit?usp=sharing I'm not giving comment permissions to the world; anyone who's interested please answer here or drop me a note, happy to give permissions to comment to well-intentioned people. N.B. The document will very likely evolve beyond this email; as it is now it's an initial brain dump. For example, I haven't thought about the ES capability of nesting structures yet. Thanks, Sanne ==== Pasting from document ===== DocumentBuilder and FieldBridge requirements for Hibernate Search 6.0 * Never import Lucene types; ideally make Lucene an dependency of the Lucene backend only. * In a modular world, don’t expect end user code to be able to load Lucene class definitions. * Efficient lookup “field name” -> field mappings and its indexing options; not least: * Cardinality {always one, optional one, one-many, zero-many} * Needed for validation of queries, e.g. query for null can use an “exists” query only in some of these cases, vs needing a null token. * projectable alone vs part of multiple fields relating to a single property (allow projection of Two-Way bridges using multiple fields) * Might need “group name”.”sub field name” for groups and index time joins * IndexedEmbedded * “depth” and navigational graph to be pre-computed: tree of valid fields and options to be known in advance. * Navigating into a relation must deal with possibly navigating into subclasses of the relation type: http://stackoverflow.com/questions/39516355/indexing-a-interface-in-hibernate-search * Immutable, threadsafe, easy to inspect/walk mapping tree * Built and validated at boostrap of the IndexManager * can’t be updated after that * Field names and custom FieldType not to be allocated at runtime * Efficient to validate Queries * Allow efficient production of an Entity instance into: * Elasticsearch “document” * Lucene “document” * An efficient to serialize “document” * If it gets easy enough, make our own simple serialization? * Extensible to other backends e.g. Apache Solr in the future (a Walkable SPI) * Pretty printed text to dump the “schema” we’re using from a given domain model * Validations and comparisons * Allow to validate compatibility with an Elasticsearch schema * Allow to validate compatibility with a Lucene schema * Walking tree to map to ORM loading strategies * allow to predict which paths we’ll need to initialize (database load) for efficient batch loading (graph initialization) * Allow for accurate Dirty-checking to skip indexing operations * Allow generation of better MassIndexer queries (fetch join some of the relations?) * ID handling: specific care * ad-hoc encoders for ID * stricter validation (e.g. cardinatlity, DocValues, Two-Way fieldbridges) * Support multi-term IDs (composite keys, @IdClass) * Have different “index id strategies” to have them apply different logic, i.e. “delete by term” and “update by term” only apply on single-term IDs. * ID handling strategy might need to take into account if the index is shared among types. * Decoupling from Java “Class” as entity-type identifiers * Sharding: * Allow reuse of the same schema for indexes using the same * Allow reuse of some elements for indexes sharing such elements * Properties / Field relations * Handle one property -> multiple Fields as a bidirectional relation. * Disallow one index field being target of different properties and/or bridges? * Representation of “Join points” and Groups: * allow future production of Lucene documents with index-time join (write in groups) * allow efficient Query validation for both index-time and query-time join options * Composable * @ClassBridge, @Field annotations to both contribute to field definitions * a @ClassBridge of an @IndexedEmbedded to both contribute to the embedded field definitions * Include type-bound user custom Bridges (see BridgeProvider) in the compositions * Both @ClassBridge and custom Bridges need to trigger on polymorphic relations as well _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev