This conversation is starting to get a bit complex, so I'll try to organize my answers:
# Applying the same solution to HV and HSearch @Emmanuel: right, I didn't see you were also talking about HV. I was only considering the HSearch case. I think I agree with you both, HV and HSearch are a bit different and we certainly cannot share the whole code. Some principles could probably be shared, such as the abstraction over accessing the input type with Emmanuel's "StructureTraverser". But the traversal algorithms are probably very different. And in fact, these traversals are at the core of each project's purpose, so it may not be a good idea to try to make them "more similar". # The requirements for HSearch @Emmanuel: we didn't take much notes, but we did draw a diagram of the target architecture: https://drive.google.com/a/redhat.com/file/d/0B_z-zSf_hJiZam JkZFBlNG5CeDQ/view?usp=sharing When you shared your recordings/pictures, I asked for the write permission on the shared folder to put the diagram, but you probably haven't had time yet. If I remember correctly, here were the main requirements: - Separate the source data traversal from the actual output format. - This will help when implementing different indexing services (Elasticsearch, Solr): we don't want to assume anything about the target format. - Make the implementation of JGroups/JMS as simple as possible. - In these case, we don't really want to build documents, we just want to transform the entity to a serializable object, and reduce the information to transmit over the network to a minimum. - Ideally, we'd just want to "record" the output of the traversal, transmit this recording to the master node, and leave the master node replay it to build a document. This would have the added benefit of not requiring any knowledge of the underlying technology (Lucene/ES/Solr) on the client side. - Requirements on the "mapping tree" (I'm not absolutely sure about those, Sanne may want to clarify): - “depth” and navigational graph to be pre-computed: tree of valid fields and options to be known in advance. - Immutable, threadsafe, easy to inspect/walk mapping tree - And on my end (I think Sanne shared this concern, but I may be wrong): query metadata as little as possible at runtime. # More info on my snippet @Gunnar: you asked for some client code, but I'm not sure it'll be very explicit. The only client-facing interface (as far as document building goes) is EntityDocumentConverter. So, the parts of the application that need to convert an entity to a document will do something like that: EntityDocumentConverter<E, D> converter = indexManager.getEntityDocument Converter(); D document = converter.convert( entity ); indexManager.performOperation( newAddOperation( document ) ); The idea behind this was to make runtime code as simple as possible, and move the complexity to the bootstrapping. Basically, when you call converter.convert, it will delegate to ValueProcessors, which will extract information from the entity and inject it into the DocumentBuilder. What is extracted, and how to extract it, is completely up to the ValueProcessor. This means that, when bootstrapping, a tree of ValueProcessors will be built according to the metadata. For instance when a @Field is encountered, we build an appropriate ValueProcessor (potentially nesting multiple ones if we want to keep matters separate: one for extracting the property's value, one for transforming this value using a bridge). When an @IndexedEmbedded is encountered, we build a different ValueProcessor. And so on. Here is an (admittedly very simple) example of what it'd look like in the metadata processor; List<ValueProcessor> collectedProcessors = new ArrayList<>(); for ( XProperty property : properties ) { Field fieldAnnotation = property.getAnnotation( Field.class ); if ( fieldAnnotation != null ) { ValueProcessor fieldBridgeProcessor = createFieldBridgeProcessor( property.getType(), fieldAnnotation ); ValueProcessor propertyProcessor = new JavaPropertyProcessor( property, fieldBridgeProcessor ); // The value of the property will be passed to the fieldBridgeProcessor at runtime collectedProcessor.add( propertyProcessor ); } } ValueProcessor rootProcessor = new CompositeProcessor( collectedProcessors ); return new EntityDocumentConverter( rootProcessor, indexManagerType.getDocumentBuilder() ) The actual code will obviously be more complex, first because we need to support much more features than just @Field, but also because the createFieldBridgeProcessor() method needs to somehow build backend-specific metadata based on the nature of the field. But I think the snippet captures the spirit. # Summary Thinking about it a little, there's a different focus in our solutions. 1. Emmanuel's solutions focuses on abstracting over the input data format (thanks to StructureTraverser), assuming the traversal algorithm will be re-implemented for each output type. 2. My solution focuses on abstracting over the output data format (thanks to DocumentBuilder), assuming the traversal algorithm will be re-implemented for each input type using different ValueProcessors. 3. Gunnar's solution seem to focus on abstracting over the output data format, reimplementing the traversal algorithm for each input type using a different TreeTraversalSequence. Solution 1 and 2 are, in my opinion, compatible. We could have very generic ValueProcessors that would make use of a StructureTraverser to extract data and of a DocumentBuilder to inject it into a document. I'm not sure it is necessary, because I expect metadata to be defined differently based on the input type, and hence the traversal algorithms to be slightly different, but I think we could do it. About solution 3: TreeTraversalSequence seems to implement the traversal algorithm, while TreeTraversalEventConsumer abstracts over the output format and TreeTraversalEvent abstracts over the information being transferred. I think the general principles are more or less equivalent to solution 2. The main difference are: - How the context around the data to transfer is propagated. In solution 2, we pass the context progressively by making call to the DocumentBuilder (documentBuilder.nest(...), documentBuilder.addField(...)). In solution 3, the context is explicitly modeled as a TreeTraversalEvent. - How metadata is looked up. In solution 2, the metadata is built in the objects implementing the traversal algorithm, so there is no look up to speak of. In solution 3, there is a metadata lookup for each node in the tree. Maybe there's a matter of performance, but I don't know enough about this to give a definitive answer. In the end it's probably more a matter of taste. Yoann Rodière <yo...@hibernate.org> Hibernate NoORM Team On 7 February 2017 at 11:17, Gunnar Morling <gun...@hibernate.org> wrote: > Emmanuel, > > In your PoC, how would a complete tree-like structure be traversed? > It's not clear to me, who is driving StructureTraverser, i.e. which > component will call processSubstructureInContainer() et al. when > traversing an entire tree. > > @Yoann, maybe you can add a usage example similar to Emmanuel's? You > have a lot of framework code, but I'm not sure about how it'd be used. > > For Hibernate Search, the traversal pattern I implemented for the > ScenicView PoC may be of interest. Its general idea is to represent a > tree traversal as a sequence of events which a traverser > implementation receives and can act on, e.g. to create a corresponding > de-normalized structure, Lucene document etc. The retrieval of values > and associated objects happens lazily as the traverser > ("TreeTraversalEventConsumer" in my lingo) pulls events from the > sequence, similar to what some XML parsers do. > > The main contract can be found at [1]. There are two event sequence > implements, one based on Hibernate's meta-model [2] and one for > java.util.Map [3]. An example event consumer implementation which > creates MongoDB documents can be found at [4]. > > As said I think it'd nicely fit for Hibernate Search, for HV I'm not > so sure. The reason being that the order of traversal may very, > depending on the defined validation groups and sequences. Sometimes we > need to go "depth first". I've been contemplating to employ an > event-like approach as described above for HV, but it may look > different than the one used for HSEARCH. > > --Gunnar > > [1] https://github.com/gunnarmorling/scenicview-mvp/ > blob/master/core/src/main/java/org/hibernate/scenicview/spi/backend/model/ > TreeTraversalSequence.java. > [2] https://github.com/gunnarmorling/scenicview-mvp/ > blob/master/core/src/main/java/org/hibernate/scenicview/internal/model/ > EntityStateBasedTreeTraversalSequence.java > [3] https://github.com/gunnarmorling/scenicview-mvp/ > blob/master/core/src/test/java/org/hibernate/scenicview/test/traversal/ > MapTreeTraversalSequence.java > [4] https://github.com/gunnarmorling/scenicview-mvp/ > blob/master/mongodb/src/main/java/org/hibernate/scenicview/ > mongodb/internal/MongoDbDenormalizationBackend.java#L91..L128 > > > > 2017-02-06 16:49 GMT+01:00 Emmanuel Bernard <emman...@hibernate.org>: > > Your prototype is very Hibernate Search tainted. I wonder how or whether > we want it reusable across Hibernate Validator, Search and possibly more. > > > > Have you captured somewhere the discussion about the new document > builder so I could get a better grip of what’s at bay? > > Would this reverse of logic also be embraced by Hibernate Validator? > There are runtime decisions done in HV during traversal that made me doubt > that it would be as pertinent. > > > > > > > >> On 30 Jan 2017, at 11:21, Yoann Rodiere <yrodi...@redhat.com> wrote: > >> > >> Hi, > >> > >> Did the same this week-end, and adapted your work to match the bigger > picture of what we discussed on Friday. > >> Basically the "StructureTraverser" is now called "ValueProcessor", > because it's not responsible for exposing the internals of a structure > anymore, but only to process a structure according to previously defined > metadata, passing the output to the "DocumentContext". I think it's the > second option you suggested. It makes sense in my opinion, since metadata > will be defined differently for different source types (POJO, JSON, ...). > >> This design allows in particular what Sanne suggested: when > bootstrapping, we can build some kind of "walker" (a composition of > "ValueProcessors") from the metadata, and avoid metadata lookup at runtime. > >> > >> The snippet is there: https://gist.github.com/yrodiere/ > 9ff8fe8a8c7f59c1a051b36db20fbd4d <https://gist.github.com/yrodiere/ > 9ff8fe8a8c7f59c1a051b36db20fbd4d> > >> > >> I'm sure it'll have to be refined to address additional constraints, > but in its current state it seems to address all of our requirements. > >> > >> Yoann Rodière <yrodi...@redhat.com <mailto:yrodi...@redhat.com>> > >> Software Engineer > >> Red Hat / Hibernate NoORM Team > >> > >> On 27 January 2017 at 18:23, Emmanuel Bernard <emman...@hibernate.org > <mailto:emman...@hibernate.org>> wrote: > >> I took the flight home to play with free form and specifically how we > would retrieve data from the free form structure. > >> By free-form I mean non POJO but they will have schema (not expressed > here). > >> > >> https://github.com/emmanuelbernard/hibernate-search/commit/ > 0bd3fbab137bdad81bfa5b9934063792a050f537 <https://github.com/ > emmanuelbernard/hibernate-search/commit/0bd3fbab137bdad81bfa5b99340637 > 92a050f537> > >> > >> And in particular > >> https://github.com/emmanuelbernard/hibernate- > search/blob/freeform/freeform/src/main/java/org/hibernate/ > freeform/StructureTraverser.java <https://github.com/ > emmanuelbernard/hibernate-search/blob/freeform/freeform/ > src/main/java/org/hibernate/freeform/StructureTraverser.java> > >> https://github.com/emmanuelbernard/hibernate- > search/blob/freeform/freeform/src/main/java/org/hibernate/ > freeform/pojo/impl/PojoStructureTraverser.java <https://github.com/ > emmanuelbernard/hibernate-search/blob/freeform/freeform/ > src/main/java/org/hibernate/freeform/pojo/impl/PojoStructureTraverser.java > > > >> > >> It probably does not compile, I could not make the build work. > >> > >> I figured it was important to dump this raw thinking because it will > influence and will be influenced by the redesign of the DocumentBuilder of > Hibernate Search. > >> > >> There are several options for traversing a free form structure > >> - expose the traversing API as a holder to navigate all properties > per structure and sub structure. This is what the prototype shows. Caching > needs to be accessed via a hashmap get or other lookup. Metadata and the > traversing structure will be navigated in parallel > >> - expose a structure that is specialized to a single property or > container unwrapping aspect. The structures will be spread across and > embedded in the Metadata > >> > >> > >> Another angle: > >> - create a traversable object per payload to carry it (sharing metadata > info per type) > >> - have a stateless traversable object that is provided the payload for > each access > >> > >> The former seems better as it does not create a traversable object per > object navigated. > >> The latter is better for payloads that need parsing or are better at > sequential access since state could be cached. > >> > >> We need to discuss that and know where DocumentBuilder is going to > properly design this API. > >> > >> Emmanuel > >> _______________________________________________ > >> hibernate-dev mailing list > >> hibernate-dev@lists.jboss.org <mailto:hibernate-dev@lists.jboss.org> > >> https://lists.jboss.org/mailman/listinfo/hibernate-dev < > https://lists.jboss.org/mailman/listinfo/hibernate-dev> > >> > > > > _______________________________________________ > > hibernate-dev mailing list > > hibernate-dev@lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hibernate-dev > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev