Hi Hardy, great proposal for the meta-data API. I've added some comments inline.
--Gunnar 2013/5/30 Hardy Ferentschik <ha...@hibernate.org> > Gee, that's an email ;-) > Before getting too much into it I think it would be useful to talk about > what I am actually doing. > I am trying to expose a meta data API for Search which allows users to > determine which entities are > indexed and which fields are available for each entity. I am trying to do > a similar approach to > Bean Validation where all metadata is exposed via descriptors. The entry > point into the API is the > SearchFactory. I am basically thinking about something like this (feedback > welcome): > > /** > * Top level descriptor of the metadata API. Giving access to the indexing > information for a single entity. > * > * @author Hardy Ferentschik > */ > public interface IndexedEntityDescriptor { > I find the name "IndexedEntityDescriptor" in conjunction with isIndexed() potentially returning "false" a bit irritating. Maybe just EntityDescriptor? Or SearchableEntityDescriptor? > /** > * @return Returns {@code true} if the entity for this descriptor > is indexed, {@code false} otherwise > */ > boolean isIndexed(); > Maybe return an enum if this can potentially be more than a simple yes/no? I don't know how likely that is, but an enum would allow for evolvement. > /** > * @return Returns the class boost value, 1 being the default. > */ > float getClassBoost(); > > /** > * @return Returns the names of the indexes instances of the > entity are indexed into. Generally this will > * be just one index, however, when sharding is applied > multiple indexes per entity can be used. > */ > Set<String> getIndexNames(); > Would something like Set<IndexDescriptor> getIndexes() make sense? > /** > * @return Returns a set of {@code FieldDescriptor}s for the > indexed fields of the entity. > */ > // TODO does this include the id field descriptor or should that > be a separate descriptor? > At least for my case I think it would be easier if this contained all field descriptors so I can handle them uniformly. Maybe FieldDescriptor#isId() or if there are more id specific things something like this could be added: if ( fieldDescriptor.getType = DescriptorType.ID ) { fieldDescriptor.as( IdDescriptor.class ).somethingIdSpecific(); } > // TODO should OBJECT_CLASS be considered? > Set<FieldDescriptor> getIndexedFields(); > Could you also add FieldDescriptor getIndexedField(String fieldName); > } > > /** > * Metadata related to a single indexed field. > * > * @author Hardy Ferentschik > */ > public interface FieldDescriptor { > /** > * Returns the Lucene {@code Document} field name for this indexed > property. > * > * @return Returns the field name for this index property > */ > String getFieldName(); > I'd call it just "getName()", not repeating the type's name. > > /** > * @return Returns an {@code Analyze} enum instance defining the > type of analyzing applied to > * this field. > */ > Analyze getAnalyzeType(); > > /** > * @return Returns a {@code Store} enum instance defining whether > the index value is stored in the index itself. > */ > Store getStoreType(); > > /** > * @return Returns a {@code TermVector} enum instance defining > whether and how term vectors are stored for this > * field > */ > TermVector getTermVectorType(); > > > /** > * @return Returns a {@code Norms} enum instance defining whether > and how norms are stored for this > * field > */ > Norms getNormType(); > > /** > * @return Returns the boost value for this field. 1 being the > default value. > */ > float getBoost(); > > /** > * @return Returns the string used to index {@code null} values. > {@code null} in case null values are not indexed. > */ > String nullIndexedAs(); > > /** > * @return Returns the field bridge instance used to convert the > property value into a string based field value > */ > FieldBridge getFieldBridge(); > > /** > * @return Returns the analyzer used for this field, {@code null} > if the field is not analyzed > */ > Analyzer getAnalyzer(); > } > > On top of this I am planning to add (addressing HSEARCH-903): > > public interface FieldNameReportingBridge { > Iterable<String> getGeneratedFieldNames(String baseFieldName); > } > Not better a Set? Returning Iterable makes it harder for users (e.g. no contains()) and also hides set vs. list semantics. > The latter I need to allow custom bridges to report which fields they add. > Most of the information I need to implement all this is in > AbstractDocumentBuilder.PropertiesMetadata. The plan so far > was to extract the information from there and while working in this making > PropertiesMetadata a proper object (instead of the > parallel arrays thingy). +1 > Maybe some other minor refactorings along the way. I was not going to > touch the processing of annotations > for now. As discussed that, there we would need yet another level of > abstraction (similar to EntitySource in ORM or BeanConfiguration > in HV). Something which can be populated by either annotation processing > (be it Jandex or reflection) or by the the programmatic API. > Different story though. > > For what I can tell I don't need a Visitor pattern for what I have planned > to do so far. If you think I am on the wrong track let me know > and let me see the light. > > One thing I was wondering about after your email, however, was whether the > API needs to provide information which field/getter/class > is responsible for creating a given Lucene Document Field. Do we have a > use case for that? > > > > On 29 Jan 2013, at 6:39 PM, Sanne Grinovero <sa...@hibernate.org> wrote: > > > We're starting a series of refactorings in Hibernate Search to improve > > how we handle the entity mapping to the index; to summarize goals: > > > > 1# Expose the Metadata as API > > > > We need to expose it because: > > a - OGM needs to be able to read this metadata to produce appropriate > queries > > @gunnar, does the API above address your needs? > Yes, from what I'm aware of atm. I think so. > > > Personally I think we end up needing this just as an SPI: that might > > be good for cases {a,b}, and I have an alternative proposal for {c} > > described below. > > -1 why SPI. I think this is a very general purpose API useful for any > users. > For example, you could image to build auto field suggesting query field > which > makes suggestions on which fields you can search on (a little like the > Jira queries). > In this case you could get the available fields via this API. Just to > mention one use case. > > > However we expose it, I think we agree this should be a read-only > > structure built as a second phase after the model is consumed from > > (annotations / programmatic API / jandex / auto-generated by OGM). > > +1 > > > It > > would also be good to keep it "minimal" in terms of memory cost, so to > > either: > > - drop references to the source structure > > - not holding on it at all, building the Metadata on demand (!) > > (Assuming we can build it from a more obscure internal representation > > I'll describe next). > > Given that I am going to build it from required runtime information it > could for sure > be lazily loaded. However, right now I think I will just go for the > straight forward approach. > > > 3# MutableSearchFactory > > > > Let's not forget we also have a MutableSearchFactory to maintain: new > > entities could be added at any time so if we drop the original > > metadata we need to be able to build a new (read-only) one from the > > current state. > > Good point > > > Things we wanted but where too hard to do so far: > > - Separate annotation reading from Document building. Separate > > validity checks too. > > +1 See above. I want to address this in another issue. We will need > another intermediate > model for that. With this in place we can remove commons-annotaiotns and > easily > consume a Jandex index as well > > > - It checks for JPA @Id using reflection as it might not be available > > -> pluggable? > > Not sure what you mean here. That's just a very specific JPA/ORM based use > case. > > > - LuceneOptionsImpl are built at runtime each time we need one -> > > reuse them, coupling them to their field > > +1 > > > - We need a reliable way to track which field names are created, and > > from which bridge they are originating (including custom bridges: > > HSEARCH-904) > > See above and the FieldNameReportingBridge I am suggesting > > > == Solution ? == > > > > Now let's assume that we can build this as a recursive structure which > > accepts a generic visitor. … > > that's where you loose me. I think I am a little like Emmanuel here. Where > does a > Visitor pattern help here? > > --Hardy > > > _______________________________________________ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev