A quick update on some more exploration on this: it turns out sorting on a NumericField when this field is also using an "indexNullAs" token gets the UninvertingReader approach to throw an exception. My two conclusions: - we need to move away from supporting those tokens in NumericField, especially as stricter schema is coming in next gen Lucene - yet another reason to clearly separate fields meant for searching vs sorting
-- Sanne On 5 August 2015 at 10:41, Gunnar Morling <gun...@hibernate.org> wrote: > Hi, > > I think a great solution for that would be to leverage "annotation > composition" as done e.g. in CDI and Bean Validation. > > There would be an annotation @DocValuesField which will cause the creation > of a DocValues and which would expose attributes required for its > configuration: > > @DocValuesField(name="foo", ... ) > private String bar; > > Besides using @DocValuesField itself directly, it would be usable as a > meta-annotation on "doc value annotation types": > > @DocValuesField > public @interface SortField { > > @OverridesAttribute(name="name") > String name(); > } > > And then its usage: > > @SortField > private String bar; > > > Of course such doc value annotation types (I think @SortField, @Facet and > @DocumentId could be modelled as that) would only expose those degrees of > freedom needed for a specific use case. > > Most users would only use these more abstract, easy-to-use specific purpose > annotations. But others could use @DocValuesField directly to create custom > doc values or they could even create their own, domain-specific doc value > annotation type. > > Cheers, > > --Gunnar > > > > > 2015-08-04 18:00 GMT+02:00 Sanne Grinovero <sa...@hibernate.org>: >> >> Hi Guillaume, >> thanks! great input. Some comments inline: >> >> On 4 August 2015 at 15:11, Guillaume Smet <guillaume.s...@gmail.com> >> wrote: >> > Hi Sanne, >> > >> > On Wed, Jul 29, 2015 at 1:26 PM, Sanne Grinovero <sa...@hibernate.org> >> > wrote: >> >> >> >> I'm not sure if this should be extending the @Field annotation as >> >> there are special restrictions implied in terms of analysis: are we >> >> going to enforce a specific type of tokenizer, or simply take the >> >> analysis option away? >> > >> > >> > You can't remove the analysis option away: it's often used to normalize >> > sorting on strings (lowercase, remove accents, remove special characters >> > and >> > so on). >> >> Right we made this same example in a recent meeting we had on this same >> subject. >> So that's what makes it tricky: we want to allow Analysis, but while >> Lucene needs a strong guarantee that it will be unique, we can't >> really verify for that unless we take away the liberty to use any >> analyzer. >> An alternative would be to wrap the Analyzer to monitor and verify it >> to be "well-behaved" but I'm not sure if that's doable, or if the >> performance would be negligible. I guess we'll just put it into user's >> hands to make a sensible choice.. not that we've done better so far on >> this aspect. >> >> > FWIW, we use specific fields for sorting each time we need to sort on a >> > string as we don't want to tokenize the string (but not for numerics and >> > dates). Maybe @SortFields/@SortField annotations would be in order (I >> > don't >> > like Sortable as I don't think it's a good idea to use these fields for >> > search). >> >> I like that name proposal, and +1 to not encourage people to try reuse >> the same field for sorting and indexing. >> >> The next action for us is to verify what the performance impact is of >> the current approach, which is based on the UninvertingReader from >> lucene-misc. Gunnar pointed out that uninverting and loading into a >> FieldCache is not very different than what Lucene has been doing so >> far, so that might be a good strategy to allow migrating to Lucene 5 >> incrementally, and provide an incremental improvement in this area >> rather than requiring the new mapping. >> >> I'll soon merge this approach, and as usual I'm lacking on real-world >> applications to benchmark so if you're interested in helping on that >> that would be awesome; we just need to know that the new code won't be >> significantly slower than the Lucene 4 based strategies for sorted >> queries. >> >> Thanks, >> Sanne >> _______________________________________________ >> hibernate-dev mailing list >> hibernate-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hibernate-dev > > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev