> With these new types,backwards compatibility is a non-issue. So unless > someone makes a strong case for needing these as String in the index, > what about we drop some complexity?
ElasticSearch uses Strings for transferring dates in JSON structures (see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html). So for that backend we'll need String-mapping field bridges (and we'd even have to ignore/override/flag as error the user's setting for numeric mapping). 2015-08-10 12:37 GMT+02:00 Sanne Grinovero <sa...@hibernate.org>: > On 10 August 2015 at 11:04, Hardy Ferentschik <ha...@hibernate.org> wrote: >> Hi, >> >> sorry, I am late to the game, but I here are some more thoughts on this. >> >> I think the consensus so far is that >> >> # Date/time types which represent an instant in time are treated as usual. >> They can be string encoded (per default yyyyMMddHHmmssSSS) or numerically >> in which case the numeric long value equals the epoch time of the >> represented >> date. > > Correct that's the consensus so far. I'd like to challenge one more > detail though: > does it still make sense to allow string-encoded? > > I think not, we did allow it primarily because a long time ago that > was the only way, then it became one of the options -but still the > default - and more recently it became the non-default way. > > With these new types,backwards compatibility is a non-issue. So unless > someone makes a strong case for needing these as String in the index, > what about we drop some complexity? > > Remember: > - Hibernate Search is not an Objects/index mapper so we're not aiming > at creating any index schema possible, we're aiming at taking > advantage of the index for practical purposes ("I want it to be a > string in the index" is not a valid argument - use your own > fieldbridge in case) > - With Projections we have to re-transform things back into their > Java original type, so how we encode things in the index is irrelevant > from a semantics point of view; I think the only valid challenge would > need to come from a performance or storage space perspective, in both > cases I'm pretty sure the numeric encoding would win. > >> # Date/time types which do not represent an instant in time can also be >> encoded as string or number, but in the latter case the numeric >> representation >> is given by interpreting the string representation as number. >> >> So far so good. There are a couple of more things to think about. >> >> # Query time gets interesting and I think we need to improve the DSL in >> unison >> with adding support for these new types. Check out this example from >> DSLTest [1] >> >> query = monthQb >> .range() >> .onField( "estimatedCreation" ) >> .ignoreFieldBridge() >> .andField( "justfortest" ) >> >> .ignoreFieldBridge().ignoreAnalyzer() >> .from( DateTools.round( from, >> DateTools.Resolution.MINUTE ) ) >> .to( DateTools.round( to, >> DateTools.Resolution.MINUTE ) ) >> .excludeLimit() >> .createQuery(); >> >> If a date is numerically encoded you need to specify numbers for the from >> and to values. ATM, >> we recommend to use the Lucene specific DateTools to get the numeric >> representation. With the support >> ofthe new date types things will get confusing for the user. How does one >> "create" the numeric representation >> of a LocalDate (and how does one know how it looks like in the first place >> and how it differs from the epoch time)? > > Great point, we should accept the user's domain type exclusively and > take the conversion burden from the user; especially since we know the > correct conversion strategy. > >> We have been discussing before whether Hibernate Search needs to offer its >> own version of DateTools. >> I think it would be time to do so and include helpers for the new date/time >> types. This also reduces the exposure >> to Lucene specific types. > > +1 to encapsulate it, but I don't expect people to need it at all in > the above case? But good for other more advanced needs. > >> >> Even better though would be, if we would be able to support directly the use >> of date types in the from and to clauses. >> It would be the responsibility of the DSL to round the specified types to >> the appropriate level based on the field's >> configuration/metadata. Even in this scenario though a Search specific >> DateTools might be necessary for the cases >> where the date specified in to/from needs to be rounded differently than the >> field itself. > > +1 > >> Last but not least, the documentation needs to be updated. At the moment, >> the docs are silent about all the complexity >> around dates. With the support of the new types, the docs needs to be more >> explicit and describe the subtleties at play. > > +1 created HSEARCH-1958 > > Thanks, > Sanne > > >> >> --Hardy >> >> >> On Wed, Aug 05, 2015 at 05:40:16PM +0100, Sanne Grinovero wrote: >>> On 5 August 2015 at 17:22, Davide D'Alto <dav...@hibernate.org> wrote: >>> >> Proposal: use numeric but still - rather than taking the milliseconds >>> >> from epoch, take the resulting number from YYYYMMDD ? >>> > >>> > I don't think I understand what you mean with "the resulting number from >>> > YYYYMMDD". >>> > Wouldn't be similar to get the number of days from epoch? >>> >>> No because epoch is a specific moment *with a timezone*. If you take a >>> calendar date "here", and take the moment in time which represents >>> your beginning of the calendar date, the distance from epoch is not a >>> whole number and you'd have to apply rounding which is timezone >>> specific. >>> >>> By simply encoding the number in the above format, you'd encode today >>> as the number "20150805". >>> That's a whole number which avoids the timezone relativity and can be >>> efficiently encoded in numeric form, and provides the expected sorting >>> properties. >>> >>> > >>> > But basically, you are saying that I can use different numeric encoding >>> > for >>> > different types. Isn't it? >>> >>> Yes, you definitely need different encodings depending on the type and >>> the used options. >>> >>> > So, for example: >>> > >>> > java.util.Date, java.util.Calendar and java.time.Instant, >>> > java.time.LocalDateTime will use number of miliseconds from epoch >>> > java.time.LocalDate: number of days from epoch >>> >>> Except this one ^ I agree with the others. >>> >>> > java.time.LocalTime: number of nanos in a day >>> >>> Conceptually, yes.. but we don't have "nanoseconds" as an option of >>> org.hibernate.search.annotations.Resolution. Should we add it? >>> We would not be able to apply that Resolution on old fashioned >>> Date/Calendar, so that would need a warning or even an exception when >>> applied to old style value types. >>> >>> >> Ok that works but why write all those zeros in the index, when you can >>> >> just write the date. I realize storage is cheap, but still we need to >>> >> be careful as the index size affects performance ;-) >>> > >>> > I don't think we need to store the 0s. >>> > If I know the type of the field I already know the the time is 0. >>> >>> Exactly >>> >>> > Am I missing something? >>> >>> I probably just misunderstood your proposal, since previously you >>> mentioned: "I would just consider a LocalDate the same as a >>> LocalDateTime with time 00:00:000 (UTC time zone)". >>> If you have to write the days only you don't need to convert to a time >>> first. >>> This misunderstanding might be related with the fact that you were >>> planning to encode as distance from epoch.. see my first comment on >>> this same email. >>> Since you don't want to look at distance from epoch for this case, the >>> time component really is irrelevant and LocalDate has all the >>> information you need.. simpler ;) >>> >>> Sanne >>> >>> >>> > >>> > >>> > On Wed, Aug 5, 2015 at 5:00 PM, Sanne Grinovero <sa...@hibernate.org> >>> > wrote: >>> > >>> >> On 5 August 2015 at 16:27, Gunnar Morling <gun...@hibernate.org> wrote: >>> >> >> as I'd like us to consider not >>> >> > applying DateBridge on the new types as it doesn't seem to add much >>> >> > practical value. >>> >> > >>> >> > Ok, that may make sense for types such as LocalDate. But there are >>> >> > types >>> >> in >>> >> > the new API which - unlike LocalDate - do describe an exact instant on >>> >> the >>> >> > time line (e.g. ZonedDateTime, Instant). For those IMO it makes sense >>> >> > for >>> >> > sure to support both encodings, NUMERIC and STRING (similar to >>> >> Date/Calendar >>> >> > so far) and thus apply @DateBridge. >>> >> >>> >> +1 >>> >> >>> >> > Question is whether/how to index/persist TZ information, for Calendar >>> >> > it >>> >> > seems not been persisted in the index so far? >>> >> >>> >> It's encoding the Calendar's time as distance from epoch, which is a >>> >> neutral encoding so you don't need the TZ. >>> >> >>> >> For the old style Date/Calendar types we always assumed the value was >>> >> a point-in-time, unless explicitly opting in for an alternative >>> >> encoding. >>> >> For example for the "birthday use case" a reasonable setting would >>> >> have been String encoding with resolution=DAY, although passing in a >>> >> Date instance having the right value (as in right timezone) would have >>> >> been user's responsibility.. we simply take the long it's storing and >>> >> index that with the requested resolution. >>> >> >>> >> Sanne >>> >> >>> >> > >>> >> > >>> >> > 2015-08-05 17:10 GMT+02:00 Sanne Grinovero <sa...@hibernate.org>: >>> >> >> >>> >> >> Inline: >>> >> >> >>> >> >> On 5 August 2015 at 15:42, Davide D'Alto <dav...@hibernate.org> wrote: >>> >> >> > If a user select a resolution that does not make much sense we can >>> >> log a >>> >> >> > warning. >>> >> >> >>> >> >> +1 And update the javadoc to mention that some resolution values don't >>> >> >> apply >>> >> >> >>> >> >> > But I think this might make sense: >>> >> >> > >>> >> >> > @DateBridge(resolution=MONTH) >>> >> >> > LocalDate birthday; >>> >> >> >>> >> >> Ok but how often do you think that will be used? >>> >> >> Sorry playing devil's advocate here, as I'd like us to consider not >>> >> >> applying DateBridge on the new types as it doesn't seem to add much >>> >> >> practical value. >>> >> >> >>> >> >> I agree it's worth a shot, but while going ahead keep in mind that >>> >> >> maybe simplifying that is the more elegant solution. >>> >> >> >>> >> >> > On Wed, Aug 5, 2015 at 3:37 PM, Davide D'Alto <dav...@hibernate.org> >>> >> >> > wrote: >>> >> >> > >>> >> >> >> > What would you do though in case of the following: >>> >> >> >> > >>> >> >> >> > @DateBridge >>> >> >> >> > LocalDate myDate; >>> >> >> >> > >>> >> >> >> > encoding() defaults to NUMERIC, so would you a) raise an error, >>> >> >> >> > or >>> >> b) >>> >> >> >> ignore encoding() for LocalDate and friends? Both seem not right to >>> >> me. >>> >> >> >> I >>> >> >> >> think there is nothing wrong with using NUMERIC encoding per-se for >>> >> >> >> these >>> >> >> >> types. We may recommend STRING but if NUMERIC really is what a user >>> >> >> >> wants I >>> >> >> >> would let them do so. >>> >> >> >>> >> >> I'm all for letting the users have the last word, but this is one of >>> >> >> those cases in which you don't know if they explicitly want that or >>> >> >> simply went with the defaults. >>> >> >> >>> >> >> Not a big problem as of course the important thing of defaults is that >>> >> >> "they work" but I'd really prefer the default to try be the most >>> >> >> appropriate encoding, which is not numeric in this case. >>> >> >> >>> >> >> Proposal: use numeric but still - rather than taking the milliseconds >>> >> >> from epoch, take the resulting number from YYYYMMDD ? It might even be >>> >> >> the most efficient encoding, as you don't have the drawback of >>> >> >> clustering which we would have with a numeric encoding working on the >>> >> >> individual fields, and doesn't have the bloat of string encoding. >>> >> >> >>> >> >> >> >>> >> >> >> +1 >>> >> >> >> >>> >> >> >> > What do you suggest we do if a user maps the following? >>> >> >> >> >>> >> >> >> > @DateBridge(resolution=MILLISECOND) >>> >> >> >> > LocalDate birthday; >>> >> >> >> >>> >> >> >> >>> >> >> >> Nothing really, >>> >> >> >> I would just consider a LocalDate the same as a LocalDateTime with >>> >> time >>> >> >> >> 00:00:000 (UTC time zone) >>> >> >> >>> >> >> Ok that works but why write all those zeros in the index, when you can >>> >> >> just write the date. I realize storage is cheap, but still we need to >>> >> >> be careful as the index size affects performance ;-) >>> >> >> >>> >> >> Sanne >>> >> >> >>> >> >> >> >>> >> >> >> It is equivalent to: >>> >> >> >> LocalDateTime dateTime = date.atStartOfDay( ZoneOffset.UTC ); >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> On Wed, Aug 5, 2015 at 3:24 PM, Gunnar Morling >>> >> >> >> <gun...@hibernate.org >>> >> > >>> >> >> >> wrote: >>> >> >> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> 2015-08-05 12:41 GMT+02:00 Sanne Grinovero <sa...@hibernate.org>: >>> >> >> >>> >>> >> >> >>>> Our current implementation converts Date in the long "distance >>> >> >> >>>> from >>> >> >> >>>> epoch" to allow correct range-queries treating each Date as an >>> >> >> >>>> instant >>> >> >> >>>> in time - allowing a universal sorting strategy. But a LocalDate >>> >> >> >>>> is >>> >> >> >>>> not an instant-in-time. >>> >> >> >>>> >>> >> >> >>>> A LocalDate is intentionally oblivious of the timezone; as the >>> >> >> >>>> javadoc >>> >> >> >>>> states, it's useful for birthdays, i.e. symbolic occurrences and >>> >> >> >>>> potentially legal matters which don't fit into a universal >>> >> >> >>>> sorting >>> >> >> >>>> model but rather with the local political scene - we would need >>> >> >> >>>> the >>> >> >> >>>> combo {LocalDate, ZoneId} provided to be able to allow sorting >>> >> across >>> >> >> >>>> different LocalDate - or simply assume that they are all >>> >> >> >>>> referring >>> >> to >>> >> >> >>>> the same Zone. >>> >> >> >>>> >>> >> >> >>> >>> >> >> >>> Right, I had the latter in mind and would use UTC for that >>> >> >> >>> purpose. >>> >> >> >>> >>> >> >> >>>> >>> >> >> >>>> I think that if the user is using a LocalDate type, he's >>> >> >> >>>> implicitly >>> >> >> >>>> hinting that the timezone is not relevant for the practical use >>> >> >> >>>> (possibly even wrong); the most faithful representation would be >>> >> the >>> >> >> >>>> string form in ISO standard format or to encode the >>> >> >> >>>> day,month,year >>> >> as >>> >> >> >>>> independent fields? This last detail depends on how it would be >>> >> more >>> >> >> >>>> efficient to store & query; probably the String format YYYYMMDD >>> >> would >>> >> >> >>>> be the most efficient internal representation to allow also >>> >> >> >>>> correct >>> >> >> >>>> sorting. >>> >> >> >>>> >>> >> >> >>>> I wouldn't use NumericField(s) in this case, as they are more >>> >> >> >>>> effective only with larger ranges, while MM and DD are very >>> >> >> >>>> short; >>> >> >> >>>> not >>> >> >> >>>> sure if it's worth splitting the year as a NumericField either, >>> >> >> >>>> as >>> >> >> >>>> the >>> >> >> >>>> values will likely be strongly clustered in the same range of >>> >> "recent >>> >> >> >>>> years" - although that might depend on the application but it >>> >> doesn't >>> >> >> >>>> seem worth the complexity, so I'd index & store as a String >>> >> YYYYMMDD. >>> >> >> >>>> >>> >> >> >>> >>> >> >> >>> Agreed that this makes most sense, given the "symbolic" nature of >>> >> >> >>> LocalDate. >>> >> >> >>> >>> >> >> >>> What would you do though in case of the following: >>> >> >> >>> >>> >> >> >>> @DateBridge >>> >> >> >>> LocalDate myDate; >>> >> >> >>> >>> >> >> >>> encoding() defaults to NUMERIC, so would you a) raise an error, or >>> >> b) >>> >> >> >>> ignore encoding() for LocalDate and friends? Both seem not right >>> >> >> >>> to >>> >> >> >>> me. I >>> >> >> >>> think there is nothing wrong with using NUMERIC encoding per-se >>> >> >> >>> for >>> >> >> >>> these >>> >> >> >>> types. We may recommend STRING but if NUMERIC really is what a >>> >> >> >>> user >>> >> >> >>> wants I >>> >> >> >>> would let them do so. >>> >> >> >>> >>> >> >> >>>> >>> >> >> >>>> -- Sanne >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> On 5 August 2015 at 11:10, Gunnar Morling <gun...@hibernate.org> >>> >> >> >>>> wrote: >>> >> >> >>>> > Hi, >>> >> >> >>>> > >>> >> >> >>>> > What's the motivation for using a different representation in >>> >> that >>> >> >> >>>> case? >>> >> >> >>>> > >>> >> >> >>>> > For the sake of consistency, I'd use milli seconds since >>> >> 1970-01-01 >>> >> >> >>>> across >>> >> >> >>>> > the board. Otherwise it'll be more difficult to compare fields >>> >> >> >>>> > created >>> >> >> >>>> from >>> >> >> >>>> > properties of different date types. >>> >> >> >>>> > >>> >> >> >>>> > --Gunnar >>> >> >> >>>> > >>> >> >> >>>> > >>> >> >> >>>> > 2015-08-04 19:49 GMT+02:00 Davide D'Alto >>> >> >> >>>> > <dav...@hibernate.org>: >>> >> >> >>>> > >>> >> >> >>>> >> Hi, >>> >> >> >>>> >> I started to work on the creation of the bridges for the >>> >> >> >>>> >> classes >>> >> >> >>>> >> in >>> >> >> >>>> the >>> >> >> >>>> >> java.time package. >>> >> >> >>>> >> >>> >> >> >>>> >> I was wondering if we want to convert the values to long using >>> >> the >>> >> >> >>>> existing >>> >> >> >>>> >> approach we have now for java.util.Date. >>> >> >> >>>> >> >>> >> >> >>>> >> In Hibernate Search a java.util.Date is converted into a long >>> >> that >>> >> >> >>>> >> represents the number of milliseconds since January 1, 1970, >>> >> >> >>>> >> 00:00:00 >>> >> >> >>>> GMT >>> >> >> >>>> >> using getTime(). >>> >> >> >>>> >> >>> >> >> >>>> >> The same value can be obtain from a java.time.LocaDate via: >>> >> >> >>>> >> >>> >> >> >>>> >> long epochMilli = date.atStartOfDay( ZoneOffset.UTC >>> >> >> >>>> >> ).toInstant().toEpochMilli(); >>> >> >> >>>> >> >>> >> >> >>>> >> LocalDate has a method that returns the same value expressed >>> >> >> >>>> >> in >>> >> >> >>>> number of >>> >> >> >>>> >> days: >>> >> >> >>>> >> >>> >> >> >>>> >> long epochDay = date.toEpochDay(); >>> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> >>>> >> I would use the second approach >>> >> >> >>>> >> >>> >> >> >>>> >> Davide >>> >> >> >>>> >> _______________________________________________ >>> >> >> >>>> >> hibernate-dev mailing list >>> >> >> >>>> >> hibernate-dev@lists.jboss.org >>> >> >> >>>> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> >> >>>> >> >>> >> >> >>>> > _______________________________________________ >>> >> >> >>>> > hibernate-dev mailing list >>> >> >> >>>> > hibernate-dev@lists.jboss.org >>> >> >> >>>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> >> >>>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >> >>> >> >> > _______________________________________________ >>> >> >> > hibernate-dev mailing list >>> >> >> > hibernate-dev@lists.jboss.org >>> >> >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> >> _______________________________________________ >>> >> >> hibernate-dev mailing list >>> >> >> hibernate-dev@lists.jboss.org >>> >> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> > >>> >> > >>> >> _______________________________________________ >>> >> hibernate-dev mailing list >>> >> hibernate-dev@lists.jboss.org >>> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> >>> > _______________________________________________ >>> > hibernate-dev mailing list >>> > hibernate-dev@lists.jboss.org >>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> _______________________________________________ >>> hibernate-dev mailing list >>> hibernate-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev > _______________________________________________ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev