[hibernate-dev] HSEARCH-2358 "fields" attribute in Elasticsearch search results is being ignored
Hi, I wanted to start a discussion about this issue. It's about stored field retrieval. When searching, Elasticsearch can return field values two different ways: * through the "_source" attribute [1], which basically provides a copy-paste of the JSON that was submitted when indexing * or through the "fields" attribute [2], which only works for stored fields and provides the actual value that Elasticsearch stored The main difference really boils down to formatting. With the "_source" attribute, there's no formatting involved, you get exactly what was originally submitted. With the "fields" attribute, the value is formatted according to the first format in the mapping's format list [3]. The thing is, Elasticsearch allows admins to set multiple formats for a given field. This won't change the output format, but will allow using any one of these formats when submitting information. Since these "extra" formats probably aren't understood by Hibernate Search, this means that using the "_source" attribute to retrieve field values becomes unreliable as soon as someone else adds/changes documents in Elasticsearch... So we have two solutions: 1. Either we only use the "fields" attribute to retrieve field values, and we force users to have the output format set to something HSearch will understand, but allow extra input formats. 2. or we use the "_source" attribute to retrieve field values, and then we force both output and input format on users, and do not allow extra formats. I'd be in favor of 1, which seems more rational to me. It only has one downside: if we go on with this approach, Calendar values (and ZonedDateTime, ZonedTime, etc.) will have to be stored as String, not as Date, since Elasticsearch doesn't store the timezone, just the UTC timestamp. We're currently working this around by inspecting the "_source", which contains the original timezone (since it's just the raw, originally submitted JSON). What do you think? [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html [2] https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fields.html [3] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-date-format.html#custom-date-formats Yoann Rodière Hibernate NoORM Team ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] HSEARCH-2358 "fields" attribute in Elasticsearch search results is being ignored
Hi Yoann, On Wed, Sep 28, 2016 at 2:56 PM, Yoann Rodiere wrote: > I'd be in favor of 1, which seems more rational to me. It only has one > downside: if we go on with this approach, Calendar values (and > ZonedDateTime, ZonedTime, etc.) will have to be stored as String, not as > Date, since Elasticsearch doesn't store the timezone, just the UTC > timestamp. We're currently working this around by inspecting the "_source", > which contains the original timezone (since it's just the raw, originally > submitted JSON). > > What do you think? > I'm not sure you completely understood the consequences of storing dates as strings. You won't be able to use these sorts of features: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html which are used very often when dealing with dates. I don't think storing dates as strings is a viable alternative. IMHO, the choice is between: - using _source as we currently do it. I'm not sure allowing people to directly inject data into Elasticsearch and bypass Hibernate Search is something we can support in the long run so I think it would be acceptable if we document that we don't expect people to index documents directly (or at least that they should carefully follow the HS indexing format - which looks like an acceptable thing). - using fields and be aware that we will get back UTC values from projections on these fields -- Guillaume ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] HSEARCH-2358 "fields" attribute in Elasticsearch search results is being ignored
On 28 September 2016 at 15:23, Guillaume Smet wrote: > > You won't be able to use these sorts of features: > https://www.elastic.co/guide/en/elasticsearch/reference/curr > ent/search-aggregations-bucket-datehistogram-aggregation.html > https://www.elastic.co/guide/en/elasticsearch/reference/curr > ent/search-aggregations-bucket-daterange-aggregation.html > which are used very often when dealing with dates. > > I don't think storing dates as strings is a viable alternative. > Right. I didn't know about these. > IMHO, the choice is between: > - using _source as we currently do it. I'm not sure allowing people to > directly inject data into Elasticsearch and bypass Hibernate Search is > something we can support in the long run so I think it would be acceptable > if we document that we don't expect people to index documents directly (or > at least that they should carefully follow the HS indexing format - which > looks like an acceptable thing). > - using fields and be aware that we will get back UTC values from > projections on these fields > ... and the latter is a no-go for ZonedDate et al., since the point of those classes is to preserve timezone/offset. Maybe we could just use "_source" when we really need to, but I doubt there's an elegant way to do this, so I guess we'd better not. Anyway, it seems we're down to only one acceptable solution... Unless anyone has another view on all this, I'll index ZonedDate/etc. as dates and use "_source" for value retrieval, and I'll close HSEARCH-2358 as "Won't fix". Thanks for the insight! Yoann Rodière Hibernate NoORM Team ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev