I think anything handled by not-Lucene is wrong. As Lucene, in this case, is the only engine that can properly address all query details - order, filtering, ...
So, imo, the only way to do this is by having multiple documents, one per cartesian product element. How to go from there, is a big TODO. :-) -Ales On Jan 3, 2013, at 3:17 PM, Emmanuel Bernard <emman...@hibernate.org> wrote: > I don't think it's as simple as you imply. If we go that route the > engine atop Hibernate Search would need to know that a given field is > multivaluable (it could be a serialized List otherwise) and look into > each entry in the Object[] to "cartesianize it". > > It's doable but it seems that it would be easier if Hibernate Search > does the work. But I don't see that as being the default value. > Depending on the situation you want: > > - an array entry with a List in your Object[] representing a row > - or you want n entries in your List<Object[]> with duplicated values in > the Object[] except for the multivalued element > > Is the current behavior good for anything? We certainly did not design > it with multivalue fields stored in mind. > > So we might want to allow for setting a given value globally with ways > to override that per association. > > Thoughts? > > Emmanuel > > On Thu 2013-01-03 14:42, Sanne Grinovero wrote: >> Hi Marko, >> this is expected by our typical users, as you only have multiple field >> values on tokenized fields, and you won't project these; occasionally >> someone uses the _addFieldToDocument_ multiple times to give the >> illusion of merging multiple strings to be tokenized in the same >> stream, or occasionally even if you are applying an analyzer to the >> field you just know for sure the output element is single, so we don't >> enforce it. >> >> Projection on the other hand can't be applied on all fields, it really >> is expected on Stored fields only - and typically one stored the field >> only once. >> >> We can discuss how to improve this for your use case but I'd like to >> better understand what you're needing: >> I don't think you would need to change EntityInfo to List<EntityInfo> >> : it still represents a *single* Document which matched your search >> criteria, it looks like what you need is that one of the projected >> fields is actually a multivalued element; but this would still be an >> element of the same and only EntityInfo. >> >> This implies that, since the return type of a projection is Object[], >> there is no need to break any API to implement such a feature: one of >> those Object elements could be a Set or an array. >> >> Also consider there is no way to recover the multiple value in the >> same order; it might seem order is maintained at a first glance but >> during index reorganization (merges, optimisations) this is not >> guaranteed; I'd think carefully before relying on multi-valued field >> encodings as you're entering an out-of-scope usage, but if all you >> need is return multiple strings that should be doable. >> >> Sanne >> >> On 3 January 2013 14:11, Marko Lukša <marko.lu...@gmail.com> wrote: >>> Hi, >>> >>> we've found the following problem regarding projection queries when >>> dealing with documents containing multiple fields with the same name. >>> >>> Let's say we add field "foo" with two different values to the same document: >>> >>> luceneOptions.addFieldToDocument("foo", "aaa", document); >>> luceneOptions.addFieldToDocument("foo", "bbb", document); >>> >>> If we now do a projection query on field "foo", one would expect the >>> resultset to contain exactly two results ({"aaa"} and {"bbb"}), but >>> HSearch returns only a single result (the property value of the result >>> is either "aaa" or "bbb", because Document.getFieldable("foo"), which is >>> called in o.h.search.engine.impl.DocumentBuilderHelper, returns the >>> first field that matches the given name). >>> >>> DocumentExtractor.extract() returns a single EntityInfo, but in order >>> for it to properly handle projections as described in the previous >>> paragraph, it should really be modified to return List<EntityInfo>. >>> >>> This sounds pretty reasonable when the query is projecting only a single >>> field. When projecting multiple multi-valued fields, the resultset >>> should actually return a cartesian product. >>> >>> This is one way of doing it. The other way of doing it is if we >>> consider multiple fields with the same name as a single multi-valued >>> field. When projecting such fields, the resultset would contain the same >>> number of results as there are matching documents, with the projected >>> value being a collection of all the values stored in the field. >>> >>> Actually, in CapeDwarf we need the cartesian product, as this is the way >>> Google AppEngine does it. >>> >>> What do you guys think? >>> >>> Marko >>> _______________________________________________ >>> hibernate-dev mailing list >>> hibernate-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> _______________________________________________ >> hibernate-dev mailing list >> hibernate-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hibernate-dev > _______________________________________________ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev