Thanks Mark for this feedback. That's a very long multi-proposal email :) We probably should try and split it into several elements. Let me continue in line.
On Thu 2012-11-22 20:10, Marc Schipperheyn wrote: > Network > @OneToMany > @IndexedEmbedded(includePaths={"id"}) > List<User> users; > > When a new User is added to the Network, all the existing Users have to be > read from the database to recreate the Lucene document. > > Another headache example is when a stored property that is used for > selection purposes changes > > LinkedInGroup > @Field(index=Index.YES) > boolean hidden; > > @OneToMany > @ContainedIn > List<LinkedInGroupPost> posts; > @OneToMany > @IndexedEmbedded(includePaths={"id"}) > Set<Tag> tags > LinkedInGroupPost > > @ManyToOne > @IndexedEmbedded(includePaths={"id","hidden"}) > Group group; > Assuming there can be hundreds of thousands of Posts, a change of hidden to > true would trigger a read of all those records. One approach that might work much better in this case is to use filters rather than indexing hidden and using it in the query as restriction. I imagine hidden is not selective enough which does not make for the best use of an inverted index. > * In the Network example, the includedPaths only contains the id. Looking > at my own work, I often find that IndexedEmbedded references just stores > the id and I believe we should think about optimizing this use case. In > that case an optimized read from the database could be executed that just > reads that value in stead of initializing the entire entity. I forgot what we said around includePaths and class level bridges but that looks like a good idea. We might be able to look at the paths and check if any of them contains an association. If not, we could use a projection to query the meaningful data. That's not at all how Hibernate Search works today so I imagine that could be a significant work but this does not look impossible. Can you open a JIRA issue for this. > > This kind of "projection read" could be an optional setting even when > includePaths contains non identifier values, assuming the developer knows > which limitations this might entail (e.g. no FieldBridges, no Hibernate Why do you say no FieldBridge? > * Lucene Document update support is at an alpha stage right now > LUCENE-3837. This effort could be supported by the Hibernate team or > implemented at the earliest viable moment. We are keeping an eye on it. Lucene 4 is a major departure from Lucene 3 so the conversion won't be easy and worse won't be fully transparent for Hibernate Search users unfortunately. > * A kind of JoinFilter is conceivable where the join filter would be able > to exclude results based on selection results from another index. > E.g. one queries the LinkedInGroupPost but the JoinFilter reads > group.idreferences from the Group index (just reading the ones needed > and storing > them during the read) and excludes LinkedInGroupPosts based on the value of > "hidden". I wonder if this approach could patterned or documented. I am pretty sure fitlers is what you are looking at http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#query-filter > > * The documentation could contain some suggestions for dealing with the > issue of cascading initialization and how to deal with this in a smart way. Sure, let's identify what we consider smart and update the doc. Can you create a JIRA issue for that? > * In the tests I have done, saving a LinkedInPostGroup where the > indexedEmbedded attributes (id,hidden) are *not* dirty, all the posts are > reinitialized anyway. The reason for this is that with a Set<Tag> the set > elements are deleted and reinserted on each save even when they haven't > changed. Hum, I believe it's true for the bag semantic but I'm surprised it's true for Set. Besides, from what you are saying, you don't add nor remove elements from the Set, you just change some non id value. > It looks like Hibernate Search is not optimized to deal with this > "semi-dirty" situation (Hibernate ORM treats a field as dirty when it > really isn't). Nothing really changed in the relevant values for the > document but because Hibernate needs to reinsert the set, it thinks so. I > wonder if this use case can or should be optimized. If not, documentation > should warn against using Sets. Can you create a minimal test case and open a JIRA / pull request, this needs to be investigated. > > * When a document is recreated because one attribute is changed leading to > all sorts of cascading database reads I often wonder: why? The reason is > that the Index segments cannot e recreated for the indexed attributes. So > we need to read them again. But what if those attributes are actually > Stored in the original document and not dirty? Why not just read these > values straight from the document with a single read in stead of executing > a slew of database reads? That might be true in some situations but FieldBridges are not guaranteed to be non destructive in their stored data. So we cannot generalize that necessarily. We could explore this idea in a prototype. Again can you open a JIRA issue? Emmanuel _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev