Hi Hardy, could you have a look at the following two commits, while I work on a test as you suggested. (documentation will follow depending on which one you like best).
In this case I just add the missing method, and I don't think it's bad, actually the name is fine and while I'm sure you might have some ideas on the javadoc, I think it's relatively clear what it all means: https://github.com/Sanne/hibernate-search/commit/9a1a542e551784565e6536c2ea56f1a9cb29e535 In the following commit, which requires the previous one, I'm introducing a new interface AdvancedShardIdentifierProvider. For as much as I don't like having too many SPIs, I think we agree that this one is addressing power users only. No configuration changes are needed, the documented example stays fine as it is: the user just has the choice of optionally implementing the more advanced interface, we'll pick it up from there with a simple "intanceof". https://github.com/Sanne/hibernate-search/commit/0f22a594075ae7364d8daf7218a4bb8656ad6aca I'll work on a test, and docs update as soon as you can let me know which approach you prefer. Addressing some of your comments below: On 7 October 2013 18:26, Hardy Ferentschik <ha...@hibernate.org> wrote: > > On 7 Jan 2013, at 5:03 PM, Sanne Grinovero <sa...@hibernate.org> wrote: > >> I've tried hard to find an agreement on this, but it seems we're >> wasting time without making progress. >> I'm not happy in ignoring a strong recommendation from any of you, >> very hard choice :-( > > In the end it is your call. I tried to give arguments for my position, but we > seem > to have general disagreement on how to develop/evolve an interface. > > If you want to have a specific method for deletion I recommend: > > String getShardIdentifierForAddition(Class<?> entityType, Serializable id, > String idAsString, Document document); > String getShardIdentifierDeletion(Class<?> entityType, Serializable id, > String idAsString); > > So I would re-add the suffixes 'ForAddition' and 'ForDeletion'. Also I'd > change the return type of > getShardIdentifierDeletion. The return type will be _Set<String>_. 'ForAddition' is misleading as it's not used just for additions, I think we discussed these already. > > I test would be nice as well. Maybe if we see an actual example coded out we > would have a better ground > for discussion. I'll make one, but I hope you'll not be too severe: a fully fledged example would take much time, I hope to find one which is somehow providing the intuition, but not the full JMS routing example I mentioned earlier which is IMHO the strongest advocate. > Also, what are the concerns here? Performance, because I target all shards > for deletion or security, aka > a deletion is send to a shard which potentially belongs to a different > customer. Right performance is a strong point, but when dealing with multitenancy you might also have legal requirements; DOSing a different tenant might be a violation of terms. > > What is the actual performance gain between the two different scenarios? A > factor of 2, 4, 10? Depends, the benefit is obviously proportional on how many shards you have and how often you delete :-) Remember that with NRT we can do writes relatively quickly, but a delete will always require a disk sync. A disk sync is a very strong barrier of course, we we're more likely in the area of 3 to 4 orders of magnitude for a delete vs. a write. Of course we would still have a delete, but on less indexes. You could think then that the cost is just a factor of how many shards, but consider also that the above cost is not actually paid for during the delete flush, but at query time: the query will trigger a pre-execution flush. So there are cases in which I might be sending deletes on index A, and running queries on index B, having indexes on B *much* faster because of the non-sync happening. On top of this you have to account for FieldCaches to be invalidated because the index is dirty, or FullTextFilters which need to be re-computed unnecessarily. It can all pile up, interacting with each other, making quite an ugly difference. Granted, I'd have to make a very unrealistic test to highlight it, so let's keep it to the theory, but I think that you can easily appreciate how it can make a significant difference in complex real world applications. Cool idea: Considering this all, I guess an interesting use case is to use a Strategy which always returns an empty set for deletions. Some people like to reindex at night to keep good performance over the day and disable our event listeners, to run the MassIndexer overnight. I guess a nice tradeoff would be to just skip delete work on the index: Hibernate Search won't return non-existing matches anyway, and the garbage would be cleaned up with the MassIndexer run overnight, but at least you would still have updates applied in real time. Might need some validation, but considering the IO cost of deletions (and indirect cost on filters and caches) I would seriously have considered such an approach as a user. --Sanne _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev