What does Iterable<String> give you over String[]?
On Mon 2013-09-23 23:04, Sanne Grinovero wrote: > Correct me if I'm wrong, but trying to synthesize this discussion I > think that we're fundamentally agreeing that dynamic sharding is a > "better replacement" for static sharding. > Still, let's keep in mind that this needs to be a backwards compatible > patch, so we're not looking for something disruptive of the current > static sharding feature. If we end up agreeing that the better API > needs disruption, we should still make an incremental change available > in this version - as experimental - so that people can play with it, > then we're free to reset the API as we wish in 5.0 but at least > including the newly acquired experience. > > We then had some brainstorming on IRC which concluded that it would be > probably more user friendly to have the __Strategy to: > a) not return arrays but simple collections > b) avoid the confusion on the two too similar methods on the proposal > for ShardIdentifierProvider (next paragraph) > c) not apply the sharding logic based on the Document (fields) but on > the entity (the user type triggering the index event) > > As a reminder, the proposal for ShardIdentifierProvider is - omitting > initialize - : > > /** > * Returns the shard identifier upon addition. > */ > String getShardIdentifier(Class<?> entity, Serializable id, String > idInString, Document document); > > /** > * Returns the set of shard identifiers upon deletion. > */ > String[] getShardIdentifiers(Class<?> entity, Serializable id, > String idInString); > > /** > * Returns the set of shard identifiers for a query. > */ > String[] getShardIdentifiersForQuery(FullTextFilterImplementor[] > fullTextFilters); > > /** > * Returns the list of all known shard identifiers. > * The list can vary between calls. > */ > String[] getAllShardIdentifiers(); > > #a > Ok we can consider that but let's see how the code turns out. > Iterable<String> perhaps? > > #b > we explored converging the two methods into the essential one: > > String getShardIdentifier(Class<?> entity, Serializable id, String > idInString); > > but that seems very poor in terms of flexibility, it doesn't even > allow access to the basic properties of the indexed entity. So that > degenerated in the proposal #c, but before moving to #c I'd venture > that these methods aren't that bad, they just need good documentation. > Also, even if we move the focus from the Document to the Entity, we > still don't have the fully loaded entity during a delete operation, so > we would still need the second method returning multiple indexes. > > #c > drawbacks first: > = to expose the entity directly has probably some risk (the user > making changes to it), but that's no different to what the user can > already do from a FieldBridge / ClassBridge. > = won't be able to consider the output of FieldBridge / ClassBridge > instances as you won't have the Document > = for deletion you still don't have the entity > > We could list some conceptually interesting advantages here, but I'd > like to shut down this feature for the time being because it's > significantly different than the goal of providing Dynamic Sharding as > a feature. I'm not saying that it doesn't have merit: its probably > worth exploring for 5+ but it rather seems like an _additional_ level > of sharding that we might want to add in future as an alternative to > the one focusing on the Document approach. > > So assuming that sounds reasonable, and that we're on the same page > for a#, let's focus on b#: the duality of the methods for add/remove > during sharding. > My position is that it worked well so far on static sharding, and that > the proposal is quite consistent with it so wouldn't be much of a pain > for people to adapt the new model. Let's try polishing the method > names (and use your imagination for a well written javadoc): > > String getShardIdFromDocument(Class<?> entity, Serializable id, > String idInString, Document document); > > Iterable<String> getShardIdsFromId(Class<?> entity, Serializable > id, String idInString); > > or maybe to highlight what's fundamentally different: > > String getShardIdFromContext(Class<?> entity, Serializable id, > String idInString, Document document); > > Iterable<String> getShardIdsFromReducedContext(Class<?> entity, > Serializable id, String idInString); > > > I especially like the javadoc! well done. > WFYT? > > > Bonus dilemma: should we stay away from String and define some > "IndexIdentifier" interface ? Note that while design wise it might > look good, this is on a very hot path so while I'd like proposing such > an SPI it would be great if we could find a way in which this would > not require to allocate objects at runtime just to replace a mere > string with a safer type. > > Sanne > > > > On 23 September 2013 13:58, Hardy Ferentschik <ha...@hibernate.org> wrote: > > > > On 23 Jan 2013, at 1:55 PM, Sanne Grinovero <sa...@hibernate.org> wrote: > > > >>> Or I set a custom sharding strategy which does not care about the number > >>> of shards? > >> > >> I think that's far fetched. The NBR_OF_SHARDS option defines the size > >> of the array of indexes passed to the IndexShardingStrategy so it's > >> hard to ignore. Sure it's possible, we could throw a > >> log.userIsAnIdiotException() but someone might not see the humor :-). > >> Worst case it degenerates in a case similar to your example of > >> "NotShardedStrategy and nbr_of_shards >1", or the user would notice > >> with an ArrayOutOfBounds.. usercode problem. > > > > I think I am thinking about a custom dynamic sharding strategy in this case. > > > >>> IMO the important factor is to set the right sharding strategy and > >>> nbr_of_shards should just be a (optional) parameter to the sharding > >>> strategy. > >> > >> Note that so far we don't expect users to explicitly set the > >> NotShardedStrategy: it's simply a consequence of not having set any > >> option; if the user sets only the number of shards but omits picking a > >> specific strategy, we automatically assume he's going for the > >> IdHashShardingStrategy > > > > I get that, but as I said, I see it as a bit of misguided "ease of use", > > since it can introduce > > problems (see above) and is not consistent. > > > >> As soon as a different IndexShardingStrategy is chosen, then I think > >> it's quite self-explanatory that setting NBR_OF_SHARDS is quite > >> useful: the user will have coded an explicit IndexShardingStrategy and > >> consequentially have a clear idea of how many shards he wants, at > >> least for the static sharding so far. > > > > Right, for static sharding maybe, but we are going to make this dynamic now. > > > >>> With dynamic sharding things get more complicated. Right now you > >>> configure dynamic sharding by setting 'nbr_of_shards' to the literal > >>> 'dynamic'. This selects under the hood the > >>> right IndexShardingStrategy (DynamicShardingStrategy). I find it > >>> misleading on multiple levels. First 'dynamic' is not a number and > >>> secondly I want to configure a strategy > >>> not the number of shards. It is also inconsistent with how we > >>> select/configure other pluggable components in Search. For that reason I > >>> suggest: > >>> > >>> - The type of sharding is configured via setting > >>> hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a > >>> parameter which gets passed to the strategy and which > >>> might get ignored depending on the sharding implementation. > >>> Implementations are free to check the property and e.g. print out a > >>> warning if the settings does not apply to them > >> > >> Conceptually it sounds nice. > >> I see two downsides: > >> - it pushes complexity to the IndexShardingStrategy implementor (the > >> user) as he needs to parse it and somehow he needs to request those > >> indexes from the SearchFactory to be built. Pushing both these > >> responsibilities to the end user in exchange for a one-liner in the > >> configuration file seems like an odd choice? I would agree if it was > >> us to write the code, but I really expect most people to plug their > >> own strategy as IdHashShardingStrategy isn't very useful in a real > >> world app. > > > > I think that if the API to create an index manager is simple it basically > > unifies static > > and dynamic sharding. > > > >> - today we pre-initialize the indexes (IndexManagers) before they are > >> passed to the IndexShardingStrategy # initialize method. We would need > >> to pass instead some lifecycle-controlling objects which allows the > >> user to trigger index initialization. Again I essentially agree but > >> that sounds much like dynamic sharding? > > > > With the right API both would be possible in a simple to understand way. > > > >> I don't think we can change these in the scope of 4.4 as it affects > >> current API. Shall we take this inconsistency point as > >> yet-another-reason to migrate to Dynamic Sharding? While the new > >> feature matures, I suspect it could completely replace the static one. > > > > I think so as well, in which case we need to make sure that we get the API > > right. A > > new/updated initialise contract might be exactly what we need instead of > > yet another patch. > > See also my email regarding ShardIdentifierProvider. > > > >>> - We introduce short names for the provided sharding strategies - 'none', > >>> 'id-hash', 'dynamic'. This will avoid the need to reference concrete > >>> implementation classes > >> > >> -1 : as I reminded above, I don't expect id-hash to be of practical > >> use, people want to plug their own strategies which implies we need > >> the concrete implementation classes. I'd rather see the > >> IdHashShardingStrategy as a concrete example we're providing (not just > >> an example, I guess someone might find it useful in production, I just > >> think it's a minority of the IndexShardingStrategy users). > > > > IdHashShardingStrategy is in use right now, at least if you enable sharding > > without any other specific > > IndexShardingStrategy implementation. Providing a short name of it is > > inline with configuration options > > like 'ram' or 'filesystem' for directory provides. We could use 'default' > > to hide the fact which impl > > we are using. This way we could even replace the impl in case we find a > > better one. > > > >>> - For dynamic sharding we have the additional sub-property > >>> 'shard_identity_provider' which specifies the ShardIdentifierProvider > >>> (new contract needed for dynamic sharding). > >>> This property is only relevant for dynamic sharding and will be handled > >>> in the same way as 'nbr_of_shards' > >> > >> To recap today we have > >> hibernate.search.[indexName].sharding_strategy = [implementation > >> of IndexShardingStrategy] > >> > >> Would it not be nice if I could either specify an implementation of > >> IndexShardingStrategy or a ShardIdentifierProvider ? > >> hibernate.search.[indexName].sharding_strategy = [implementation > >> of IndexShardingStrategy | ShardIdentifierProvider] > > > > hmm, I have not thought about it this way. So far I was more thinking along > > the > > lines of removing ShardIdentifierProvider. But you are proposing to keep it > > and maybe in the long run remove IndexShardingStrategy? > > > >> - being the same property keeps it clear that you can either specify > >> one OR the other. > > > > Except that we are talking about to different interfaces. Hardly good > > practice to offer this type of > > confguration. > > > > --Hardy > _______________________________________________ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev