Hi,

I am currently working on HSEARCH-471 [1] - dynamic sharding. The work is built 
on Emmanuel's prototype and you find the current code on my fork [2].

Right now I am wondering about how to configure (dynamic) sharding. Here is how 
things worked prior to dynamic sharding. Basically there two properties
driving the shard configuration:

- hibernate.search.[indexName].sharding_strategy
- hibernate.search.[indexName].sharding_strategy.nbr_of_shards

The first property determines the implementation class of IndexShardingStrategy 
and the second the number of shards to create. So far we had two implementations
of IndexShardingStrategy, namely NotShardedStrategy and IdHashShardingStrategy. 

To configure sharding it was enough to set nbr_of_shards to a value > 1. This 
would automatically select IdHashShardingStrategy and shard depending on the 
configured 
number of shards. The idea was to make it simple to for the user and only 
require a single configuration change to enable sharding. 

However, it creates inconsistencies. For example what if I select 
NotShardedStrategy and nbr_of_shards >1? 
Or I set a custom sharding strategy which does not care about the number of 
shards?
IMO the important factor is to set the right sharding strategy and 
nbr_of_shards should just be a (optional) parameter to the sharding strategy. 

With dynamic sharding things get more complicated. Right now you configure 
dynamic sharding by setting 'nbr_of_shards' to the literal 'dynamic'. This 
selects under the hood the
right IndexShardingStrategy (DynamicShardingStrategy). I find it misleading on 
multiple levels. First 'dynamic' is not a number and secondly I want to 
configure a strategy
not the number of shards. It is also inconsistent with how we select/configure 
other pluggable components in Search. For that reason I suggest:

- The type of sharding is configured via setting 
hibernate.search.[indexName].sharding_strategy. 'nbr_of_shards' is a parameter 
which gets passed to the strategy and which
   might get ignored depending on the sharding implementation. Implementations 
are free to check the property and e.g. print out a warning if the settings 
does not apply to them
- We introduce short names for the provided sharding strategies - 'none', 
'id-hash', 'dynamic'. This will avoid the need to reference concrete 
implementation classes
- For dynamic sharding we have the additional sub-property 
'shard_identity_provider' which specifies the ShardIdentifierProvider (new 
contract needed for dynamic sharding). 
  This property is only relevant for dynamic sharding and will be handled in 
the same way as 'nbr_of_shards'

Thoughts?

--Hardy

 

[1] https://hibernate.atlassian.net/browse/HSEARCH-472
[2] https://github.com/hferentschik/hibernate-search/compare/HSEARCH-472
_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Reply via email to