Got it, makes sense. On Fri, Jan 25, 2019 at 11:06 AM Anton Kalashnikov <kaa....@yandex.ru> wrote:
> Vladimir, thanks for your notes, both of them looks good enough but I > have two different thoughts about it. > > I think I agree about enabling only one of manual/auto adjustment. It is > easier than current solution and in fact as extra feature we can allow > user to force task to execute(if they doesn't want to wait until timeout > expired). > But about second one I don't sure that one parameters instead of two would > be more convenient. For example: in case when user changed timeout and then > disable auto-adjust after then when someone will want to enable it they > should know what value of timeout was before auto-adjust was disabled. I > think "negative value" pattern good choice for always usable parameters > like timeout of connection (ex. -1 equal to endless waiting) and so on, but > in our case we want to disable whole functionality rather than change > parameter value. > > -- > Best regards, > Anton Kalashnikov > > > 24.01.2019, 22:03, "Vladimir Ozerov" <voze...@gridgain.com>: > > Hi Anton, > > > > This is great feature, but I am a bit confused about automatic disabling > of > > a feature during manual baseline adjustment. This may lead to unpleasant > > situations when a user enabled auto-adjustment, then re-adjusted it > > manually somehow (e.g. from some previously created script) so that > > auto-adjustment disabling went unnoticed, then added more nodes hoping > that > > auto-baseline is still active, etc. > > > > Instead, I would rather make manual and auto adjustment mutually > exclusive > > - baseline cannot be adjusted manually when auto mode is set, and vice > > versa. If exception is thrown in that cases, administrators will always > > know current behavior of the system. > > > > As far as configuration, wouldn’t it be enough to have a single long > value > > as opposed to Boolean + long? Say, 0 - immediate auto adjustment, > negative > > - disabled, positive - auto adjustment after timeout. > > > > Thoughts? > > > > чт, 24 янв. 2019 г. в 18:33, Anton Kalashnikov <kaa....@yandex.ru>: > > > >> Hello, Igniters! > >> > >> Work on the Phase II of IEP-4 (Baseline topology) [1] has started. I > want > >> to start to discuss of implementation of "Baseline auto-adjust" [2]. > >> > >> "Baseline auto-adjust" feature implements mechanism of auto-adjust > >> baseline corresponding to current topology after event join/left was > >> appeared. It is required because when a node left the grid and nobody > would > >> change baseline manually it can lead to lost data(when some more nodes > left > >> the grid on depends in backup factor) but permanent tracking of grid > is not > >> always possible/desirible. Looks like in many cases auto-adjust > baseline > >> after some timeout is very helpfull. > >> > >> Distributed metastore[3](it is already done): > >> > >> First of all it is required the ability to store configuration data > >> consistently and cluster-wide. Ignite doesn't have any specific API for > >> such configurations and we don't want to have many similar > implementations > >> of the same feature in our code. After some thoughts is was proposed to > >> implement it as some kind of distributed metastorage that gives the > ability > >> to store any data in it. > >> First implementation is based on existing local metastorage API for > >> persistent clusters (in-memory clusters will store data in memory). > >> Write/remove operation use Discovery SPI to send updates to the > cluster, it > >> guarantees updates order and the fact that all existing (alive) nodes > have > >> handled the update message. As a way to find out which node has the > latest > >> data there is a "version" value of distributed metastorage, which is > >> basically <number of all updates, hash of updates>. All updates history > >> until some point in the past is stored along with the data, so when an > >> outdated node connects to the cluster it will receive all the missing > data > >> and apply it locally. If there's not enough history stored or joining > node > >> is clear then it'll receive shapshot of distributed metastorage so > there > >> won't be inconsistencies. > >> > >> Baseline auto-adjust: > >> > >> Main scenario: > >> - There is grid with the baseline is equal to the current > topology > >> - New node joins to grid or some node left(failed) the grid > >> - New mechanism detects this event and it add task for changing > >> baseline to queue with configured timeout > >> - If new event are happened before baseline would be changed > task > >> would be removed from queue and new task will be added > >> - When timeout are expired the task would try to set new > baseline > >> corresponded to current topology > >> > >> First of all we need to add two parameters[4]: > >> - baselineAutoAdjustEnabled - enable/disable "Baseline > >> auto-adjust" feature. > >> - baselineAutoAdjustTimeout - timeout after which baseline > should > >> be changed. > >> > >> This parameters are cluster wide and can be changed in real time > because > >> it is based on "Distributed metastore". On first time this parameters > would > >> be initiated by corresponded parameters(initBaselineAutoAdjustEnabled, > >> initBaselineAutoAdjustTimeout) from "Ignite Configuration". Init value > >> valid only before first changing of it after value would be changed it > is > >> stored in "Distributed metastore". > >> > >> Restrictions: > >> - This mechanism handling events only on active grid > >> - If baselineNodes != gridNodes on activate this feature would > be > >> disabled > >> - If lost partitions was detected this feature would be > disabled > >> - If baseline was adjusted manually on baselineNodes != > gridNodes > >> this feature would be disabled > >> > >> Draft implementation you can find here[5]. Feel free to ask more > details > >> and make suggestions. > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-4+Baseline+topology+for+caches > >> [2] https://issues.apache.org/jira/browse/IGNITE-8571 > >> [3] https://issues.apache.org/jira/browse/IGNITE-10640 > >> [4] https://issues.apache.org/jira/browse/IGNITE-8573 > >> [5] https://github.com/apache/ignite/pull/5907 > >> > >> -- > >> Best regards, > >> Anton Kalashnikov >