Re: Service grid redesign

Vyacheslav Daradur Wed, 04 Apr 2018 03:10:26 -0700

Denis, thanks for the link.

I looked through the task and I think that understand your redesign point now.


Do you have a clear plan or IEP for the whole redesign?

I'm interested in this component and I'd like to take part in the development.



On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <[email protected]> wrote:
> Vyacheslav,
>
> Service deployment design, based on replicated utility cache has proven to
> be unstable and deadlock-prone.
> You can find a list of JIRA issues, connected to it, in my previous letter.
>
> The intention behind it is similar to the binary metadata redesign, that
> happened in the following ticket: IGNITE-4157
> <https://issues.apache.org/jira/browse/IGNITE-4157>
> This change in service deployment procedure will eliminate need for another
> internal replicated cache
> and make service deployment more reliable on unstable topology.
>
> Denis
>
> вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[email protected]>:
>
>> Hi, Denis Mekhanikov!
>>
>> As far as I know, Ignite services are based on IgniteCache and we have
>> all its features. We can use listeners or continuous queries for
>> deployment synchronizations.
>>
>> Why do you want using the discovery layer for that?
>>
>> One more thing: we can use baseline approach for services, that means
>> *IgniteService.deploy()* returns ready to work service after
>> deployment on baseline nodes and deploy to other nodes on demand, for
>> example when deployed service's loading will be hight.
>>
>> About versioning, maybe there is sense to extend public API:
>> IgniteServices.service(name, *version*)?
>>
>> At first deployment, we can compute service's hashcode (just for an
>> example) and store it, after new deployment request for services with
>> an existing name we will compute new service's hashcode and compare
>> them if they have different hashcodes that we will deploy new service
>> as service with a different version.
>>
>>
>> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[email protected]> wrote:
>> > Denis,
>> >
>> > Thanks for the extensive analysis. There is a vast room for optimizations
>> > on the service grid side.
>> >
>> > Yakov, Sam, Alex G.,
>> >
>> > How do you like the idea of the usage of discovery protocol for the
>> service
>> > grid system messages exchange? Any pitfalls?
>> >
>> >
>> > --
>> > Denis
>> >
>> >
>> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <[email protected]
>> >
>> > wrote:
>> >
>> >> Igniters,
>> >>
>> >> I'd like to start a discussion on Ignite service grid redesign.
>> >> We have a number of problems in our current architecture, that have to
>> be
>> >> addressed.
>> >>
>> >> Here are the most severe ones:
>> >>
>> >> One of them is lack of guarantee, that service is successfully deployed
>> and
>> >> ready for work by the time, when *IgniteService.deploy*()* methods
>> return.
>> >> Furthermore, if an exception is thrown from *Service.init() *method,
>> then
>> >> the deploying side is not able to receive it, or even understand, that
>> >> service is in unusable state.
>> >> So, you may end up in such situation, when you deployed a service
>> without
>> >> receiving any errors, then called a service's method, and hung
>> indefinitely
>> >> on this invocation.
>> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
>> >>
>> >> Another problem is locking during service deployment on unstable
>> topology.
>> >> This issue is caused by missing updates in continuous query listeners on
>> >> the internal cache.
>> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow
>> such
>> >> possibility, that deployment methods hang without saying anything.
>> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
>> >>
>> >> I think, we should change the deployment procedure to make it more
>> >> reliable.
>> >> Moving from operating over internal replicated service cache to sending
>> >> custom discovery events seems to be a good idea.
>> >> Service deployment may trigger a discovery event, that will make chosen
>> >> nodes deploy the service, and the same event will notify other nodes
>> about
>> >> the deployed service instances.
>> >> It will eliminate the need for distributed transactions on the internal
>> >> replicated system cache, and make the service deployment protocol more
>> >> transparent.
>> >>
>> >> There are a few points, that should be taken into account though.
>> >>
>> >> First of all, we can't wait for services to be deployed and initialised
>> in
>> >> the discovery thread.
>> >> So, we need to make notification about service deployment result
>> >> asynchronous, presumably over communication protocol.
>> >> I can think of a procedure similar to the current exchange protocol,
>> when
>> >> service deployment is initialised with an initial discovery message,
>> >> followed by asynchronous notifications from the hosting servers over
>> >> communication. And finally, one more discovery message will notify all
>> >> nodes about the service deployment result and location of the deployed
>> >> service instances. Coordinator will be responsible for collecting of the
>> >> deployment results in this scheme.
>> >>
>> >> Another problem is failover in case, when some nodes fail during
>> deployment
>> >> or further work.
>> >> The following cases should be handled:
>> >>
>> >>    1. coordinator failure during deployment;
>> >>    2. failure of nodes, that were chosen to host the service, during
>> >>    deployment;
>> >>    3. failure of nodes, that contain deployed services, after the
>> >>    deployment.
>> >>
>> >> The first case may be resolved by either continuation of deployment
>> with a
>> >> new coordinator, or by cancelling it.
>> >> The second case will require another node to be chosen and notified.
>> Maybe
>> >> another discovery message will be needed.
>> >> The third case will require redeployment, so coordinator should track
>> >> topology changes and redeploy failed services.
>> >>
>> >> Another good improvement would be service versioning. This matter was
>> >> already discussed in another thread:
>> >>
>> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning-
>> >> td20858.html
>> >> Let's resume this discussion and state the final decision here.
>> >> This feature is closely connected to peer class loading, which is not
>> >> working for services currently.
>> >> So, service versioning should be implemented along with peer class
>> loading.
>> >> JIRA ticket for versioning:
>> >> https://issues.apache.org/jira/browse/IGNITE-6069
>> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975
>> >>
>> >> Please share your thoughts. Constructive criticism is highly
>> appreciated.
>> >>
>> >> Denis
>> >>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>



-- 
Best Regards, Vyacheslav D.

Re: Service grid redesign

Reply via email to