Denis, thanks for the link. I looked through the task and I think that understand your redesign point now.
Do you have a clear plan or IEP for the whole redesign? I'm interested in this component and I'd like to take part in the development. On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <dmekhani...@gmail.com> wrote: > Vyacheslav, > > Service deployment design, based on replicated utility cache has proven to > be unstable and deadlock-prone. > You can find a list of JIRA issues, connected to it, in my previous letter. > > The intention behind it is similar to the binary metadata redesign, that > happened in the following ticket: IGNITE-4157 > <https://issues.apache.org/jira/browse/IGNITE-4157> > This change in service deployment procedure will eliminate need for another > internal replicated cache > and make service deployment more reliable on unstable topology. > > Denis > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <daradu...@gmail.com>: > >> Hi, Denis Mekhanikov! >> >> As far as I know, Ignite services are based on IgniteCache and we have >> all its features. We can use listeners or continuous queries for >> deployment synchronizations. >> >> Why do you want using the discovery layer for that? >> >> One more thing: we can use baseline approach for services, that means >> *IgniteService.deploy()* returns ready to work service after >> deployment on baseline nodes and deploy to other nodes on demand, for >> example when deployed service's loading will be hight. >> >> About versioning, maybe there is sense to extend public API: >> IgniteServices.service(name, *version*)? >> >> At first deployment, we can compute service's hashcode (just for an >> example) and store it, after new deployment request for services with >> an existing name we will compute new service's hashcode and compare >> them if they have different hashcodes that we will deploy new service >> as service with a different version. >> >> >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <dma...@apache.org> wrote: >> > Denis, >> > >> > Thanks for the extensive analysis. There is a vast room for optimizations >> > on the service grid side. >> > >> > Yakov, Sam, Alex G., >> > >> > How do you like the idea of the usage of discovery protocol for the >> service >> > grid system messages exchange? Any pitfalls? >> > >> > >> > -- >> > Denis >> > >> > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <dmekhani...@gmail.com >> > >> > wrote: >> > >> >> Igniters, >> >> >> >> I'd like to start a discussion on Ignite service grid redesign. >> >> We have a number of problems in our current architecture, that have to >> be >> >> addressed. >> >> >> >> Here are the most severe ones: >> >> >> >> One of them is lack of guarantee, that service is successfully deployed >> and >> >> ready for work by the time, when *IgniteService.deploy*()* methods >> return. >> >> Furthermore, if an exception is thrown from *Service.init() *method, >> then >> >> the deploying side is not able to receive it, or even understand, that >> >> service is in unusable state. >> >> So, you may end up in such situation, when you deployed a service >> without >> >> receiving any errors, then called a service's method, and hung >> indefinitely >> >> on this invocation. >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392 >> >> >> >> Another problem is locking during service deployment on unstable >> topology. >> >> This issue is caused by missing updates in continuous query listeners on >> >> the internal cache. >> >> It is hard to reproduce, but it happens sometimes. We shouldn't allow >> such >> >> possibility, that deployment methods hang without saying anything. >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259 >> >> >> >> I think, we should change the deployment procedure to make it more >> >> reliable. >> >> Moving from operating over internal replicated service cache to sending >> >> custom discovery events seems to be a good idea. >> >> Service deployment may trigger a discovery event, that will make chosen >> >> nodes deploy the service, and the same event will notify other nodes >> about >> >> the deployed service instances. >> >> It will eliminate the need for distributed transactions on the internal >> >> replicated system cache, and make the service deployment protocol more >> >> transparent. >> >> >> >> There are a few points, that should be taken into account though. >> >> >> >> First of all, we can't wait for services to be deployed and initialised >> in >> >> the discovery thread. >> >> So, we need to make notification about service deployment result >> >> asynchronous, presumably over communication protocol. >> >> I can think of a procedure similar to the current exchange protocol, >> when >> >> service deployment is initialised with an initial discovery message, >> >> followed by asynchronous notifications from the hosting servers over >> >> communication. And finally, one more discovery message will notify all >> >> nodes about the service deployment result and location of the deployed >> >> service instances. Coordinator will be responsible for collecting of the >> >> deployment results in this scheme. >> >> >> >> Another problem is failover in case, when some nodes fail during >> deployment >> >> or further work. >> >> The following cases should be handled: >> >> >> >> 1. coordinator failure during deployment; >> >> 2. failure of nodes, that were chosen to host the service, during >> >> deployment; >> >> 3. failure of nodes, that contain deployed services, after the >> >> deployment. >> >> >> >> The first case may be resolved by either continuation of deployment >> with a >> >> new coordinator, or by cancelling it. >> >> The second case will require another node to be chosen and notified. >> Maybe >> >> another discovery message will be needed. >> >> The third case will require redeployment, so coordinator should track >> >> topology changes and redeploy failed services. >> >> >> >> Another good improvement would be service versioning. This matter was >> >> already discussed in another thread: >> >> >> http://apache-ignite-developers.2346864.n4.nabble.com/Service-versioning- >> >> td20858.html >> >> Let's resume this discussion and state the final decision here. >> >> This feature is closely connected to peer class loading, which is not >> >> working for services currently. >> >> So, service versioning should be implemented along with peer class >> loading. >> >> JIRA ticket for versioning: >> >> https://issues.apache.org/jira/browse/IGNITE-6069 >> >> Peer class loading: https://issues.apache.org/jira/browse/IGNITE-975 >> >> >> >> Please share your thoughts. Constructive criticism is highly >> appreciated. >> >> >> >> Denis >> >> >> >> >> >> -- >> Best Regards, Vyacheslav D. >> -- Best Regards, Vyacheslav D.