Re: Service grid redesign

Denis Magda Wed, 04 Apr 2018 12:53:28 -0700

Sorry, that was me who renamed the IEP to "Oil Change in Service Grid". Was
writing this email after the renaming. Like that title more because it's
fun and highlights what we're intended to do - cleaning of our service grid
engine and powering it up with new "liquid" (new communication and
deployment approach not available before).


Denis


> This message contains serialized service instance and its configuration.
> It is delivered to the coordinator node first, that calculates the service
> deployment assignments and adds this information to the message.


I would consider using a NodeFilter first to decide where a service can be
potentially deployed.  Otherwise, we would require service classes to be on
every node (every node might become a coordinator) which is not the desired
requirement.


As for the peer-class-loading, I would backup up Dmitriy here. Let's at
least not to focus on this task for now. We should design services
versioning in the right way first and support it.

--
Denis



On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <[email protected]>
wrote:

> Here is the correct link:
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Oil+Change+in+Service+Grid
>
> I have looked at the tickets there, and I believe that we should not
> support peer-deployment for services. It is very hard and I do not think we
> should even try.
>
> I am proposing closing this ticket as Won't Fix -
> https://issues.apache.org/jira/browse/IGNITE-975
>
> D.
>
> On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <[email protected]>
> wrote:
>
> > Vyacheslav,
> >
> > I've just posted my first draft of the IEP:
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Service+grid+
> > improvements
> > It's not finished yet, but you can get the idea from it.
> > If you have some thoughts on your mind, please let me know, I'll add them
> > to the IEP.
> >
> > Denis
> >
> > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <[email protected]>:
> >
> > > Denis, thanks for the link.
> > >
> > > I looked through the task and I think that understand your redesign
> point
> > > now.
> > >
> > > Do you have a clear plan or IEP for the whole redesign?
> > >
> > > I'm interested in this component and I'd like to take part in the
> > > development.
> > >
> > >
> > >
> > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> [email protected]>
> > > wrote:
> > > > Vyacheslav,
> > > >
> > > > Service deployment design, based on replicated utility cache has
> proven
> > > to
> > > > be unstable and deadlock-prone.
> > > > You can find a list of JIRA issues, connected to it, in my previous
> > > letter.
> > > >
> > > > The intention behind it is similar to the binary metadata redesign,
> > that
> > > > happened in the following ticket: IGNITE-4157
> > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > This change in service deployment procedure will eliminate need for
> > > another
> > > > internal replicated cache
> > > > and make service deployment more reliable on unstable topology.
> > > >
> > > > Denis
> > > >
> > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur <[email protected]
> >:
> > > >
> > > >> Hi, Denis Mekhanikov!
> > > >>
> > > >> As far as I know, Ignite services are based on IgniteCache and we
> have
> > > >> all its features. We can use listeners or continuous queries for
> > > >> deployment synchronizations.
> > > >>
> > > >> Why do you want using the discovery layer for that?
> > > >>
> > > >> One more thing: we can use baseline approach for services, that
> means
> > > >> *IgniteService.deploy()* returns ready to work service after
> > > >> deployment on baseline nodes and deploy to other nodes on demand,
> for
> > > >> example when deployed service's loading will be hight.
> > > >>
> > > >> About versioning, maybe there is sense to extend public API:
> > > >> IgniteServices.service(name, *version*)?
> > > >>
> > > >> At first deployment, we can compute service's hashcode (just for an
> > > >> example) and store it, after new deployment request for services
> with
> > > >> an existing name we will compute new service's hashcode and compare
> > > >> them if they have different hashcodes that we will deploy new
> service
> > > >> as service with a different version.
> > > >>
> > > >>
> > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <[email protected]>
> > > wrote:
> > > >> > Denis,
> > > >> >
> > > >> > Thanks for the extensive analysis. There is a vast room for
> > > optimizations
> > > >> > on the service grid side.
> > > >> >
> > > >> > Yakov, Sam, Alex G.,
> > > >> >
> > > >> > How do you like the idea of the usage of discovery protocol for
> the
> > > >> service
> > > >> > grid system messages exchange? Any pitfalls?
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Denis
> > > >> >
> > > >> >
> > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov <
> > > [email protected]
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> >> Igniters,
> > > >> >>
> > > >> >> I'd like to start a discussion on Ignite service grid redesign.
> > > >> >> We have a number of problems in our current architecture, that
> have
> > > to
> > > >> be
> > > >> >> addressed.
> > > >> >>
> > > >> >> Here are the most severe ones:
> > > >> >>
> > > >> >> One of them is lack of guarantee, that service is successfully
> > > deployed
> > > >> and
> > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> methods
> > > >> return.
> > > >> >> Furthermore, if an exception is thrown from *Service.init()
> > *method,
> > > >> then
> > > >> >> the deploying side is not able to receive it, or even understand,
> > > that
> > > >> >> service is in unusable state.
> > > >> >> So, you may end up in such situation, when you deployed a service
> > > >> without
> > > >> >> receiving any errors, then called a service's method, and hung
> > > >> indefinitely
> > > >> >> on this invocation.
> > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-3392
> > > >> >>
> > > >> >> Another problem is locking during service deployment on unstable
> > > >> topology.
> > > >> >> This issue is caused by missing updates in continuous query
> > > listeners on
> > > >> >> the internal cache.
> > > >> >> It is hard to reproduce, but it happens sometimes. We shouldn't
> > allow
> > > >> such
> > > >> >> possibility, that deployment methods hang without saying
> anything.
> > > >> >> JIRA ticket: https://issues.apache.org/jira/browse/IGNITE-6259
> > > >> >>
> > > >> >> I think, we should change the deployment procedure to make it
> more
> > > >> >> reliable.
> > > >> >> Moving from operating over internal replicated service cache to
> > > sending
> > > >> >> custom discovery events seems to be a good idea.
> > > >> >> Service deployment may trigger a discovery event, that will make
> > > chosen
> > > >> >> nodes deploy the service, and the same event will notify other
> > nodes
> > > >> about
> > > >> >> the deployed service instances.
> > > >> >> It will eliminate the need for distributed transactions on the
> > > internal
> > > >> >> replicated system cache, and make the service deployment protocol
> > > more
> > > >> >> transparent.
> > > >> >>
> > > >> >> There are a few points, that should be taken into account though.
> > > >> >>
> > > >> >> First of all, we can't wait for services to be deployed and
> > > initialised
> > > >> in
> > > >> >> the discovery thread.
> > > >> >> So, we need to make notification about service deployment result
> > > >> >> asynchronous, presumably over communication protocol.
> > > >> >> I can think of a procedure similar to the current exchange
> > protocol,
> > > >> when
> > > >> >> service deployment is initialised with an initial discovery
> > message,
> > > >> >> followed by asynchronous notifications from the hosting servers
> > over
> > > >> >> communication. And finally, one more discovery message will
> notify
> > > all
> > > >> >> nodes about the service deployment result and location of the
> > > deployed
> > > >> >> service instances. Coordinator will be responsible for collecting
> > of
> > > the
> > > >> >> deployment results in this scheme.
> > > >> >>
> > > >> >> Another problem is failover in case, when some nodes fail during
> > > >> deployment
> > > >> >> or further work.
> > > >> >> The following cases should be handled:
> > > >> >>
> > > >> >>    1. coordinator failure during deployment;
> > > >> >>    2. failure of nodes, that were chosen to host the service,
> > during
> > > >> >>    deployment;
> > > >> >>    3. failure of nodes, that contain deployed services, after the
> > > >> >>    deployment.
> > > >> >>
> > > >> >> The first case may be resolved by either continuation of
> deployment
> > > >> with a
> > > >> >> new coordinator, or by cancelling it.
> > > >> >> The second case will require another node to be chosen and
> > notified.
> > > >> Maybe
> > > >> >> another discovery message will be needed.
> > > >> >> The third case will require redeployment, so coordinator should
> > track
> > > >> >> topology changes and redeploy failed services.
> > > >> >>
> > > >> >> Another good improvement would be service versioning. This matter
> > was
> > > >> >> already discussed in another thread:
> > > >> >>
> > > >>
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Service-versioning-
> > > >> >> td20858.html
> > > >> >> Let's resume this discussion and state the final decision here.
> > > >> >> This feature is closely connected to peer class loading, which is
> > not
> > > >> >> working for services currently.
> > > >> >> So, service versioning should be implemented along with peer
> class
> > > >> loading.
> > > >> >> JIRA ticket for versioning:
> > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > >> >> Peer class loading: https://issues.apache.org/
> > jira/browse/IGNITE-975
> > > >> >>
> > > >> >> Please share your thoughts. Constructive criticism is highly
> > > >> appreciated.
> > > >> >>
> > > >> >> Denis
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Vyacheslav D.
> > > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>

Re: Service grid redesign

Reply via email to