Re: Service Grid new design overview

Dmitriy Setrakyan Mon, 27 Aug 2018 14:26:36 -0700

Agree with Val. I think all users would expect that a service is restarted
upon a node or cluster restart. Let's make sure we preserve this behavior.


D.

On Fri, Aug 24, 2018 at 4:17 PM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Guys,
>
> I believe we should preserve the behavior that we have now. What happens to
> services if we restart a persistent cluster running 2.6? Are services
> recreated or not? If YES, we should make sure the same happens after
> redesign. Would be even better if we preserve compatibility, i.e. allow
> seamless upgrade from older version that uses system cache to newer version
> that uses disco messages for service deployment. If NO, it's much easier
> and we can leave it as is for now. However, eventually would be great to
> have an option to persist services and redeploy them after cluster restart.
>
> -Val
>
> On Fri, Aug 24, 2018 at 2:51 PM Dmitriy Pavlov <dpavlov....@gmail.com>
> wrote:
>
> > Denis M. & Val please share your vision about this topic.
> >
> > пт, 24 авг. 2018 г. в 15:52, Vyacheslav Daradur <daradu...@gmail.com>:
> >
> > > Nick, Antron, thank you for stepping in.
> > >
> > > AFAIK, Ignite cluster can move its state to a new version of Ignite
> > > using persistence only.
> > >
> > > Since Ignite v.2.3 persistence is configured on a memory region and
> > > system memory region is not persistence, that means the system
> > > (utility) cache will not be recovered on cluster restart.
> > >
> > > Here is a ticket which describes the same issue:
> > > https://issues.apache.org/jira/browse/IGNITE-6629
> > >
> > > > BTW, Is proposed solution provides the guarantee that services will
> be
> > > > redeployed after each cluster restart since now we're not using the
> > > cache?
> > >
> > > No, only services described in IgniteConfiguration will be deployed at
> > > node startup as well as now.
> > >
> > > Am I wrong in something?
> > > On Thu, Aug 23, 2018 at 5:59 PM Anton Vinogradov <a...@apache.org>
> wrote:
> > > >
> > > > Vyacheslav.
> > > >
> > > > It looks like we able to restart all services on grid startup from
> old
> > > > definitions (inside cache) in case persistence turned on.
> > > > Se no problems to provide such automated migration case.
> > > > Also, we can test it using compatibility framework.
> > > >
> > > > BTW, Is proposed solution provides the guarantee that services will
> be
> > > > redeployed after each cluster restart since now we're not using the
> > > cache?
> > > >
> > > > чт, 23 авг. 2018 г. в 15:21, Nikolay Izhikov <nizhi...@apache.org>:
> > > >
> > > > > Hello, Vyacheslav.
> > > > >
> > > > > Thanks, for sharing your design.
> > > > >
> > > > > > I have a question about services migration from AI 2.6 to a new
> > > solution
> > > > >
> > > > > Can you describe consequences of not having migration solution?
> > > > > What will happen on the user side?
> > > > >
> > > > >
> > > > > В Чт, 23/08/2018 в 14:44 +0300, Vyacheslav Daradur пишет:
> > > > > > Hi, Igniters!
> > > > > >
> > > > > > I’m working on Service Grid redesign tasks and design seems to be
> > > > > finished.
> > > > > >
> > > > > > The main goal of Service Grid redesign is to provide missed
> > > guarantees:
> > > > > > - Synchronous services deployment/undeployment;
> > > > > > - Failover on coordinator change;
> > > > > > - Propagation of deployments errors across the cluster;
> > > > > > - Introduce of a deployment failures policy;
> > > > > > - Prevention of deployments initiators hangs while deployment;
> > > > > > - etc.
> > > > > >
> > > > > > I’d like to ask the community their thoughts about the proposed
> > > design
> > > > > > to be sure that all important things have been considered.
> > > > > >
> > > > > > Also, I have a question about services migration from AI 2.6 to a
> > new
> > > > > > solution. It’s very hard to provide tools for users migration,
> > > because
> > > > > > of significant changes. We don’t use utility cache anymore.
> Should
> > we
> > > > > > spend time on this?
> > > > > >
> > > > > > I’ve prepared a definition of new Service Grid design, it’s
> > described
> > > > > below:
> > > > > >
> > > > > > *OVERVIEW*
> > > > > >
> > > > > > All nodes (servers and clients) are able to host services, but
> the
> > > > > > client nodes are excluded from service deployment by default. The
> > > only
> > > > > > way to deploy service on clients nodes is to specify node filter
> in
> > > > > > ServiceConfiguration.
> > > > > >
> > > > > > All deployed services are identified internally by “serviceId”
> > > > > > (IgniteUuid). This allows us to build a base for such features as
> > hot
> > > > > > redeployment and service’s versioning. It’s important to have the
> > > > > > ability to identify and manage services with the same name, but
> > > > > > different version.
> > > > > >
> > > > > > All actions on service’s state change are processed according to
> > > unified
> > > > > flow:
> > > > > > 1) Initiator sends over disco-spi a request to change service
> state
> > > > > > [deploy, undeploy] DynamicServicesChangeRequestBatchMessage
> which
> > > will
> > > > > > be stored by all server nodes in own queue to be processed, if
> > > > > > coordinator failed, at new coordinator;
> > > > > > 2) Coordinator calculates assignments and defines actions in a
> new
> > > > > > message ServicesAssignmentsRequestMessage and sends it over
> > disco-spi
> > > > > > to be processed by all nodes;
> > > > > > 3) All nodes apply actions and build single map message
> > > > > > ServicesSingleMapMessage that contains services id and amount of
> > > > > > instances were deployed on this single node and sends the message
> > > over
> > > > > > comm-spi to coordinator (p2p);
> > > > > > 4) Once coordinator receives all single map messages then it
> builds
> > > > > > ServicesFullMapMessage that contains services deployments across
> > the
> > > > > > cluster and sends message over disco-spi to be processed by all
> > > nodes;
> > > > > >
> > > > > > *MESSAGES*
> > > > > >
> > > > > > class DynamicServicesChangeRequestBatchMessage {
> > > > > >     Collection<DynamicServiceChangeRequest> reqs;
> > > > > > }
> > > > > >
> > > > > > class DynamicServiceChangeRequest {
> > > > > >     IgniteUuid srvcId; // Unique service id (generates to deploy,
> > > > > > existing used to undeploy)
> > > > > >     ServiceConfiguration cfg; // Empty in case of undeploy
> > > > > >     byte flags; // Change’s types flags [deploy, undeploy, etc.]
> > > > > > }
> > > > > >
> > > > > > class ServicesAssignmentsRequestMessage {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy
> > and
> > > > > reassign
> > > > > >     Collection<IgniteUuid> srvcsToUndeploy;
> > > > > > }
> > > > > >
> > > > > > class ServicesSingleMapMessage {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
> > > > > > }
> > > > > >
> > > > > > class ServiceSingleDeploymentsResults {
> > > > > >     int cnt; // Deployed instances count, 0 in case of undeploy
> > > > > >     Collection<byte[]> errors; // Serialized exceptions to avoid
> > > > > > issues at spi-level
> > > > > > }
> > > > > >
> > > > > > class ServicesFullMapMessage  {
> > > > > >     ServicesDeploymentExchangeId exchId;
> > > > > >     Collection<ServiceFullDeploymentsResults> results;
> > > > > > }
> > > > > >
> > > > > > class ServiceFullDeploymentsResults {
> > > > > >     IgniteUuid srvcId;
> > > > > >     Map<UUID, ServiceSingleDeploymentsResults> results; // Per
> node
> > > > > > }
> > > > > >
> > > > > > class ServicesDeploymentExchangeId {
> > > > > >     UUID nodeId; // Initiated, joined or failed node id
> > > > > >     int evtType; // EVT_NODE_[JOIN/LEFT/FAILED],
> > > EVT_DISCOVERY_CUSTOM_EVT
> > > > > >     AffinityTopologyVersion topVer;
> > > > > >     IgniteUuid reqId; // Unique id of custom discovery message
> > > > > > }
> > > > > >
> > > > > > *COORDINATOR CHANGE*
> > > > > >
> > > > > > All server nodes handle requests of service’s state changes and
> put
> > > it
> > > > > > into deployment queue, but only coordinator process them. If
> > > > > > coordinator left or fail they will be processed on new
> coordinator.
> > > > > >
> > > > > > *TOPOLOGY CHANGE*
> > > > > >
> > > > > > Each topology change (NODE_JOIN/LEFT/FAILED event) causes
> service's
> > > > > > states deployment task. Assignments will be recalculated and
> > applied
> > > > > > for each deployed service.
> > > > > >
> > > > > > *CLUSTER ACTIVATION/DEACTIVATION*
> > > > > >
> > > > > > - On deactivation:
> > > > > >     * local services are being undeployed;
> > > > > >     * requests are not handling (including deployment /
> > > undeployment);
> > > > > > - On activation:
> > > > > >     * local services are being redeployed;
> > > > > >     * requests are handling as usual;
> > > > > >
> > > > > > *RELATED LINKS*
> > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 17%3A+Oil+Change+in+Service+Grid
> > > > > >
> > > > >
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Service-grid-redesign-td28521.html
> > > > > >
> > > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>

Re: Service Grid new design overview

Reply via email to