Service Grid new design overview

Vyacheslav Daradur Thu, 23 Aug 2018 04:45:01 -0700

Hi, Igniters!

I’m working on Service Grid redesign tasks and design seems to be finished.


The main goal of Service Grid redesign is to provide missed guarantees:
- Synchronous services deployment/undeployment;
- Failover on coordinator change;
- Propagation of deployments errors across the cluster;
- Introduce of a deployment failures policy;
- Prevention of deployments initiators hangs while deployment;
- etc.

I’d like to ask the community their thoughts about the proposed design
to be sure that all important things have been considered.

Also, I have a question about services migration from AI 2.6 to a new
solution. It’s very hard to provide tools for users migration, because
of significant changes. We don’t use utility cache anymore. Should we
spend time on this?

I’ve prepared a definition of new Service Grid design, it’s described below:

*OVERVIEW*

All nodes (servers and clients) are able to host services, but the
client nodes are excluded from service deployment by default. The only
way to deploy service on clients nodes is to specify node filter in
ServiceConfiguration.

All deployed services are identified internally by “serviceId”
(IgniteUuid). This allows us to build a base for such features as hot
redeployment and service’s versioning. It’s important to have the
ability to identify and manage services with the same name, but
different version.

All actions on service’s state change are processed according to unified flow:
1) Initiator sends over disco-spi a request to change service state
[deploy, undeploy] DynamicServicesChangeRequestBatchMessage which will
be stored by all server nodes in own queue to be processed, if
coordinator failed, at new coordinator;
2) Coordinator calculates assignments and defines actions in a new
message ServicesAssignmentsRequestMessage and sends it over disco-spi
to be processed by all nodes;
3) All nodes apply actions and build single map message
ServicesSingleMapMessage that contains services id and amount of
instances were deployed on this single node and sends the message over
comm-spi to coordinator (p2p);
4) Once coordinator receives all single map messages then it builds
ServicesFullMapMessage that contains services deployments across the
cluster and sends message over disco-spi to be processed by all nodes;

*MESSAGES*

class DynamicServicesChangeRequestBatchMessage {
    Collection<DynamicServiceChangeRequest> reqs;
}

class DynamicServiceChangeRequest {
    IgniteUuid srvcId; // Unique service id (generates to deploy,
existing used to undeploy)
    ServiceConfiguration cfg; // Empty in case of undeploy
    byte flags; // Change’s types flags [deploy, undeploy, etc.]
}

class ServicesAssignmentsRequestMessage {
    ServicesDeploymentExchangeId exchId;
    Map<IgniteUuid, Map<UUID, Integer>> srvcsToDeploy; // Deploy and reassign
    Collection<IgniteUuid> srvcsToUndeploy;
}

class ServicesSingleMapMessage {
    ServicesDeploymentExchangeId exchId;
    Map<IgniteUuid, ServiceSingleDeploymentsResults> results;
}

class ServiceSingleDeploymentsResults {
    int cnt; // Deployed instances count, 0 in case of undeploy
    Collection<byte[]> errors; // Serialized exceptions to avoid
issues at spi-level
}

class ServicesFullMapMessage  {
    ServicesDeploymentExchangeId exchId;
    Collection<ServiceFullDeploymentsResults> results;
}

class ServiceFullDeploymentsResults {
    IgniteUuid srvcId;
    Map<UUID, ServiceSingleDeploymentsResults> results; // Per node
}

class ServicesDeploymentExchangeId {
    UUID nodeId; // Initiated, joined or failed node id
    int evtType; // EVT_NODE_[JOIN/LEFT/FAILED], EVT_DISCOVERY_CUSTOM_EVT
    AffinityTopologyVersion topVer;
    IgniteUuid reqId; // Unique id of custom discovery message
}

*COORDINATOR CHANGE*

All server nodes handle requests of service’s state changes and put it
into deployment queue, but only coordinator process them. If
coordinator left or fail they will be processed on new coordinator.

*TOPOLOGY CHANGE*

Each topology change (NODE_JOIN/LEFT/FAILED event) causes service's
states deployment task. Assignments will be recalculated and applied
for each deployed service.

*CLUSTER ACTIVATION/DEACTIVATION*

- On deactivation:
    * local services are being undeployed;
    * requests are not handling (including deployment / undeployment);
- On activation:
    * local services are being redeployed;
    * requests are handling as usual;

*RELATED LINKS*

https://cwiki.apache.org/confluence/display/IGNITE/IEP-17%3A+Oil+Change+in+Service+Grid
http://apache-ignite-developers.2346864.n4.nabble.com/Service-grid-redesign-td28521.html


-- 
Best Regards, Vyacheslav D.

Service Grid new design overview

Reply via email to