Best practices on how to approach data centre aware affinity

Courtney Robinson Wed, 04 Aug 2021 23:41:01 -0700

Hi all,
Our growth with Ignite continues and as we enter the next phase, we need to
support multi-cluster deployments for our platform.
We deploy Ignite and the rest of our stack in Kubernetes and we're in the
early stages of designing what a multi-region deployment should look like.
We are 90% SQL based when using Ignite, the other 10% includes Ignite
messaging, Queues and compute.


In our case we have thousands of tables

CREATE TABLE IF NOT EXISTS Person (
  id int,
  city_id int,
  name varchar,
  company_id varchar,
  PRIMARY KEY (id, city_id)) WITH "template=...";

In our case, most tables use a template that looks like this:

partitioned,backups=2,data_region=hypi,cache_group=hypi,write_synchronization_mode=primary_sync,affinity_key=instance_id,atomicity=ATOMIC,cache_name=Person,key_type=PersonKey,value_type=PersonValue

I'm aware of affinity co-location (
https://ignite.apache.org/docs/latest/data-modeling/affinity-collocation)
and in the past when we used the key value APIs more than SQL we also used
custom affinity a function to control placement.

What I don't know is how to best do this with SQL defined caches.
We will have at least 3 Kubernetes clusters, each in a different data
centre, let's say EU_WEST, EU_EAST, CAN0

Previously we provided environment variables that our custom affinity
function would use and we're thinking of providing the data centre name
this way.

We have 2 backups in all cases + the primary and so we want the primary in
one DC and each backup to be in a different DC.

There is no syntax in the SQL template that we could find to enables
specifying a custom affinity function.
Our instance_id column currently used has no common prefix or anything to
associate with a DC.

We're thinking of getting the cache for each table and then setting the
affinity function to replace the default RendevousAffinityFunction the way
we did before we switched to SQL.
Something like this:

repo.ctx.ignite.cache("Person").getConfiguration(org.apache.ignite.configuration.CacheConfiguration)
.setAffinity(new org.apache.ignite.cache.affinity.AffinityFunction() {
    ...
})


There are a few things unclear about this:

   1. Is this the right approach?
   2. How do we handle existing data, changing the affinity function will
   cause Ignite to not be able to find existing data right?
   3. How would you recommend implementing the affinity function to be
   aware of the data centre?
   4. Are there any other caveats we need to be thinking about?

There is a lot of existing data, we want to try to avoid a full copy/move
to new tables if possible, that will prove to be very difficult in
production.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io

Best practices on how to approach data centre aware affinity

Reply via email to