Re: [DISCUSSION] Hot cache backup

Andrey Dolmatov Wed, 08 Apr 2020 14:53:21 -0700

I would like to understand your solution deeper. Hope, that my questions
are interesting not only for me:


   - What about primary/backup node data consistency. I found, that [1]
   Cassandra uses eventually consistent backups, so some backup data could
   miss from snapshot. If I apply snapshot, would Ignite detect and rebalance
   data to backup nodes?
   - I cant quite picture how persistence rebalancing works, but according
   to [2] it uses WAL logs. Snapshot doesn't contain WAL data, correct? Did
   You analyze alternative snapshot solutions based on WAL?

[1]
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAboutSnapshots.html
[2]
https://cwiki.apache.org/confluence/display/IGNITE/Persistent+Store+Architecture#PersistentStoreArchitecture-Rebalancing

ср, 8 апр. 2020 г. в 18:22, Maxim Muzafarov <mmu...@apache.org>:

> Andrey,
>
>
> Thanks for your questions, I've also clarified some details on the
> IEP-43 [1] page according to them.
>
> > Does snapshot contain only primary data or backup partitions or both?
>
> A snapshot contains a full copy of persistence data on each local
> node. This means all primary, backup partitions and the SQL index file
> available on the local node are copied to snapshot.
>
> > Could I create snapshot from m-node cluster and apply it to n-node
> cluster (n<>m)?
>
> Currently, the restore procedure is fully manual, but it is possible
> to restore on different topology in general. There are a few options
> here:
> - m == n, the easiest and fastest way
> - m < n, cluster will start and the rebalance will happen (see
> testClusterSnapshotWithRebalancing in PR). If some SQL indexes exist
> it may take a quite a long time to complete.
> - m > n, the hardest case. For instance, if backups > 1 you can start
> a cluster and remove node one by one from baseline. I think this case
> should be covered by additional recovery scripts which will be
> developed further.
>
> > - Should data node has extra space on persistent store to create
> snapshot? Or, from another point of view, woild size of temporary file be
> equal to size of all data on cluster node?
>
> If a cluster has no load you will need only a free space to store
> snapshot which is almost equal to the node `db` directory size.
>
> If a cluster is under the load it needs some extra space to store
> intermediate snapshot results. The amount of such space depends on how
> fast cache partition files are copied to snapshot directory (if disks
> are slow). The maximum size of the temporary file per each partition
> is equal to the size of the appropriate partition file. So, the worst
> case you need x3 extra disk size. But according to my measurements
> assume SSD is used and size of each partition is 300MB it will require
> no more than 1-3% to a cluster under high load.
>
> - What resulted snapshot is, single file or collection of files (one
> for every data node)?
>
> Check the example of the snapshot directory structure on the IEP-43
> page [1], this is how a completed snapshot will look like.
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Restoresnapshot(manually)
>
> On Wed, 8 Apr 2020 at 17:18, Andrey Dolmatov <itdolma...@gmail.com> wrote:
> >
> > Hi, Maxim!
> > It is very useful feature, great job!
> >
> > But could you explain me some aspects?
> >
> >    - Does snapshot contain only primary data or backup partitions or
> both?
> >    - Could I create snapshot from m-node cluster and apply it to n-node
> >    cluster (n<>m)?
> >    - Should data node has extra space on persistent store to create
> >    snapshot? Or, from another point of view, woild size of temporary
> file be
> >    equal to size of all data on cluster node?
> >    - What resulted snapshot is, single file or collection of files (one
> for
> >    every data node)?
> >
> > I apologize for my questions, but i really interested in such feature.
> >
> >
> > вт, 7 апр. 2020 г. в 22:10, Maxim Muzafarov <maxmu...@gmail.com>:
> >
> > > Igniters,
> > >
> > >
> > > I'd like to back to the discussion of a snapshot operation for Apache
> > > Ignite for persistence cache groups and I propose my changes below. I
> > > have prepared everything so that the discussion is as meaningful and
> > > specific as much as possible:
> > >
> > > - IEP-43: Cluster snapshot [1]
> > > - The Jira task IGNITE-11073 [2]
> > > - PR with described changes, Patch Available [4]
> > >
> > > Changes are ready for review.
> > >
> > >
> > > Here are a few implementation details and my thoughts:
> > >
> > > 1. Snapshot restore assumed to be manual at the first step. The
> > > process will be described on our documentation pages, but it is
> > > possible to start node right from the snapshot directory since the
> > > directory structure is preserved (check
> > > `testConsistentClusterSnapshotUnderLoad` in the PR). We also have some
> > > options here about how the restore process must look like:
> > > - fully manual snapshot restore (will be documented)
> > > - ansible or shell scripts for restore
> > > - Java API for restore (I doubt we should go this way).
> > >
> > > 3. The snapshot `create` procedure creates a snapshot of all
> > > persistent caches available on the cluster (see limitations [1]).
> > >
> > > 2. The snapshot `create` procedure is available through Java API and
> > > JMX (control.sh may be implemented further).
> > >
> > > Java API:
> > > IgniteFuture<Void> fut = ignite.snapshot()
> > > .createSnapshot(name);
> > >
> > > JMX:
> > > SnapshotMXBean mxBean = getMBean(ignite.name());
> > > mxBean.createSnapshot(name);
> > >
> > > 3. The Distribute Process [3] is used to perform a cluster-wide
> > > snapshot procedure, so we've avoided a lot of boilerplate code here.
> > >
> > > 4. The design document [1] contains also an internal API for creating
> > > a consistent local snapshot of requested cache groups and transfer it
> > > to another node using the FileTransmission protocol [6]. This is one
> > > of the parts of IEP-28 [5] for cluster rebalancing via partition files
> > > and an important part for understanding the whole design.
> > >
> > > Java API:
> > > public IgniteInternalFuture<Void> createRemoteSnapshot(
> > >     UUID rmtNodeId,
> > >     Map<Integer, Set<Integer>> parts,
> > >     BiConsumer<File, GroupPartitionId> partConsumer);
> > >
> > >
> > > Please, share your thoughts and take a loot at my changes [4].
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots
> > > [2] https://issues.apache.org/jira/browse/IGNITE-11073
> > > [3]
> > >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/util/distributed/DistributedProcess.java#L49
> > > [4] https://github.com/apache/ignite/pull/7607
> > > [5]
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing#IEP-28:Clusterpeer-2-peerbalancing-Filetransferbetweennodes
> > > [6]
> > >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/managers/communication/TransmissionHandler.java#L42
> > >
> > >
> > > On Thu, 28 Feb 2019 at 14:43, Dmitriy Pavlov <dpav...@apache.org>
> wrote:
> > > >
> > > > Hi Maxim,
> > > >
> > > > I agree with Denis and I have just one concern here.
> > > >
> > > > Apache Ignite has quite a long story (started even before Apache),
> and
> > > now
> > > > it has a way too huge number of features. Some of these features
> > > > - are developed and well known by community members,
> > > > - some of them were contributed a long time ago and nobody develops
> it,
> > > > - and, actually, in some rare cases, nobody in the community knows
> how it
> > > > works and how to change it.
> > > >
> > > > Such features may attract users, but a bug in it may ruin impression
> > > about
> > > > the product. Even worse, nobody can help to solve it, and only user
> > > himself
> > > > or herself may be encouraged to contribute a fix.
> > > >
> > > > And my concern here, such a big feature should have a number of
> > > interested
> > > > contributors, who can support it in case if others lost interest. I
> will
> > > be
> > > > happy if 3-5 members will come and say, yes, I will do a review/I
> will
> > > help
> > > > with further changes.
> > > >
> > > > Just to be clear, I'm not against it, and I'll never cast -1 for it,
> but
> > > it
> > > > would be more comfortable to develop this feature with understanding
> that
> > > > this work will not be useless.
> > > >
> > > > Sincerely,
> > > > Dmitriy Pavlov
> > > >
> > > > ср, 27 февр. 2019 г. в 23:36, Denis Magda <dma...@apache.org>:
> > > >
> > > > > Maxim,
> > > > >
> > > > > GridGain has this exact feature available for Ignite native
> persistence
> > > > > deployments. It's not as easy as it might have been seen from the
> > > > > enablement perspective. Took us many years to make it production
> ready,
> > > > > involving many engineers. If the rest of the community wants to
> create
> > > > > something similar and available in open source then please take
> this
> > > > > estimate into consideration.
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Wed, Feb 27, 2019 at 8:53 AM Maxim Muzafarov <
> maxmu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > Some of the stores with which the Apache Ignite is often
> compared has
> > > > > > a feature called Snapshots [1] [2]. This feature provides an
> > > > > > eventually consistent view on stored data for different purposes
> > > (e.g.
> > > > > > moving data between environments, saving a backup of data for the
> > > > > > further restore procedure and so on). The Apache Ignite has all
> > > > > > opportunities and machinery to provide cache and\or data region
> > > > > > snapshots out of the box but still don't have them.
> > > > > >
> > > > > > This issue derives from IEP-28 [5] on which I'm currently
> working on
> > > > > > (partially described in the section [6]). I would like to solve
> this
> > > > > > issue too and make Apache Ignite more attractive to use on a
> > > > > > production environment. I've haven't investigated in-memory type
> > > > > > caches yet, but for caches with enabled persistence, we can do it
> > > > > > without any performance impact on cache operations (some
> additional
> > > IO
> > > > > > operations are needed to copy cache data to backup store, copy on
> > > > > > write technique is used here). We just need to use our
> DiscoverySpi,
> > > > > > PME and Checkpointer process the right way.
> > > > > >
> > > > > > For the first step, we can store all backup data on each of cache
> > > > > > affinity node locally. For instance, the
> `backup\snapshotId\cache0`
> > > > > > folder will be created and all `cache0` partitions will be stored
> > > > > > there for each local node for the snapshot process with id
> > > > > > `snapshotId`. In future, we can teach nodes to upload snapshotted
> > > > > > partitions to the one remote node or cloud.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > High-level process overview
> > > > > >
> > > > > > A new snapshot process is managed via DiscoverySpi and
> > > > > > CommunicationSpi messages.
> > > > > >
> > > > > > 1. The initiator sends a request to the cluster
> (DiscoveryMessage).
> > > > > > 2. When the node receives a message it initiates PME.
> > > > > > 3. The node begins checkpoint process (holding write lock a short
> > > time)
> > > > > > 4. The node starts to track any write attempts to the
> snapshotting
> > > > > > partition and places the copy of original pages to the temp file.
> > > > > > 5. The node performs merge the partition file with the
> corresponding
> > > > > delta.
> > > > > > 6. When the node finishes the backup process it sends ack message
> > > with
> > > > > > saved partitions to the initiator (or the error response).
> > > > > > 7. When all ack messages received the backup is finished.
> > > > > >
> > > > > > The only problem here is that when the request message arrives
> at the
> > > > > > particular node during running checkpoint PME will be locked
> until it
> > > > > > ends. This is not good. But hopefully, it will be fixed here [4].
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Probable API
> > > > > >
> > > > > > From the cache perspective:
> > > > > >
> > > > > > IgniteFuture<IgniteSnapshot> snapshotFut =
> > > > > >     ignite.cache("default")
> > > > > >         .shapshotter()
> > > > > >         .create("myShapshotId");
> > > > > >
> > > > > > IgniteSnapshot cacheSnapshot = snapshotFut.get();
> > > > > >
> > > > > > IgniteCache<K, V> copiedCache =
> > > > > >     ignite.createCache("CopyCache")
> > > > > >         .withConfiguration(defaultCache.getConfiguration())
> > > > > >         .loadFromSnapshot(cacheSnapshot.id());
> > > > > >
> > > > > > From the command line perspective:
> > > > > >
> > > > > > control.sh --snapshot take cache0,cache1,cache2
> > > > > >
> > > > > > --
> > > > > >
> > > > > > WDYT?
> > > > > > Will it be a useful feature for the Apache Ignite?
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > >
> https://geode.apache.org/docs/guide/10/managing/cache_snapshots/chapter_overview.html
> > > > > > [2]
> > > > > >
> > > > >
> > >
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html
> > > > > > [3]
> > > > > >
> > > > >
> > >
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-Snapshots-in-Ignite-td4183.html
> > > > > > [4] https://issues.apache.org/jira/browse/IGNITE-10508
> > > > > > [5]
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing
> > > > > > [6]
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing#IEP-28:Clusterpeer-2-peerbalancing-Checkpointer
> > > > > >
> > > > >
> > >
>

Re: [DISCUSSION] Hot cache backup

Reply via email to