Re: Production deployment requirements for memory backend storage

Charlie Voiselle Wed, 12 Apr 2017 12:01:32 -0700

Neeraj:

Thanks for you interest in Riak. I will copy your questions into this email for 
reference and answer them inline.

1. What is the “platform_data_dir” used for when memory is used as storage 
backend? Is it only needed for active anti-entropy and cluster metadata? Do I 
need to persist this data i.e. if a node goes down and restarts in this 
configuration, is persistence of data in “platform_data_dir” required.
As you have pointed out, the platform_data_dir contains more than just the 
actual data stored in the cluster. There are three folders that must be 
persisted for a node to remain a member of a cluster and to not create issues 
with the sizes of the vector clocks internal to the objects. They are:

ring - The binary files that describe the cluster and the vnode ownership 
mappings. Deleting this folder will cause the node to start up and create a new 
default ring. This default ring will allocate 100% of the partitions to that 
node. This is non-fatal and is resolved by rejoining the node to the cluster. 
This extra work can be avoided by persisting the ring file properly.

cluster_meta - This folder contains the properties for bucket types and typed 
custom buckets.

kv_vnode - This folder contains generated actor-ids for each Riak vnode. The 
routine loss of this directory will cause orphaned vnode actor-ids to 
potentially accumulate in objects’ vclocks.

Active anti-entropy is a process to prevent bit-rot in long-lived data. Since 
your questions we concerning ephemeral data, we would recommend that it be 
disabled because there are overheads in creating and maintaining the trees that 
make no sense for ephemeral data.

2. What is the minimum memory requirement of an empty Riak node in this 
configuration?
On a sample node that I brought up, an empty Riak KV 2.2.3, the beam.smp 
process was using 1.5 gb of RAM with an empty memory backend and AAE-disabled.

3. What is the minimum disk and CPU requirement of a Riak node in this 
configuration?
There are a few variables that dictate how much actual disk throughput you will 
use in a Riak cluster that only uses the memory backend-logging overhead, ring 
changes, and cluster metadata changes.

Logging throughput is determined by the general health of the cluster and is 
minimal in clusters that are well-behaved. The logfiles themselves have 
configurable size caps and set numbers of rotations (by default 5 logs capped 
at 50mb for each file). There are some other logfiles that are not managed by 
lager and they can grow beyond these expected limits. If you are building nodes 
optimized for storage, you will want to monitor the size of this folder and 
trim it as appropriate.

The ring is a data structure that is used to hold information about the 
cluster’s membership, the node capabilities, MDC replication configuration, and 
the legacy custom bucket metadata. In stable clusters that are using no custom 
buckets the impact of writes to the ring is negligible; however there are 
certain antipatterns involving the creation of a large number of buckets with 
custom properties in the “default” bucket type that will bloat the ring file 
and result in a large amount of ring gossip.

Finally, Riak bucket types and their properties as well as the custom bucket 
properties of typed buckets is stored in cluster-metadata. This backend is a 
dets-based store that uses hashtree comparisons to maintain consistency across 
members of the cluster. This backend’s storage also depends on the amount and 
speed with which you create metadata within your cluster.

There is more generically-applicable information about [cluster capacity 
planning] 
<http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/> in the 
Riak KV documentation.

Thanks again for your interest,

Charlie Voiselle
Sr. Product Manager, Riak KV/Clients
Basho Technologies
@angrycub

[cluster capacity planning] - 
http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/ 
<http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/>

> On Apr 10, 2017, at 3:30 PM, Neeraj Poddar <n.pod...@f5.com> wrote:
> 
> Hello,
>  
> I wanted to understand the production requirements for using Riak as a 
> non-persistent ephemeral data store. In particular the following questions 
> relate to using Riak with “memory” configured as storage backend:
>  
> 1.       What is the “platform_data_dir” used for when memory is used as 
> storage backend? Is it only needed for active anti-entropy and cluster 
> metadata? Do I need to persist this data i.e. if a node goes down and 
> restarts in this configuration, is persistence of data in “platform_data_dir” 
> required.
> 2.       What is the minimum memory requirement of an empty Riak node in this 
> configuration?
> 3.       What is the minimum disk and CPU requirement of a Riak node in this 
> configuration?
>  
> -- 
> Regards,
> Neeraj Poddar
>  
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Production deployment requirements for memory backend storage

Reply via email to