Data loss may occur during node restart when using LevelDB tiered storage
on a low write volume cluster.

Overview

This issue is limited to customers using LevelDB tiered storage. There is a
recognized error in the LevelDB tiered storage subsystem whereby if less
than 60MB of data is written per vnode before Riak is restarted, the data
will be unavailable for reads. The data is written to an incorrectly
located recovery log that is not found on restart. Once more than 60MB of
data is written to the vnode, no data will be lost upon restart.

Description

When using LevelDB tiered storage as the backend for Riak, LevelDB creates
the first (and only the first) recovery log per vnode, 0000xxxx.log, in an
incorrect location. This first recovery log file is only used by LevelDB if
the Riak server restarts prior to committing it permanently. LevelDB
obsoletes a recovery file once it writes the newly arrived data to a long
term storage file (an .sst table file). All subsequent recovery files exist
in the location anticipated by LevelDB's startup procedures. The data loss
is therefore limited to the contents of this first recovery log and only if
LevelDB has not subsequently rewritten the data to long term storage.

The only symptom of this issue is that the data within the first recovery
is unavailable for reads after a restart. However, Riak has several
resiliency features that mitigate the likelihood of the read failure. Riak
defaults to a replication factor of n_val = 3. This means that Riak is
writing the data to 3 different locations. Therefore, all 3 locations must
restart within the same short period for data loss to occur. Otherwise,
Riak will automatically correct individual nodes with missing data from
other nodes via its read-repair and/or AAE features.

Affected Users

This issue will affect you if ALL of these conditions are true:

- You are using the LevelDB backend, AND
- LevelDB is configured to use tiered storage (leveldb.tiered settings in
riak.conf), AND
- All Riak nodes responsible for the n_val copies of the data are restarted
before LevelDB rewrites the initial recovery log into a permanent .sst
table file. (60MB data per vnode).

Mitigation Strategy
This issue can be mitigated in an existing Riak installation by creating a
soft link between the incorrect log location and the fast tier directory.
The same steps can be used if you need to create a fresh install with
tiered storage before starting Riak the first time.

To mitigate the issue, follow these steps:

- Identify where the first LevelDB files will be written to - look in
leveldb.data_root
- Move the existing leveldb.data_root directory out of the way
- Identify where the fast tiered storage directory is -
{leveldb.tiered.path.fast}/{leveldb.data_root}
- Create a symbolic link from the fast path data_root to the standard
data_root.

Step by Step instructions can be found here:
http://docs.basho.com/riak/latest/community/product-advisories/leveldbrestart/#Mitigation-Strategy
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to