Hi Dikang,

Cool stuff. 2 questions. Based on your presentation at ngcc, it seems like 
rocks db stores things in byte order. Does this mean that you have code that 
makes each of the existing types byte comparable, or is clustering order 
implementation dependent? Also, I don't see anything in the draft api that 
seems to support splitting the data set into arbitrary categories (ie repaired 
and unrepaired data living in the same token range). Is support for incremental 
repair planned for v1?

Thanks,

Blake


On October 4, 2017 at 1:28:01 PM, Dikang Gu (dikan...@gmail.com) wrote:

Hello C* developers: 

In my previous email 
(https://www.mail-archive.com/dev@cassandra.apache.org/msg11024.html), I 
presented that Instagram was kicking off a project to make C*'s storage engine 
to be pluggable, as other modern databases, like mysql, mongoDB etc, so that 
users will be able to choose most suitable storage engine for different work 
load, or to use different features. In addition to that, a pluggable storage 
engine architecture will improve the modularity of the system, help to increase 
the testability and reliability of Cassandra.

After months of development and testing, we'd like to share the work we have 
done, including the first(draft) version of the C* storage engine API, and the 
first version of the RocksDB based storage engine.

​


For the C* storage engine API, here is the draft version we proposed, 
https://docs.google.com/document/d/1PxYm9oXW2jJtSDiZ-SR9O20jud_0jnA-mW7ttp2dVmk/edit.
 It contains the APIs for read/write requests, streaming, and table management. 
The storage engine related functionalities, like data encoding/decoding format, 
on-disk data read/write, compaction, etc, will be taken care by the storage 
engine implementation.

Each storage engine is a class with each instance of the class is stored in the 
Keyspace instance. So all the column families within a keyspace will share one 
storage engine instance.

Once a storage engine instance is created, Cassandra sever issues commands to 
the engine instance to performance data storage and retrieval tasks such as 
opening a column family, managing column families and streaming.

How to config storage engine for different keyspaces? It's still open for 
discussion. One proposal is that we can add the storage engine option in the 
create keyspace cql command, and potentially we can overwrite the option per C* 
node in its config file.

Under that API, we implemented a new storage engine, based on RocksDB, called 
RocksEngine. In long term, we want to support most of C* existing features in 
RocksEngine, and we want to build it in a progressive manner. For the first 
version of the RocksDBEngine, we support following features:
Most of non-nested data types
Table schema
Point query
Range query
Mutations
Timestamp
TTL
Deletions/Cell tombstones
Streaming
We do not supported following features in first version yet:
Multi-partition query
Nested data types
Counters
Range tombstone
Materialized views
Secondary indexes
SASI
Repair
At this moment, we've implemented the V1 features, and deployed it to our 
shadow cluster. Using shadowing traffic of our production use cases, we saw ~3X 
P99 read latency drop, compared to our C* 2.2 prod clusters. Here are some 
detailed metrics: 
https://docs.google.com/document/d/1DojHPteDPSphO0_N2meZ3zkmqlidRwwe_cJpsXLcp10.

So if you need the features in existing storage engine, please keep using the 
existing storage engine. If you want to have a more predictable and lower read 
latency, also the features supported by RocksEngine are enough for your use 
cases, then RocksEngine could be a fit for you.

The work is 1% finished, and we want to work together with community to make it 
happen. We presented the work in NGCC last week, and also pushed the beta 
version of the pluggable storage engine to Instagram github Cassandra repo, 
rocks_3.0 branch (https://github.com/Instagram/cassandra/tree/rocks_3.0), which 
is based on C* 3.0.12, please feel free to play with it! You can download it 
and follow the instructions 
(https://github.com/Instagram/cassandra/blob/rocks_3.0/StorageEngine.md) to try 
it out in your test environment, your feedback will be very valuable to us.

Thanks
Dikang.

Reply via email to