[ovs-dev] RFC: OVN database options

Ben Pfaff Wed, 09 Mar 2016 23:12:03 -0800

Requirements
============

OVN uses two databases, the "northbound" and "southbound" databases,
in a somewhat idiosyncratic manner.  Each client of one of these
databases maintains an in-memory replica of the database (or some
subset of it), and the server sends it updates to this replica as they
are committed.  Thus, at any given time, a client has a consistent
snapshot of the database, although it might be old if the database has
changed but the updates have not yet made it from the server to the
client.


Beyond supporting this usage model, the basic requirements for the OVN
use case are:

    - Size: 20 MB to 100 MB of data (estimated database size to hold
      data for our target scale of 1,000 hypervisors and 20,000
      logical ports).

    - Scale: The northbound database has only a single-digit number of
      clients.  Each hypervisor is a client to the southbound
      database, so about 1,000 clients for our target scale of 1,000
      hypervisors.

    - Performance: Hundreds of transactions per second.  (Because of
      the usage model described above, all transactions are write
      transactions; clients read from their local replicas.)

    - Transactions: Clients expect atomic, consistent, isolated
      transactions.

      Durability is not essential, because the clients will reissue
      lost transactions (up to and including completely refilling an
      empty database, although this can be slow).

    - High availability: If the database server goes down, then this
      freezes the OVN configuration.  This is OK briefly for running
      clients--the existing configuration continues to work, it just
      can't be updated--but it prevents new clients or clients that
      restart from using OVN at all.

      For the same reason that durability is not essential, it is
      acceptable if an occasional fail-over between database servers
      loses a few transactions, though of course it's best to minimize
      the probability and the amount of data lost.

    - Open source.  Some "open source" databases only provide high
      availability and transactions as proprietary extensions; that's
      undesirable.

Desirable features:

    - C client, since OVN is written in C; otherwise, we'll likely
      have to write one.  (We've had suggestions that OVN should be
      written in another language, such as Java, but we have not
      decided to change the language yet.)

    - Python client, since OVS includes tools written in Python.

    - Table structured.  We could layer tables on top of a key-value
      store if necessary.

    - Schema support, with referential integrity constraints.  We find
      this helpful for increasing our confidence in the system.  This
      is something that we could leave out or layer on top.

    - Network protocol.  Some databases are just designed for local
      access.  If such a database were otherwise just right, we could
      wrap it for distributed use.  The analysis below mostly ignores
      databases that are local-only or in which remote access appears
      to be an afterthought.


Options
=======

Each entry has the columns listed below.  In general, all-caps answers
are problematic for the OVN use case.

    - Database: The database being evaluated.

    - txn: "yes" if the database supports transactions across
      arbitrary data, "NO" if its transactions are limited to a single
      data item, such as a single key-value pair, or perhaps even more
      limited.

    - ACID: The transactional properties that the database supports,
      within the transactions that the database supports.  (Thus, a
      database whose transactions cover only a single data item can be
      listed as ACID, but this is only for those limited
      transactions.)

    - consist: The distributed consistency model that the database
      supports, one of "strong" for strong or linearizable
      consistency, "tunable" for consistency that can be tuned to be
      strong or linearizable or weaker, or "EVNTUAL" for eventual
      consistency.

    - trk: "yes" if the database can automatically report data changes
      to clients, "NO" if the database requires clients to poll for
      changes.

    - HA: "yes" if the database can be configured for high
      availability, so that loss of a single node does not stop
      database activity, "NO" otherwise.

    - OS: "yes" if the database is open source or free (libre)
      software, "NO" if it is proprietary.  When a database has open
      source and proprietary editions, this is "yes" and only the
      features in the open source edition are credited in other
      columns.

    - C: "yes" if the database has a C (not C++) client library, "NO"
      otherwise.

    - Python: "yes" if the database has a Python client library, "NO"
      otherwise.

    - format: The database's data model.  "sql", "db", "table",
      "multi" all indicate that OVN could directly use the data model,
      "KV" or "JSON" that OVN's data model would have to be overlaid
      on it.

Database       txn  ACID  consist  trk   HA   OS    C   Py  format
-------------  ---  ----  -------  ---  ---  ---  ---  ---  ------
ActorDB        yes  ACID   strong   NO  yes  yes  yes  yes     sql
Aerospike      yes  ACID   strong   NO  yes  yes  yes  yes   db/KV
Cassandra       NO  -C-D  tunable   NO  yes  yes   NO  yes   table
Cockroach DB   yes  ACID   strong   NO  yes  yes   ?    ?      sql
Couchbase       NO  ????     ????   NO  yes  NO?  yes  yes    JSON
CrateIO         NO  ????  EVNTUAL   NO  yes  yes   NO  yes     sql
etcd            NO  ACID   strong  yes? yes  yes  yes  yes      KV
Gigaspaces XAP yes  ACID   strong  yes  yes   NO   NO   NO   multi
HBase           NO  ACID   strong   NO  yes  yes   NO  yes   table
Hyperdex       yes  ACID   strong   NO  yes   NO  yes  yes      KV
Hypertable      NO  ????     ????   NO  yes  yes   NO  yes   table
MongoDB         NO  ACID   strong   ??  yes  yes  yes  yes    JSON
RAMCloud       yes  ????   strong   NO  yes  yes   NO  yes      KV
Redis          yes  -C?D     ????   NO  yes  yes  yes  yes      KV
Riak            NO  ---D  EVNTUAL   NO  yes  yes  yes  yes      KV
Scalaris       yes  ACI-   strong   NO  yes  yes   NO  yes      KV
ScyllaDB        NO  -C-D  tunable   NO  yes  yes   NO  yes   table
Voldemort       NO  ????  EVNTUAL   NO  yes  yes   NO  yes      KV
Zookeeper      yes  AC-D   strong  yes  yes  yes  yes  yes      KV

OVSDB          yes  ACID   strong  yes   NO  yes  yes  yes   table


Analysis
========

The most troublesome part of the OVN use case is the idiosyncratic use
of the database to maintain state, immediately distributing changes to
all of the clients.  As the "trk" column above shows, most databases
don't support this mode of operation.  Possibly this means that OVN is
misusing the concept of a database and should be redesigned not to use
a database; if so, that's a bigger discussion.

Assuming that we wish to retain this requirement, then only the
following databases appear to support the feature to an acceptable
extent:

    - etcd.  etcd appears to allow clients to receive a notification
      when keys change.  A client might be able to bootstrap
      monitoring of entire tables on top of this feature.  Perhaps
      this would require registering for notification separately on
      all of the keys that would be used to simulate a table on top of
      the etcd key-value store; if so, that would probably be
      unreasonable.  Assuming that is not a problem or can be
      overcome, it would also be necessary to make sure that the new
      values of all of the modified keys could be obtained in a way
      such that the client's view reflects a consistent snapshot of
      the database contents.

    - Gigaspaces XAP.  Not open source.

    - Zookeeper.  The issues here are similar to those for etcd.
      Also, Zookeeper transactions don't seem to be isolated.

    - OVSDB.  If we choose to use OVSDB, we'll have to add
      high-availability support.  Also, the table doesn't mention
      scaling, since it's hard to compare objectively, but the OVSDB
      server currently doesn't scale well to the 1000 clients required
      for the southbound database, although Andy has started working
      on that.


Recommendation
==============

I'm intentionally not offering a recommendation, because I want to start
a discussion.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] RFC: OVN database options

Reply via email to