Requirements ============ OVN uses two databases, the "northbound" and "southbound" databases, in a somewhat idiosyncratic manner. Each client of one of these databases maintains an in-memory replica of the database (or some subset of it), and the server sends it updates to this replica as they are committed. Thus, at any given time, a client has a consistent snapshot of the database, although it might be old if the database has changed but the updates have not yet made it from the server to the client.
Beyond supporting this usage model, the basic requirements for the OVN use case are: - Size: 20 MB to 100 MB of data (estimated database size to hold data for our target scale of 1,000 hypervisors and 20,000 logical ports). - Scale: The northbound database has only a single-digit number of clients. Each hypervisor is a client to the southbound database, so about 1,000 clients for our target scale of 1,000 hypervisors. - Performance: Hundreds of transactions per second. (Because of the usage model described above, all transactions are write transactions; clients read from their local replicas.) - Transactions: Clients expect atomic, consistent, isolated transactions. Durability is not essential, because the clients will reissue lost transactions (up to and including completely refilling an empty database, although this can be slow). - High availability: If the database server goes down, then this freezes the OVN configuration. This is OK briefly for running clients--the existing configuration continues to work, it just can't be updated--but it prevents new clients or clients that restart from using OVN at all. For the same reason that durability is not essential, it is acceptable if an occasional fail-over between database servers loses a few transactions, though of course it's best to minimize the probability and the amount of data lost. - Open source. Some "open source" databases only provide high availability and transactions as proprietary extensions; that's undesirable. Desirable features: - C client, since OVN is written in C; otherwise, we'll likely have to write one. (We've had suggestions that OVN should be written in another language, such as Java, but we have not decided to change the language yet.) - Python client, since OVS includes tools written in Python. - Table structured. We could layer tables on top of a key-value store if necessary. - Schema support, with referential integrity constraints. We find this helpful for increasing our confidence in the system. This is something that we could leave out or layer on top. - Network protocol. Some databases are just designed for local access. If such a database were otherwise just right, we could wrap it for distributed use. The analysis below mostly ignores databases that are local-only or in which remote access appears to be an afterthought. Options ======= Each entry has the columns listed below. In general, all-caps answers are problematic for the OVN use case. - Database: The database being evaluated. - txn: "yes" if the database supports transactions across arbitrary data, "NO" if its transactions are limited to a single data item, such as a single key-value pair, or perhaps even more limited. - ACID: The transactional properties that the database supports, within the transactions that the database supports. (Thus, a database whose transactions cover only a single data item can be listed as ACID, but this is only for those limited transactions.) - consist: The distributed consistency model that the database supports, one of "strong" for strong or linearizable consistency, "tunable" for consistency that can be tuned to be strong or linearizable or weaker, or "EVNTUAL" for eventual consistency. - trk: "yes" if the database can automatically report data changes to clients, "NO" if the database requires clients to poll for changes. - HA: "yes" if the database can be configured for high availability, so that loss of a single node does not stop database activity, "NO" otherwise. - OS: "yes" if the database is open source or free (libre) software, "NO" if it is proprietary. When a database has open source and proprietary editions, this is "yes" and only the features in the open source edition are credited in other columns. - C: "yes" if the database has a C (not C++) client library, "NO" otherwise. - Python: "yes" if the database has a Python client library, "NO" otherwise. - format: The database's data model. "sql", "db", "table", "multi" all indicate that OVN could directly use the data model, "KV" or "JSON" that OVN's data model would have to be overlaid on it. Database txn ACID consist trk HA OS C Py format ------------- --- ---- ------- --- --- --- --- --- ------ ActorDB yes ACID strong NO yes yes yes yes sql Aerospike yes ACID strong NO yes yes yes yes db/KV Cassandra NO -C-D tunable NO yes yes NO yes table Cockroach DB yes ACID strong NO yes yes ? ? sql Couchbase NO ???? ???? NO yes NO? yes yes JSON CrateIO NO ???? EVNTUAL NO yes yes NO yes sql etcd NO ACID strong yes? yes yes yes yes KV Gigaspaces XAP yes ACID strong yes yes NO NO NO multi HBase NO ACID strong NO yes yes NO yes table Hyperdex yes ACID strong NO yes NO yes yes KV Hypertable NO ???? ???? NO yes yes NO yes table MongoDB NO ACID strong ?? yes yes yes yes JSON RAMCloud yes ???? strong NO yes yes NO yes KV Redis yes -C?D ???? NO yes yes yes yes KV Riak NO ---D EVNTUAL NO yes yes yes yes KV Scalaris yes ACI- strong NO yes yes NO yes KV ScyllaDB NO -C-D tunable NO yes yes NO yes table Voldemort NO ???? EVNTUAL NO yes yes NO yes KV Zookeeper yes AC-D strong yes yes yes yes yes KV OVSDB yes ACID strong yes NO yes yes yes table Analysis ======== The most troublesome part of the OVN use case is the idiosyncratic use of the database to maintain state, immediately distributing changes to all of the clients. As the "trk" column above shows, most databases don't support this mode of operation. Possibly this means that OVN is misusing the concept of a database and should be redesigned not to use a database; if so, that's a bigger discussion. Assuming that we wish to retain this requirement, then only the following databases appear to support the feature to an acceptable extent: - etcd. etcd appears to allow clients to receive a notification when keys change. A client might be able to bootstrap monitoring of entire tables on top of this feature. Perhaps this would require registering for notification separately on all of the keys that would be used to simulate a table on top of the etcd key-value store; if so, that would probably be unreasonable. Assuming that is not a problem or can be overcome, it would also be necessary to make sure that the new values of all of the modified keys could be obtained in a way such that the client's view reflects a consistent snapshot of the database contents. - Gigaspaces XAP. Not open source. - Zookeeper. The issues here are similar to those for etcd. Also, Zookeeper transactions don't seem to be isolated. - OVSDB. If we choose to use OVSDB, we'll have to add high-availability support. Also, the table doesn't mention scaling, since it's hard to compare objectively, but the OVSDB server currently doesn't scale well to the 1000 clients required for the southbound database, although Andy has started working on that. Recommendation ============== I'm intentionally not offering a recommendation, because I want to start a discussion. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev