On Fri, Mar 11, 2016 at 12:55 AM, Dan Mihai Dumitriu <dm...@cornell.edu> wrote:
> Great writeup Ben. > > The NB DB does need HA and ACID transactions, but it has few clients, so > it's probably not a very hard problem - could even use BDB with log > shipping - > http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html > . > > However, one more potential requirement for the NB DB is secondary > indices, because the NB clients may expect to query the NB models in > various ways that weren't considered a priori. I bring this up because in > the OpenStack context the NB DB could be used to store the Neutron data > model entirely, thus obviating the need for the Neutron DB, and eliminating > the "syncing problem" between Neutron and the NB DB. I could see the same > applying in the context of containers. > My colleague Ivan pointed out that ZK could be used for the NB DB. I think that could be a reasonable choice actually. > Regarding the SB DB, as Liran pointed out, it doesn't necessarily need > durable persistence. It would be possible to make the whole thing work with > an in memory SB DB. (I am waiting for you to start shooting holes in this > hypothesis, but I'm reasonably confident those holes can be filled.) That > said, it does need to be replicated for HA - luckily the replication of an > in memory data structure is easier and more performant than that of a > durably persistent data structure. In order to support efficient syncing > with clients (ovn-controller agents) the in memory replication should be a > form of log shipping, so that clients that disconnect from one SB DB > instance and reconnect to different SB DB instance can do a resync without > a full table download. Is this premature optimization? > > On Thu, Mar 10, 2016 at 4:11 PM, Ben Pfaff <b...@ovn.org> wrote: > >> Requirements >> ============ >> >> OVN uses two databases, the "northbound" and "southbound" databases, >> in a somewhat idiosyncratic manner. Each client of one of these >> databases maintains an in-memory replica of the database (or some >> subset of it), and the server sends it updates to this replica as they >> are committed. Thus, at any given time, a client has a consistent >> snapshot of the database, although it might be old if the database has >> changed but the updates have not yet made it from the server to the >> client. >> >> Beyond supporting this usage model, the basic requirements for the OVN >> use case are: >> >> - Size: 20 MB to 100 MB of data (estimated database size to hold >> data for our target scale of 1,000 hypervisors and 20,000 >> logical ports). >> >> - Scale: The northbound database has only a single-digit number of >> clients. Each hypervisor is a client to the southbound >> database, so about 1,000 clients for our target scale of 1,000 >> hypervisors. >> >> - Performance: Hundreds of transactions per second. (Because of >> the usage model described above, all transactions are write >> transactions; clients read from their local replicas.) >> >> - Transactions: Clients expect atomic, consistent, isolated >> transactions. >> >> Durability is not essential, because the clients will reissue >> lost transactions (up to and including completely refilling an >> empty database, although this can be slow). >> >> - High availability: If the database server goes down, then this >> freezes the OVN configuration. This is OK briefly for running >> clients--the existing configuration continues to work, it just >> can't be updated--but it prevents new clients or clients that >> restart from using OVN at all. >> >> For the same reason that durability is not essential, it is >> acceptable if an occasional fail-over between database servers >> loses a few transactions, though of course it's best to minimize >> the probability and the amount of data lost. >> >> - Open source. Some "open source" databases only provide high >> availability and transactions as proprietary extensions; that's >> undesirable. >> >> Desirable features: >> >> - C client, since OVN is written in C; otherwise, we'll likely >> have to write one. (We've had suggestions that OVN should be >> written in another language, such as Java, but we have not >> decided to change the language yet.) >> >> - Python client, since OVS includes tools written in Python. >> >> - Table structured. We could layer tables on top of a key-value >> store if necessary. >> >> - Schema support, with referential integrity constraints. We find >> this helpful for increasing our confidence in the system. This >> is something that we could leave out or layer on top. >> >> - Network protocol. Some databases are just designed for local >> access. If such a database were otherwise just right, we could >> wrap it for distributed use. The analysis below mostly ignores >> databases that are local-only or in which remote access appears >> to be an afterthought. >> >> >> Options >> ======= >> >> Each entry has the columns listed below. In general, all-caps answers >> are problematic for the OVN use case. >> >> - Database: The database being evaluated. >> >> - txn: "yes" if the database supports transactions across >> arbitrary data, "NO" if its transactions are limited to a single >> data item, such as a single key-value pair, or perhaps even more >> limited. >> >> - ACID: The transactional properties that the database supports, >> within the transactions that the database supports. (Thus, a >> database whose transactions cover only a single data item can be >> listed as ACID, but this is only for those limited >> transactions.) >> >> - consist: The distributed consistency model that the database >> supports, one of "strong" for strong or linearizable >> consistency, "tunable" for consistency that can be tuned to be >> strong or linearizable or weaker, or "EVNTUAL" for eventual >> consistency. >> >> - trk: "yes" if the database can automatically report data changes >> to clients, "NO" if the database requires clients to poll for >> changes. >> >> - HA: "yes" if the database can be configured for high >> availability, so that loss of a single node does not stop >> database activity, "NO" otherwise. >> >> - OS: "yes" if the database is open source or free (libre) >> software, "NO" if it is proprietary. When a database has open >> source and proprietary editions, this is "yes" and only the >> features in the open source edition are credited in other >> columns. >> >> - C: "yes" if the database has a C (not C++) client library, "NO" >> otherwise. >> >> - Python: "yes" if the database has a Python client library, "NO" >> otherwise. >> >> - format: The database's data model. "sql", "db", "table", >> "multi" all indicate that OVN could directly use the data model, >> "KV" or "JSON" that OVN's data model would have to be overlaid >> on it. >> >> Database txn ACID consist trk HA OS C Py format >> ------------- --- ---- ------- --- --- --- --- --- ------ >> ActorDB yes ACID strong NO yes yes yes yes sql >> Aerospike yes ACID strong NO yes yes yes yes db/KV >> Cassandra NO -C-D tunable NO yes yes NO yes table >> Cockroach DB yes ACID strong NO yes yes ? ? sql >> Couchbase NO ???? ???? NO yes NO? yes yes JSON >> CrateIO NO ???? EVNTUAL NO yes yes NO yes sql >> etcd NO ACID strong yes? yes yes yes yes KV >> Gigaspaces XAP yes ACID strong yes yes NO NO NO multi >> HBase NO ACID strong NO yes yes NO yes table >> Hyperdex yes ACID strong NO yes NO yes yes KV >> Hypertable NO ???? ???? NO yes yes NO yes table >> MongoDB NO ACID strong ?? yes yes yes yes JSON >> RAMCloud yes ???? strong NO yes yes NO yes KV >> Redis yes -C?D ???? NO yes yes yes yes KV >> Riak NO ---D EVNTUAL NO yes yes yes yes KV >> Scalaris yes ACI- strong NO yes yes NO yes KV >> ScyllaDB NO -C-D tunable NO yes yes NO yes table >> Voldemort NO ???? EVNTUAL NO yes yes NO yes KV >> Zookeeper yes AC-D strong yes yes yes yes yes KV >> >> OVSDB yes ACID strong yes NO yes yes yes table >> >> >> Analysis >> ======== >> >> The most troublesome part of the OVN use case is the idiosyncratic use >> of the database to maintain state, immediately distributing changes to >> all of the clients. As the "trk" column above shows, most databases >> don't support this mode of operation. Possibly this means that OVN is >> misusing the concept of a database and should be redesigned not to use >> a database; if so, that's a bigger discussion. >> >> Assuming that we wish to retain this requirement, then only the >> following databases appear to support the feature to an acceptable >> extent: >> >> - etcd. etcd appears to allow clients to receive a notification >> when keys change. A client might be able to bootstrap >> monitoring of entire tables on top of this feature. Perhaps >> this would require registering for notification separately on >> all of the keys that would be used to simulate a table on top of >> the etcd key-value store; if so, that would probably be >> unreasonable. Assuming that is not a problem or can be >> overcome, it would also be necessary to make sure that the new >> values of all of the modified keys could be obtained in a way >> such that the client's view reflects a consistent snapshot of >> the database contents. >> >> - Gigaspaces XAP. Not open source. >> >> - Zookeeper. The issues here are similar to those for etcd. >> Also, Zookeeper transactions don't seem to be isolated. >> >> - OVSDB. If we choose to use OVSDB, we'll have to add >> high-availability support. Also, the table doesn't mention >> scaling, since it's hard to compare objectively, but the OVSDB >> server currently doesn't scale well to the 1000 clients required >> for the southbound database, although Andy has started working >> on that. >> >> >> Recommendation >> ============== >> >> I'm intentionally not offering a recommendation, because I want to start >> a discussion. >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev