It was confusing having a database called "ovn", since it's also the name of the project. Since we have an "ovn-nb", rename "ovn" to "ovn-sb".
Signed-off-by: Justin Pettit <jpet...@nicira.com> --- ovn/.gitignore | 12 +- ovn/TODO | 22 +- ovn/automake.mk | 66 +++--- ovn/ovn-architecture.7.xml | 90 ++++---- ovn/ovn-controller.8.in | 4 +- ovn/ovn-idl.ann | 9 - ovn/ovn-nb.xml | 43 ++-- ovn/ovn-nbd.c | 83 ++++---- ovn/ovn-sb-idl.ann | 9 + ovn/ovn-sb.ovsschema | 60 +++++ ovn/ovn-sb.xml | 547 ++++++++++++++++++++++++++++++++++++++++++++ ovn/ovn.ovsschema | 60 ----- ovn/ovn.xml | 546 ------------------------------------------- 13 files changed, 781 insertions(+), 770 deletions(-) delete mode 100644 ovn/ovn-idl.ann create mode 100644 ovn/ovn-sb-idl.ann create mode 100644 ovn/ovn-sb.ovsschema create mode 100644 ovn/ovn-sb.xml delete mode 100644 ovn/ovn.ovsschema delete mode 100644 ovn/ovn.xml diff --git a/ovn/.gitignore b/ovn/.gitignore index e354a82..9ddcbb8 100644 --- a/ovn/.gitignore +++ b/ovn/.gitignore @@ -1,17 +1,17 @@ -/ovn.5 -/ovn.gv -/ovn.pic /ovn-architecture.7 /ovn-controller.8 -/ovn-idl.c -/ovn-idl.h -/ovn-idl.ovsidl /ovn-nb.5 /ovn-nb.gv /ovn-nb.pic /ovn-nb-idl.c /ovn-nb-idl.h /ovn-nb-idl.ovsidl +/ovn-sb.5 +/ovn-sb.gv +/ovn-sb.pic +/ovn-sb-idl.c +/ovn-sb-idl.h +/ovn-sb-idl.ovsidl /ovn-nbctl /ovn-nbctl.8 /ovn-nbd diff --git a/ovn/TODO b/ovn/TODO index 43a867c..7bd89df 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -159,7 +159,7 @@ We can probably use the same default as ovs-vsctl. -*** Location of OVN database. +*** Location of OVN Southbound database. Probably no useful default. @@ -183,16 +183,16 @@ Initially, the simplest way to do this is probably to write straight C code to do a full translation of the entire OVN_Northbound database into the format for the Pipeline table in - the OVN database. As scale increases, this will probably be too - inefficient since a small change in OVN_Northbound requires a full - recomputation. At that point, we probably want to adopt a more - systematic approach, such as something akin to the "nlog" system - used in NVP (see Koponen et al. "Network Virtualization in - Multi-tenant Datacenters", NSDI 2014). + the OVN Southbound database. As scale increases, this will probably + be too inefficient since a small change in OVN_Northbound requires a + full recomputation. At that point, we probably want to adopt a more + systematic approach, such as something akin to the "nlog" system used + in NVP (see Koponen et al. "Network Virtualization in Multi-tenant + Datacenters", NSDI 2014). ** Push logical datapath flows to Pipeline table. -** Monitor OVN database Bindings table. +** Monitor OVN Southbound database Bindings table. Sync rows in the OVN Bindings table to the "up" column in the OVN_Northbound database. @@ -208,9 +208,9 @@ ** Scaling number of connections. In typical use today a given ovsdb-server has only a single-digit - number of simultaneous connections. The OVN database will have a - connection from every hypervisor. This use case needs testing and - probably coding work. Here are some possible improvements. + number of simultaneous connections. The OVN Southbound database will + have a connection from every hypervisor. This use case needs testing + and probably coding work. Here are some possible improvements. *** Reducing amount of data sent to clients. diff --git a/ovn/automake.mk b/ovn/automake.mk index 426e547..180352e 100644 --- a/ovn/automake.mk +++ b/ovn/automake.mk @@ -1,35 +1,35 @@ -# OVN schema and IDL -EXTRA_DIST += ovn/ovn.ovsschema -pkgdata_DATA += ovn/ovn.ovsschema +# OVN southbound schema and IDL +EXTRA_DIST += ovn/ovn-sb.ovsschema +pkgdata_DATA += ovn/ovn-sb.ovsschema -# OVN E-R diagram +# OVN southbound E-R diagram # # If "python" or "dot" is not available, then we do not add graphical diagram # to the documentation. if HAVE_PYTHON if HAVE_DOT -ovn/ovn.gv: ovsdb/ovsdb-dot.in ovn/ovn.ovsschema - $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn.ovsschema > $@ -ovn/ovn.pic: ovn/ovn.gv ovsdb/dot2pic - $(AM_V_GEN)(dot -T plain < ovn/ovn.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \ +ovn/ovn-sb.gv: ovsdb/ovsdb-dot.in ovn/ovn-sb.ovsschema + $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn-sb.ovsschema > $@ +ovn/ovn-sb.pic: ovn/ovn-sb.gv ovsdb/dot2pic + $(AM_V_GEN)(dot -T plain < ovn/ovn-sb.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \ mv $@.tmp $@ -OVN_PIC = ovn/ovn.pic -OVN_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_PIC) -DISTCLEANFILES += ovn/ovn.gv ovn/ovn.pic +OVN_SB_PIC = ovn/ovn-sb.pic +OVN_SB_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_SB_PIC) +DISTCLEANFILES += ovn/ovn-sb.gv ovn/ovn-sb.pic endif endif -# OVN schema documentation -EXTRA_DIST += ovn/ovn.xml -DISTCLEANFILES += ovn/ovn.5 -man_MANS += ovn/ovn.5 -ovn/ovn.5: \ - ovsdb/ovsdb-doc ovn/ovn.xml ovn/ovn.ovsschema $(OVN_PIC) +# OVN southbound schema documentation +EXTRA_DIST += ovn/ovn-sb.xml +DISTCLEANFILES += ovn/ovn-sb.5 +man_MANS += ovn/ovn-sb.5 +ovn/ovn-sb.5: \ + ovsdb/ovsdb-doc ovn/ovn-sb.xml ovn/ovn-sb.ovsschema $(OVN_SB_PIC) $(AM_V_GEN)$(OVSDB_DOC) \ - $(OVN_DOT_DIAGRAM_ARG) \ + $(OVN_SB_DOT_DIAGRAM_ARG) \ --version=$(VERSION) \ - $(srcdir)/ovn/ovn.ovsschema \ - $(srcdir)/ovn/ovn.xml > $@.tmp && \ + $(srcdir)/ovn/ovn-sb.ovsschema \ + $(srcdir)/ovn/ovn-sb.xml > $@.tmp && \ mv $@.tmp $@ # OVN northbound schema and IDL @@ -78,19 +78,19 @@ EXTRA_DIST += \ ovn/TODO \ ovn/CONTAINERS.OpenStack.md -# ovn IDL +# ovn-sb IDL OVSIDL_BUILT += \ - $(srcdir)/ovn/ovn-idl.c \ - $(srcdir)/ovn/ovn-idl.h \ - $(srcdir)/ovn/ovn.ovsidl -EXTRA_DIST += $(srcdir)/ovn/ovn-idl.ann -OVN_IDL_FILES = \ - $(srcdir)/ovn/ovn.ovsschema \ - $(srcdir)/ovn/ovn-idl.ann -$(srcdir)/ovn/ovn-idl.ovsidl: $(OVN_IDL_FILES) - $(AM_V_GEN)$(OVSDB_IDLC) annotate $(OVN_IDL_FILES) > $@.tmp && \ + $(srcdir)/ovn/ovn-sb-idl.c \ + $(srcdir)/ovn/ovn-sb-idl.h \ + $(srcdir)/ovn/ovn-sb.ovsidl +EXTRA_DIST += $(srcdir)/ovn/ovn-sb-idl.ann +OVN_SB_IDL_FILES = \ + $(srcdir)/ovn/ovn-sb.ovsschema \ + $(srcdir)/ovn/ovn-sb-idl.ann +$(srcdir)/ovn/ovn-sb-idl.ovsidl: $(OVN_SB_IDL_FILES) + $(AM_V_GEN)$(OVSDB_IDLC) annotate $(OVN_SB_IDL_FILES) > $@.tmp && \ mv $@.tmp $@ -CLEANFILES += ovn/ovn-idl.c ovn/ovn-idl.h +CLEANFILES += ovn/ovn-sb-idl.c ovn/ovn-sb-idl.h # ovn-nb IDL OVSIDL_BUILT += \ @@ -113,8 +113,8 @@ ovn_libovn_la_LDFLAGS = \ -Wl,--version-script=$(top_builddir)/ovn/libovn.sym \ $(AM_LDFLAGS) ovn_libovn_la_SOURCES = \ - ovn/ovn-idl.c \ - ovn/ovn-idl.h \ + ovn/ovn-sb-idl.c \ + ovn/ovn-sb-idl.h \ ovn/ovn-nb-idl.c \ ovn/ovn-nb-idl.h diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 035527f..57e3042 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -109,21 +109,21 @@ <li> <code>ovn-nbd</code>(8) connects to the OVN Northbound Database above it - and the OVN Database below it. It translates the logical network - configuration in terms of conventional network concepts, taken from the - OVN Northbound Database, into logical datapath flows in the OVN Database - below it. + and the OVN Southbound Database below it. It translates the + logical network configuration in terms of conventional network + concepts, taken from the OVN Northbound Database, into logical + datapath flows in the OVN Southbound Database below it. </li> <li> <p> - The <dfn>OVN Database</dfn> is the center of the system. Its clients - are <code>ovn-nbd</code>(8) above it and <code>ovn-controller</code>(8) - on every transport node below it. + The <dfn>OVN Southbound Database</dfn> is the center of the system. + Its clients are <code>ovn-nbd</code>(8) above it and + <code>ovn-controller</code>(8) on every transport node below it. </p> <p> - The OVN Database contains three kinds of data: <dfn>Physical + The OVN Southbound Database contains three kinds of data: <dfn>Physical Network</dfn> (PN) tables that specify how to reach hypervisor and other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the logical network in terms of ``logical datapath flows,'' and @@ -134,9 +134,10 @@ </p> <p> - OVN Database performance must scale with the number of transport nodes. - This will likely require some work on <code>ovsdb-server</code>(1) as - we encounter bottlenecks. Clustering for availability may be needed. + OVN Southbound Database performance must scale with the number of + transport nodes. This will likely require some work on + <code>ovsdb-server</code>(1) as we encounter bottlenecks. + Clustering for availability may be needed. </p> </li> </ul> @@ -148,13 +149,14 @@ <ul> <li> <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and - software gateway. Northbound, it connects to the OVN Database to learn - about OVN configuration and status and to populate the PN table and the - <code>Chassis</code> column in <code>Bindings</code> table with the - hypervisor's status. Southbound, it connects to - <code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over - network traffic, and to the local <code>ovsdb-server</code>(1) to allow - it to monitor and control Open vSwitch configuration. + software gateway. Northbound, it connects to the OVN Southbound + Database to learn about OVN configuration and status and to + populate the PN table and the <code>Chassis</code> column in + <code>Bindings</code> table with the hypervisor's status. + Southbound, it connects to <code>ovs-vswitchd</code>(8) as an + OpenFlow controller, for control over network traffic, and to the + local <code>ovsdb-server</code>(1) to allow it to monitor and + control Open vSwitch configuration. </li> <li> @@ -180,14 +182,14 @@ +-----------|-----------+ | | - +------+ - |OVN DB| - +------+ + +-------------------+ + | OVN Southbound DB | + +-------------------+ | | +------------------+------------------+ | | | - HV 1 | | HV n | + HV 1 | | HV n | +---------------|---------------+ . +---------------|---------------+ | | | . | | | | ovn-controller | . | ovn-controller | @@ -267,7 +269,7 @@ <p> The steps in this example refer often to details of the OVN and OVN - Northbound database schemas. Please see <code>ovn</code>(5) and + Northbound database schemas. Please see <code>ovn-sb</code>(5) and <code>ovn-nb</code>(5), respectively, for the full story on these databases. </p> @@ -290,14 +292,15 @@ </li> <li> - <code>ovn-nbd</code> receives the OVN Northbound database update. In - turn, it makes the corresponding updates to the OVN database, by adding - rows to the OVN database <code>Pipeline</code> table to reflect the new - port, e.g. add a flow to recognize that packets destined to the new - port's MAC address should be delivered to it, and update the flow that - delivers broadcast and multicast packets to include the new port. It - also creates a record in the <code>Bindings</code> table and populates - all its columns except the column that identifies the + <code>ovn-nbd</code> receives the OVN Northbound database update. + In turn, it makes the corresponding updates to the OVN Southbound + database, by adding rows to the OVN Southbound database + <code>Pipeline</code> table to reflect the new port, e.g. add a + flow to recognize that packets destined to the new port's MAC + address should be delivered to it, and update the flow that + delivers broadcast and multicast packets to include the new port. + It also creates a record in the <code>Bindings</code> table and + populates all its columns except the column that identifies the <code>chassis</code>. </li> @@ -386,9 +389,10 @@ <li> <code>ovn-nbd</code> receives the OVN Northbound update and in turn - updates the OVN database accordingly, by removing or updating the - rows from the OVN database <code>Pipeline</code> table and - <code>Bindings</code> table that were related to the now-destroyed VIF. + updates the OVN Southbound database accordingly, by removing or + updating the rows from the OVN Southbound database + <code>Pipeline</code> table and <code>Bindings</code> table that + were related to the now-destroyed VIF. </li> <li> @@ -482,10 +486,11 @@ <li> <code>ovn-nbd</code> receives the OVN Northbound database update. In - turn, it makes the corresponding updates to the OVN database, by adding - rows to the OVN database's <code>Pipeline</code> table to reflect the new - port and also by creating a new row in the <code>Bindings</code> table - and populating all its columns except the column that identifies the + turn, it makes the corresponding updates to the OVN Southbound + database, by adding rows to the OVN Southbound database's + <code>Pipeline</code> table to reflect the new port and also by + creating a new row in the <code>Bindings</code> table and + populating all its columns except the column that identifies the <code>chassis</code>. </li> @@ -521,10 +526,11 @@ <li> <code>ovn-nbd</code> receives the OVN Northbound update and in turn - updates the OVN database accordingly, by removing or updating the - rows from the OVN database <code>Pipeline</code> table that were related - to the now-destroyed CIF. It also deletes the row in the - <code>Bindings</code> table for that CIF. + updates the OVN Southbound database accordingly, by removing or + updating the rows from the OVN Southbound database + <code>Pipeline</code> table that were related to the now-destroyed + CIF. It also deletes the row in the <code>Bindings</code> table + for that CIF. </li> <li> diff --git a/ovn/ovn-controller.8.in b/ovn/ovn-controller.8.in index 59fcb59..22c18e8 100644 --- a/ovn/ovn-controller.8.in +++ b/ovn/ovn-controller.8.in @@ -15,8 +15,8 @@ ovn\-controller \- OVN local controller . .SH DESCRIPTION \fBovn\-controller\fR is the local controller daemon for OVN, the Open -Virtual Network. It connects northbound to the OVN database (see -\fBovn\fR(5)) over the OVSDB protocol, and southbound to the Open +Virtual Network. It connects up to the OVN Southbound database (see +\fBovn\-sb\fR(5)) over the OVSDB protocol, and down to the Open vSwitch database (see \fBovs-vswitchd.conf.db\fR(5)) over the OVSDB protocol and to \fBovs\-vswitchd\fR(8) via OpenFlow. Each hypervisor and software gateway in an OVN deployment runs its own independent diff --git a/ovn/ovn-idl.ann b/ovn/ovn-idl.ann deleted file mode 100644 index 800b9eb..0000000 --- a/ovn/ovn-idl.ann +++ /dev/null @@ -1,9 +0,0 @@ -# -*- python -*- - -# This code, when invoked by "ovsdb-idlc annotate" (by the build -# process), annotates vswitch.ovsschema with additional data that give -# the ovsdb-idl engine information about the types involved, so that -# it can generate more programmer-friendly data structures. - -s["idlPrefix"] = "ovnrec_" -s["idlHeader"] = "\"ovn/ovn-idl.h\"" diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 84b9ed5..14ee117 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -5,7 +5,7 @@ (CMS), such as OpenStack, running above it. The CMS produces almost all of the contents of the database. The <code>ovn-nbd</code> program monitors the database contents, transforms it, and stores it into the <ref - db="OVN"/> database. + db="OVN_Southbound"/> database. </p> <p> @@ -117,13 +117,13 @@ <column name="up"> This column is populated by <code>ovn-nbd</code>, rather than by the CMS - plugin as is most of this database. When a logical port is bound to a - physical location in the OVN database <ref db="OVN" table="Bindings"/> - table, <code>ovn-nbd</code> sets this column to <code>true</code>; - otherwise, or if the port becomes unbound later, it sets it to - <code>false</code>. This allows the CMS to wait for a VM's - (or container's) networking to become active before it allows the - VM (or container) to start. + plugin as is most of this database. When a logical port is bound + to a physical location in the OVN Southbound database <ref + db="OVN_Southbound" table="Bindings"/> table, <code>ovn-nbd</code> + sets this column to <code>true</code>; otherwise, or if the port + becomes unbound later, it sets it to <code>false</code>. This + allows the CMS to wait for a VM's (or container's) networking to + become active before it allows the VM (or container) to start. </column> <column name="macs"> @@ -144,11 +144,12 @@ </p> <p> - Exact syntax is TBD. One could simply use comma- or space-separated L2 - and L3 addresses in each set member, or replace this by a subset of the - general-purpose expression language used for the <ref column="match" - table="Pipeline" db="OVN"/> column in the OVN database's <ref - table="Pipeline" db="OVN"/> table. + Exact syntax is TBD. One could simply use comma- or + space-separated L2 and L3 addresses in each set member, or + replace this by a subset of the general-purpose expression + language used for the <ref column="match" table="Pipeline" + db="OVN_Southbound"/> column in the OVN Southbound database's + <ref table="Pipeline" db="OVN_Southbound"/> table. </p> </column> @@ -184,13 +185,15 @@ </column> <column name="match"> - The packets that the ACL should match, in the same expression language - used for the <ref column="match" table="Pipeline" db="OVN"/> column in - the OVN database's <ref table="Pipeline" db="OVN"/> table. Match - <code>inport</code> and <code>outport</code> against names of logical - ports within <ref column="lswitch"/> to implement ingress and egress ACLs, - respectively. In logical switches connected to logical routers, the - special port name <code>ROUTER</code> refers to the logical router port. + The packets that the ACL should match, in the same expression + language used for the <ref column="match" table="Pipeline" + db="OVN_Southbound"/> column in the OVN Southbound database's <ref + table="Pipeline" db="OVN_Southbound"/> table. Match + <code>inport</code> and <code>outport</code> against names of + logical ports within <ref column="lswitch"/> to implement ingress + and egress ACLs, respectively. In logical switches connected to + logical routers, the special port name <code>ROUTER</code> refers + to the logical router port. </column> <column name="action"> diff --git a/ovn/ovn-nbd.c b/ovn/ovn-nbd.c index 637d8cf..a8cbc9c 100644 --- a/ovn/ovn-nbd.c +++ b/ovn/ovn-nbd.c @@ -24,7 +24,7 @@ #include "fatal-signal.h" #include "hash.h" #include "hmap.h" -#include "ovn/ovn-idl.h" +#include "ovn/ovn-sb-idl.h" #include "ovn/ovn-nb-idl.h" #include "poll-loop.h" #include "stream.h" @@ -37,13 +37,13 @@ VLOG_DEFINE_THIS_MODULE(ovn_nbd); struct nbd_context { struct ovsdb_idl *ovnnb_idl; - struct ovsdb_idl *ovn_idl; + struct ovsdb_idl *ovnsb_idl; struct ovsdb_idl_txn *ovnnb_txn; struct ovsdb_idl_txn *ovn_txn; }; static const char *ovnnb_db; -static const char *ovn_db; +static const char *ovnsb_db; static const char *default_db(void); @@ -57,7 +57,7 @@ usage: %s [OPTIONS]\n\ Options:\n\ --ovnnb-db=DATABASE connect to ovn-nb database at DATABASE\n\ (default: %s)\n\ - --ovn-db=DATABASE connect to ovn database at DATABASE\n\ + --ovnsb-db=DATABASE connect to ovn-sb database at DATABASE\n\ (default: %s)\n\ -h, --help display this help message\n\ -o, --options list available options\n\ @@ -116,19 +116,20 @@ macs_equal(char **binding_macs_, size_t b_n_macs, /* * When a change has occurred in the OVN_Northbound database, we go through and - * make sure that the contents of the Bindings table in the OVN database are up - * to date with the logical ports defined in the OVN_Northbound database. + * make sure that the contents of the Bindings table in the OVN_Southbound + * database are up to date with the logical ports defined in the + * OVN_Northbound database. */ static void set_bindings(struct nbd_context *ctx) { struct hmap bindings_hmap; - const struct ovnrec_bindings *binding; + const struct sbrec_bindings *binding; const struct nbrec_logical_port *lport; struct binding_hash_node { struct hmap_node node; - const struct ovnrec_bindings *binding; + const struct sbrec_bindings *binding; } *hash_node, *hash_node_next; /* @@ -143,7 +144,7 @@ set_bindings(struct nbd_context *ctx) */ hmap_init(&bindings_hmap); - OVNREC_BINDINGS_FOR_EACH(binding, ctx->ovn_idl) { + SBREC_BINDINGS_FOR_EACH(binding, ctx->ovnsb_idl) { struct binding_hash_node *hash_node = xzalloc(sizeof *hash_node); hash_node->binding = binding; @@ -171,22 +172,22 @@ set_bindings(struct nbd_context *ctx) if (!macs_equal(binding->mac, binding->n_mac, lport->macs, lport->n_macs)) { - ovnrec_bindings_set_mac(binding, + sbrec_bindings_set_mac(binding, (const char **) lport->macs, lport->n_macs); } } else { /* There is no binding for this logical port, so create one. */ - binding = ovnrec_bindings_insert(ctx->ovn_txn); - ovnrec_bindings_set_logical_port(binding, lport->name); - ovnrec_bindings_set_mac(binding, + binding = sbrec_bindings_insert(ctx->ovn_txn); + sbrec_bindings_set_logical_port(binding, lport->name); + sbrec_bindings_set_mac(binding, (const char **) lport->macs, lport->n_macs); } } HMAP_FOR_EACH_SAFE(hash_node, hash_node_next, node, &bindings_hmap) { hmap_remove(&bindings_hmap, &hash_node->node); - ovnrec_bindings_delete(hash_node->binding); + sbrec_bindings_delete(hash_node->binding); free(hash_node); } hmap_destroy(&bindings_hmap); @@ -206,13 +207,13 @@ ovnnb_db_changed(struct nbd_context *ctx) * set the corresponding logical port as 'up' in the northbound DB. */ static void -ovn_db_changed(struct nbd_context *ctx) +ovnsb_db_changed(struct nbd_context *ctx) { - const struct ovnrec_bindings *bindings; + const struct sbrec_bindings *bindings; VLOG_DBG("Recalculating port up states for ovn-nb db."); - OVNREC_BINDINGS_FOR_EACH(bindings, ctx->ovn_idl) { + SBREC_BINDINGS_FOR_EACH(bindings, ctx->ovnsb_idl) { const struct nbrec_logical_port *lport; struct uuid lport_uuid; @@ -257,7 +258,7 @@ parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) VLOG_OPTION_ENUMS, }; static const struct option long_options[] = { - {"ovn-db", required_argument, NULL, 'd'}, + {"ovnsb-db", required_argument, NULL, 'd'}, {"ovnnb-db", required_argument, NULL, 'D'}, {"help", no_argument, NULL, 'h'}, {"options", no_argument, NULL, 'o'}, @@ -283,7 +284,7 @@ parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) STREAM_SSL_OPTION_HANDLERS; case 'd': - ovn_db = optarg; + ovnsb_db = optarg; break; case 'D': @@ -307,8 +308,8 @@ parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) } } - if (!ovn_db) { - ovn_db = default_db(); + if (!ovnsb_db) { + ovnsb_db = default_db(); } if (!ovnnb_db) { @@ -322,7 +323,7 @@ int main(int argc, char *argv[]) { extern struct vlog_module VLM_reconnect; - struct ovsdb_idl *ovnnb_idl, *ovn_idl; + struct ovsdb_idl *ovnnb_idl, *ovnsb_idl; unsigned int ovnnb_seqno, ovn_seqno; int res = EXIT_SUCCESS; struct nbd_context ctx = { @@ -340,7 +341,7 @@ main(int argc, char *argv[]) daemonize(); nbrec_init(); - ovnrec_init(); + sbrec_init(); /* We want to detect all changes to the ovn-nb db. */ ctx.ovnnb_idl = ovnnb_idl = ovsdb_idl_create(ovnnb_db, @@ -348,12 +349,12 @@ main(int argc, char *argv[]) /* There is only a small subset of changes to the ovn db that ovn-nbd has to * care about, so we'll enable monitoring those directly. */ - ctx.ovn_idl = ovn_idl = ovsdb_idl_create(ovn_db, - &ovnrec_idl_class, false, true); - ovsdb_idl_add_table(ovn_idl, &ovnrec_table_bindings); - ovsdb_idl_add_column(ovn_idl, &ovnrec_bindings_col_logical_port); - ovsdb_idl_add_column(ovn_idl, &ovnrec_bindings_col_chassis); - ovsdb_idl_add_column(ovn_idl, &ovnrec_bindings_col_mac); + ctx.ovnsb_idl = ovnsb_idl = ovsdb_idl_create(ovnsb_db, + &sbrec_idl_class, false, true); + ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_bindings); + ovsdb_idl_add_column(ovnsb_idl, &sbrec_bindings_col_logical_port); + ovsdb_idl_add_column(ovnsb_idl, &sbrec_bindings_col_chassis); + ovsdb_idl_add_column(ovnsb_idl, &sbrec_bindings_col_mac); /* * The loop here just runs the IDL in a loop waiting for the seqno to @@ -368,10 +369,10 @@ main(int argc, char *argv[]) */ ovnnb_seqno = ovsdb_idl_get_seqno(ovnnb_idl); - ovn_seqno = ovsdb_idl_get_seqno(ovn_idl); + ovn_seqno = ovsdb_idl_get_seqno(ovnsb_idl); for (;;) { ovsdb_idl_run(ovnnb_idl); - ovsdb_idl_run(ovn_idl); + ovsdb_idl_run(ovnsb_idl); if (!ovsdb_idl_is_alive(ovnnb_idl)) { int retval = ovsdb_idl_get_last_error(ovnnb_idl); @@ -381,10 +382,10 @@ main(int argc, char *argv[]) break; } - if (!ovsdb_idl_is_alive(ovn_idl)) { - int retval = ovsdb_idl_get_last_error(ovn_idl); + if (!ovsdb_idl_is_alive(ovnsb_idl)) { + int retval = ovsdb_idl_get_last_error(ovnsb_idl); VLOG_ERR("%s: database connection failed (%s)", - ovn_db, ovs_retval_to_string(retval)); + ovnsb_db, ovs_retval_to_string(retval)); res = EXIT_FAILURE; break; } @@ -394,8 +395,8 @@ main(int argc, char *argv[]) ovnnb_changes_pending = true; } - if (ovn_seqno != ovsdb_idl_get_seqno(ovn_idl)) { - ovn_seqno = ovsdb_idl_get_seqno(ovn_idl); + if (ovn_seqno != ovsdb_idl_get_seqno(ovnsb_idl)) { + ovn_seqno = ovsdb_idl_get_seqno(ovnsb_idl); ovn_changes_pending = true; } @@ -413,7 +414,7 @@ main(int argc, char *argv[]) * The OVN-nb db contents have changed, so create a transaction for * updating the OVN DB. */ - ctx.ovn_txn = ovsdb_idl_txn_create(ctx.ovn_idl); + ctx.ovn_txn = ovsdb_idl_txn_create(ctx.ovnsb_idl); ovnnb_db_changed(&ctx); ovnnb_changes_pending = false; } @@ -424,7 +425,7 @@ main(int argc, char *argv[]) * updating the northbound DB. */ ctx.ovnnb_txn = ovsdb_idl_txn_create(ctx.ovnnb_idl); - ovn_db_changed(&ctx); + ovnsb_db_changed(&ctx); ovn_changes_pending = false; } @@ -471,9 +472,9 @@ main(int argc, char *argv[]) } if (ovnnb_seqno == ovsdb_idl_get_seqno(ovnnb_idl) && - ovn_seqno == ovsdb_idl_get_seqno(ovn_idl)) { + ovn_seqno == ovsdb_idl_get_seqno(ovnsb_idl)) { ovsdb_idl_wait(ovnnb_idl); - ovsdb_idl_wait(ovn_idl); + ovsdb_idl_wait(ovnsb_idl); if (ctx.ovnnb_txn) { ovsdb_idl_txn_wait(ctx.ovnnb_txn); } @@ -484,7 +485,7 @@ main(int argc, char *argv[]) } } - ovsdb_idl_destroy(ovn_idl); + ovsdb_idl_destroy(ovnsb_idl); ovsdb_idl_destroy(ovnnb_idl); exit(res); diff --git a/ovn/ovn-sb-idl.ann b/ovn/ovn-sb-idl.ann new file mode 100644 index 0000000..1efef5c --- /dev/null +++ b/ovn/ovn-sb-idl.ann @@ -0,0 +1,9 @@ +# -*- python -*- + +# This code, when invoked by "ovsdb-idlc annotate" (by the build +# process), annotates vswitch.ovsschema with additional data that give +# the ovsdb-idl engine information about the types involved, so that +# it can generate more programmer-friendly data structures. + +s["idlPrefix"] = "sbrec_" +s["idlHeader"] = "\"ovn/ovn-sb-idl.h\"" diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema new file mode 100644 index 0000000..98662b8 --- /dev/null +++ b/ovn/ovn-sb.ovsschema @@ -0,0 +1,60 @@ +{ + "name": "OVN_Southbound", + "tables": { + "Chassis": { + "columns": { + "name": {"type": "string"}, + "encaps": {"type": {"key": {"type": "uuid", + "refTable": "Encap"}, + "min": 1, "max": "unlimited"}}, + "gateway_ports": {"type": {"key": "string", + "value": {"type": "uuid", + "refTable": "Gateway", + "refType": "strong"}, + "min": 0, + "max": "unlimited"}}}, + "isRoot": true, + "indexes": [["name"]]}, + "Encap": { + "columns": { + "type": {"type": "string"}, + "options": {"type": {"key": "string", + "value": "string", + "min": 0, + "max": "unlimited"}}, + "ip": {"type": "string"}}}, + "Gateway": { + "columns": {"attached_port": {"type": "string"}, + "vlan_map": {"type": {"key": {"type": "integer", + "minInteger": 0, + "maxInteger": 4095}, + "value": {"type": "string"}, + "min": 0, + "max": "unlimited"}}}}, + "Pipeline": { + "columns": { + "table_id": {"type": {"key": {"type": "integer", + "minInteger": 0, + "maxInteger": 127}}}, + "priority": {"type": {"key": {"type": "integer", + "minInteger": 0, + "maxInteger": 65535}}}, + "match": {"type": "string"}, + "actions": {"type": "string"}}, + "isRoot": true}, + "Bindings": { + "columns": { + "logical_port": {"type": "string"}, + "parent_port": {"type": {"key": "string", "min": 0, "max": 1}}, + "tag": { + "type": {"key": {"type": "integer", + "minInteger": 0, + "maxInteger": 4095}, + "min": 0, "max": 1}}, + "chassis": {"type": "string"}, + "mac": {"type": {"key": "string", + "min": 0, + "max": "unlimited"}}}, + "indexes": [["logical_port"]], + "isRoot": true}}, + "version": "1.0.0"} diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml new file mode 100644 index 0000000..e26dbda --- /dev/null +++ b/ovn/ovn-sb.xml @@ -0,0 +1,547 @@ +<?xml version="1.0" encoding="utf-8"?> +<database name="ovn-sb" title="OVN Southbound Database"> + <p> + This database holds logical and physical configuration and state for the + Open Virtual Network (OVN) system to support virtual network abstraction. + For an introduction to OVN, please see <code>ovn-architecture</code>(7). + </p> + + <p> + The OVN Southbound database sits at the center of the OVN + architecture. It is the one component that speaks both southbound + directly to all the hypervisors and gateways, via + <code>ovn-controller</code>, and northbound to the Cloud Management + System, via <code>ovn-nbd</code>: + </p> + + <h2>Database Structure</h2> + + <p> + The OVN Southbound database contains three classes of data with + different properties, as described in the sections below. + </p> + + <h3>Physical Network (PN) data</h3> + + <p> + PN tables contain information about the chassis nodes in the system. This + contains all the information necessary to wire the overlay, such as IP + addresses, supported tunnel types, and security keys. + </p> + + <p> + The amount of PN data is small (O(n) in the number of chassis) and it + changes infrequently, so it can be replicated to every chassis. + </p> + + <p> + The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the + PN tables. + </p> + + <h3>Logical Network (LN) data</h3> + + <p> + LN tables contain the topology of logical switches and routers, ACLs, + firewall rules, and everything needed to describe how packets traverse a + logical network, represented as logical datapath flows (see Logical + Datapath Flows, below). + </p> + + <p> + LN data may be large (O(n) in the number of logical ports, ACL rules, + etc.). Thus, to improve scaling, each chassis should receive only data + related to logical networks in which that chassis participates. Past + experience shows that in the presence of large logical networks, even + finer-grained partitioning of data, e.g. designing logical flows so that + only the chassis hosting a logical port needs related flows, pays off + scale-wise. (This is not necessary initially but it is worth bearing in + mind in the design.) + </p> + + <p> + The LN is a slave of the cloud management system running northbound of OVN. + That CMS determines the entire OVN logical configuration and therefore the + LN's content at any given time is a deterministic function of the CMS's + configuration, although that happens indirectly via the OVN Northbound DB + and <code>ovn-nbd</code>. + </p> + + <p> + LN data is likely to change more quickly than PN data. This is especially + true in a container environment where VMs are created and destroyed (and + therefore added to and deleted from logical switches) quickly. + </p> + + <p> + The <ref table="Pipeline"/> table is currently the only LN table. + </p> + + <h3>Bindings data</h3> + + <p> + The Bindings tables contain the current placement of logical components + (such as VMs and VIFs) onto chassis and the bindings between logical ports + and MACs. + </p> + + <p> + Bindings change frequently, at least every time a VM powers up or down + or migrates, and especially quickly in a container environment. The + amount of data per VM (or VIF) is small. + </p> + + <p> + Each chassis is authoritative about the VMs and VIFs that it hosts at any + given time and can efficiently flood that state to a central location, so + the consistency needs are minimal. + </p> + + <p> + The <ref table="Bindings"/> table is currently the only Bindings table. + </p> + + <table name="Chassis" title="Physical Network Hypervisor and Gateway Information"> + <p> + Each row in this table represents a hypervisor or gateway (a chassis) in + the physical network (PN). Each chassis, via + <code>ovn-controller</code>, adds and updates its own row, and keeps a + copy of the remaining rows to determine how to reach other hypervisors. + </p> + + <p> + When a chassis shuts down gracefully, it should remove its own row. + (This is not critical because resources hosted on the chassis are equally + unreachable regardless of whether the row is present.) If a chassis + shuts down permanently without removing its row, some kind of manual or + automatic cleanup is eventually needed; we can devise a process for that + as necessary. + </p> + + <column name="name"> + A chassis name, taken from <ref key="system-id" table="Open_vSwitch" + column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch + database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does + not prescribe a particular format for chassis names. + </column> + + <group title="Encapsulation Configuration"> + <p> + OVN uses encapsulation to transmit logical dataplane packets + between chassis. + </p> + + <column name="encaps"> + Points to supported encapsulation configurations to transmit + logical dataplane packets to this chassis. Each entry is a <ref + table="Encap"/> record that describes the configuration. + </column> + </group> + + <group title="Gateway Configuration"> + <p> + A <dfn>gateway</dfn> is a chassis that forwards traffic between a + logical network and a physical VLAN. Gateways are typically dedicated + nodes that do not host VMs. + </p> + + <column name="gateway_ports"> + Maps from the name of a gateway port, which is typically a physical + port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref + table="Gateway"/> record that describes the details of the gatewaying + function. + </column> + </group> + </table> + + <table name="Encap" title="Encapsulation Types"> + <p> + The <ref column="encaps" table="Chassis"/> column in the <ref + table="Chassis"/> table refers to rows in this table to identify + how OVN may transmit logical dataplane packets to this chassis. + Each chassis, via <code>ovn-controller</code>(8), adds and updates + its own rows and keeps a copy of the remaining rows to determine + how to reach other chassis. + </p> + + <column name="type"> + The encapsulation to use to transmit packets to this chassis. + Examples include <code>geneve</code>, <code>vxlan</code>, and + <code>stt</code>. + </column> + + <column name="options"> + Options for configuring the encapsulation, e.g. IPsec parameters when + IPsec support is introduced. No options are currently defined. + </column> + + <column name="ip"> + The IPv4 address of the encapsulation tunnel endpoint. + </column> + </table> + + <table name="Gateway" title="Physical Network Gateway Ports"> + <p> + The <ref column="gateway_ports" table="Chassis"/> column in the <ref + table="Chassis"/> table refers to rows in this table to connect a chassis + port to a gateway function. Each row in this table describes the logical + networks to which a gateway port is attached. Each chassis, via + <code>ovn-controller</code>(8), adds and updates its own rows, if any + (since most chassis are not gateways), and keeps a copy of the remaining + rows to determine how to reach other chassis. + </p> + + <column name="vlan_map"> + Maps from a VLAN ID to a logical port name. Thus, each named logical + port corresponds to one VLAN on the gateway port. + </column> + + <column name="attached_port"> + The name of the gateway port in the chassis's Open vSwitch integration + bridge. + </column> + </table> + + <table name="Pipeline" title="Logical Network Pipeline"> + <p> + Each row in this table represents one logical flow. The cloud management + system, via its OVN integration, populates this table with logical flows + that implement the L2 and L3 topology specified in the CMS configuration. + Each hypervisor, via <code>ovn-controller</code>, translates the logical + flows into OpenFlow flows specific to its hypervisor and installs them + into Open vSwitch. + </p> + + <p> + Logical flows are expressed in an OVN-specific format, described here. A + logical datapath flow is much like an OpenFlow flow, except that the + flows are written in terms of logical ports and logical datapaths instead + of physical ports and physical datapaths. Translation between logical + and physical flows helps to ensure isolation between logical datapaths. + (The logical flow abstraction also allows the CMS to do less work, since + it does not have to separately compute and push out physical physical + flows to each chassis.) + </p> + + <p> + The default action when no flow matches is to drop packets. + </p> + + <column name="table_id"> + The stage in the logical pipeline, analogous to an OpenFlow table number. + </column> + + <column name="priority"> + The flow's priority. Flows with numerically higher priority take + precedence over those with lower. If two logical datapath flows with the + same priority both match, then the one actually applied to the packet is + undefined. + </column> + + <column name="match"> + <p> + A matching expression. OVN provides a superset of OpenFlow matching + capabilities, using a syntax similar to Boolean expressions in a + programming language. + </p> + + <p> + Matching expressions have two important kinds of primary expression: + <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of + data or metadata. The supported fields are: + </p> + + <ul> + <li> + <code>metadata</code> <code>reg0</code> ... <code>reg7</code> + <code>xreg0</code> ... <code>xreg3</code> + </li> + <li><code>inport</code> <code>outport</code> <code>queue</code></li> + <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li> + <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li> + <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li> + <li><code>ip4.src</code> <code>ip4.dst</code></li> + <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li> + <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li> + <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li> + <li><code>udp.src</code> <code>udp.dst</code></li> + <li><code>sctp.src</code> <code>sctp.dst</code></li> + <li><code>icmp4.type</code> <code>icmp4.code</code></li> + <li><code>icmp6.type</code> <code>icmp6.code</code></li> + <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li> + </ul> + + <p> + Subfields may be addressed using a <code>[]</code> suffix, + e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP + source port. A subfield may be used in any context a field is allowed. + </p> + + <p> + Some fields have prerequisites. OVN implicitly adds clauses to satisfy + these. For example, <code>arp.op == 1</code> is equivalent to + <code>eth.type == 0x0806 && arp.op == 1</code>, and + <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800 + || eth.type == 0x86dd) && ip.proto == 6 && tcp.src == + 80</code>. + </p> + + <p> + Most fields have integer values. Integer constants may be expressed in + several forms: decimal integers, hexadecimal integers prefixed by + <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their + standard forms, and as Ethernet addresses as colon-separated hex + digits. A constant in any of these forms may be followed by a slash + and a second constant (the mask) in the same form, to form a masked + constant. IPv4 and IPv6 masks may be given as integers, to express + CIDR prefixes. + </p> + + <p> + The <code>inport</code> and <code>outport</code> fields have string + values. The useful values are <ref column="logical_port"/> names from + the <ref column="Bindings"/> and <ref column="Gateway"/> table. + </p> + + <p> + The available operators, from highest to lowest precedence, are: + </p> + + <ul> + <li><code>()</code></li> + <li><code>== != < <= > >= in not in</code></li> + <li><code>!</code></li> + <li><code>&&</code></li> + <li><code>||</code></li> + </ul> + + <p> + The <code>()</code> operator is used for grouping. + </p> + + <p> + The equality operator <code>==</code> is the most important operator. + Its operands must be a field and an optionally masked constant, in + either order. The <code>==</code> operator yields true when the + field's value equals the constant's value for all the bits included in + the mask. The <code>==</code> operator translates simply and naturally + to OpenFlow. + </p> + + <p> + The inequality operator <code>!=</code> yields the inverse of + <code>==</code> but its syntax and use are the same. Implementation of + the inequality operator is expensive. + </p> + + <p> + The relational operators are <, <=, >, and >=. Their + operands must be a field and a constant, in either order; the constant + must not be masked. These operators are most commonly useful for L4 + ports, e.g. <code>tcp.src < 1024</code>. Implementation of the + relational operators is expensive. + </p> + + <p> + The set membership operator <code>in</code>, with syntax + ``<code><var>field</var> in { <var>constant1</var>, + <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar + for ``<code>(<var>field</var> == <var>constant1</var> || + <var>field</var> == <var>constant2</var> || </code>...<code>)</code>. + Conversely, ``<code><var>field</var> not in { <var>constant1</var>, + <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar + for ``<code>(<var>field</var> != <var>constant1</var> && + <var>field</var> != <var>constant2</var> && + </code>...<code>)</code>''. + </p> + + <p> + The unary prefix operator <code>!</code> yields its operand's inverse. + </p> + + <p> + The logical AND operator <code>&&</code> yields true only if + both of its operands are true. + </p> + + <p> + The logical OR operator <code>||</code> yields true if at least one of + its operands is true. + </p> + + <p> + Finally, the keywords <code>true</code> and <code>false</code> may also + be used in matching expressions. <code>true</code> is useful by itself + as a catch-all expression that matches every packet. + </p> + + <p> + (The above is pretty ambitious. It probably makes sense to initially + implement only a subset of this specification. The full specification + is written out mainly to get an idea of what a fully general matching + expression language could include.) + </p> + </column> + + <column name="actions"> + <p> + Below, a <var>value</var> is either a <var>constant</var> or a + <var>field</var>. The following actions seem most likely to be useful: + </p> + + <dl> + <dt><code>drop;</code></dt> + <dd>syntactic sugar for no actions</dd> + + <dt><code>output(<var>value</var>);</code></dt> + <dd>output to port</dd> + + <dt><code>broadcast;</code></dt> + <dd>output to every logical port except ingress port</dd> + + <dt><code>resubmit;</code></dt> + <dd>execute next logical datapath table as subroutine</dd> + + <dt><code>set(<var>field</var>=<var>value</var>);</code></dt> + <dd>set data or metadata field, or copy between fields</dd> + </dl> + + <p> + Following are not well thought out: + </p> + + <dl> + <dt><code>learn</code></dt> + + <dt><code>conntrack</code></dt> + + <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt> + <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd> + + <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt> + <dd> + decrement TTL; execute first set of actions if + successful, second set if TTL decrement fails + </dd> + + <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt> + <dd>generate ICMP reply from packet, execute <var>action</var>s</dd> + + <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt> + <dd>generate ARP from packet, execute <var>action</var>s</dd> + </dl> + + <p> + Other actions can be added as needed + (e.g. <code>push_vlan</code>, <code>pop_vlan</code>, + <code>push_mpls</code>, <code>pop_mpls</code>). + </p> + + <p> + Some of the OVN actions do not map directly to OpenFlow actions, e.g.: + </p> + + <ul> + <li> + <code>with</code>: Implemented as <code>stack_push; + set(</code>...<code>); <var>actions</var>; stack_pop</code>. + </li> + + <li> + <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed + by the successful actions. The failure case has to be implemented by + ovn-controller interpreting packet-ins. It might be difficult to + identify the particular place in the processing pipeline in + <code>ovn-controller</code>; maybe some restrictions will be + necessary. + </li> + + <li> + <code>icmp_reply</code>: Implemented by sending the packet to + <code>ovn-controller</code>, which generates the ICMP reply and sends + the packet back to <code>ovs-vswitchd</code>. + </li> + </ul> + </column> + </table> + + <table name="Bindings" title="Physical-Logical Bindings"> + <p> + Each row in this table identifies the physical location of a logical + port. + </p> + + <p> + For every <code>Logical_Port</code> record in <code>OVN_Northbound</code> + database, <code>ovn-nbd</code> creates a record in this table. + <code>ovn-nbd</code> populates and maintains every column except + the <code>chassis</code> column, which it leaves empty in new records. + </p> + + <p> + <code>ovn-controller</code> populates the <code>chassis</code> column + for the records that identify the logical ports that are located on its + hypervisor, which <code>ovn-controller</code> in turn finds out by + monitoring the local hypervisor's Open_vSwitch database, which + identifies logical ports via the conventions described in + <code>IntegrationGuide.md</code>. + </p> + + <p> + When a chassis shuts down gracefully, it should cleanup the + <code>chassis</code> column that it previously had populated. + (This is not critical because resources hosted on the chassis are equally + unreachable regardless of whether their rows are present.) To handle the + case where a VM is shut down abruptly on one chassis, then brought up + again on a different one, <code>ovn-controller</code> must overwrite the + <code>chassis</code> column with new information. + </p> + + <column name="logical_port"> + A logical port, taken from <ref table="Logical_Port" column="name" + db="OVN_Northbound"/> in the OVN_Northbound database's + <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not + prescribe a particular format for the logical port ID. + </column> + + <column name="parent_port"> + For containers created inside a VM, this is taken from + <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/> + in the OVN_Northbound database's <ref table="Logical_Port" + db="OVN_Northbound"/> table. It is left empty if + <ref column="logical_port"/> belongs to a VM or a container created + in the hypervisor. + </column> + + <column name="tag"> + When <ref column="logical_port"/> identifies the interface of a container + spawned inside a VM, this column identifies the VLAN tag in + the network traffic associated with that container's network interface. + It is left empty if <ref column="logical_port"/> belongs to a VM or a + container created in the hypervisor. + </column> + + <column name="chassis"> + The physical location of the logical port. To successfully identify a + chassis, this column must match the <ref table="Chassis" column="name"/> + column in some row in the <ref table="Chassis"/> table. This is + populated by <code>ovn-controller</code>. + </column> + + <column name="mac"> + <p> + The Ethernet address or addresses used as a source address on the + logical port, each in the form + <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>. + The string <code>unknown</code> is also allowed to indicate that the + logical port has an unknown set of (additional) source addresses. + </p> + + <p> + A VM interface would ordinarily have a single Ethernet address. A + gateway port might initially only have <code>unknown</code>, and then + add MAC addresses to the set as it learns new source addresses. + </p> + </column> + </table> +</database> diff --git a/ovn/ovn.ovsschema b/ovn/ovn.ovsschema deleted file mode 100644 index 6bc4f94..0000000 --- a/ovn/ovn.ovsschema +++ /dev/null @@ -1,60 +0,0 @@ -{ - "name": "OVN", - "tables": { - "Chassis": { - "columns": { - "name": {"type": "string"}, - "encaps": {"type": {"key": {"type": "uuid", - "refTable": "Encap"}, - "min": 1, "max": "unlimited"}}, - "gateway_ports": {"type": {"key": "string", - "value": {"type": "uuid", - "refTable": "Gateway", - "refType": "strong"}, - "min": 0, - "max": "unlimited"}}}, - "isRoot": true, - "indexes": [["name"]]}, - "Encap": { - "columns": { - "type": {"type": "string"}, - "options": {"type": {"key": "string", - "value": "string", - "min": 0, - "max": "unlimited"}}, - "ip": {"type": "string"}}}, - "Gateway": { - "columns": {"attached_port": {"type": "string"}, - "vlan_map": {"type": {"key": {"type": "integer", - "minInteger": 0, - "maxInteger": 4095}, - "value": {"type": "string"}, - "min": 0, - "max": "unlimited"}}}}, - "Pipeline": { - "columns": { - "table_id": {"type": {"key": {"type": "integer", - "minInteger": 0, - "maxInteger": 127}}}, - "priority": {"type": {"key": {"type": "integer", - "minInteger": 0, - "maxInteger": 65535}}}, - "match": {"type": "string"}, - "actions": {"type": "string"}}, - "isRoot": true}, - "Bindings": { - "columns": { - "logical_port": {"type": "string"}, - "parent_port": {"type": {"key": "string", "min": 0, "max": 1}}, - "tag": { - "type": {"key": {"type": "integer", - "minInteger": 0, - "maxInteger": 4095}, - "min": 0, "max": 1}}, - "chassis": {"type": "string"}, - "mac": {"type": {"key": "string", - "min": 0, - "max": "unlimited"}}}, - "indexes": [["logical_port"]], - "isRoot": true}}, - "version": "1.0.0"} diff --git a/ovn/ovn.xml b/ovn/ovn.xml deleted file mode 100644 index 0906272..0000000 --- a/ovn/ovn.xml +++ /dev/null @@ -1,546 +0,0 @@ -<?xml version="1.0" encoding="utf-8"?> -<database name="ovn" title="OVN Database"> - <p> - This database holds logical and physical configuration and state for the - Open Virtual Network (OVN) system to support virtual network abstraction. - For an introduction to OVN, please see <code>ovn-architecture</code>(7). - </p> - - <p> - The OVN database sits at the center of the OVN architecture. It is the one - component that speaks both southbound directly to all the hypervisors and - gateways, via <code>ovn-controller</code>, and northbound to the Cloud - Management System, via <code>ovn-nbd</code>: - </p> - - <h2>Database Structure</h2> - - <p> - The OVN database contains three classes of data with different properties, - as described in the sections below. - </p> - - <h3>Physical Network (PN) data</h3> - - <p> - PN tables contain information about the chassis nodes in the system. This - contains all the information necessary to wire the overlay, such as IP - addresses, supported tunnel types, and security keys. - </p> - - <p> - The amount of PN data is small (O(n) in the number of chassis) and it - changes infrequently, so it can be replicated to every chassis. - </p> - - <p> - The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the - PN tables. - </p> - - <h3>Logical Network (LN) data</h3> - - <p> - LN tables contain the topology of logical switches and routers, ACLs, - firewall rules, and everything needed to describe how packets traverse a - logical network, represented as logical datapath flows (see Logical - Datapath Flows, below). - </p> - - <p> - LN data may be large (O(n) in the number of logical ports, ACL rules, - etc.). Thus, to improve scaling, each chassis should receive only data - related to logical networks in which that chassis participates. Past - experience shows that in the presence of large logical networks, even - finer-grained partitioning of data, e.g. designing logical flows so that - only the chassis hosting a logical port needs related flows, pays off - scale-wise. (This is not necessary initially but it is worth bearing in - mind in the design.) - </p> - - <p> - The LN is a slave of the cloud management system running northbound of OVN. - That CMS determines the entire OVN logical configuration and therefore the - LN's content at any given time is a deterministic function of the CMS's - configuration, although that happens indirectly via the OVN Northbound DB - and <code>ovn-nbd</code>. - </p> - - <p> - LN data is likely to change more quickly than PN data. This is especially - true in a container environment where VMs are created and destroyed (and - therefore added to and deleted from logical switches) quickly. - </p> - - <p> - The <ref table="Pipeline"/> table is currently the only LN table. - </p> - - <h3>Bindings data</h3> - - <p> - The Bindings tables contain the current placement of logical components - (such as VMs and VIFs) onto chassis and the bindings between logical ports - and MACs. - </p> - - <p> - Bindings change frequently, at least every time a VM powers up or down - or migrates, and especially quickly in a container environment. The - amount of data per VM (or VIF) is small. - </p> - - <p> - Each chassis is authoritative about the VMs and VIFs that it hosts at any - given time and can efficiently flood that state to a central location, so - the consistency needs are minimal. - </p> - - <p> - The <ref table="Bindings"/> table is currently the only Bindings table. - </p> - - <table name="Chassis" title="Physical Network Hypervisor and Gateway Information"> - <p> - Each row in this table represents a hypervisor or gateway (a chassis) in - the physical network (PN). Each chassis, via - <code>ovn-controller</code>, adds and updates its own row, and keeps a - copy of the remaining rows to determine how to reach other hypervisors. - </p> - - <p> - When a chassis shuts down gracefully, it should remove its own row. - (This is not critical because resources hosted on the chassis are equally - unreachable regardless of whether the row is present.) If a chassis - shuts down permanently without removing its row, some kind of manual or - automatic cleanup is eventually needed; we can devise a process for that - as necessary. - </p> - - <column name="name"> - A chassis name, taken from <ref key="system-id" table="Open_vSwitch" - column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch - database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does - not prescribe a particular format for chassis names. - </column> - - <group title="Encapsulation Configuration"> - <p> - OVN uses encapsulation to transmit logical dataplane packets - between chassis. - </p> - - <column name="encaps"> - Points to supported encapsulation configurations to transmit - logical dataplane packets to this chassis. Each entry is a <ref - table="Encap"/> record that describes the configuration. - </column> - </group> - - <group title="Gateway Configuration"> - <p> - A <dfn>gateway</dfn> is a chassis that forwards traffic between a - logical network and a physical VLAN. Gateways are typically dedicated - nodes that do not host VMs. - </p> - - <column name="gateway_ports"> - Maps from the name of a gateway port, which is typically a physical - port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref - table="Gateway"/> record that describes the details of the gatewaying - function. - </column> - </group> - </table> - - <table name="Encap" title="Encapsulation Types"> - <p> - The <ref column="encaps" table="Chassis"/> column in the <ref - table="Chassis"/> table refers to rows in this table to identify - how OVN may transmit logical dataplane packets to this chassis. - Each chassis, via <code>ovn-controller</code>(8), adds and updates - its own rows and keeps a copy of the remaining rows to determine - how to reach other chassis. - </p> - - <column name="type"> - The encapsulation to use to transmit packets to this chassis. - Examples include <code>geneve</code>, <code>vxlan</code>, and - <code>stt</code>. - </column> - - <column name="options"> - Options for configuring the encapsulation, e.g. IPsec parameters when - IPsec support is introduced. No options are currently defined. - </column> - - <column name="ip"> - The IPv4 address of the encapsulation tunnel endpoint. - </column> - </table> - - <table name="Gateway" title="Physical Network Gateway Ports"> - <p> - The <ref column="gateway_ports" table="Chassis"/> column in the <ref - table="Chassis"/> table refers to rows in this table to connect a chassis - port to a gateway function. Each row in this table describes the logical - networks to which a gateway port is attached. Each chassis, via - <code>ovn-controller</code>(8), adds and updates its own rows, if any - (since most chassis are not gateways), and keeps a copy of the remaining - rows to determine how to reach other chassis. - </p> - - <column name="vlan_map"> - Maps from a VLAN ID to a logical port name. Thus, each named logical - port corresponds to one VLAN on the gateway port. - </column> - - <column name="attached_port"> - The name of the gateway port in the chassis's Open vSwitch integration - bridge. - </column> - </table> - - <table name="Pipeline" title="Logical Network Pipeline"> - <p> - Each row in this table represents one logical flow. The cloud management - system, via its OVN integration, populates this table with logical flows - that implement the L2 and L3 topology specified in the CMS configuration. - Each hypervisor, via <code>ovn-controller</code>, translates the logical - flows into OpenFlow flows specific to its hypervisor and installs them - into Open vSwitch. - </p> - - <p> - Logical flows are expressed in an OVN-specific format, described here. A - logical datapath flow is much like an OpenFlow flow, except that the - flows are written in terms of logical ports and logical datapaths instead - of physical ports and physical datapaths. Translation between logical - and physical flows helps to ensure isolation between logical datapaths. - (The logical flow abstraction also allows the CMS to do less work, since - it does not have to separately compute and push out physical physical - flows to each chassis.) - </p> - - <p> - The default action when no flow matches is to drop packets. - </p> - - <column name="table_id"> - The stage in the logical pipeline, analogous to an OpenFlow table number. - </column> - - <column name="priority"> - The flow's priority. Flows with numerically higher priority take - precedence over those with lower. If two logical datapath flows with the - same priority both match, then the one actually applied to the packet is - undefined. - </column> - - <column name="match"> - <p> - A matching expression. OVN provides a superset of OpenFlow matching - capabilities, using a syntax similar to Boolean expressions in a - programming language. - </p> - - <p> - Matching expressions have two important kinds of primary expression: - <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of - data or metadata. The supported fields are: - </p> - - <ul> - <li> - <code>metadata</code> <code>reg0</code> ... <code>reg7</code> - <code>xreg0</code> ... <code>xreg3</code> - </li> - <li><code>inport</code> <code>outport</code> <code>queue</code></li> - <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li> - <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li> - <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li> - <li><code>ip4.src</code> <code>ip4.dst</code></li> - <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li> - <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li> - <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li> - <li><code>udp.src</code> <code>udp.dst</code></li> - <li><code>sctp.src</code> <code>sctp.dst</code></li> - <li><code>icmp4.type</code> <code>icmp4.code</code></li> - <li><code>icmp6.type</code> <code>icmp6.code</code></li> - <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li> - </ul> - - <p> - Subfields may be addressed using a <code>[]</code> suffix, - e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP - source port. A subfield may be used in any context a field is allowed. - </p> - - <p> - Some fields have prerequisites. OVN implicitly adds clauses to satisfy - these. For example, <code>arp.op == 1</code> is equivalent to - <code>eth.type == 0x0806 && arp.op == 1</code>, and - <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800 - || eth.type == 0x86dd) && ip.proto == 6 && tcp.src == - 80</code>. - </p> - - <p> - Most fields have integer values. Integer constants may be expressed in - several forms: decimal integers, hexadecimal integers prefixed by - <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their - standard forms, and as Ethernet addresses as colon-separated hex - digits. A constant in any of these forms may be followed by a slash - and a second constant (the mask) in the same form, to form a masked - constant. IPv4 and IPv6 masks may be given as integers, to express - CIDR prefixes. - </p> - - <p> - The <code>inport</code> and <code>outport</code> fields have string - values. The useful values are <ref column="logical_port"/> names from - the <ref column="Bindings"/> and <ref column="Gateway"/> table. - </p> - - <p> - The available operators, from highest to lowest precedence, are: - </p> - - <ul> - <li><code>()</code></li> - <li><code>== != < <= > >= in not in</code></li> - <li><code>!</code></li> - <li><code>&&</code></li> - <li><code>||</code></li> - </ul> - - <p> - The <code>()</code> operator is used for grouping. - </p> - - <p> - The equality operator <code>==</code> is the most important operator. - Its operands must be a field and an optionally masked constant, in - either order. The <code>==</code> operator yields true when the - field's value equals the constant's value for all the bits included in - the mask. The <code>==</code> operator translates simply and naturally - to OpenFlow. - </p> - - <p> - The inequality operator <code>!=</code> yields the inverse of - <code>==</code> but its syntax and use are the same. Implementation of - the inequality operator is expensive. - </p> - - <p> - The relational operators are <, <=, >, and >=. Their - operands must be a field and a constant, in either order; the constant - must not be masked. These operators are most commonly useful for L4 - ports, e.g. <code>tcp.src < 1024</code>. Implementation of the - relational operators is expensive. - </p> - - <p> - The set membership operator <code>in</code>, with syntax - ``<code><var>field</var> in { <var>constant1</var>, - <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar - for ``<code>(<var>field</var> == <var>constant1</var> || - <var>field</var> == <var>constant2</var> || </code>...<code>)</code>. - Conversely, ``<code><var>field</var> not in { <var>constant1</var>, - <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar - for ``<code>(<var>field</var> != <var>constant1</var> && - <var>field</var> != <var>constant2</var> && - </code>...<code>)</code>''. - </p> - - <p> - The unary prefix operator <code>!</code> yields its operand's inverse. - </p> - - <p> - The logical AND operator <code>&&</code> yields true only if - both of its operands are true. - </p> - - <p> - The logical OR operator <code>||</code> yields true if at least one of - its operands is true. - </p> - - <p> - Finally, the keywords <code>true</code> and <code>false</code> may also - be used in matching expressions. <code>true</code> is useful by itself - as a catch-all expression that matches every packet. - </p> - - <p> - (The above is pretty ambitious. It probably makes sense to initially - implement only a subset of this specification. The full specification - is written out mainly to get an idea of what a fully general matching - expression language could include.) - </p> - </column> - - <column name="actions"> - <p> - Below, a <var>value</var> is either a <var>constant</var> or a - <var>field</var>. The following actions seem most likely to be useful: - </p> - - <dl> - <dt><code>drop;</code></dt> - <dd>syntactic sugar for no actions</dd> - - <dt><code>output(<var>value</var>);</code></dt> - <dd>output to port</dd> - - <dt><code>broadcast;</code></dt> - <dd>output to every logical port except ingress port</dd> - - <dt><code>resubmit;</code></dt> - <dd>execute next logical datapath table as subroutine</dd> - - <dt><code>set(<var>field</var>=<var>value</var>);</code></dt> - <dd>set data or metadata field, or copy between fields</dd> - </dl> - - <p> - Following are not well thought out: - </p> - - <dl> - <dt><code>learn</code></dt> - - <dt><code>conntrack</code></dt> - - <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt> - <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd> - - <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt> - <dd> - decrement TTL; execute first set of actions if - successful, second set if TTL decrement fails - </dd> - - <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt> - <dd>generate ICMP reply from packet, execute <var>action</var>s</dd> - - <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt> - <dd>generate ARP from packet, execute <var>action</var>s</dd> - </dl> - - <p> - Other actions can be added as needed - (e.g. <code>push_vlan</code>, <code>pop_vlan</code>, - <code>push_mpls</code>, <code>pop_mpls</code>). - </p> - - <p> - Some of the OVN actions do not map directly to OpenFlow actions, e.g.: - </p> - - <ul> - <li> - <code>with</code>: Implemented as <code>stack_push; - set(</code>...<code>); <var>actions</var>; stack_pop</code>. - </li> - - <li> - <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed - by the successful actions. The failure case has to be implemented by - ovn-controller interpreting packet-ins. It might be difficult to - identify the particular place in the processing pipeline in - <code>ovn-controller</code>; maybe some restrictions will be - necessary. - </li> - - <li> - <code>icmp_reply</code>: Implemented by sending the packet to - <code>ovn-controller</code>, which generates the ICMP reply and sends - the packet back to <code>ovs-vswitchd</code>. - </li> - </ul> - </column> - </table> - - <table name="Bindings" title="Physical-Logical Bindings"> - <p> - Each row in this table identifies the physical location of a logical - port. - </p> - - <p> - For every <code>Logical_Port</code> record in <code>OVN_Northbound</code> - database, <code>ovn-nbd</code> creates a record in this table. - <code>ovn-nbd</code> populates and maintains every column except - the <code>chassis</code> column, which it leaves empty in new records. - </p> - - <p> - <code>ovn-controller</code> populates the <code>chassis</code> column - for the records that identify the logical ports that are located on its - hypervisor, which <code>ovn-controller</code> in turn finds out by - monitoring the local hypervisor's Open_vSwitch database, which - identifies logical ports via the conventions described in - <code>IntegrationGuide.md</code>. - </p> - - <p> - When a chassis shuts down gracefully, it should cleanup the - <code>chassis</code> column that it previously had populated. - (This is not critical because resources hosted on the chassis are equally - unreachable regardless of whether their rows are present.) To handle the - case where a VM is shut down abruptly on one chassis, then brought up - again on a different one, <code>ovn-controller</code> must overwrite the - <code>chassis</code> column with new information. - </p> - - <column name="logical_port"> - A logical port, taken from <ref table="Logical_Port" column="name" - db="OVN_Northbound"/> in the OVN_Northbound database's - <ref table="Logical_Port" db="OVN_Northbound"/> table. OVN does not - prescribe a particular format for the logical port ID. - </column> - - <column name="parent_port"> - For containers created inside a VM, this is taken from - <ref table="Logical_Port" column="parent_name" db="OVN_Northbound"/> - in the OVN_Northbound database's <ref table="Logical_Port" - db="OVN_Northbound"/> table. It is left empty if - <ref column="logical_port"/> belongs to a VM or a container created - in the hypervisor. - </column> - - <column name="tag"> - When <ref column="logical_port"/> identifies the interface of a container - spawned inside a VM, this column identifies the VLAN tag in - the network traffic associated with that container's network interface. - It is left empty if <ref column="logical_port"/> belongs to a VM or a - container created in the hypervisor. - </column> - - <column name="chassis"> - The physical location of the logical port. To successfully identify a - chassis, this column must match the <ref table="Chassis" column="name"/> - column in some row in the <ref table="Chassis"/> table. This is - populated by <code>ovn-controller</code>. - </column> - - <column name="mac"> - <p> - The Ethernet address or addresses used as a source address on the - logical port, each in the form - <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>. - The string <code>unknown</code> is also allowed to indicate that the - logical port has an unknown set of (additional) source addresses. - </p> - - <p> - A VM interface would ordinarily have a single Ethernet address. A - gateway port might initially only have <code>unknown</code>, and then - add MAC addresses to the set as it learns new source addresses. - </p> - </column> - </table> -</database> -- 1.7.5.4 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev