A performance issue that has always bothered me:

OVSDB has a set data type that matches up with Python's set data type (an
unordered collection of unique items). The in-tree Python library
represents this set type as a list. Not only does it do that, but every
time you call Row.__getattr__() through accessing a Row with a set-type
column, it will loop through those values, add them to a new Python set
(presumably to remove duplicates)...and then return them as a sorted list.
Every single time the attribute is accessed [1].

Some of these sets can be quite huge. In OpenStack Neutron, for example, we
have a default Port Group that all ports are added to by default. This is
many thousands of ports.

Now, it would be very simple to just return a set here and users would get
the benefits of both less overhead on attribute access AND the ability to
do O(1) lookups on these sets. Things like "find port groups that have this
port" etc. would be *much* cheaper. The problem is that this breaks the
API. You can no longer do things like Port_Group.ports[0] as set objects
are unordered and do not have __getitem__(), operations like append() don't
exist, etc. This will also break tons of tests because they tend to rely on
order of objects since they do simple string matching. The latter issue is
probably pretty easy to fix in the tests themselves by just sorting the
results in the tests themselves.

It's probably possible to create a wrapper type object that makes a set
that kinda looks like a list enough to not break things, but that's also
pretty ugly. So I guess my question is, "what do we think about breaking
the API at some point to fix this?" It's pretty terrible behavior, but it's
also annoying when APIs change.

Terry

[1]
https://github.com/openvswitch/ovs/blob/d70688a7291edb432fd66b9230a92842fcfd3607/python/ovs/db/data.py#L498-L504
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to