Hi, guys. Gosh, this turned out to be long. Sorry about that.
I'm adding some tests for the DB api, and I've stumbled across something that we should probably discuss. First of all, most (if not all) of the various *_create methods we have, return quite amputated objects. Any attempt to access related objects will fail with the much too familiar: DetachedInstanceError: Parent instance <Whatever at 0x4f5c8d0> is not bound to a Session; lazy load operation of attribute 'other_things' cannot proceed Also, with the SQLAlchemy driver, this test would pass: network = db.network_create(ctxt, {}) network = db.network_get(ctxt, network['id']) instance = db.instance_create(ctxt, {}) self.assertEquals(len(network['virtual_interfaces']), 0) db.virtual_interface_create(ctxt, {'network_id': network['id'], 'instance_id': instance['id']} self.assertEquals(len(network['virtual_interfaces']), 0) network = db.network_get(ctxt, network['id']) self.assertEquals(len(network['virtual_interfaces']), 1) I create a network, pull it out again (as per my comment above), verify that it has no virtual_interfaces related to it, create a virtual interface in this network, and check the network's virtual_interfaces key and finds that it still has length 0. Reloading the network now reveals the new virtual interface. SQLAlchemy does support looking these things up on the fly. In fact, AFAIK, this is its default behaviour. We just override it with joinedload options, because we don't use scoped sessions. My fake db driver looks stuff like this up on the fly (so the assertEquals after the virtual_interface_create will fail with that db driver). So my question is this: Should this be a) looked up on the fly, b) looked up on first key access and then cached, c) looked up when the parent object is loaded and then never again, d) or up to the driver author? Or should we do away with this stuff altogether? I.e. no more looking up related objects by way of __getitem__ lookups, and instead only allow lookups through db methods. So, instead of network['virtual_interfaces'], you'd always do db.virtual_interfaces_get_by_network(ctxt, network['id']). Let's call this option e). I'm pretty undecided myself. If we go with option e) it becomes clear to consumers of the DB api when they're pulling out fresh stuff from the DB and when they're reusing potentially old results. Explicit is better than implicit, but it'll take quite a bit of work to change this. If we go with one of options a) through d), my order of preference is (from most to least preferred): a), d), c), b). There's value in having a right way and a wrong way to do this. If it's either-or, it makes testing (as in validation) more awkward. I'd say it's always possible to do on-the-fly lookups. Overriding __getitem__ and fetching fresh results is pretty simple, and, as mentioned earlier, I believe this is SQLAlchemy's default behaviour (somebody please correct me if I'm wrong). Forcing an arbitrary ORM to replicate the behaviour of b) and c) could be incredibly awkward, and c) is also complicated because there might be reference loops involved. Also, reviewing correct use of something where the need for reloads depends on previous use of your db objects (which might itself be conditional or happen in called methods) sounds like no fun at all. With d) it's pretty straightforward: "Do you want to be sure to have fresh responses? Then reload the object. Otherwise, behaviour is undefined." It's straightforward to explain and review. Option e) is also easy to explain and do reviews for, btw. DB objects with options a) through d) will have trouble making it through a serializer. After dict serialisation, the object isn't going to change, unresolved related objects will not be resolved, and prepopulating everything prior to serialisation is out of the question due to the potential for loops and very big sets of related objects (and their related objects and their related objects's related objects, etc.). I think it would be valuable to not have to think a whole lot about whether a particular db-like object is a "real" db object or whether it came in as a JSON object or over the message bus or whatever. Only option e) will give us that, AFAICS. It seems I've talked myself into preferring option e). It's too much work to do on my own, though, and it's going to be disruptive, so we need to do it real soon. I think it'll be worth it, though. -- Soren Hansen | http://linux2go.dk/ Ubuntu Developer | http://www.ubuntu.com/ OpenStack Developer | http://www.openstack.org/ _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp