Excerpts from Mike Bayer's message of 2015-08-13 11:03:32 +0800: > > On 8/12/15 10:29 PM, Clint Byrum wrote: > > Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800: > >>> If OTOH we are referring to the width of the columns and the join is > >>> such that you're going to get the same A identity over and over again, > >>> if you join A and B you get a "wide" row with all of A and B with a very > >>> large amount of redundant data sent over the wire again and again (note > >>> that the database drivers available to us in Python always send all rows > >>> and columns over the wire unconditionally, whether or not we fetch them > >>> in application code). > >> Yep, it was this. N instances times M rows of metadata each. If you pull > >> 100 instances and they each have 30 rows of system metadata, that's a > >> lot of data, and most of it is the instance being repeated 30 times for > >> each metadata row. When we first released code doing this, a prominent > >> host immediately raised the red flag because their DB traffic shot > >> through the roof. > >> > > In the past I've taken a different approach to problematic one to > > many relationships and have made the metadata a binary JSON blob. > > Is there some reason that won't work? Of course, this type of thing > > can run into concurrency issues on update, but these can be handled by > > SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata > > is nearly always queried as a whole, this seems like a valid approach > > that would keep DB traffic low but also ease the burden of reassembling > > the collection in nova-api. > > JSON blobs have the disadvantages that you are piggybacking an entirely > different storage model on top of the relational one, losing all the > features you might like about the relational model like rich datatypes > (I understand our JSON decoders trip up on plain datetimes?), insert > defaults, nullability constraints, a fixed, predefined schema that can > be altered in a controlled, all-or-nothing way, efficient storage > characteristics, and of course reasonable querying capabilities. They > are useful IMO only for small sections of data that are amenable to > ad-hoc changes in schema like simple bags of key-value pairs containing > miscellaneous features. >
Agreed on all points!. And metadata for instances is exactly that: a simple bag of key/value strings that is almost always queried and delivered as a whole. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev