On 8/12/15 1:49 PM, Sachin Manpathak wrote:
Thanks, This feedback was helpful.
Perhaps my paraphrasing was misleading. I am not running openstack at scale in order to see how much the DB can sustain. My observation was that the host running nova services saturates on CPU much earlier than the DB does.
You absolutely *want* a single host to be saturated *way* before the database is; the database here is a single vertical service intended to serve hundreds or thousands of horizontally scaled clients simultaneously. A single request at a time should not even be a blip in the database's view of things.



Joins could be one of the reasons. I also observed that background tasks like instance creation, resource/stats updates contend with get queries. In addition to caching optimizations prioritizing tasks in nova could help.

Is there a nova API to fetch list of instances without metadata? Until I find a good way to profile openstack code, changing the queries can be a good experiement IMO.


On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith <d...@danplanet.com <mailto:d...@danplanet.com>> wrote:

    > If OTOH we are referring to the width of the columns and the join is
    > such that you're going to get the same A identity over and over
    again,
    > if you join A and B you get a "wide" row with all of A and B
    with a very
    > large amount of redundant data sent over the wire again and
    again (note
    > that the database drivers available to us in Python always send
    all rows
    > and columns over the wire unconditionally, whether or not we
    fetch them
    > in application code).

    Yep, it was this. N instances times M rows of metadata each. If
    you pull
    100 instances and they each have 30 rows of system metadata, that's a
    lot of data, and most of it is the instance being repeated 30
    times for
    each metadata row. When we first released code doing this, a prominent
    host immediately raised the red flag because their DB traffic shot
    through the roof.

    > In this case you *do* want to do the join in
    > Python to some extent, though you use the database to deliver the
    > simplest information possible to work with first; you get the
    full row
    > for all of the A entries, then a second query for all of B plus A's
    > primary key that can be quickly matched to that of A.

    This is what we're doing. Fetch the list of instances that match the
    filters, then for the ones that were returned, get their metadata.

    --Dan

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to