Cool, thanks! 

On Feb 16, 2011, at 9:30 AM, Ewan Mellor wrote:

> Dom0 CPU has to serve:
> 
> 1. All I/O for all domains, including inspection / routing of network packets 
> at the bridge.
> 2. All device emulation for non-PV domains, which is particularly expensive 
> and unavoidable during Windows boot.
> 3. Display emulation for all HVM domains, which is moderately expensive.
> 4. VNC service, any time a customer wants to see the console, which is 
> moderately expensive.
> 5. Performance metrics sampling and aggregation.
> 6. Control-plane operations.
> 
> Each individual thing isn't huge, but you've only got one domain 0 with four 
> VCPUs, and it has to serve tens of domUs.  It starts to add up.  If you put 
> something CPU-intensive in there too, such as a gunzip or some crypto, then 
> you can find yourself with 30 customer VMs trying to funnel I/O through one 
> or two dom0 CPUs, and we simply run out.
> 
> The majority of the cost of I/O is the copy of the payload.  We're doing work 
> at the moment to move that cost into the domU, so that it's accounted to the 
> domU CPU time, not dom0.  This will improve fairness between customer VMs, 
> because if the cost of I/O is inside dom0, you can't ensure fairness between 
> I/O-intensive customer VMs vs CPU-intensive ones.
> 
> Yes, the hypervisor does generally schedule dom0 and domUs similarly.  
> There's actually a boost given to dom0, so if it has a VCPU ready to run, it 
> will be scheduled over a domU, but other than that they're basically the 
> same.  The main problem is that there's only one dom0 and lots of domUs.
> 
> Cheers,
> 
> Ewan.
> 
> 
>> -----Original Message-----
>> From: Chris Behrens [mailto:chris.behr...@rackspace.com]
>> Sent: 16 February 2011 17:13
>> To: Ewan Mellor
>> Cc: Chris Behrens; Rick Harris; openstack-xenapi@lists.launchpad.net
>> Subject: Re: [Openstack-xenapi] Glance Plugin/DomU access to SR?
>> 
>> Ewan,
>> 
>> Can you explain why you say dom0 CPU is a scarce resource?  I agree for
>> a lot of reasons work like this should be done in a domU, but I'm just
>> curious.  My thoughts would have been that it's not so scarce.  I know
>> there are things like the disk drivers running in the dom0 kernel doing
>> disk I/O, but I'd think that'd not be much CPU usage.
>> It'd be mostly I/O wait.  And I wouldn't think network receive in dom0
>> vs domU would cause much of a difference overall.  I thought the
>> hypervisor scheduled dom0 and domUs similarly.  Am I wrong?
>> 
>> The only thing I can think of is when running HVM VMs, qemu can be
>> using a lot of CPU.
>> 
>> - Chris
>> 
>> 
>> 
>> On Feb 16, 2011, at 7:12 AM, Ewan Mellor wrote:
>> 
>>> Just for summary, the advantages of having the streaming inside a
>> domU are:
>>> 
>>> 1.       You move the network receive and the image decompression /
>> decryption (if you're using that) off dom0's CPU and onto the domU's.
>> Dom0 CPU is a scarce resource, even in the new release of XenServer
>> with 4 CPUs in domain 0.  This avoids hurting customer workloads by
>> contending inside domain 0.
>>> 2.       You can easily apply network and CPU QoS to the operations
>> above.  This also avoids hurting customer workloads, by simply capping
>> the maximum amount of work that the OpenStack domU can do.
>>> 3.       You can use Python 2.6 for OpenStack, even though XenServer
>> dom0 is stuck on CentOS 5.5 (Python 2.4).
>>> 4.       You get a minor security improvement, because you get to
>> keep a network-facing service out of domain 0.
>>> 
>>> So, this is all fine if you're streaming direct to disk, but as you
>> say, if you want to stream VHD files you have a problem, because the
>> VHD needs to go into a filesystem mounted in domain 0.  It's not
>> possible to write from a domU into a dom0-owned filesystem, without
>> some trickery.  Here are the options as I see them:
>>> 
>>> Option A: Stream in two stages, one from Glance to domU, then from
>> domU to dom0.  The stream from domU to dom0 could just be a really
>> simple network put, and would just fit on the end of the current
>> pipeline.  You lose a bit of dom0 CPU, because of the incoming stream,
>> and it's less efficient overall, because of the two hops. It's primary
>> advantage is that you can do most of the work inside the domU still, so
>> if you are intending to decompress and/or decrypt locally, then this
>> would likely be a win.
>>> 
>>> Option B: Stream from Glance directly into dom0.  This would be a
>> xapi plugin acting as a Glance client.  This is the simplest solution,
>> but loses all the benefits above.   I think it's the one that you're
>> suggesting below.  This leaves you with similar performance problems to
>> the ones that you suffer today on your existing architecture.  The
>> advantage here is simplicity, and it's certainly worth considering.
>>> 
>>> Option C: Run an NFS server in domain 0, and mount that inside the
>> domU.  You can then write direct to dom0's filesystem from the domU.
>> This sounds plausible, but I don't think that I recommend it.  The load
>> on dom0 of doing this is probably no better than Options A or B, which
>> would mean that the complexity wasn't worth it.
>>> 
>>> Option D: Unpack the VHD file inside the domU, and write it through
>> the PV path.  This is probably the option that you haven't considered
>> yet.  The same VHD parsing code that we use in domain 0 is also
>> available in an easily consumable form (called libvhdio).  This can be
>> used to take a VHD file from the network and parse it, so that you can
>> write the allocated blocks directly to the VDI.  This would have all
>> the advantages above, but it adds yet another moving part to the
>> pipeline.  Also, this is going to be pretty simple if you're just using
>> VHDs as a way to handle sparseness.  If you're expecting to stream a
>> whole tree of snapshots as multiple files, and then expect all the
>> relationships between the files to get wired up correctly, then this is
>> not the solution you're looking for.  It's technically doable, but it's
>> very fiddly.
>>> 
>>> So, in summary:
>>> 
>>> Option A: Two hops.  Ideal if you're worried about the cost of
>> decompressing / decrypting on the host.
>>> Option B: Direct to dom0.  Ideal if you want the simplest solution.
>>> Option D: Parse the VHD.  Probably best performance.  Fiddly
>> development work required.  Not a good idea if you want to work with
>> trees of VHDs.
>>> 
>>> Where do you think you stand?  I can advise in more detail about the
>> implementation, if you have a particular option that you prefer.
>>> 
>>> Cheers.
>>> 
>>> Ewan.
>>> 
>>> 
>>> From: openstack-xenapi-
>> bounces+ewan.mellor=citrix....@lists.launchpad.net [mailto:openstack-
>> xenapi-bounces+ewan.mellor=citrix....@lists.launchpad.net] On Behalf Of
>> Rick Harris
>>> Sent: 11 February 2011 22:13
>>> To: openstack-xenapi@lists.launchpad.net
>>> Subject: [Openstack-xenapi] Glance Plugin/DomU access to SR?
>>> 
>>> We recently moved to running the compute-worker within a domU
>> instance.
>>> 
>>> We could make this move because domU can access VBDs in dom0-space by
>>> performing a VBD.plug.
>>> 
>>> The problem is that we'd like to deal deal with whole VHDs rather
>> than kernel,
>>> ramdisk, and partitioning (the impetus of the unified-images BP).
>>> 
>>> So, for snapshots we stream the base copy VHD held in the SR into
>> Glance,
>>> and, likewise, for restores, we stream the snapshot VHD from Glance
>> into the SR, rescan, and
>>> then spin up the instance.
>>> 
>>> The problem is: now that we're running the compute-worker in domU,
>> how can we
>>> access the SR?  Is there a way we can map it into domU space (a la
>> VBD.plug)?
>>> 
>>> The way we solved this for snapshots was by using the Glance plugin
>> and
>>> performing these operations in dom0.
>>> 
>>> So, my questions are:
>>> 
>>> 1. Are SR operations something we need to use the Glance plugin for?
>>> 
>>> 2. If we must use a dom0 plugin for this method of restore, does it
>> make sense to just do
>>> everything image related in the plugin?
>>> 
>>> -Rick
>>> 
>>> Confidentiality Notice: This e-mail message (including any attached
>> or
>>> embedded documents) is intended for the exclusive and confidential
>> use of the
>>> individual or entity to which this message is addressed, and unless
>> otherwise
>>> expressly indicated, is confidential and privileged information of
>> Rackspace.
>>> Any dissemination, distribution or copying of the enclosed material
>> is prohibited.
>>> If you receive this transmission in error, please notify us
>> immediately by e-mail
>>> at ab...@rackspace.com, and delete the original message.
>>> Your cooperation is appreciated.
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack-xenapi
>>> Post to     : openstack-xenapi@lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~openstack-xenapi
>>> More help   : https://help.launchpad.net/ListHelp
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack-xenapi
> Post to     : openstack-xenapi@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack-xenapi
> More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~openstack-xenapi
Post to     : openstack-xenapi@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack-xenapi
More help   : https://help.launchpad.net/ListHelp

Reply via email to