If I may, we've detected very poor performance executing snapshots. We think it's due to XenServer's API, I don't know how and why, but the API is very slow and runs one task at a time (if it is doing paralelization it's almost nothing).
Do you know if there's a way to improve IO rates on XS side? thx. On Mon, Dec 3, 2012 at 8:07 PM, Matthew Hartmann <mhartm...@tls.net> wrote: > Thank you Anthony! :) > > Cheers, > > Matthew > > > > Matthew Hartmann > Systems Administrator | V: 812.378.4100 x 850 | E: mhartm...@tls.net > > TLS.NET, Inc. > http://www.tls.net > > > -----Original Message----- > From: Anthony Xu [mailto:xuefei...@citrix.com] > Sent: Monday, December 03, 2012 1:59 PM > To: 'Cloudstack Developers'; cloudstack-us...@incubator.apache.org > Subject: RE: XenServer & VM Snapshots > > CS 3.0.2 is too old version. > > I'm pretty sure mount & copy on the same host in 3.0.4 and 3.0.5. > If mount & copy might be on different hosts, the issue is very likely to > happen. > I didn't hear this issue from QA and users. > > I just checked vmopsSnapshot plug-in for XenServer, at /etc/xapi.d/plugins, > Which mounts secondary storage just before sparse-dd. > > I recommend you to upgrade new version. > > If you still see the issue, > > Please post related management server log and /var/log/SMlog in XenServer. > > > Anthony > > > > > > > > > > > > > -----Original Message----- > > From: Matthew Hartmann [mailto:mhartm...@tls.net] > > Sent: Monday, December 03, 2012 10:31 AM > > To: cloudstack-us...@incubator.apache.org > > Cc: 'Cloudstack Developers' > > Subject: RE: XenServer & VM Snapshots > > > > Anthony: > > > > Thank you for the prompt and informative reply. > > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > The behavior I have witnessed with CS 3.0.2 is that it doesn't always > > do the > > mount & copy on the same host. Out of the 12 tests I've performed, only > > once > > was the mount & copy performed on the same host that the VM was running > > on. > > > > > I think the issue is the backup takes a long time because the data > > volume > > is big and network rate is low. > > > You can increase "BackupSnapshotWait" in global configuration table > > to let > > the backup operation finish. > > > > I increased this in global settings from the default of 9 hours to 16 > > hours. > > The snapshot still doesn't complete on time; it on average copies about > > ~460G before it times out. I'm pretty confident the network rate isn't > > the > > bottle neck as ISOs and imported VHDs install quickly. We have the > > Secondary > > Storage server set as the only internal site allowed to host files. I > > upload > > my ISO or VHD to Secondary Storage server and install using SSVM which > > completes in a very timely manner. With a 1Gb network link, 1TB should > > copy > > in roughly 2 hours (if the link is saturated by the copy process); I've > > only > > found snapshotting (template creation appears to work flawlessly) to > > take an > > insanely long time to complete. > > > > Is there anything else I can do to increase performance or logs I > > should > > check? > > > > Cheers, > > > > Matthew > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartm...@tls.net > > > > TLS.NET, Inc. > > http://www.tls.net > > > > > > -----Original Message----- > > From: Anthony Xu [mailto:xuefei...@citrix.com] > > Sent: Monday, December 03, 2012 12:31 PM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: RE: XenServer & VM Snapshots > > > > Hi Matthew, > > > > You analysis is correct except following, > > > > >I must mention that the same Compute Node that ran sparse_dd or > > mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of >Compute Nodes and using > > the > > first one that is available. > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > I think the issue is the backup takes a long time because the data > > volume is > > big and network rate is low. > > You can increase "BackupSnapshotWait" in global configuration table to > > let > > the backup operation finish. > > > > > > Since CS takes the advantage of XenServer image format VHD, it uses VHD > > to > > do snapshot and clone, it requires snapshot to be backed up through > > XenServer host. > > The ideal solution for this issue might be leverage storage snapshot > > and > > clone functionality, Then snapshot back up is executed by storage host, > > relieve some of the limitation. > > Currently CS doesn't support this, it is not hard to support this > > after > > Edison finishes storage frame change, it should be just another storage > > plug-in. > > When CS uses storage server snapshot and clone function, CS needs to > > consider number of snapshot , number of volume limitation of storage > > server. > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Matthew Hartmann [mailto:mhartm...@tls.net] > > Sent: Monday, December 03, 2012 9:08 AM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: XenServer & VM Snapshots > > > > Hello! I'm hoping someone can help me troubleshoot the following issue: > > > > I have a client who has a 960G data volume which contains their VM's > > Exchange Data Store. When starting a snapshot, I found that a process > > is > > started on one of my Compute Nodes titled "sparse_dd". I found that > > this > > process is then sending the output of "sparse_dd" through another > > Compute > > Node's xapi before placing it into the "snapshot store" on Secondary > > Storage. It appears that this is part of the bottle neck as all of our > > systems are connected via gigabit link and should not take 15+ hours to > > create a snapshot. The following is the behavior that I have analyzed > > from > > within my environment: > > > > > > 1) Snapshot is started (either via Manual or Scheduled). > > > > 2) Compute Node 1 "processes the snapshot" by exposing the VDI > > which > > "sparse_dd" then creates a "thin provisioned" snapshot. > > > > 3) The output of sparse_dd is delivered over HTTP to xapi on > > Compute > > Node 2 where the Management Server mounted Secondary Storage. > > > > 4) Compute Node 2 (receiving the snapshot via xapi) stores the > > snapshot > > in the Secondary Storage mount point. > > > > Based on the behavior, I have devise the following logic that I believe > > CloudStack is utilizing: > > > > > > 1) CloudStack creates a "snapshot VDI" via XenServer Pool Master's > > API. > > > > 2) CloudStack finds a Compute Node that can mount Secondary Storage. > > > > 3) CloudStack finds a Compute Node that can run "sparse_dd". > > > > 4) CloudStack uses available Compute node to output the VDI to xapi > > on > > the Compute Node that mounted Secondary Storage. > > > > I must mention that the same Compute Node that ran sparse_dd or mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of Compute Nodes and using > > the > > first one that is available. > > > > Does anyone have any input on the issue I'm having or analysis of how > > CloudStack/XenServer snapshots operate? > > > > Thanks! > > > > Cheers, > > > > Matthew > > > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartm...@tls.net > > > > [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s > > ignat > > ure&utm_source=home&utm_medium=email> > > > > [cid:image018.jpg@01CDD14E.DBAA2E70] > > > > > > [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou > > d/clo > > ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em > > ail> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_ > > servi > > ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_ > > mediu > > m=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo > > rk_en > > gineering.php?utm_campaign=signature&utm_source=network_engineering&utm > > _medi > > um=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da > > ta_ce > > nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema > > il> > > > > > > > > > > > > > > >