VM snapshot is a nice feature to have, but I think it has the same issue as volume snapshot when VM snapshot is backed up to secondary storage.
It has a great room to improve VDI-copy, right now the slowness is not caused by coalesce. Right now vdi-copy goes through all layers of blk-tap2, The context switch consume most of the CPU cycle, if vdi-copy can work on VHD chain directly, it can improve a lot. If you compare performance of vdi-copy and "vhd-util coalesce" on the same VHD chain, you'll see how many it can improve, that's a lot. Anthony > -----Original Message----- > From: Mice Xia [mailto:mice_...@tcloudcomputing.com] > Sent: Monday, December 03, 2012 11:18 AM > To: cloudstack-dev@incubator.apache.org > Subject: 答复: XenServer & VM Snapshots > > > Anthony, > > This is one of the reasons that Im working on VM snapshot on PS, > (instead of volume snapshot) > > I don't think it's easy to improve vdi-copy, considering it needs to > coalesce incremental snapshots and verify the result. > > mice > > -----Original Message----- > From: Anthony Xu [mailto:xuefei...@citrix.com] > Sent: 2012-12-4 (星期二) 3:08 > To: cloudstack-dev@incubator.apache.org > Subject: RE: XenServer & VM Snapshots > > You are right, Vdi-copy is slow. we have reported this to XenServer > team, they are working on this, but no time/road map is provided on > this so far. > > > Anthony > > > -----Original Message----- > > From: Mice Xia [mailto:mice_...@tcloudcomputing.com] > > Sent: Monday, December 03, 2012 11:05 AM > > To: cloudstack-dev@incubator.apache.org > > Subject: 答复: XenServer & VM Snapshots > > > > It is slow to take volume snapshot if your volume is huge, the reason > > is vdi-copy, which is used to backup snapshot to SS, has performance > > problem. > > > > You can't speed it up much for a full snapshot, perhaps you can try > > increasing dom0 memory, or, adjust the ratio between full snapshot > and > > incremental snapshot to reduce the times of full snapshot. > > > > Mice > > > > > > -----Original Message----- > > From: Matthew Hartmann [mailto:mhartm...@tls.net] > > Sent: 2012-12-4 (星期二) 2:31 > > To: cloudstack-us...@incubator.apache.org > > Cc: 'Cloudstack Developers' > > Subject: RE: XenServer & VM Snapshots > > > > Anthony: > > > > Thank you for the prompt and informative reply. > > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > The behavior I have witnessed with CS 3.0.2 is that it doesn't always > > do the > > mount & copy on the same host. Out of the 12 tests I've performed, > only > > once > > was the mount & copy performed on the same host that the VM was > running > > on. > > > > > I think the issue is the backup takes a long time because the data > > volume > > is big and network rate is low. > > > You can increase "BackupSnapshotWait" in global configuration table > > to let > > the backup operation finish. > > > > I increased this in global settings from the default of 9 hours to 16 > > hours. > > The snapshot still doesn't complete on time; it on average copies > about > > ~460G before it times out. I'm pretty confident the network rate > isn't > > the > > bottle neck as ISOs and imported VHDs install quickly. We have the > > SecondaryP > > Storage server set as the only internal site allowed to host files. I > > upload > > my ISO or VHD to Secondary Storage server and install using SSVM > which > > completes in a very timely manner. With a 1Gb network link, 1TB > should > > copy > > in roughly 2 hours (if the link is saturated by the copy process); > I've > > only > > found snapshotting (template creation appears to work flawlessly) to > > take an > > insanely long time to complete. > > > > Is there anything else I can do to increase performance or logs I > > should > > check? > > > > Cheers, > > > > Matthew > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartm...@tls.net > > > > TLS.NET, Inc. > > http://www.tls.net > > > > > > -----Original Message----- > > From: Anthony Xu [mailto:xuefei...@citrix.com] > > Sent: Monday, December 03, 2012 12:31 PM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: RE: XenServer & VM Snapshots > > > > Hi Matthew, > > > > You analysis is correct except following, > > > > >I must mention that the same Compute Node that ran sparse_dd or > > mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of >Compute Nodes and using > > the > > first one that is available. > > > > I'm pretty sure mount and copy are using the same XenServe host. > > > > I think the issue is the backup takes a long time because the data > > volume is > > big and network rate is low. > > You can increase "BackupSnapshotWait" in global configuration table > to > > let > > the backup operation finish. > > > > > > Since CS takes the advantage of XenServer image format VHD, it uses > VHD > > to > > do snapshot and clone, it requires snapshot to be backed up through > > XenServer host. > > The ideal solution for this issue might be leverage storage snapshot > > and > > clone functionality, Then snapshot back up is executed by storage > host, > > relieve some of the limitation. > > Currently CS doesn't support this, it is not hard to support this > > after > > Edison finishes storage frame change, it should be just another > storage > > plug-in. > > When CS uses storage server snapshot and clone function, CS needs to > > consider number of snapshot , number of volume limitation of storage > > server. > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Matthew Hartmann [mailto:mhartm...@tls.net] > > Sent: Monday, December 03, 2012 9:08 AM > > To: Cloudstack Users > > Cc: Cloudstack Developers > > Subject: XenServer & VM Snapshots > > > > Hello! I'm hoping someone can help me troubleshoot the following > issue: > > > > I have a client who has a 960G data volume which contains their VM's > > Exchange Data Store. When starting a snapshot, I found that a process > > is > > started on one of my Compute Nodes titled "sparse_dd". I found that > > this > > process is then sending the output of "sparse_dd" through another > > Compute > > Node's xapi before placing it into the "snapshot store" on Secondary > > Storage. It appears that this is part of the bottle neck as all of > our > > systems are connected via gigabit link and should not take 15+ hours > to > > create a snapshot. The following is the behavior that I have analyzed > > from > > within my environment: > > > > > > 1) Snapshot is started (either via Manual or Scheduled). > > > > 2) Compute Node 1 "processes the snapshot" by exposing the VDI > > which > > "sparse_dd" then creates a "thin provisioned" snapshot. > > > > 3) The output of sparse_dd is delivered over HTTP to xapi on > > Compute > > Node 2 where the Management Server mounted Secondary Storage. > > > > 4) Compute Node 2 (receiving the snapshot via xapi) stores the > > snapshot > > in the Secondary Storage mount point. > > > > Based on the behavior, I have devise the following logic that I > believe > > CloudStack is utilizing: > > > > > > 1) CloudStack creates a "snapshot VDI" via XenServer Pool > Master's > > API. > > > > 2) CloudStack finds a Compute Node that can mount Secondary > Storage. > > > > 3) CloudStack finds a Compute Node that can run "sparse_dd". > > > > 4) CloudStack uses available Compute node to output the VDI to > xapi > > on > > the Compute Node that mounted Secondary Storage. > > > > I must mention that the same Compute Node that ran sparse_dd or > mounted > > Secondary Storage is not always the same. It appears the Management > > Server > > is simply round-robining through the list of Compute Nodes and using > > the > > first one that is available. > > > > Does anyone have any input on the issue I'm having or analysis of how > > CloudStack/XenServer snapshots operate? > > > > Thanks! > > > > Cheers, > > > > Matthew > > > > > > > > Matthew Hartmann > > Systems Administrator | V: 812.378.4100 x 850 | E: mhartm...@tls.net > > > > > [cid:image017.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/?utm_campaign=s > > ignat > > ure&utm_source=home&utm_medium=email> > > > > [cid:image018.jpg@01CDD14E.DBAA2E70] > > > > > > > [cid:image019.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/enterprise_clou > > d/clo > > > ud.php?utm_campaign=signature&utm_source=enterprise_cloud&utm_medium=em > > ail> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > > [cid:image021.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/voip_ > > servi > > > ces/hosted_pbx.php?utm_campaign=signature&utm_source=voip_services&utm_ > > mediu > > m=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > > [cid:image022.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/solutions/netwo > > rk_en > > > gineering.php?utm_campaign=signature&utm_source=network_engineering&utm > > _medi > > um=email> > > > > [cid:image020.jpg@01CDD14E.DBAA2E70] > > > > > [cid:image023.jpg@01CDD14E.DBAA2E70]<http://www.tls.net/data_centers/da > > ta_ce > > > nters.php?utm_campaign=signature&utm_source=data_centers&utm_medium=ema > > il> > > > > > > > > > > > > > > >