Yes snapshots are supposed to be in PS template copy.
On 6/6/19, 9:24 AM, "Yiping Zhang" <[email protected]> wrote:
The nfs volume definitely allows root mount and have RW permissions, as we
already see the volume mounted and template staged on primary storage. The
volume is mounted as NFS3 datastore in vSphere.
Volume snapshot is enabled, I can ask to have snapshot disabled to see if
it makes any differentces. I need to find out more about NFS version and
qtree mode from our storage admin.
One thing I noticed is that when cloudstack templates are staged on to
primary storage, a snapshot was created, which does not exist In the original
OVA or on secondary storage. I suppose this is the expected behavior?
Yiping
On 6/6/19, 6:59 AM, "Sergey Levitskiy" <[email protected]> wrote:
This option is 'vol options name_of_volume nosnapdir on' however if I
recall it right is supposed to work even with .snapshot directory visible
Can you find out all vol options on your netapp volume? I would be most
concerned about:
- NFS version - NFS v4 should be disabled
- security qtree mode to be set to UNIX
- allow root mount
I am also wondering if ACS is able to create ROOT-XX folder so you
might want to watch the content of the DS when ACS tries the operations.
On 6/5/19, 11:43 PM, "Paul Angus" <[email protected]> wrote:
Hi Yiping,
do you have snapshots enabled on the NetApp filer? (it used to be
seen as a ".snapshot" subdirectory in each directory)
If so try disabling snapshots - there used to be a bug where the
.snapshot directory would confuse CloudStack.
[email protected]
https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.com&data=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822&sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3D&reserved=0
Amadeus House, Floral Street, London WC2E 9DPUK
@shapeblue
-----Original Message-----
From: Yiping Zhang <[email protected]>
Sent: 05 June 2019 23:38
To: [email protected]
Subject: Re: Can't start systemVM in a new advanced zone deployment
Hi, Sergey:
I found more logs in vpxa.log ( the esxi hosts are using UTC time
zone, so I was looking at wrong time periods earlier). I have uploaded more
logs into pastebin.
From these log entries, it appears that when copying template to
VM, it tried to open destination VMDK file and got error file not found.
In case that the CloudStack attempted to create a systemVM, the
destination VMDK file path it is looking for is
"<datastore>/<disk-name>/<disk-name>.vmdk", see uploaded log at
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTy&data=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822&sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3D&reserved=0
In case when I manually created new VM from a (different) template
in vCenter UI, the destination VMDK file path it is looking for is
"<datastore>/<VM-NAME>/<VM-NAME>.vmdk", see uploaded log at
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xB&data=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817&sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3D&reserved=0
So, I am confused as to how the path for destination VMDK was
determined and by CloudStack or VMware, how did I end up with this?
Yiping
On 6/5/19, 12:32 PM, "Sergey Levitskiy" <[email protected]> wrote:
Some operations log get transferred to vCenter log vpxd.log. It
is not straightforward to trace I but Vmware will be able to help should you
open case with them.
On 6/5/19, 11:39 AM, "Yiping Zhang"
<[email protected]> wrote:
Hi, Sergey:
During the time period when I had problem cloning template,
there are only a few unique entries in vmkernel.log, and they were repeated
hundreds/thousands of times by all the cpu cores:
2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to
open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs],
(Existing flags 0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa:
hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00
00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC:
0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948:
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense
data: 0x5 0x20 0x0.
The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the
local disk on this host.
Yiping
On 6/5/19, 11:15 AM, "Sergey Levitskiy"
<[email protected]> wrote:
This must be specific to that environment. For a full
clone mode ACS simply calls cloneVMTask of vSphere API so basically until
cloning of that template succeeds when attmepted in vSphere client it would
keep failing in ACS. Can you post vmkernel.log from your ESX host
esx-0001-a-001?
On 6/5/19, 8:47 AM, "Yiping Zhang"
<[email protected]> wrote:
Well, I can always reproduce it in this particular
vSphere set up, but in a different ACS+vSphere environment, I don't see this
problem.
Yiping
On 6/5/19, 1:00 AM, "Andrija Panic"
<[email protected]> wrote:
Yiping,
if you are sure you can reproduce the issue, it
would be good to raise a
GitHub issue and provide as much detail as
possible.
Andrija
On Wed, 5 Jun 2019 at 05:29, Yiping Zhang
<[email protected]>
wrote:
> Hi, Sergey:
>
> Thanks for the tip. After setting
vmware.create.full.clone=false, I was
> able to create and start system VM instances.
However, I feel that the
> underlying problem still exists, and I am
just working around it instead of
> fixing it, because in my lab CloudStack
instance with the same version of
> ACS and vSphere, I still have
vmware.create.full.clone=true and all is
> working as expected.
>
> I did some reading on VMware docs regarding
full clone vs. linked clone.
> It seems that the best practice is to use
full clone for production,
> especially if there are high rates of changes
to the disks. So
> eventually, I need to understand and fix the
root cause for this issue.
> At least for now, I am over this hurdle and
I can move on.
>
> Thanks again,
>
> Yiping
>
> On 6/4/19, 11:13 AM, "Sergey Levitskiy"
<[email protected]> wrote:
>
> Everything looks good and consistent
including all references in VMDK
> and its snapshot. I would try these 2 routes:
> 1. Figure out what vSphere error actually
means from vmkernel log of
> ESX when ACS tries to clone the template. If
the same error happens while
> doing it outside of ACS then a support case
with VMware can be an option
> 2. Try using link clones. This can be
done by this global setting and
> restarting management server
> vmware.create.full.clone
false
>
>
> On 6/4/19, 9:57 AM, "Yiping Zhang"
<[email protected]> wrote:
>
> Hi, Sergey:
>
> Thanks for the help. By now, I have
dropped and recreated DB,
> re-deployed this zone multiple times, blow
away primary and secondary
> storage (including all contents on them) , or
just delete template itself
> from primary storage, multiple times. Every
time I ended up with the same
> error at the same place.
>
> The full management server log, from
the point I seeded the
> systemvmtemplate for vmware, to deploying a
new advanced zone and enable
> the zone to let CS to create system VM's and
finally disable the zone to
> stop infinite loop of trying to recreate
failed system VM's, are posted
> at pastebin:
>
>
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817&sdata=NY2nAe8v%2BP7ANlpjD4xcmOSc7IoBpFizoX3eCuclUHo%3D&reserved=0
>
> Here are the content of relevant
files for the template on primary
> storage:
>
> 1) /vmfsvolumes:
>
> ls -l /vmfs/volumes/
> total 2052
> drwxr-xr-x 1 root root
8 Jan 1 1970
> 414f6a73-87cd6dac-9585-133ddd409762
> lrwxr-xr-x 1 root root
17 Jun 4 16:37
> 42054b8459633172be231d72a52d59d4 ->
afc5e946-03bfe3c2 <== this is
> the NFS datastore for primary storage
> drwxr-xr-x 1 root root
8 Jan 1 1970
> 5cd4b46b-fa4fcff0-d2a1-00215a9b31c0
> drwxr-xr-t 1 root root
1400 Jun 3 22:50
> 5cd4b471-c2318b91-8fb2-00215a9b31c0
> drwxr-xr-x 1 root root
8 Jan 1 1970
> 5cd4b471-da49a95b-bdb6-00215a9b31c0
> drwxr-xr-x 4 root root
4096 Jun 3 23:38
> afc5e946-03bfe3c2
> drwxr-xr-x 1 root root
8 Jan 1 1970
> b70c377c-54a9d28a-6a7b-3f462a475f73
>
> 2) content in template dir on primary
storage:
>
> ls -l
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/
> total 1154596
> -rw------- 1 root root
8192 Jun 3 23:38
>
533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk
> -rw------- 1 root root
366 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk
> -rw-r--r-- 1 root root
268 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog
> -rw------- 1 root root
9711 Jun 3 23:38
>
533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn
> -rw------- 1 root root
2097152000 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk
> -rw------- 1 root root
518 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a.vmdk
> -rw-r--r-- 1 root root
471 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a.vmsd
> -rwxr-xr-x 1 root root
1402 Jun 3 23:38
> 533b6fcf3fa6301aadcc2b168f3f999a.vmtx
>
> 3) *.vmdk file content:
>
> cat
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmdk
> # Disk DescriptorFile
> version=1
> encoding="UTF-8"
> CID=ecb01275
> parentCID=ffffffff
> isNativeSnapshot="no"
> createType="vmfs"
>
> # Extent description
> RW 4096000 VMFS
"533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk"
>
> # The Disk Data Base
> #DDB
>
> ddb.adapterType = "lsilogic"
> ddb.geometry.cylinders = "4063"
> ddb.geometry.heads = "16"
> ddb.geometry.sectors = "63"
> ddb.longContentID =
"1c60ba48999abde959998f05ecb01275"
> ddb.thinProvisioned = "1"
> ddb.uuid = "60 00 C2 9b 52 6d 98
c4-1f 44 51 ce 1e 70 a9 70"
> ddb.virtualHWVersion = "13"
>
> 4) *-0001.vmdk content:
>
> cat
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk
>
> # Disk DescriptorFile
> version=1
> encoding="UTF-8"
> CID=ecb01275
> parentCID=ecb01275
> isNativeSnapshot="no"
> createType="vmfsSparse"
>
parentFileNameHint="533b6fcf3fa6301aadcc2b168f3f999a.vmdk"
> # Extent description
> RW 4096000 VMFSSPARSE
>
"533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk"
>
> # The Disk Data Base
> #DDB
>
> ddb.longContentID =
"1c60ba48999abde959998f05ecb01275"
>
>
> 5) *.vmtx content:
>
> cat
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmtx
>
> .encoding = "UTF-8"
> config.version = "8"
> virtualHW.version = "8"
> nvram =
"533b6fcf3fa6301aadcc2b168f3f999a.nvram"
> pciBridge0.present = "TRUE"
> svga.present = "TRUE"
> pciBridge4.present = "TRUE"
> pciBridge4.virtualDev = "pcieRootPort"
> pciBridge4.functions = "8"
> pciBridge5.present = "TRUE"
> pciBridge5.virtualDev = "pcieRootPort"
> pciBridge5.functions = "8"
> pciBridge6.present = "TRUE"
> pciBridge6.virtualDev = "pcieRootPort"
> pciBridge6.functions = "8"
> pciBridge7.present = "TRUE"
> pciBridge7.virtualDev = "pcieRootPort"
> pciBridge7.functions = "8"
> vmci0.present = "TRUE"
> hpet0.present = "TRUE"
> floppy0.present = "FALSE"
> memSize = "256"
> scsi0.virtualDev = "lsilogic"
> scsi0.present = "TRUE"
> ide0:0.startConnected = "FALSE"
> ide0:0.deviceType = "atapi-cdrom"
> ide0:0.fileName = "CD/DVD drive 0"
> ide0:0.present = "TRUE"
> scsi0:0.deviceType = "scsi-hardDisk"
> scsi0:0.fileName =
"533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk"
> scsi0:0.present = "TRUE"
> displayName =
"533b6fcf3fa6301aadcc2b168f3f999a"
> annotation =
"systemvmtemplate-4.11.2.0-vmware"
> guestOS = "otherlinux-64"
> toolScripts.afterPowerOn = "TRUE"
> toolScripts.afterResume = "TRUE"
> toolScripts.beforeSuspend = "TRUE"
> toolScripts.beforePowerOff = "TRUE"
> uuid.bios = "42 02 f1 40 33 e8 de
e5-1a c5 93 2a c9 12 47 61"
> vc.uuid = "50 02 5b d9 e9 c9 77 86-28
3e 84 00 22 2b eb d3"
> firmware = "bios"
> migrate.hostLog =
"533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog"
>
>
> 6) *.vmsd file content:
>
> cat
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmsd
> .encoding = "UTF-8"
> snapshot.lastUID = "1"
> snapshot.current = "1"
> snapshot0.uid = "1"
> snapshot0.filename =
>
"533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn"
> snapshot0.displayName =
"cloud.template.base"
> snapshot0.description = "Base
snapshot"
> snapshot0.createTimeHigh = "363123"
> snapshot0.createTimeLow = "-679076964"
> snapshot0.numDisks = "1"
> snapshot0.disk0.fileName =
"533b6fcf3fa6301aadcc2b168f3f999a.vmdk"
> snapshot0.disk0.node = "scsi0:0"
> snapshot.numSnapshots = "1"
>
> 7) *-Snapshot1.vmsn content:
>
> cat
>
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn
>
> ҾSnapshot\?%?cfgFilet%t%.encoding =
"UTF-8"
> config.version = "8"
> virtualHW.version = "8"
> nvram =
"533b6fcf3fa6301aadcc2b168f3f999a.nvram"
> pciBridge0.present = "TRUE"
> svga.present = "TRUE"
> pciBridge4.present = "TRUE"
> pciBridge4.virtualDev = "pcieRootPort"
> pciBridge4.functions = "8"
> pciBridge5.present = "TRUE"
> pciBridge5.virtualDev = "pcieRootPort"
> pciBridge5.functions = "8"
> pciBridge6.present = "TRUE"
> pciBridge6.virtualDev = "pcieRootPort"
> pciBridge6.functions = "8"
> pciBridge7.present = "TRUE"
> pciBridge7.virtualDev = "pcieRootPort"
> pciBridge7.functions = "8"
> vmci0.present = "TRUE"
> hpet0.present = "TRUE"
> floppy0.present = "FALSE"
> memSize = "256"
> scsi0.virtualDev = "lsilogic"
> scsi0.present = "TRUE"
> ide0:0.startConnected = "FALSE"
> ide0:0.deviceType = "atapi-cdrom"
> ide0:0.fileName = "CD/DVD drive 0"
> ide0:0.present = "TRUE"
> scsi0:0.deviceType = "scsi-hardDisk"
> scsi0:0.fileName =
"533b6fcf3fa6301aadcc2b168f3f999a.vmdk"
> scsi0:0.present = "TRUE"
> displayName =
"533b6fcf3fa6301aadcc2b168f3f999a"
> annotation =
"systemvmtemplate-4.11.2.0-vmware"
> guestOS = "otherlinux-64"
> toolScripts.afterPowerOn = "TRUE"
> toolScripts.afterResume = "TRUE"
> toolScripts.beforeSuspend = "TRUE"
> toolScripts.beforePowerOff = "TRUE"
> uuid.bios = "42 02 f1 40 33 e8 de
e5-1a c5 93 2a c9 12 47 61"
> vc.uuid = "50 02 5b d9 e9 c9 77 86-28
3e 84 00 22 2b eb d3"
> firmware = "bios"
> migrate.hostLog =
"533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog"
>
>
> ------------
>
> That's all the data on the template
VMDK.
>
> Much appreciate your time!
>
> Yiping
>
>
>
> On 6/4/19, 9:29 AM, "Sergey
Levitskiy" <[email protected]>
> wrote:
>
> Have you tried deleting template
from PS and let ACS to recopy
> it again? If the issue is reproducible we can
try to look what is wrong
> with VMDK. Please post content of
533b6fcf3fa6301aadcc2b168f3f999a.vmdk ,
> 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk
and
> 533b6fcf3fa6301aadcc2b168f3f999a.vmx (their
equitant after ACS finishes
> copying template). Also from one of your ESX
hosts output of this
> ls -al /vmfs/volumes
> ls -al
/vmfs/volumes/*/533b6fcf3fa6301aadcc2b168f3f999a (their
> equitant after ACS finishes copying template)
>
> Can you also post management
server log starting from the
> point you unregister and delete template from
the vCenter.
>
> On 6/4/19, 8:37 AM, "Yiping
Zhang" <[email protected]>
> wrote:
>
> I have manually imported the
OVA to vCenter and
> successfully cloned a VM instance with it, on
the same NFS datastore.
>
>
> On 6/4/19, 8:25 AM, "Sergey
Levitskiy" <
> [email protected]> wrote:
>
> I would suspect the
template is corrupted on the
> secondary storage. You can try
disabling/enabling link clone feature and
> see if it works the other way.
> vmware.create.full.clone
false
>
> Also systemVM template
might have been generated on a
> newer version of vSphere and not compatible
with ESXi 6.5. What you can do
> to validate this is to manually deploy OVA
that is in Secondary storage and
> try to spin up VM from it directly in vCenter.
>
>
>
> On 6/3/19, 5:41 PM,
"Yiping Zhang"
> <[email protected]> wrote:
>
> Hi, list:
>
> I am struggling with
deploying a new advanced zone
> using ACS 4.11.2.0 + vSphere 6.5 + NetApp
volumes for primary and secondary
> storage devices. The initial setup of CS
management server, seeding of
> systemVM template, and advanced zone
deployment all went smoothly.
>
> Once I enabled the
zone in web UI and the systemVM
> template gets copied/staged on to primary
storage device. But subsequent VM
> creations from this template would fail with
errors:
>
>
> 2019-06-03
18:38:15,764 INFO [c.c.h.v.m.HostMO]
> (DirectAgent-7:ctx-d01169cb
esx-0001-a-001.example.org, job-3/job-29,
> cmd: CopyCommand) VM
533b6fcf3fa6301aadcc2b168f3f999a not found in host
> cache
>
> 2019-06-03
18:38:17,017 INFO
> [c.c.h.v.r.VmwareResource]
(DirectAgent-4:ctx-08b54fbd
> esx-0001-a-001.example.org, job-3/job-29,
cmd: CopyCommand)
> VmwareStorageProcessor and
VmwareStorageSubsystemCommandHandler
> successfully reconfigured
>
> 2019-06-03
18:38:17,128 INFO
> [c.c.s.r.VmwareStorageProcessor]
(DirectAgent-4:ctx-08b54fbd
> esx-0001-a-001.example.org, job-3/job-29,
cmd: CopyCommand) creating full
> clone from template
>
> 2019-06-03
18:38:17,657 INFO
> [c.c.h.v.u.VmwareHelper]
(DirectAgent-4:ctx-08b54fbd
> esx-0001-a-001.example.org, job-3/job-29,
cmd: CopyCommand)
> [ignored]failed toi get message for
exception: Error caused by file
>
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk
>
> 2019-06-03
18:38:17,658 ERROR
> [c.c.s.r.VmwareStorageProcessor]
(DirectAgent-4:ctx-08b54fbd
> esx-0001-a-001.example.org, job-3/job-29,
cmd: CopyCommand) clone volume
> from base image failed due to Exception:
java.lang.RuntimeException
>
> Message: Error caused
by file
>
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk
>
>
>
> If I try to create
“new VM from template”
> (533b6fcf3fa6301aadcc2b168f3f999a) on vCenter
UI manually, I will receive
> exactly the same error message. The name of
the VMDK file in the error
> message is a snapshot of the base disk image,
but it is not part of the
> original template OVA on the secondary
storage. So, in the process of
> copying the template from secondary to
primary storage, a snapshot got
> created and the disk became
corrupted/unusable.
>
> Much later in the log
file, there is another
> error message “failed to fetch any free
public IP address” (for ssvm, I
> think). I don’t know if these two errors are
related or if one is the root
> cause for the other error.
>
> The full management
server log is uploaded as
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817&sdata=NY2nAe8v%2BP7ANlpjD4xcmOSc7IoBpFizoX3eCuclUHo%3D&reserved=0
>
> Any help or insight
on what went wrong here are
> much appreciated.
>
> Thanks
>
> Yiping
>
>
>
>
>
>
>
>
>
>
>
>
>
--
Andrija Panić