Re: HA issues

2018-02-19 Thread Simon Weller
Andrija,


We pushed quite a few PRs on the exception and lockup issues related to Ceph in 
the agent.


We have a PR for the deletion issue. See if you have it pulled into your 
release - https://github.com/myENA/cloudstack/pull/9


- Si





From: Andrija Panic 
Sent: Saturday, February 17, 2018 1:49 PM
To: dev
Subject: Re: HA issues

Hi Sean,

(we have 2 threads interleaving on the libvirt lockd..) - so, did you
manage to understand what can cause the Agent Disconnect in most cases, for
you specifically? Is there any software (CloudStack) root cause
(disregarding i.e. networking issues etc)

Just our examples, which you should probably not have:

We had CEPH cluster running (with ACS), and there any exception in librbd
would crash JVM and the agent, but this has been fixed mostly -
Now get i.e. agent disconnect when ACS try to delete volume on CEPH (and
for some reason not succeed withing 30 minutes, volume deletion fails) -
then libvirt get's completety stuck (virsh list even dont work)...so  agent
get's disconnect eventually.

It would be good to get rid of agent disconnections in general, obviously
:) so that is why I'm asking (you are on NFS, so would like to see your
experience here).

Thanks

On 16 February 2018 at 21:52, Sean Lair  wrote:

> We were in the same situation as Nux.
>
> In our test environment we hit the issue with VMs not getting fenced and
> coming up on two hosts because of VM HA.   However, we updated some of the
> logic for VM HA and turned on libvirtd's locking mechanism.  Now we are
> working great w/o IPMI.  The locking stops the VMs from starting elsewhere,
> and everything recovers very nicely when the host starts responding again.
>
> We are on 4.9.3 and haven't started testing with 4.11 yet, but it may work
> along-side IPMI just fine - it would just have affect the fencing.
> However, we *currently* prefer how we are doing it now, because if the
> agent stops responding, but the host is still up, the VMs continue running
> and no actual downtime is incurred.  Even when VM HA attempts to power on
> the VMs on another host, it just fails the power-up and the VMs continue to
> run on the "agent disconnected" host. The host goes into alarm state and
> our NOC can look into what is wrong the agent on the host.  If IPMI was
> enabled, it sounds like it would power off the host (fence) and force
> downtime for us even if the VMs were actually running OK - and just the
> agent is unreachable.
>
> I plan on submitting our updates via a pull request at some point.  But I
> can also send the updated code to anyone that wants to do some testing
> before then.
>
> -Original Message-
> From: Marcus [mailto:shadow...@gmail.com]
> Sent: Friday, February 16, 2018 11:27 AM
> To: dev@cloudstack.apache.org
> Subject: Re: HA issues
>
> From your other emails it sounds as though you do not have IPMI
> configured, nor host HA enabled, correct? In this case, the correct thing
> to do is nothing. If CloudStack cannot guarantee the VM state (as is the
> case with an unreachable hypervisor), it should do nothing, for fear of
> causing a split brain and corrupting the VM disk (VM running on two hosts).
>
> Clustering and fencing is a tricky proposition. When CloudStack (or any
> other cluster manager) is not configured to or cannot guarantee state then
> things will simply lock up, in this case your HA VM on your broken
> hypervisor will not run elsewhere. This has been the case for a long time
> with CloudStack, HA would only start a VM after the original hypervisor
> agent came back and reported no VM is running.
>
> The new feature, from what I gather, simply adds the possibility of
> CloudStack being able to reach out and shut down the hypervisor to
> guarantee state. At that point it can start the VM elsewhere. If something
> fails in that process (IPMI unreachable, for example, or bad credentials),
> you're still going to be stuck with a VM not coming back.
>
> It's the nature of the thing. I'd be wary of any HA solution that does not
> reach out and guarantee state via host or storage fencing before starting a
> VM elsewhere, as it will be making assumptions. Its entirely possible a VM
> might be unreachable or unable to access it storage for a short while, a
> new instance of the VM is started elsewhere, and the original VM comes back.
>
> On Wed, Jan 17, 2018 at 9:02 AM Nux!  wrote:
>
> > Hi Rohit,
> >
> > I've reinstalled and tested. Still no go with VM HA.
> >
> > What I did was to kernel panic that particular HV ("echo c >
> > /proc/sysrq-trigger" <- this is a proper way to simulate a crash).
> > What happened next is the HV got marked as "Alert", the VM on it was
> > all the time marked as "Running" and it was not migrated to another HV.
> > Once the panicked HV has booted back the VM reboots and becomes
> available.
> >
> > I'm running on CentOS 7 mgmt + HVs and NFS primary and secondary storage.
> > The VM has HA enabled service offering.
> > H

Re: Potential backward incompatibility problem in building SystemVM

2018-02-19 Thread Rohit Yadav
Khosrow,


Your argument about the ability to have a given name be used as the final 
artifact name is not correct for prior 4.11 versions, as that only was a 
specific case/condition for systemvm template to copy/rename and then use an 
existing definition, and not with rest of the veewee definitions that existed 
in the folder.


Even if the name of the folder was systemvm64template, you build job may still 
fail due to the build process and tool/environment changes. Finally, you can 
always rename the build artifacts. The issue IMO is with your build job and not 
the current build scripts.


The README file already mentions what arguments can be used to build templates 
but can indeed get some improvements:

https://github.com/apache/cloudstack/blob/master/tools/appliance/README.md


Both your suggestions are okay with me, you may improve the README or send a PR 
that modifies the build.sh script to handle exporting appliances to custom name 
(but as a general option, not specific to systemvmtemplate).


- Rohit






From: Khosrow Moossavi 
Sent: Monday, February 19, 2018 3:07:41 AM
To: dev@cloudstack.apache.org
Subject: Re: Potential backward incompatibility problem in building SystemVM

Daan, Rohit

With the new packer build (4.11+) one cannot give "blah" to be the final
name of the template.
The script will look for a folder called "blah" in appliance folder which
does not exist. But in before
packer (prior to 4.11) one can give "blah" to be the final name, because
the script would copy
"definition" to "blah" folder and continue the script.

In our own case, for instance, we needed to change the Jenkins script
because it was failing with
"systemvm64template" as the name on 4.11.0.1.

So I guess my point is either 1) we need to change the script to still
handle custom name, 2) have
this documented somewhere (applicane/README may be) that the only accepted
name will be
"systemvmtemplate".

Minor change either way.



On Sat, Feb 17, 2018 at 2:30 PM, Rohit Yadav 
wrote:

> Khosrow,
>
>
> The name 'systemvmtemplate' refers to the name of the folder, the build.sh
> script now accepts a folder that has the packer definitions such as the
> built-in one or any other future packer based templates. The systemvm
> template's file name is never used for compatibilities sake, one can choose
> whatever name they want and they will be used okay as long as that name is
> correctly configured in the global settings. I don't understand the bit
> about name/compatbility.
>
>
> Historically, we used to a 32bit template with its definition defined in
> systemvmtemplate and then we moved to 64-bit template with introduction of
> definitions in systemvm64template folder, and when we did that we added
> that constraint to remove and rename folders while are not needed/useful
> anymore.
>
>
> Wrt building it's not backward compatible as well, the build process has
> been changed from virtualbox+veewee/ruby based to packer+qemu/kvm based so
> the old script/jobs are broken as well.
>
>
> - Rohit
>
> 
>
>
>
> 
> From: Khosrow Moossavi 
> Sent: Friday, February 16, 2018 5:58:59 PM
> To: dev@cloudstack.apache.org
> Subject: Potential backward incompatibility problem in building SystemVM
>
> Hi
>
> I just noticed that the changes [1] in tools/applince/build.sh may break
> backward compatibility
> of the building process of systemvmtremplate.
>
> In the highlighted (and now removed) line, we used to have a predefined
> name as "systemvm64template"
> and if one still wants to execute "build.sh systemvm64template ..." (or any
> other name they
> want) the build will break (becauase of the now new if condition).
>
> Was this intentional to always have a new "systemvmtemplate" as the name or
> the new if
> condition should be fixed? Super simple to fix anyway.
>
> if [ "systemvmtemplate" != "${appliance_build_name}" ]; then
>
> instead of:
>
> if [ "${appliance}" != "${appliance_build_name}" ]; then
>
>
> [1]
> https://github.com/apache/cloudstack/commit/3839239a21fc14a64acc18900ae303
> 961036ef91#diff-68ae31f5f30dae8f541e26b8acbd75eeL247
>
> Khosrow Moossavi
> CloudOps
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: HA issues

2018-02-19 Thread Andrija Panic
Hi Simon,

a big thank you for this, will have our devs check this!

Thanks!

On 19 February 2018 at 09:02, Simon Weller  wrote:

> Andrija,
>
>
> We pushed quite a few PRs on the exception and lockup issues related to
> Ceph in the agent.
>
>
> We have a PR for the deletion issue. See if you have it pulled into your
> release - https://github.com/myENA/cloudstack/pull/9
>
>
> - Si
>
>
>
>
> 
> From: Andrija Panic 
> Sent: Saturday, February 17, 2018 1:49 PM
> To: dev
> Subject: Re: HA issues
>
> Hi Sean,
>
> (we have 2 threads interleaving on the libvirt lockd..) - so, did you
> manage to understand what can cause the Agent Disconnect in most cases, for
> you specifically? Is there any software (CloudStack) root cause
> (disregarding i.e. networking issues etc)
>
> Just our examples, which you should probably not have:
>
> We had CEPH cluster running (with ACS), and there any exception in librbd
> would crash JVM and the agent, but this has been fixed mostly -
> Now get i.e. agent disconnect when ACS try to delete volume on CEPH (and
> for some reason not succeed withing 30 minutes, volume deletion fails) -
> then libvirt get's completety stuck (virsh list even dont work)...so  agent
> get's disconnect eventually.
>
> It would be good to get rid of agent disconnections in general, obviously
> :) so that is why I'm asking (you are on NFS, so would like to see your
> experience here).
>
> Thanks
>
> On 16 February 2018 at 21:52, Sean Lair  wrote:
>
> > We were in the same situation as Nux.
> >
> > In our test environment we hit the issue with VMs not getting fenced and
> > coming up on two hosts because of VM HA.   However, we updated some of
> the
> > logic for VM HA and turned on libvirtd's locking mechanism.  Now we are
> > working great w/o IPMI.  The locking stops the VMs from starting
> elsewhere,
> > and everything recovers very nicely when the host starts responding
> again.
> >
> > We are on 4.9.3 and haven't started testing with 4.11 yet, but it may
> work
> > along-side IPMI just fine - it would just have affect the fencing.
> > However, we *currently* prefer how we are doing it now, because if the
> > agent stops responding, but the host is still up, the VMs continue
> running
> > and no actual downtime is incurred.  Even when VM HA attempts to power on
> > the VMs on another host, it just fails the power-up and the VMs continue
> to
> > run on the "agent disconnected" host. The host goes into alarm state and
> > our NOC can look into what is wrong the agent on the host.  If IPMI was
> > enabled, it sounds like it would power off the host (fence) and force
> > downtime for us even if the VMs were actually running OK - and just the
> > agent is unreachable.
> >
> > I plan on submitting our updates via a pull request at some point.  But I
> > can also send the updated code to anyone that wants to do some testing
> > before then.
> >
> > -Original Message-
> > From: Marcus [mailto:shadow...@gmail.com]
> > Sent: Friday, February 16, 2018 11:27 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: HA issues
> >
> > From your other emails it sounds as though you do not have IPMI
> > configured, nor host HA enabled, correct? In this case, the correct thing
> > to do is nothing. If CloudStack cannot guarantee the VM state (as is the
> > case with an unreachable hypervisor), it should do nothing, for fear of
> > causing a split brain and corrupting the VM disk (VM running on two
> hosts).
> >
> > Clustering and fencing is a tricky proposition. When CloudStack (or any
> > other cluster manager) is not configured to or cannot guarantee state
> then
> > things will simply lock up, in this case your HA VM on your broken
> > hypervisor will not run elsewhere. This has been the case for a long time
> > with CloudStack, HA would only start a VM after the original hypervisor
> > agent came back and reported no VM is running.
> >
> > The new feature, from what I gather, simply adds the possibility of
> > CloudStack being able to reach out and shut down the hypervisor to
> > guarantee state. At that point it can start the VM elsewhere. If
> something
> > fails in that process (IPMI unreachable, for example, or bad
> credentials),
> > you're still going to be stuck with a VM not coming back.
> >
> > It's the nature of the thing. I'd be wary of any HA solution that does
> not
> > reach out and guarantee state via host or storage fencing before
> starting a
> > VM elsewhere, as it will be making assumptions. Its entirely possible a
> VM
> > might be unreachable or unable to access it storage for a short while, a
> > new instance of the VM is started elsewhere, and the original VM comes
> back.
> >
> > On Wed, Jan 17, 2018 at 9:02 AM Nux!  wrote:
> >
> > > Hi Rohit,
> > >
> > > I've reinstalled and tested. Still no go with VM HA.
> > >
> > > What I did was to kernel panic that particular HV ("echo c >
> > > /proc/sysrq-trigger" <- this is a proper way to simulate a crash).
> > >

Re: HA issues

2018-02-19 Thread Simon Weller
Also these -

https://github.com/myENA/cloudstack/pull/20/commits/1948ce5d24b87433ae9e8f4faebdfc20b56b751a


https://github.com/myENA/cloudstack/pull/12/commits






From: Andrija Panic 
Sent: Monday, February 19, 2018 5:23 AM
To: dev
Subject: Re: HA issues

Hi Simon,

a big thank you for this, will have our devs check this!

Thanks!

On 19 February 2018 at 09:02, Simon Weller  wrote:

> Andrija,
>
>
> We pushed quite a few PRs on the exception and lockup issues related to
> Ceph in the agent.
>
>
> We have a PR for the deletion issue. See if you have it pulled into your
> release - https://github.com/myENA/cloudstack/pull/9
[https://avatars1.githubusercontent.com/u/1444686?s=400&v=4]

context cleanup by leprechau · Pull Request #9 · 
myENA/cloudstack
github.com
cleanup rbd image and rados context even if exceptions are thrown in 
deletePhysicalDisk routine



>
>
> - Si
>
>
>
>
> 
> From: Andrija Panic 
> Sent: Saturday, February 17, 2018 1:49 PM
> To: dev
> Subject: Re: HA issues
>
> Hi Sean,
>
> (we have 2 threads interleaving on the libvirt lockd..) - so, did you
> manage to understand what can cause the Agent Disconnect in most cases, for
> you specifically? Is there any software (CloudStack) root cause
> (disregarding i.e. networking issues etc)
>
> Just our examples, which you should probably not have:
>
> We had CEPH cluster running (with ACS), and there any exception in librbd
> would crash JVM and the agent, but this has been fixed mostly -
> Now get i.e. agent disconnect when ACS try to delete volume on CEPH (and
> for some reason not succeed withing 30 minutes, volume deletion fails) -
> then libvirt get's completety stuck (virsh list even dont work)...so  agent
> get's disconnect eventually.
>
> It would be good to get rid of agent disconnections in general, obviously
> :) so that is why I'm asking (you are on NFS, so would like to see your
> experience here).
>
> Thanks
>
> On 16 February 2018 at 21:52, Sean Lair  wrote:
>
> > We were in the same situation as Nux.
> >
> > In our test environment we hit the issue with VMs not getting fenced and
> > coming up on two hosts because of VM HA.   However, we updated some of
> the
> > logic for VM HA and turned on libvirtd's locking mechanism.  Now we are
> > working great w/o IPMI.  The locking stops the VMs from starting
> elsewhere,
> > and everything recovers very nicely when the host starts responding
> again.
> >
> > We are on 4.9.3 and haven't started testing with 4.11 yet, but it may
> work
> > along-side IPMI just fine - it would just have affect the fencing.
> > However, we *currently* prefer how we are doing it now, because if the
> > agent stops responding, but the host is still up, the VMs continue
> running
> > and no actual downtime is incurred.  Even when VM HA attempts to power on
> > the VMs on another host, it just fails the power-up and the VMs continue
> to
> > run on the "agent disconnected" host. The host goes into alarm state and
> > our NOC can look into what is wrong the agent on the host.  If IPMI was
> > enabled, it sounds like it would power off the host (fence) and force
> > downtime for us even if the VMs were actually running OK - and just the
> > agent is unreachable.
> >
> > I plan on submitting our updates via a pull request at some point.  But I
> > can also send the updated code to anyone that wants to do some testing
> > before then.
> >
> > -Original Message-
> > From: Marcus [mailto:shadow...@gmail.com]
> > Sent: Friday, February 16, 2018 11:27 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: HA issues
> >
> > From your other emails it sounds as though you do not have IPMI
> > configured, nor host HA enabled, correct? In this case, the correct thing
> > to do is nothing. If CloudStack cannot guarantee the VM state (as is the
> > case with an unreachable hypervisor), it should do nothing, for fear of
> > causing a split brain and corrupting the VM disk (VM running on two
> hosts).
> >
> > Clustering and fencing is a tricky proposition. When CloudStack (or any
> > other cluster manager) is not configured to or cannot guarantee state
> then
> > things will simply lock up, in this case your HA VM on your broken
> > hypervisor will not run elsewhere. This has been the case for a long time
> > with CloudStack, HA would only start a VM after the original hypervisor
> > agent came back and reported no VM is running.
> >
> > The new feature, from what I gather, simply adds the possibility of
> > CloudStack being able to reach out and shut down the hypervisor to
> > guarantee state. At that point it can start the VM elsewhere. If
> something
> > fails in that process (IPMI unreachable, for example, or bad
> credentials),
> > you're still going to be stuck with a VM not coming back.
> >
> > It's the nature of the thing. I'd be wary of any HA solution that doe

Re: HA issues

2018-02-19 Thread Andrija Panic
HI Again Simon,

thanks for these, we also had something commited (actually the whole RBD
snap deletion logic on CEPH side, which was initially missing):
https://github.com/apache/cloudstack/pull/1230/commits and some stuff were
also handled here.

But what we have here, is, afaik a new case, where customer try to delete
large volume on CEPH (4TB in our case, or a bit smaller - happened a few
times), then this takes time (whatever reason...) - this is during the
actual delete command sent from MGMT to AGENT, so not "lazy delete" with
later purge thread) - the deletion process itself timeout after 30minutes
(1800sec - I guess this is the default "wait" global parameter) and after
this libvirt just hanges (kill -9 is the only way to restart libvirt)

i.e: delete volume sent 14

2018-02-14 15:20:53,032 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Trying to fetch storage pool
8457c284-cf5d-3979-b82e-32ea5efeb97b from libvirt
2018-02-14 15:20:53,032 DEBUG [kvm.resource.LibvirtConnection]
(agentRequest-Handler-5:null) Looking for libvirtd connection at:
qemu:///system
2018-02-14 15:20:53,041 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully refreshed pool
8457c284-cf5d-3979-b82e-32ea5efeb97b Capacity: 235312757125120 Used:
44773027414768 Available: 99561505730560
2018-02-14 15:20:53,190 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Attempting to remove volume
84c12d6f-7536-429a-8994-1b860446b672 from pool
8457c284-cf5d-3979-b82e-32ea5efeb97b
2018-02-14 15:20:53,190 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
image cold-storage/84c12d6f-7536-429a-8994-1b860446b672 prior to removing
the image
2018-02-14 15:20:53,202 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
mon..local:6789
2018-02-14 15:20:53,216 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Fetching list of snapshots of RBD image
cold-storage/84c12d6f-7536-429a-8994-1b860446b672
2018-02-14 15:20:53,224 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully unprotected and removed any
snapshots of cold-storage/84c12d6f-7536-429a-8994-1b860446b672 Continuing
to remove the RBD image
2018-02-14 15:20:53,228 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully closed rbd image and destroyed io
context.
2018-02-14 15:20:53,229 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Instructing libvirt to remove volume
84c12d6f-7536-429a-8994-1b860446b672 from pool
8457c284-cf5d-3979-b82e-32ea5efeb97b

then 30 minutes later, timeout.

2018-02-14 15:50:53,030 WARN  [c.c.a.m.AgentAttache]
(catalina-exec-4:ctx-468724bd ctx-b9984210) (logid:d23624d1) Seq
16-3001086201689154455: *Timed out on Seq 16-3001086201689154455*:  { Cmd ,
MgmtId: 90520740254323, via: 16(eq4-c2-2), Ver: v1, Flags: 100011,
[{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"84c12d6f-7536-429a-8994-1b860446b672","volumeType":"DATADISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"8457c284-cf5d-3979-b82e-32ea5efeb97b","id":1,"poolType":"RBD","host":"mon..local","path":"cold-storage","port":6789,"url":"RBD://mon..local/cold-storage/?ROLE=Primary&STOREUUID=8457c284-cf5d-3979-b82e-32ea5efeb97b"}},"name":"PRDRMSSQL01-DATA-DR","size":1073741824000,"path":"84c12d6f-7536-429a-8994-1b860446b672","volumeId":13889,"accountId":722,"format":"RAW","provisioningType":"THIN","id":13889,"hypervisorType":"KVM"}},"wait":0}}]
}

and then 3minute later

We get agent disconnected (virsh stuck, even virsh list don't work).
Nothing special in libvirt logs...

After this the volume still exist on CEPH, but I beleive later is again
removed via purge thread in ACS (I dont remember manually deleting them) -
which is very interesting actaully - why it does (seems to do) immediate
volume deletion, when later it's again removed (by purge thread I assume).

CHeers



On 19 February 2018 at 12:55, Simon Weller  wrote:

> Also these -
>
> https://github.com/myENA/cloudstack/pull/20/commits/
> 1948ce5d24b87433ae9e8f4faebdfc20b56b751a
>
>
> https://github.com/myENA/cloudstack/pull/12/commits
>
>
>
>
>
> 
> From: Andrija Panic 
> Sent: Monday, February 19, 2018 5:23 AM
> To: dev
> Subject: Re: HA issues
>
> Hi Simon,
>
> a big thank you for this, will have our devs check this!
>
> Thanks!
>
> On 19 February 2018 at 09:02, Simon Weller 
> wrote:
>
> > Andrija,
> >
> >
> > We pushed quite a few PRs on the exception and lockup issues related to
> > Ceph in the agent.
> >
> >
> > We have a PR for the deletion issue. See if you have it pulled into your
> > release - https://github.com/myENA/cloudstack/pull/9
> [https://avatars1.githubusercontent.com/u/1444686?s=40

Re: Potential backward incompatibility problem in building SystemVM

2018-02-19 Thread Khosrow Moossavi
Fair enough. I modified our Jenkins job to differentiate between versions.
I might open a PR to emphasis this in README, to prevent further confusion.

Thanks





On Mon, Feb 19, 2018 at 4:35 AM, Rohit Yadav 
wrote:

> Khosrow,
>
>
> Your argument about the ability to have a given name be used as the final
> artifact name is not correct for prior 4.11 versions, as that only was a
> specific case/condition for systemvm template to copy/rename and then use
> an existing definition, and not with rest of the veewee definitions that
> existed in the folder.
>
>
> Even if the name of the folder was systemvm64template, you build job may
> still fail due to the build process and tool/environment changes. Finally,
> you can always rename the build artifacts. The issue IMO is with your build
> job and not the current build scripts.
>
>
> The README file already mentions what arguments can be used to build
> templates but can indeed get some improvements:
>
> https://github.com/apache/cloudstack/blob/master/tools/appliance/README.md
>
>
> Both your suggestions are okay with me, you may improve the README or send
> a PR that modifies the build.sh script to handle exporting appliances to
> custom name (but as a general option, not specific to systemvmtemplate).
>
>
> - Rohit
>
> 
>
>
>
> 
> From: Khosrow Moossavi 
> Sent: Monday, February 19, 2018 3:07:41 AM
> To: dev@cloudstack.apache.org
> Subject: Re: Potential backward incompatibility problem in building
> SystemVM
>
> Daan, Rohit
>
> With the new packer build (4.11+) one cannot give "blah" to be the final
> name of the template.
> The script will look for a folder called "blah" in appliance folder which
> does not exist. But in before
> packer (prior to 4.11) one can give "blah" to be the final name, because
> the script would copy
> "definition" to "blah" folder and continue the script.
>
> In our own case, for instance, we needed to change the Jenkins script
> because it was failing with
> "systemvm64template" as the name on 4.11.0.1.
>
> So I guess my point is either 1) we need to change the script to still
> handle custom name, 2) have
> this documented somewhere (applicane/README may be) that the only accepted
> name will be
> "systemvmtemplate".
>
> Minor change either way.
>
>
>
> On Sat, Feb 17, 2018 at 2:30 PM, Rohit Yadav 
> wrote:
>
> > Khosrow,
> >
> >
> > The name 'systemvmtemplate' refers to the name of the folder, the
> build.sh
> > script now accepts a folder that has the packer definitions such as the
> > built-in one or any other future packer based templates. The systemvm
> > template's file name is never used for compatibilities sake, one can
> choose
> > whatever name they want and they will be used okay as long as that name
> is
> > correctly configured in the global settings. I don't understand the bit
> > about name/compatbility.
> >
> >
> > Historically, we used to a 32bit template with its definition defined in
> > systemvmtemplate and then we moved to 64-bit template with introduction
> of
> > definitions in systemvm64template folder, and when we did that we added
> > that constraint to remove and rename folders while are not needed/useful
> > anymore.
> >
> >
> > Wrt building it's not backward compatible as well, the build process has
> > been changed from virtualbox+veewee/ruby based to packer+qemu/kvm based
> so
> > the old script/jobs are broken as well.
> >
> >
> > - Rohit
> >
> > 
> >
> >
> >
> > 
> > From: Khosrow Moossavi 
> > Sent: Friday, February 16, 2018 5:58:59 PM
> > To: dev@cloudstack.apache.org
> > Subject: Potential backward incompatibility problem in building SystemVM
> >
> > Hi
> >
> > I just noticed that the changes [1] in tools/applince/build.sh may break
> > backward compatibility
> > of the building process of systemvmtremplate.
> >
> > In the highlighted (and now removed) line, we used to have a predefined
> > name as "systemvm64template"
> > and if one still wants to execute "build.sh systemvm64template ..." (or
> any
> > other name they
> > want) the build will break (becauase of the now new if condition).
> >
> > Was this intentional to always have a new "systemvmtemplate" as the name
> or
> > the new if
> > condition should be fixed? Super simple to fix anyway.
> >
> > if [ "systemvmtemplate" != "${appliance_build_name}" ]; then
> >
> > instead of:
> >
> > if [ "${appliance}" != "${appliance_build_name}" ]; then
> >
> >
> > [1]
> > https://github.com/apache/cloudstack/commit/
> 3839239a21fc14a64acc18900ae303
> > 961036ef91#diff-68ae31f5f30dae8f541e26b8acbd75eeL247
> >
> > Khosrow Moossavi
> > CloudOps
> >
> > rohit.ya...@shapeblue.com
> > www.shapeblue.com
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> >
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>