Re: Requested by a presenter at CCC in Dublin today

2015-10-09 Thread Todd Hebert
Plan is supposed to be 4.5 by the end of this year, but it's looking like we 
won't hit that target, so probably 4.6 early next year.

On 9 Oct 2015 12:11 am, Remi Bergsma  wrote:
Thanks Todd! Nice to find a cloud that has been running since the vmops days :-)

What's the next upgrade? 4.5.2?

Regards, Remi

Sent from my iPhone

> On 08 Oct 2015, at 21:27, Todd Hebert  wrote:
>
> One of the presenters (Remi?) asked me to post this query from the cloud 
> database on our CloudPlatform, which has been in continuous operations since 
> shortly before the name change to cloud.com some years ago.  (originally 
> stood up in 2010. Apparently there was no version table back then.)
>
> mysql> select * from version;
> ++-+-+--+
> | id | version | updated | step |
> ++-+-+--+
> |  1 | 2.2.1   | 2011-11-09 18:07:49 | Complete |
> |  2 | 2.2.2   | 2011-11-09 18:07:49 | Complete |
> |  3 | 2.2.4   | 2011-11-09 18:07:50 | Complete |
> |  4 | 2.2.5   | 2011-11-09 18:07:50 | Complete |
> |  5 | 2.2.6   | 2011-11-09 18:07:50 | Complete |
> |  6 | 2.2.8   | 2011-11-09 18:07:50 | Complete |
> |  7 | 2.2.9   | 2011-11-09 18:07:50 | Complete |
> |  8 | 2.2.10  | 2011-11-09 18:07:49 | Complete |
> |  9 | 2.2.11  | 2011-11-09 18:07:49 | Complete |
> | 10 | 2.2.12  | 2011-11-09 18:07:49 | Complete |
> | 11 | 2.2.13  | 2013-02-22 21:07:32 | Complete |
> | 12 | 2.2.14  | 2013-02-22 21:07:32 | Complete |
> | 13 | 3.0.0   | 2013-02-23 00:23:41 | Complete |
> | 14 | 3.0.1   | 2013-02-23 00:23:41 | Complete |
> | 15 | 3.0.2   | 2013-02-23 00:23:46 | Complete |
> | 16 | 3.0.3   | 2013-02-23 00:23:46 | Complete |
> | 17 | 3.0.4   | 2013-02-23 00:23:46 | Complete |
> | 18 | 3.0.5   | 2013-02-23 00:23:46 | Complete |
> | 19 | 3.0.6   | 2013-02-23 00:23:46 | Complete |
> | 20 | 3.0.7   | 2014-09-29 22:49:18 | Complete |
> | 21 | 4.1.0   | 2014-12-10 10:01:17 | Complete |
> | 22 | 4.2.0   | 2014-12-10 10:01:17 | Complete |
> | 23 | 4.2.1   | 2014-12-10 10:01:17 | Complete |
> | 24 | 4.3.0   | 2014-12-10 10:01:17 | Complete |
> ++-+-+--+
> 24 rows in set (0.00 sec)
>
>
>
> Todd Hebert
> Hosting engineer
>
> E: theb...@digiweb.ie
> A: College Business & Technology Park, Blanchardstown, Dublin 15, Ireland
> W: http://www.digiweb.ie/
>
>
>
>


Todd Hebert
Hosting engineer

E: theb...@digiweb.ie
A: College Business & Technology Park, Blanchardstown, Dublin 15, Ireland 
W: http://www.digiweb.ie/
 





[GitHub] cloudstack pull request: README: useful

2015-10-09 Thread bhaisaab
GitHub user bhaisaab opened a pull request:

https://github.com/apache/cloudstack/pull/917

README: useful



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/cloudstack 4.5-demo-ccceu

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #917


commit b5667c266fdfe2c054ab7e452c33ad37d47b9dbe
Author: Rohit Yadav 
Date:   2015-10-09T08:31:34Z

README: useful

Signed-off-by: Rohit Yadav 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: README: useful

2015-10-09 Thread remibergsma
Github user remibergsma commented on the pull request:

https://github.com/apache/cloudstack/pull/917#issuecomment-146796642
  
Nice demo!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


UI translation for 4.6

2015-10-09 Thread Sebastien Goasguen
Milamber reminded me that before releasing 4.6 we should include the latest 
translated strings for the UI.
If you care about translation, you might be interested to work on it in the 
coming days.

Here is where we stand:

Portuguese (Brazil) 100%
French (France) 100%

Japanese (Japan)99%
Hungarian   98%
Chinese (China) 93%
Dutch (Netherlands) 89%
Russian (Russia)75%
German (Germany)71%
Norwegian Bokmål (Norway)   70%
Korean (Korea)  66%
Spanish 53%


Italian (Italy) 36%
Polish  21%
Catalan 13%
Arabic  12%
Persian 11%
Romanian (Romania)  9%
Chinese (Taiwan)7%
Georgian5%
Thai (Thailand) 5%
Turkish (Turkey)2%
Tatar   2%
Indonesian  1%

you can see that Japanese, Hungarian, Chinese and Dutch are really close to 
100%:

https://www.transifex.com/ke4qqq/CloudStack_UI/


-sebastien

[GitHub] cloudstack pull request: README: useful

2015-10-09 Thread bhaisaab
Github user bhaisaab closed the pull request at:

https://github.com/apache/cloudstack/pull/917


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


slow nfs = reboot all hosts (((

2015-10-09 Thread Andrei Mikhailovsky
Hello 

My issue is whenever my nfs server becomes slow to respond, ACS just bloody 
reboots ALL hosts servers, not just the once running vms with volumes attached 
to the slow nfs server. Recently, i've decided to remove some of the old 
snapshots to free up some disk space. I've deleted about a dozen snapshots and 
I was monitoring the nfs server for progress. At no point did the nfs server 
lost the connectivity, it just became a bit slow and under load. By slow I mean 
i was still able to list files on the nfs mount point and the ssh session was 
still working okay. It was just taking a few more seconds to respond when it 
comes to nfs file listings, creation, deletion, etc. However, the ACS agent has 
just rebooted every single host server, killing all running guests and system 
vms. In my case, I only have two guests with volumes on the nfs server. The 
rest of the vms are running off rbd storage. Yet, all host servers were 
rebooted, even those which were not running guests with nfs volumes. 

Ever since i've started using ACS, it was always pretty dumb in correctly 
determining if the nfs storage is still alive. I would say it has done the 
maniac reboot everything type of behaviour at least 5 times in the past 3 
years. So, in the previous versions of ACS i've just modified the 
kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were 
just pissing everyone off. 

After upgrading to ACS 4.5.x that script has no reboot command and I was 
wondering if it is still possible to instruct the kvmheartbeat script not to 
reboot the host servers? 

Thanks for your advice. 

Andrei 


Fwd: [CentOS-virt] kvm-qemu-ev in testing

2015-10-09 Thread Nux!
Those running KVM on CentOS might be interested in the below.

Kvm-qemu-ev has a few additions compared to the stock one, such as:

Live Snapshots
Live Storage Migration
Live Snapshot Merge
Block I/O Throttling
CEPH Enablement
OpenvSwitch

More here 
https://rhsummit.files.wordpress.com/2014/04/sarathy_h_0945_red_hat_enterprise_virtualization_hypervisor.pdf

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Forwarded Message -
> From: "Karanbir Singh" 
> To: "Discussion about the virtualization on CentOS" 
> Sent: Thursday, 8 October, 2015 00:07:46
> Subject: [CentOS-virt] kvm-qemu-ev in testing

> hi,
> 
> kvm-qemu-ev from virt7-kvm-common-release is now signed and available on
> buildlogs.centos.org for testing, the corresponding release file is
> available in the centos/7/extras/ location on buildlogs as well.
> 
> Once we have some testing, we can push and announce via
> mirror.centos.org for wider adoption.
> 
> Regards,
> 
> --
> Karanbir Singh
> +44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
> GnuPG Key : http://www.karan.org/publickey.asc
> ___
> CentOS-virt mailing list
> centos-v...@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Andrija Panic
I managed this problem the folowing way:
http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/

Cheers
On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:

> Hello
>
> My issue is whenever my nfs server becomes slow to respond, ACS just
> bloody reboots ALL hosts servers, not just the once running vms with
> volumes attached to the slow nfs server. Recently, i've decided to remove
> some of the old snapshots to free up some disk space. I've deleted about a
> dozen snapshots and I was monitoring the nfs server for progress. At no
> point did the nfs server lost the connectivity, it just became a bit slow
> and under load. By slow I mean i was still able to list files on the nfs
> mount point and the ssh session was still working okay. It was just taking
> a few more seconds to respond when it comes to nfs file listings, creation,
> deletion, etc. However, the ACS agent has just rebooted every single host
> server, killing all running guests and system vms. In my case, I only have
> two guests with volumes on the nfs server. The rest of the vms are running
> off rbd storage. Yet, all host servers were rebooted, even those which were
> not running guests with nfs volumes.
>
> Ever since i've started using ACS, it was always pretty dumb in correctly
> determining if the nfs storage is still alive. I would say it has done the
> maniac reboot everything type of behaviour at least 5 times in the past 3
> years. So, in the previous versions of ACS i've just modified the
> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
> just pissing everyone off.
>
> After upgrading to ACS 4.5.x that script has no reboot command and I was
> wondering if it is still possible to instruct the kvmheartbeat script not
> to reboot the host servers?
>
> Thanks for your advice.
>
> Andrei
>


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Andrija Panic
Ah sorry you already use this approach...
On Oct 9, 2015 10:25 AM, "Andrija Panic"  wrote:

> I managed this problem the folowing way:
> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
>
> Cheers
> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:
>
>> Hello
>>
>> My issue is whenever my nfs server becomes slow to respond, ACS just
>> bloody reboots ALL hosts servers, not just the once running vms with
>> volumes attached to the slow nfs server. Recently, i've decided to remove
>> some of the old snapshots to free up some disk space. I've deleted about a
>> dozen snapshots and I was monitoring the nfs server for progress. At no
>> point did the nfs server lost the connectivity, it just became a bit slow
>> and under load. By slow I mean i was still able to list files on the nfs
>> mount point and the ssh session was still working okay. It was just taking
>> a few more seconds to respond when it comes to nfs file listings, creation,
>> deletion, etc. However, the ACS agent has just rebooted every single host
>> server, killing all running guests and system vms. In my case, I only have
>> two guests with volumes on the nfs server. The rest of the vms are running
>> off rbd storage. Yet, all host servers were rebooted, even those which were
>> not running guests with nfs volumes.
>>
>> Ever since i've started using ACS, it was always pretty dumb in correctly
>> determining if the nfs storage is still alive. I would say it has done the
>> maniac reboot everything type of behaviour at least 5 times in the past 3
>> years. So, in the previous versions of ACS i've just modified the
>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
>> just pissing everyone off.
>>
>> After upgrading to ACS 4.5.x that script has no reboot command and I was
>> wondering if it is still possible to instruct the kvmheartbeat script not
>> to reboot the host servers?
>>
>> Thanks for your advice.
>>
>> Andrei
>>
>


[GitHub] cloudstack pull request: CLOUDSTACK-8940: Wrong value is inserted ...

2015-10-09 Thread ke4qqq
Github user ke4qqq commented on the pull request:

https://github.com/apache/cloudstack/pull/916#issuecomment-146813771
  
So several questions: 
1. What versions of ACS does this affect?
2. What tests were failing (if any)?
3. What tests need to be added so we don't have a regression? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: Dockerfile part2

2015-10-09 Thread milamberspace
Github user milamberspace commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/910#discussion_r41616136
  
--- Diff: tools/docker/Dockerfile.centos6 ---
@@ -23,15 +23,19 @@ LABEL Vendor="Apache.org" License="ApacheV2" 
Version="4.6.0"
 ENV 
PKG_URL=http://jenkins.buildacloud.org/job/package-rhel63-master/lastSuccessfulBuild/artifact/dist/rpmbuild/RPMS/x86_64
 
 # install CloudStack
-RUN yum install -y \
+RUN yum install -y nc wget \
 ${PKG_URL}/cloudstack-common-4.6.0-SNAPSHOT.el6.x86_64.rpm \
 ${PKG_URL}/cloudstack-management-4.6.0-SNAPSHOT.el6.x86_64.rpm
 
 RUN cd /etc/cloudstack/management; \
 ln -s tomcat6-nonssl.conf tomcat6.conf; \
 ln -s server-nonssl.xml server.xml; \
-ln -s log4j-cloud.xml log4j.xml
+ln -s log4j-cloud.xml log4j.xml; \
+cd /usr/share/cloudstack-common/scripts/vm/hypervisor/xenserver; \
+wget http://download.cloud.com.s3.amazonaws.com/tools/vhd-util
--- End diff --

Perhaps instead of theses 2 last lines:
wget -O 
/usr/share/cloudstack-common/scripts/vm/hypervisor/xenserver/vhd-util 
http://download.cloud.com.s3.amazonaws.com/tools/vhd-util
(one line)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Wei ZHOU
in 4.5, it has been changed from 'reboot' to 'echo b > /proc/sysrq-trigger'
you can hash out the line and test it.



2015-10-09 10:21 GMT+01:00 Andrei Mikhailovsky :

> Hello
>
> My issue is whenever my nfs server becomes slow to respond, ACS just
> bloody reboots ALL hosts servers, not just the once running vms with
> volumes attached to the slow nfs server. Recently, i've decided to remove
> some of the old snapshots to free up some disk space. I've deleted about a
> dozen snapshots and I was monitoring the nfs server for progress. At no
> point did the nfs server lost the connectivity, it just became a bit slow
> and under load. By slow I mean i was still able to list files on the nfs
> mount point and the ssh session was still working okay. It was just taking
> a few more seconds to respond when it comes to nfs file listings, creation,
> deletion, etc. However, the ACS agent has just rebooted every single host
> server, killing all running guests and system vms. In my case, I only have
> two guests with volumes on the nfs server. The rest of the vms are running
> off rbd storage. Yet, all host servers were rebooted, even those which were
> not running guests with nfs volumes.
>
> Ever since i've started using ACS, it was always pretty dumb in correctly
> determining if the nfs storage is still alive. I would say it has done the
> maniac reboot everything type of behaviour at least 5 times in the past 3
> years. So, in the previous versions of ACS i've just modified the
> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
> just pissing everyone off.
>
> After upgrading to ACS 4.5.x that script has no reboot command and I was
> wondering if it is still possible to instruct the kvmheartbeat script not
> to reboot the host servers?
>
> Thanks for your advice.
>
> Andrei
>


[GitHub] cloudstack pull request: CLOUDSTACK-8941: fix NPE when migrate vm ...

2015-10-09 Thread ustcweizhou
GitHub user ustcweizhou opened a pull request:

https://github.com/apache/cloudstack/pull/918

CLOUDSTACK-8941: fix NPE when migrate vm to other zone-wide pools the 
second time

This is because the pod_id is set to NULL at the first time when I migrate 
the instance to a zone-wide pool (not cluster-wide).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ustcweizhou/cloudstack NPE-storage-migration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #918


commit ee7348b5d79f8c9cc187adbf647574e9c404c5a0
Author: Wei Zhou 
Date:   2015-10-09T11:20:41Z

CLOUDSTACK-8941: fix NPE when migrate vm to other zone-wide pools the 
second time




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-7985: assignVM in Advanced zon...

2015-10-09 Thread ustcweizhou
Github user ustcweizhou commented on the pull request:

https://github.com/apache/cloudstack/pull/844#issuecomment-146839857
  
@wilderrodrigues thanks for you testing. I believe this PR will not impact 
the current network/vr/storage.
It will only change the owner of a vm, and add the vm to a network 
/security group if applicable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Nux!
Hello,

Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your HA 
at the same time, perhaps there's a way to tweak the timeouts to be more 
generous with lazy NFS servers.

Can you go through the logs and see what is happening before the reboot? I am 
not sure exactly which timeout the script cares about, worth investigating.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Andrija Panic" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 10:25:05
> Subject: Re: slow nfs = reboot all hosts (((

> I managed this problem the folowing way:
> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
> 
> Cheers
> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:
> 
>> Hello
>>
>> My issue is whenever my nfs server becomes slow to respond, ACS just
>> bloody reboots ALL hosts servers, not just the once running vms with
>> volumes attached to the slow nfs server. Recently, i've decided to remove
>> some of the old snapshots to free up some disk space. I've deleted about a
>> dozen snapshots and I was monitoring the nfs server for progress. At no
>> point did the nfs server lost the connectivity, it just became a bit slow
>> and under load. By slow I mean i was still able to list files on the nfs
>> mount point and the ssh session was still working okay. It was just taking
>> a few more seconds to respond when it comes to nfs file listings, creation,
>> deletion, etc. However, the ACS agent has just rebooted every single host
>> server, killing all running guests and system vms. In my case, I only have
>> two guests with volumes on the nfs server. The rest of the vms are running
>> off rbd storage. Yet, all host servers were rebooted, even those which were
>> not running guests with nfs volumes.
>>
>> Ever since i've started using ACS, it was always pretty dumb in correctly
>> determining if the nfs storage is still alive. I would say it has done the
>> maniac reboot everything type of behaviour at least 5 times in the past 3
>> years. So, in the previous versions of ACS i've just modified the
>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
>> just pissing everyone off.
>>
>> After upgrading to ACS 4.5.x that script has no reboot command and I was
>> wondering if it is still possible to instruct the kvmheartbeat script not
>> to reboot the host servers?
>>
>> Thanks for your advice.
>>
>> Andrei


[GitHub] cloudstack pull request: CLOUDSTACK-7985: assignVM in Advanced zon...

2015-10-09 Thread miguelaferreira
Github user miguelaferreira commented on the pull request:

https://github.com/apache/cloudstack/pull/844#issuecomment-146845866
  
@ustcweizhou may I ask you if you believe that the change in this PR will 
not impact the areas you mention because you've tested it (been manually?). Or 
because you assume that this change is isolated from hose areas of the system?

We have had production issues in the past due to "isolated" changes to 
plugins for hypervisors we don't even use. Therefore, the urge for testing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [SUSPECTED SPAM]new features in 4.6 ?

2015-10-09 Thread Dustin Wright
Raja,

Thank you for the suggestion. I will test and report back.

Dustin

Sent from my iPhone

> On Oct 9, 2015, at 1:09 AM, Raja Pullela  wrote:
> 
> Dustin, the changes to support VMWare 6.0 are in 4.6, I think.  Can you test 
> the latest build on 4.6 to see if it addresses your need?
> 
> Raja
> -Original Message-
> From: Dustin Wright [mailto:dwri...@untangledtechnology.com]
> Sent: Thursday, October 8, 2015 6:17 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [SUSPECTED SPAM]new features in 4.6 ?
> 
> Any chance VMWare vCenter 6 support will be added? We're limited to 5.5 and I 
> need improved OSX support in 6.
> 
> Thanks for everyone's hard work!
> 
> Dustin
> 
>> On Oct 8, 2015, at 2:56 AM, Raja Pullela  wrote:
>> 
>> Following are shown with 4.6.0 on the page - cwiki - 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Roadmap 
>> 1) Improve Object Storage, CloudStack-8640, Wido
>> 2) Snapshot Improvements, CloudStack-8663, Anshul
>> 3) Deploy user Instance from VM Snapshot, CloudStack-8676, Sateesh
>> 4) Netscaler Integration, CloudStack-8672, CloudStack-8673, Rajesh
>> 5) Billing Quota, CloudStack-8592, Abhinandan, Rohit
>> 6) Improve SAML plugin, CloudStack-8457, Rohit
>> 7) LDAP Improvements - Trust AD and Auto Import, CloudStack-8647, Rajani, 
>> Sarath (even though this shows 4.7 - I know, the code has been merged to 
>> master).
>> 8) Docker/Containers, No JIRA ticket, Pdion
>> 9) iSCSI and HA support in Hyper-V, CloudStack-8444, Anshul
>> 10) Support for non-US keyboards in Console Proxy, CloudStack-8665, Anshul
>> 11) QA/CI Environment, No JIRA Ticket, Pdion, Bharat
>> 
>> -Original Message-
>> From: Sebastien Goasguen [mailto:run...@gmail.com] 
>> Sent: Thursday, October 8, 2015 4:09 AM
>> To: dev@cloudstack.apache.org
>> Subject: [SUSPECTED SPAM]new features in 4.6 ?
>> 
>> Can you guys help me with a few new features in 4.6 or even the last year ?
>> 
>> I lost track but want to mention some of them in tomorrow’s talk
>> 
>> thanks
>> 
>> -sebastien


[GitHub] cloudstack pull request: CLOUDSTACK-8879: Depend in rados-java 0.2...

2015-10-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/cloudstack/pull/889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8879: Depend in rados-java 0.2...

2015-10-09 Thread remibergsma
Github user remibergsma commented on the pull request:

https://github.com/apache/cloudstack/pull/889#issuecomment-146861091
  
Agree to merge this small bugfix. Thanks for running the lifecycle test to 
be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: build-master-jdk18 #363

2015-10-09 Thread jenkins
See 

Changes:

[wido] CLOUDSTACK-8879: Depend in rados-java 0.2.0

--
[...truncated 398 lines...]
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec - in 
com.cloud.utils.UuidUtilsTest
Running com.cloud.utils.crypto.RSAHelperTest
2015-10-09 13:08:13,522 INFO  [utils.crypt.RSAHelper] (main:) [ignored]error 
during public key encryption: Unsupported format
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec - in 
com.cloud.utils.crypto.RSAHelperTest
Running com.cloud.utils.crypto.EncryptionSecretKeyCheckerTest
2015-10-09 13:08:13,896 DEBUG [utils.crypt.EncryptionSecretKeyChecker] (main:) 
Encryption Type: null
2015-10-09 13:08:13,897 DEBUG [utils.crypt.EncryptionSecretKeyChecker] (main:) 
Encryption Type: file
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.089 sec - in 
com.cloud.utils.crypto.EncryptionSecretKeyCheckerTest
Running com.cloud.utils.PropertiesUtilsTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec - in 
com.cloud.utils.PropertiesUtilsTest
Running com.cloud.utils.exception.ExceptionUtilTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec - in 
com.cloud.utils.exception.ExceptionUtilTest
Running com.cloud.utils.storage.QCOW2UtilsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec - in 
com.cloud.utils.storage.QCOW2UtilsTest
Running com.cloud.utils.encoding.UrlEncoderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - in 
com.cloud.utils.encoding.UrlEncoderTest
Running com.cloud.utils.UriUtilsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.06 sec - in 
com.cloud.utils.UriUtilsTest
Running com.cloud.utils.HttpUtilsTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec - in 
com.cloud.utils.HttpUtilsTest
Running com.cloud.utils.backoff.impl.ConstantTimeBackoffTest
2015-10-09 13:08:14,183 INFO  [backoff.impl.ConstantTimeBackoff] (Thread-1:) 
Thread Thread-1 interrupted while waiting for retry
2015-10-09 13:08:14,286 DEBUG [backoff.impl.ConstantTimeBackoffTest] (main:) 
thread started
2015-10-09 13:08:14,286 DEBUG [backoff.impl.ConstantTimeBackoffTest] 
(Thread-2:) before
2015-10-09 13:08:14,389 DEBUG [backoff.impl.ConstantTimeBackoffTest] (main:) 
testing wakeup
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.317 sec - in 
com.cloud.utils.backoff.impl.ConstantTimeBackoffTest
Running com.cloud.utils.ProcessUtilTest
2015-10-09 13:08:14,390 INFO  [backoff.impl.ConstantTimeBackoff] (Thread-2:) 
Thread Thread-2 interrupted while waiting for retry
2015-10-09 13:08:14,390 DEBUG [backoff.impl.ConstantTimeBackoffTest] 
(Thread-2:) after
2015-10-09 13:08:14,396 DEBUG [cloud.utils.ProcessUtil] (main:) 
environment.properties could not be opened
2015-10-09 13:08:14,401 DEBUG [cloud.utils.ProcessUtil] (main:) 
environment.properties could not be opened
2015-10-09 13:08:14,401 DEBUG [cloud.utils.ProcessUtil] (main:) Executing: bash 
-c ps -p 123456 
2015-10-09 13:08:14,472 DEBUG [cloud.utils.ProcessUtil] (main:) Exit value is 1
2015-10-09 13:08:14,473 DEBUG [cloud.utils.ProcessUtil] (main:)   PID TTY   
   TIME CMD
2015-10-09 13:08:14,474 DEBUG [cloud.utils.ProcessUtil] (main:) Executing: bash 
-c echo $PPID 
2015-10-09 13:08:14,480 DEBUG [cloud.utils.ProcessUtil] (main:) Execution is 
successful.
2015-10-09 13:08:14,509 DEBUG [cloud.utils.ProcessUtil] (main:) 
environment.properties could not be opened
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.117 sec - in 
com.cloud.utils.ProcessUtilTest
Running com.cloud.utils.PasswordGeneratorTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in 
com.cloud.utils.PasswordGeneratorTest
Running com.cloud.utils.rest.HttpUriRequestBuilderTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec - in 
com.cloud.utils.rest.HttpUriRequestBuilderTest
Running com.cloud.utils.rest.HttpStatusCodeHelperTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
com.cloud.utils.rest.HttpStatusCodeHelperTest
Running com.cloud.utils.rest.RESTServiceConnectorTest
2015-10-09 13:08:14,808 DEBUG [utils.rest.RESTServiceConnector] (main:) 
Executing retrieve object on /somepath
2015-10-09 13:08:14,809 DEBUG [utils.rest.BasicRestClient] (main:) Executig GET 
request on https://localhost:443/somepath
2015-10-09 13:08:14,819 DEBUG [utils.rest.RESTServiceConnector] (main:) 
Executed request: GET /somepath HTTP/1.1 status was HTTP/1.1 200 OK
2015-10-09 13:08:14,820 DEBUG [utils.rest.RESTServiceConnector] (main:) 
Response entity: [{somethig_not_type : "WrongType"}]
2015-10-09 13:08:14,856 DEBUG [utils.rest.BasicRestClient] (main:) Closing HTTP 
connection
2015-10-09 13:08:14,876 DEBUG [utils.rest.RESTServiceConnector] (main:) 
Executing create object on /somepath
2

[GitHub] cloudstack pull request: CLOUDSTACK-8923: Do not send zoneId with ...

2015-10-09 Thread remibergsma
Github user remibergsma commented on the pull request:

https://github.com/apache/cloudstack/pull/911#issuecomment-14684
  
Hi @borisroman any update on this? :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Build failed in Jenkins: build-master-jdk18 #363

2015-10-09 Thread Boris Schrijver
It fails due to the checkVolumeFileForActivityTest randomly failing on JDK1.8

org.apache.cloudstack.utils.hypervisor.HypervisorUtilsTest.checkVolumeFileForActivityTest(HypervisorUtilsTest.java:70

-- 

Met vriendelijke groet / Kind regards,

Boris Schrijver

PCextreme B.V.

http://www.pcextreme.nl/contact
Tel direct: +31 6 33784542

> 
> On October 9, 2015 at 3:02 PM jenk...@cloudstack.org wrote:
> 
> 
> See 
> 
> Changes:
> 
> [wido] CLOUDSTACK-8879: Depend in rados-java 0.2.0
> 
> --
> [...truncated 398 lines...]
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec
> - in com.cloud.utils.UuidUtilsTest
> Running com.cloud.utils.crypto.RSAHelperTest
> 2015-10-09 13:08:13,522 INFO [utils.crypt.RSAHelper] (main:)
> [ignored]error during public key encryption: Unsupported format
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec
> - in com.cloud.utils.crypto.RSAHelperTest
> Running com.cloud.utils.crypto.EncryptionSecretKeyCheckerTest
> 2015-10-09 13:08:13,896 DEBUG [utils.crypt.EncryptionSecretKeyChecker]
> (main:) Encryption Type: null
> 2015-10-09 13:08:13,897 DEBUG [utils.crypt.EncryptionSecretKeyChecker]
> (main:) Encryption Type: file
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.089 sec
> - in com.cloud.utils.crypto.EncryptionSecretKeyCheckerTest
> Running com.cloud.utils.PropertiesUtilsTest
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec
> - in com.cloud.utils.PropertiesUtilsTest
> Running com.cloud.utils.exception.ExceptionUtilTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
> - in com.cloud.utils.exception.ExceptionUtilTest
> Running com.cloud.utils.storage.QCOW2UtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
> - in com.cloud.utils.storage.QCOW2UtilsTest
> Running com.cloud.utils.encoding.UrlEncoderTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
> - in com.cloud.utils.encoding.UrlEncoderTest
> Running com.cloud.utils.UriUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.06 sec -
> in com.cloud.utils.UriUtilsTest
> Running com.cloud.utils.HttpUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec
> - in com.cloud.utils.HttpUtilsTest
> Running com.cloud.utils.backoff.impl.ConstantTimeBackoffTest
> 2015-10-09 13:08:14,183 INFO [backoff.impl.ConstantTimeBackoff]
> (Thread-1:) Thread Thread-1 interrupted while waiting for retry
> 2015-10-09 13:08:14,286 DEBUG [backoff.impl.ConstantTimeBackoffTest]
> (main:) thread started
> 2015-10-09 13:08:14,286 DEBUG [backoff.impl.ConstantTimeBackoffTest]
> (Thread-2:) before
> 2015-10-09 13:08:14,389 DEBUG [backoff.impl.ConstantTimeBackoffTest]
> (main:) testing wakeup
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.317 sec
> - in com.cloud.utils.backoff.impl.ConstantTimeBackoffTest
> Running com.cloud.utils.ProcessUtilTest
> 2015-10-09 13:08:14,390 INFO [backoff.impl.ConstantTimeBackoff]
> (Thread-2:) Thread Thread-2 interrupted while waiting for retry
> 2015-10-09 13:08:14,390 DEBUG [backoff.impl.ConstantTimeBackoffTest]
> (Thread-2:) after
> 2015-10-09 13:08:14,396 DEBUG [cloud.utils.ProcessUtil] (main:)
> environment.properties could not be opened
> 2015-10-09 13:08:14,401 DEBUG [cloud.utils.ProcessUtil] (main:)
> environment.properties could not be opened
> 2015-10-09 13:08:14,401 DEBUG [cloud.utils.ProcessUtil] (main:) Executing:
> bash -c ps -p 123456
> 2015-10-09 13:08:14,472 DEBUG [cloud.utils.ProcessUtil] (main:) Exit value
> is 1
> 2015-10-09 13:08:14,473 DEBUG [cloud.utils.ProcessUtil] (main:) PID TTY
> TIME CMD
> 2015-10-09 13:08:14,474 DEBUG [cloud.utils.ProcessUtil] (main:) Executing:
> bash -c echo $PPID
> 2015-10-09 13:08:14,480 DEBUG [cloud.utils.ProcessUtil] (main:) Execution
> is successful.
> 2015-10-09 13:08:14,509 DEBUG [cloud.utils.ProcessUtil] (main:)
> environment.properties could not be opened
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.117 sec
> - in com.cloud.utils.ProcessUtilTest
> Running com.cloud.utils.PasswordGeneratorTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
> - in com.cloud.utils.PasswordGeneratorTest
> Running com.cloud.utils.rest.HttpUriRequestBuilderTest
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.054 sec
> - in com.cloud.utils.rest.HttpUriRequestBuilderTest
> Running com.cloud.utils.rest.HttpStatusCodeHelperTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in
> com.cloud.utils.rest.HttpStatusCodeHelperTest
> Running com.cloud.utils.re

[GitHub] cloudstack pull request: Local cloudstack

2015-10-09 Thread sarathkouk
GitHub user sarathkouk opened a pull request:

https://github.com/apache/cloudstack/pull/919

Local cloudstack

LDAP: Auto Import and Trust AD 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarathkouk/cloudstack local_cloudstack

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/919.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #919


commit c07764b0811dde3c2976aa958d6f2b90affb918f
Author: sarath 
Date:   2015-10-09T14:23:04Z

test/integration/smoke/AUTOMATIONConfigfiles/
'LDAP: Auto Import and Trust AD' automation

commit 30829073b0e00f7b892799b92b2f35e02bd93114
Author: sarath 
Date:   2015-10-09T14:29:19Z

'LDAP:Auto Import and Trust AD' automation 2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: Add all tests in /test/integration/smoke ...

2015-10-09 Thread runseb
GitHub user runseb opened a pull request:

https://github.com/apache/cloudstack/pull/920

Add all tests in /test/integration/smoke to Travis run

This modified .travis.yml file to include missing tests in 
/test/integration/smoke.
It expands the coverage of the Travis runs.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/runseb/cloudstack master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #920


commit 3b526b3d13ddbdbc281fe52d3f0c08ead41a592a
Author: runseb 
Date:   2015-10-09T15:14:08Z

Add all tests in /test/integration/smoke to Travis run




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Andrei Mikhailovsky
Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
kick in yet ))) 

Anyway, am I right in saying that now the host server reboot is now forced 
without stopping the services, unmounting filesystems with potentially open and 
unsync-ed data, etc? 

Isn't this rather bad and dangerous to perform simply because of 
slow/unresponsive one of possibly many nfs servers? Not only that, the 
heartbeat also reboot the servers that are not running vms with nfs volumes? In 
my case it just rebooted every single host server. 

Very worrying indeed. 

Andrei 


- Original Message -

From: "Nux!"  
To: dev@cloudstack.apache.org 
Sent: Friday, 9 October, 2015 12:58:19 PM 
Subject: Re: slow nfs = reboot all hosts ((( 

Hello, 

Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your HA 
at the same time, perhaps there's a way to tweak the timeouts to be more 
generous with lazy NFS servers. 

Can you go through the logs and see what is happening before the reboot? I am 
not sure exactly which timeout the script cares about, worth investigating. 

Lucian 

-- 
Sent from the Delta quadrant using Borg technology! 

Nux! 
www.nux.ro 

- Original Message - 
> From: "Andrija Panic"  
> To: dev@cloudstack.apache.org 
> Sent: Friday, 9 October, 2015 10:25:05 
> Subject: Re: slow nfs = reboot all hosts ((( 

> I managed this problem the folowing way: 
> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/ 
> 
> Cheers 
> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote: 
> 
>> Hello 
>> 
>> My issue is whenever my nfs server becomes slow to respond, ACS just 
>> bloody reboots ALL hosts servers, not just the once running vms with 
>> volumes attached to the slow nfs server. Recently, i've decided to remove 
>> some of the old snapshots to free up some disk space. I've deleted about a 
>> dozen snapshots and I was monitoring the nfs server for progress. At no 
>> point did the nfs server lost the connectivity, it just became a bit slow 
>> and under load. By slow I mean i was still able to list files on the nfs 
>> mount point and the ssh session was still working okay. It was just taking 
>> a few more seconds to respond when it comes to nfs file listings, creation, 
>> deletion, etc. However, the ACS agent has just rebooted every single host 
>> server, killing all running guests and system vms. In my case, I only have 
>> two guests with volumes on the nfs server. The rest of the vms are running 
>> off rbd storage. Yet, all host servers were rebooted, even those which were 
>> not running guests with nfs volumes. 
>> 
>> Ever since i've started using ACS, it was always pretty dumb in correctly 
>> determining if the nfs storage is still alive. I would say it has done the 
>> maniac reboot everything type of behaviour at least 5 times in the past 3 
>> years. So, in the previous versions of ACS i've just modified the 
>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were 
>> just pissing everyone off. 
>> 
>> After upgrading to ACS 4.5.x that script has no reboot command and I was 
>> wondering if it is still possible to instruct the kvmheartbeat script not 
>> to reboot the host servers? 
>> 
>> Thanks for your advice. 
>> 
>> Andrei 



CCCEU15: on site discussions around QA and release of 4.6 report

2015-10-09 Thread Pierre-Luc Dion
Here is a quick meeting note of a round table related to new QA
infrastructure
and tests of 4.6 release.

State of the QA infrastructure donated by Citrix


~ 10 servers in viginial

Questions:
1. how to use it
2. how do we control it

nested hypervisor is slow

BVT cycle time  2 to 3 hours

- Citrix QA to provide solution to rebuild hypervisor after each testrun.
- hardware installation perform and ready to be provisionned.
- Possiblity to run QA job on vendor site via jenkins slaves
- Need to move jenkins jobs running on j.bac.o into jenkins.apache.org
- Need a box to build systemvm template on server having Virtual-box


ACS 4.6
===

- 4 blockers left
- Please comment what you did to test a PR with your LGTM.
- Most recent PR does not include tests or does not point which tests PR's
are
  solving
- Look like the new PR model solve the past issue where master build was
broken
  most of the time.
- Travis does not test all from the simulator, we will enable all test, this
  will slow down travis jobs but will help on quality.
- Need QA volunteers to test 4.6 so we can push for an RC soon.
- update the README.md about how to contribute the CloudStack to PR.
- Still need to build a release-note for 4.6
- [NEED VOTE] for PR that will not have response to comment will be close
after
  X days, submitter can still reopen the PR.
- Need more unit test. We have a hard need for unittest because they help
trap
  small function issues that can cause serious damaged to running VM and
speedup
  guarantee quality of releases.
- [VOTE or DISCUSS needed] new features should be ported via multiple PRs
into
  a community would see feature evolution feature branch. When a feature is
  ready then a merge would be requested. but fork should be prefer as it's
also
  to do collaborative coding on github fork.
- lot of discussion around unit tests and marvin tests.


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Nux!
Andrei,

Yes, that command will just reboot without flushing anything to disk, like 
cutting power.
It is made because many servers are slow to respond to normal reboot commands 
under load, if at all, this could lead to corrupted data and so on.
The sysrq switch is a much better choice from this pov.

We really need to look at a proper way of doing HA with KVM.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 16:47:46
> Subject: Re: slow nfs = reboot all hosts (((

> Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
> kick
> in yet )))
> 
> Anyway, am I right in saying that now the host server reboot is now forced
> without stopping the services, unmounting filesystems with potentially open 
> and
> unsync-ed data, etc?
> 
> Isn't this rather bad and dangerous to perform simply because of
> slow/unresponsive one of possibly many nfs servers? Not only that, the
> heartbeat also reboot the servers that are not running vms with nfs volumes? 
> In
> my case it just rebooted every single host server.
> 
> Very worrying indeed.
> 
> Andrei
> 
> 
> - Original Message -
> 
> From: "Nux!" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 12:58:19 PM
> Subject: Re: slow nfs = reboot all hosts (((
> 
> Hello,
> 
> Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your 
> HA
> at the same time, perhaps there's a way to tweak the timeouts to be more
> generous with lazy NFS servers.
> 
> Can you go through the logs and see what is happening before the reboot? I am
> not sure exactly which timeout the script cares about, worth investigating.
> 
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
>> From: "Andrija Panic" 
>> To: dev@cloudstack.apache.org
>> Sent: Friday, 9 October, 2015 10:25:05
>> Subject: Re: slow nfs = reboot all hosts (((
> 
>> I managed this problem the folowing way:
>> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
>> 
>> Cheers
>> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:
>> 
>>> Hello
>>> 
>>> My issue is whenever my nfs server becomes slow to respond, ACS just
>>> bloody reboots ALL hosts servers, not just the once running vms with
>>> volumes attached to the slow nfs server. Recently, i've decided to remove
>>> some of the old snapshots to free up some disk space. I've deleted about a
>>> dozen snapshots and I was monitoring the nfs server for progress. At no
>>> point did the nfs server lost the connectivity, it just became a bit slow
>>> and under load. By slow I mean i was still able to list files on the nfs
>>> mount point and the ssh session was still working okay. It was just taking
>>> a few more seconds to respond when it comes to nfs file listings, creation,
>>> deletion, etc. However, the ACS agent has just rebooted every single host
>>> server, killing all running guests and system vms. In my case, I only have
>>> two guests with volumes on the nfs server. The rest of the vms are running
>>> off rbd storage. Yet, all host servers were rebooted, even those which were
>>> not running guests with nfs volumes.
>>> 
>>> Ever since i've started using ACS, it was always pretty dumb in correctly
>>> determining if the nfs storage is still alive. I would say it has done the
>>> maniac reboot everything type of behaviour at least 5 times in the past 3
>>> years. So, in the previous versions of ACS i've just modified the
>>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were
>>> just pissing everyone off.
>>> 
>>> After upgrading to ACS 4.5.x that script has no reboot command and I was
>>> wondering if it is still possible to instruct the kvmheartbeat script not
>>> to reboot the host servers?
>>> 
>>> Thanks for your advice.
>>> 
> >> Andrei


Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Andrei Mikhailovsky
I think there should be as much REISUB as possible when trying to reboot a 
broken server. Doing only last B bit is a bit dangerous imho. 

Andrei 
- Original Message -

From: . "Nux!"  
To: dev@cloudstack.apache.org 
Sent: Friday, 9 October, 2015 6:53:43 PM 
Subject: Re: slow nfs = reboot all hosts ((( 

Andrei, 

Yes, that command will just reboot without flushing anything to disk, like 
cutting power. 
It is made because many servers are slow to respond to normal reboot commands 
under load, if at all, this could lead to corrupted data and so on. 
The sysrq switch is a much better choice from this pov. 

We really need to look at a proper way of doing HA with KVM. 

-- 
Sent from the Delta quadrant using Borg technology! 

Nux! 
www.nux.ro 

- Original Message - 
> From: "Andrei Mikhailovsky"  
> To: dev@cloudstack.apache.org 
> Sent: Friday, 9 October, 2015 16:47:46 
> Subject: Re: slow nfs = reboot all hosts ((( 

> Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
> kick 
> in yet ))) 
> 
> Anyway, am I right in saying that now the host server reboot is now forced 
> without stopping the services, unmounting filesystems with potentially open 
> and 
> unsync-ed data, etc? 
> 
> Isn't this rather bad and dangerous to perform simply because of 
> slow/unresponsive one of possibly many nfs servers? Not only that, the 
> heartbeat also reboot the servers that are not running vms with nfs volumes? 
> In 
> my case it just rebooted every single host server. 
> 
> Very worrying indeed. 
> 
> Andrei 
> 
> 
> - Original Message - 
> 
> From: "Nux!"  
> To: dev@cloudstack.apache.org 
> Sent: Friday, 9 October, 2015 12:58:19 PM 
> Subject: Re: slow nfs = reboot all hosts ((( 
> 
> Hello, 
> 
> Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your 
> HA 
> at the same time, perhaps there's a way to tweak the timeouts to be more 
> generous with lazy NFS servers. 
> 
> Can you go through the logs and see what is happening before the reboot? I am 
> not sure exactly which timeout the script cares about, worth investigating. 
> 
> Lucian 
> 
> -- 
> Sent from the Delta quadrant using Borg technology! 
> 
> Nux! 
> www.nux.ro 
> 
> - Original Message - 
>> From: "Andrija Panic"  
>> To: dev@cloudstack.apache.org 
>> Sent: Friday, 9 October, 2015 10:25:05 
>> Subject: Re: slow nfs = reboot all hosts ((( 
> 
>> I managed this problem the folowing way: 
>> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/ 
>> 
>> Cheers 
>> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote: 
>> 
>>> Hello 
>>> 
>>> My issue is whenever my nfs server becomes slow to respond, ACS just 
>>> bloody reboots ALL hosts servers, not just the once running vms with 
>>> volumes attached to the slow nfs server. Recently, i've decided to remove 
>>> some of the old snapshots to free up some disk space. I've deleted about a 
>>> dozen snapshots and I was monitoring the nfs server for progress. At no 
>>> point did the nfs server lost the connectivity, it just became a bit slow 
>>> and under load. By slow I mean i was still able to list files on the nfs 
>>> mount point and the ssh session was still working okay. It was just taking 
>>> a few more seconds to respond when it comes to nfs file listings, creation, 
>>> deletion, etc. However, the ACS agent has just rebooted every single host 
>>> server, killing all running guests and system vms. In my case, I only have 
>>> two guests with volumes on the nfs server. The rest of the vms are running 
>>> off rbd storage. Yet, all host servers were rebooted, even those which were 
>>> not running guests with nfs volumes. 
>>> 
>>> Ever since i've started using ACS, it was always pretty dumb in correctly 
>>> determining if the nfs storage is still alive. I would say it has done the 
>>> maniac reboot everything type of behaviour at least 5 times in the past 3 
>>> years. So, in the previous versions of ACS i've just modified the 
>>> kvmheartbeat.sh and hashed out the line with "reboot" as these reboots were 
>>> just pissing everyone off. 
>>> 
>>> After upgrading to ACS 4.5.x that script has no reboot command and I was 
>>> wondering if it is still possible to instruct the kvmheartbeat script not 
>>> to reboot the host servers? 
>>> 
>>> Thanks for your advice. 
>>> 
> >> Andrei 



Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Simon Weller
Andrei,

In a failure scenerio you want to get rid of that problematic server has 
quickly as possible. Effectively this action is fencing the host in question.

Nux brought up a good point earlier in this thread where ultimately we need to 
figure out a much better way to handling KVM failure conditions. The current 
'wait until it comes back up' is very much a flawed approach and something 
we've been thinking about internally a lot lately.

In your case, it sounds like you might need to separate your NFS storage for 
primary and secondary to avoid saturating the primary storage and causing a 
case where the agent believes that the primary NFS is unresponsive.

We've certainly run into situations previously where the I/O wait state was too 
high on some ISCSI connected hosts and we saw nodes being shot due to access 
times. Our approach to fixing that was reduce the number of VMs being run on 
those hosts and move to higher speed connectivity between the hosts and our 
storage (i.e. FC, 10Gb ethernet).

- Si


From: Andrei Mikhailovsky 
Sent: Friday, October 9, 2015 5:37 PM
To: dev@cloudstack.apache.org
Subject: Re: slow nfs = reboot all hosts (((

I think there should be as much REISUB as possible when trying to reboot a 
broken server. Doing only last B bit is a bit dangerous imho.

Andrei
- Original Message -

From: . "Nux!" 
To: dev@cloudstack.apache.org
Sent: Friday, 9 October, 2015 6:53:43 PM
Subject: Re: slow nfs = reboot all hosts (((

Andrei,

Yes, that command will just reboot without flushing anything to disk, like 
cutting power.
It is made because many servers are slow to respond to normal reboot commands 
under load, if at all, this could lead to corrupted data and so on.
The sysrq switch is a much better choice from this pov.

We really need to look at a proper way of doing HA with KVM.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 16:47:46
> Subject: Re: slow nfs = reboot all hosts (((

> Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
> kick
> in yet )))
>
> Anyway, am I right in saying that now the host server reboot is now forced
> without stopping the services, unmounting filesystems with potentially open 
> and
> unsync-ed data, etc?
>
> Isn't this rather bad and dangerous to perform simply because of
> slow/unresponsive one of possibly many nfs servers? Not only that, the
> heartbeat also reboot the servers that are not running vms with nfs volumes? 
> In
> my case it just rebooted every single host server.
>
> Very worrying indeed.
>
> Andrei
>
>
> - Original Message -
>
> From: "Nux!" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 12:58:19 PM
> Subject: Re: slow nfs = reboot all hosts (((
>
> Hello,
>
> Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your 
> HA
> at the same time, perhaps there's a way to tweak the timeouts to be more
> generous with lazy NFS servers.
>
> Can you go through the logs and see what is happening before the reboot? I am
> not sure exactly which timeout the script cares about, worth investigating.
>
> Lucian
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> - Original Message -
>> From: "Andrija Panic" 
>> To: dev@cloudstack.apache.org
>> Sent: Friday, 9 October, 2015 10:25:05
>> Subject: Re: slow nfs = reboot all hosts (((
>
>> I managed this problem the folowing way:
>> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
>>
>> Cheers
>> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:
>>
>>> Hello
>>>
>>> My issue is whenever my nfs server becomes slow to respond, ACS just
>>> bloody reboots ALL hosts servers, not just the once running vms with
>>> volumes attached to the slow nfs server. Recently, i've decided to remove
>>> some of the old snapshots to free up some disk space. I've deleted about a
>>> dozen snapshots and I was monitoring the nfs server for progress. At no
>>> point did the nfs server lost the connectivity, it just became a bit slow
>>> and under load. By slow I mean i was still able to list files on the nfs
>>> mount point and the ssh session was still working okay. It was just taking
>>> a few more seconds to respond when it comes to nfs file listings, creation,
>>> deletion, etc. However, the ACS agent has just rebooted every single host
>>> server, killing all running guests and system vms. In my case, I only have
>>> two guests with volumes on the nfs server. The rest of the vms are running
>>> off rbd storage. Yet, all host servers were rebooted, even those which were
>>> not running guests with nfs volumes.
>>>
>>> Ever since i've started using ACS, it was always pretty dumb in correctly
>>> determining if the nfs storage is still alive. I would say it has done the
>>> maniac reboot every

Re: slow nfs = reboot all hosts (((

2015-10-09 Thread Nux!
Ok, I'm gonna make a bit of noise about this. Hope you guys will chip in so we 
can make some progress re HA in future versions.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Simon Weller" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 23:46:26
> Subject: Re: slow nfs = reboot all hosts (((

> Andrei,
> 
> In a failure scenerio you want to get rid of that problematic server has 
> quickly
> as possible. Effectively this action is fencing the host in question.
> 
> Nux brought up a good point earlier in this thread where ultimately we need to
> figure out a much better way to handling KVM failure conditions. The current
> 'wait until it comes back up' is very much a flawed approach and something
> we've been thinking about internally a lot lately.
> 
> In your case, it sounds like you might need to separate your NFS storage for
> primary and secondary to avoid saturating the primary storage and causing a
> case where the agent believes that the primary NFS is unresponsive.
> 
> We've certainly run into situations previously where the I/O wait state was 
> too
> high on some ISCSI connected hosts and we saw nodes being shot due to access
> times. Our approach to fixing that was reduce the number of VMs being run on
> those hosts and move to higher speed connectivity between the hosts and our
> storage (i.e. FC, 10Gb ethernet).
> 
> - Si
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Friday, October 9, 2015 5:37 PM
> To: dev@cloudstack.apache.org
> Subject: Re: slow nfs = reboot all hosts (((
> 
> I think there should be as much REISUB as possible when trying to reboot a
> broken server. Doing only last B bit is a bit dangerous imho.
> 
> Andrei
> - Original Message -
> 
> From: . "Nux!" 
> To: dev@cloudstack.apache.org
> Sent: Friday, 9 October, 2015 6:53:43 PM
> Subject: Re: slow nfs = reboot all hosts (((
> 
> Andrei,
> 
> Yes, that command will just reboot without flushing anything to disk, like
> cutting power.
> It is made because many servers are slow to respond to normal reboot commands
> under load, if at all, this could lead to corrupted data and so on.
> The sysrq switch is a much better choice from this pov.
> 
> We really need to look at a proper way of doing HA with KVM.
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: dev@cloudstack.apache.org
>> Sent: Friday, 9 October, 2015 16:47:46
>> Subject: Re: slow nfs = reboot all hosts (((
> 
>> Thanks guys, I am not sure how i've missed that. probably the coffee didn't 
>> kick
>> in yet )))
>>
>> Anyway, am I right in saying that now the host server reboot is now forced
>> without stopping the services, unmounting filesystems with potentially open 
>> and
>> unsync-ed data, etc?
>>
>> Isn't this rather bad and dangerous to perform simply because of
>> slow/unresponsive one of possibly many nfs servers? Not only that, the
>> heartbeat also reboot the servers that are not running vms with nfs volumes? 
>> In
>> my case it just rebooted every single host server.
>>
>> Very worrying indeed.
>>
>> Andrei
>>
>>
>> - Original Message -
>>
>> From: "Nux!" 
>> To: dev@cloudstack.apache.org
>> Sent: Friday, 9 October, 2015 12:58:19 PM
>> Subject: Re: slow nfs = reboot all hosts (((
>>
>> Hello,
>>
>> Instead of commenting 'echo b > /proc/sysrq-trigger' and also disabling your 
>> HA
>> at the same time, perhaps there's a way to tweak the timeouts to be more
>> generous with lazy NFS servers.
>>
>> Can you go through the logs and see what is happening before the reboot? I am
>> not sure exactly which timeout the script cares about, worth investigating.
>>
>> Lucian
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> - Original Message -
>>> From: "Andrija Panic" 
>>> To: dev@cloudstack.apache.org
>>> Sent: Friday, 9 October, 2015 10:25:05
>>> Subject: Re: slow nfs = reboot all hosts (((
>>
>>> I managed this problem the folowing way:
>>> http://admintweets.com/cloudstack-disable-agent-rebooting-kvm-host/
>>>
>>> Cheers
>>> On Oct 9, 2015 10:21 AM, "Andrei Mikhailovsky"  wrote:
>>>
 Hello

 My issue is whenever my nfs server becomes slow to respond, ACS just
 bloody reboots ALL hosts servers, not just the once running vms with
 volumes attached to the slow nfs server. Recently, i've decided to remove
 some of the old snapshots to free up some disk space. I've deleted about a
 dozen snapshots and I was monitoring the nfs server for progress. At no
 point did the nfs server lost the connectivity, it just became a bit slow
 and under load. By slow I mean i was still able to list files on the nfs
 mount point and the ssh session was still working okay. It was just taking
 a few more seconds to respond when it comes to nfs file listi

KVM HA is broken, let's fix it

2015-10-09 Thread Nux!
Hello, 

Following a recent thread on the users ml where slow NFS caused a mass reboot, 
I have opened the following issue about improving HA on KVM.
https://issues.apache.org/jira/browse/CLOUDSTACK-8943

I know there are many people around here who use KVM and are interested in a 
more robust way of doing HA.

Please share your ideas, comments, suggestions, let's see what we can come up 
with to make this better.

Regards,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro