[GitHub] cloudstack pull request: CLOUDSTACK-8901: PrepareTemplate job thre...

2015-09-28 Thread SudharmaJain
Github user SudharmaJain commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/880#discussion_r40543530
  
--- Diff: server/src/com/cloud/configuration/Config.java ---
@@ -1999,7 +1999,9 @@
 // StatsCollector
 StatsOutPutGraphiteHost("Advanced", ManagementServer.class, 
String.class, "stats.output.uri", "", "URI to additionally send StatsCollector 
statistics to", null),
 
-SSVMPSK("Hidden", ManagementServer.class, String.class, 
"upload.post.secret.key", "", "PSK with SSVM", null);
+SSVMPSK("Hidden", ManagementServer.class, String.class, 
"upload.post.secret.key", "", "PSK with SSVM", null),
+
+TemplatePreloaderPoolSize("Advanced", TemplateManager.class, 
Integer.class, "template.preloader.pool.size", "8", "Size of the 
TemplateManager threadpool", null);
--- End diff --

@koushik-das, changed the mechanism to use configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Remi,

Whatever  ever we think  we have discovered are all well known best practices 
while developing code in community. 
I agree that tests need to be run on a new PR,  but i wonder why was this 
ignored when merging the VR refactor code. Perhaps we will uncover some more 
issues if we investigate this. I believe 
one of the reasons for this is the complexity and incomplete nature of the vr 
refactor change and failing to identify the areas which got effected. If we had 
a good documentation i think we cloud have understood the areas that were 
getting 
impacted early on, areas like the vpn ,  iptables, isolated networks rvr   etc  
and run the relevant tests. The documentation will also help us focus on these 
areas while reviewing  and fixing subsequent issues. Currently no one knows the 
areas that got effected 
due to the vr refactor change, we are seeing issues all over the code.  I think 
this is a bigger problem than what we have discussed so far.

I think presently we should stop fixing all the vr refactoring  bugs until we 
come up with a  proper document describing the VR refactoring  changes.

I am not suggesting that we should revert the vr refactor code, I am willing to 
work on this and fix the issues,  I am only asking if we can get some 
documentation.


Regards,
Bharat.

On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:

> 
>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>> wrote:
>> 
>> Hi Bharat,
>> 
>> 
>> There is only one way to prove a feature works: with tests. That’s why I say 
>> actually _running_ the tests we have today on any new PR, is the most 
>> important thing. Having no documentation is a problem, I agree, but it is 
>> not more important IMHO. If we had the documentation, we still would have 
>> issues if nobody runs the tests and verifies pull requests. Documentation 
>> that is perfect does not automatically lead to perfect implementation. So we 
>> need tests to verify.
>> 
>> If we don’t agree that is also fine. We need to do both anyway and I think 
>> we do agree on that.
>> 
> 
> Also we need to move forward. We should have a live chat once 4.6 is out to 
> discuss all issues/problems and iron out the process.
> 
> But reverting the VR refactor is not going to happen. There was ample 
> discussions on the PR when it was submitted, there was time to review and 
> raise concerns at that time. It went through quite a few reviews, tests 
> etc…Maybe the documentation is not good, but the time to raise this concern I 
> am afraid was six months ago. We can learn from it, but we are not going to 
> revert it, this would not go cleanly as David mentioned.
> 
> So let’s get back to blockers for 4.6, are there still VR related issues with 
> master ?
> 
> 
> 
> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> i do not agree with “There is no bigger problem”  part of your reply. so I 
>>> had to repeat myself to make it more clear, Not because i am not aware of 
>>> what this thread is supposed to do.
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
>>> wrote:
>>> 
 Hi Bharat,
 
 I understand your frustrations but we already agreed on this so no need to 
 repeat. This thread is supposed to list some improvements and learn from 
 it. Your point has been taken so let’s move on.
 
 We need documentation first, then do a change after which all tests should 
 pass. Even better is we write (missing) tests before changing stuff so you 
 know they pass before and after the fact. 
 
 When doing reviews, feel free to ask for design documentation if you feel 
 it is needed.
 
 Regards, Remi
 
 
 
 On 28/09/15 11:02, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> I never intended to say that we should not run tests, but even before 
> tests we should have proper documentation. My concern was if a major 
> change is being introduced it should be properly documented. All the 
> issues which we are trying to fix are majorly due to VR refactor. If 
> there was a proper documentation for this we could
> have fixed this in a better way.  Even to add tests we need to understand 
> how a particular thing works and what data dose it expect. I think while 
> fixing the python based code changes this is where most of the people are 
> facing issues. A proper documentation will help in understanding these in 
> a better way.
> 
> Thanks,
> Bharat.
> 
> On 28-Sep-2015, at 1:57 pm, Remi Bergsma  
> wrote:
> 
>> Hi Bharat,
>> 
>> There is no bigger problem. We should always run the tests and if we 
>> find a case that isn’t currently covered by the tests we should simply 
>> add tests for it. There’s no way we’ll get a stable master without them. 
>> The fact that they may not cover everything, is no reason to not rely on 
>>>

[GitHub] cloudstack pull request: CLOUDSTACK-8815 : Issues with cloudstack-...

2015-09-28 Thread sudhansu7
Github user sudhansu7 commented on the pull request:

https://github.com/apache/cloudstack/pull/799#issuecomment-143724455
  
@remibergsma 

rebased against current master and  also squashed 4 commits to one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Remi Bergsma
Dude, this is the final friendly email about his. All points have been made in 
previous mails. This has nothing to do with ‘blameless’ and ‘learning’ anymore.

Read Seb’s mail. We will move on now.


Regards, Remi



On 28/09/15 13:54, "Bharat Kumar"  wrote:

>Hi Remi,
>
>Whatever  ever we think  we have discovered are all well known best practices 
>while developing code in community. 
>I agree that tests need to be run on a new PR,  but i wonder why was this 
>ignored when merging the VR refactor code. Perhaps we will uncover some more 
>issues if we investigate this. I believe 
>one of the reasons for this is the complexity and incomplete nature of the vr 
>refactor change and failing to identify the areas which got effected. If we 
>had a good documentation i think we cloud have understood the areas that were 
>getting 
>impacted early on, areas like the vpn ,  iptables, isolated networks rvr   etc 
> and run the relevant tests. The documentation will also help us focus on 
>these areas while reviewing  and fixing subsequent issues. Currently no one 
>knows the areas that got effected 
>due to the vr refactor change, we are seeing issues all over the code.  I 
>think this is a bigger problem than what we have discussed so far.
>
>I think presently we should stop fixing all the vr refactoring  bugs until we 
>come up with a  proper document describing the VR refactoring  changes.
>
>I am not suggesting that we should revert the vr refactor code, I am willing 
>to work on this and fix the issues,  I am only asking if we can get some 
>documentation.
>
>
>Regards,
>Bharat.
>
>On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:
>
>> 
>>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>>> wrote:
>>> 
>>> Hi Bharat,
>>> 
>>> 
>>> There is only one way to prove a feature works: with tests. That’s why I 
>>> say actually _running_ the tests we have today on any new PR, is the most 
>>> important thing. Having no documentation is a problem, I agree, but it is 
>>> not more important IMHO. If we had the documentation, we still would have 
>>> issues if nobody runs the tests and verifies pull requests. Documentation 
>>> that is perfect does not automatically lead to perfect implementation. So 
>>> we need tests to verify.
>>> 
>>> If we don’t agree that is also fine. We need to do both anyway and I think 
>>> we do agree on that.
>>> 
>> 
>> Also we need to move forward. We should have a live chat once 4.6 is out to 
>> discuss all issues/problems and iron out the process.
>> 
>> But reverting the VR refactor is not going to happen. There was ample 
>> discussions on the PR when it was submitted, there was time to review and 
>> raise concerns at that time. It went through quite a few reviews, tests 
>> etc…Maybe the documentation is not good, but the time to raise this concern 
>> I am afraid was six months ago. We can learn from it, but we are not going 
>> to revert it, this would not go cleanly as David mentioned.
>> 
>> So let’s get back to blockers for 4.6, are there still VR related issues 
>> with master ?
>> 
>> 
>> 
>> 
>>> Regards,
>>> Remi
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>>> 
 Hi Remi,
 
 i do not agree with “There is no bigger problem”  part of your reply. so I 
 had to repeat myself to make it more clear, Not because i am not aware of 
 what this thread is supposed to do.
 
 Regards,
 Bharat.
 
 On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
 wrote:
 
> Hi Bharat,
> 
> I understand your frustrations but we already agreed on this so no need 
> to repeat. This thread is supposed to list some improvements and learn 
> from it. Your point has been taken so let’s move on.
> 
> We need documentation first, then do a change after which all tests 
> should pass. Even better is we write (missing) tests before changing 
> stuff so you know they pass before and after the fact. 
> 
> When doing reviews, feel free to ask for design documentation if you feel 
> it is needed.
> 
> Regards, Remi
> 
> 
> 
> On 28/09/15 11:02, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> I never intended to say that we should not run tests, but even before 
>> tests we should have proper documentation. My concern was if a major 
>> change is being introduced it should be properly documented. All the 
>> issues which we are trying to fix are majorly due to VR refactor. If 
>> there was a proper documentation for this we could
>> have fixed this in a better way.  Even to add tests we need to 
>> understand how a particular thing works and what data dose it expect. I 
>> think while fixing the python based code changes this is where most of 
>> the people are facing issues. A proper documentation will help in 
>> understanding these in a better way.
>> 
>> Thanks,
>> Bharat.
>> 
>> On 28-Sep-2015, at 1:57 p

Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Sebastien,

You are confused, we are talking about  persistent VR config changes. below is 
the pr related to it.
https://github.com/apache/cloudstack/pull/118

If you look at it you will notice that there are more than 250 commits and only 
a few tests that were run.

Regards,
Bharat.

On 28-Sep-2015, at 5:24 pm, Bharat Kumar 
mailto:bharat.ku...@citrix.com>> wrote:

Hi Remi,

Whatever  ever we think  we have discovered are all well known best practices 
while developing code in community.
I agree that tests need to be run on a new PR,  but i wonder why was this 
ignored when merging the VR refactor code. Perhaps we will uncover some more 
issues if we investigate this. I believe
one of the reasons for this is the complexity and incomplete nature of the vr 
refactor change and failing to identify the areas which got effected. If we had 
a good documentation i think we cloud have understood the areas that were 
getting
impacted early on, areas like the vpn ,  iptables, isolated networks rvr   etc  
and run the relevant tests. The documentation will also help us focus on these 
areas while reviewing  and fixing subsequent issues. Currently no one knows the 
areas that got effected
due to the vr refactor change, we are seeing issues all over the code.  I think 
this is a bigger problem than what we have discussed so far.

I think presently we should stop fixing all the vr refactoring  bugs until we 
come up with a  proper document describing the VR refactoring  changes.

I am not suggesting that we should revert the vr refactor code, I am willing to 
work on this and fix the issues,  I am only asking if we can get some 
documentation.


Regards,
Bharat.

On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen 
mailto:run...@gmail.com>> wrote:


On Sep 28, 2015, at 1:14 PM, Remi Bergsma 
mailto:rberg...@schubergphilis.com>> wrote:

Hi Bharat,


There is only one way to prove a feature works: with tests. That’s why I say 
actually _running_ the tests we have today on any new PR, is the most important 
thing. Having no documentation is a problem, I agree, but it is not more 
important IMHO. If we had the documentation, we still would have issues if 
nobody runs the tests and verifies pull requests. Documentation that is perfect 
does not automatically lead to perfect implementation. So we need tests to 
verify.

If we don’t agree that is also fine. We need to do both anyway and I think we 
do agree on that.


Also we need to move forward. We should have a live chat once 4.6 is out to 
discuss all issues/problems and iron out the process.

But reverting the VR refactor is not going to happen. There was ample 
discussions on the PR when it was submitted, there was time to review and raise 
concerns at that time. It went through quite a few reviews, tests etc…Maybe the 
documentation is not good, but the time to raise this concern I am afraid was 
six months ago. We can learn from it, but we are not going to revert it, this 
would not go cleanly as David mentioned.

So let’s get back to blockers for 4.6, are there still VR related issues with 
master ?




Regards,
Remi






On 28/09/15 12:15, "Bharat Kumar" 
mailto:bharat.ku...@citrix.com>> wrote:

Hi Remi,

i do not agree with “There is no bigger problem”  part of your reply. so I had 
to repeat myself to make it more clear, Not because i am not aware of what this 
thread is supposed to do.

Regards,
Bharat.

On 28-Sep-2015, at 2:51 pm, Remi Bergsma 
mailto:rberg...@schubergphilis.com>> wrote:

Hi Bharat,

I understand your frustrations but we already agreed on this so no need to 
repeat. This thread is supposed to list some improvements and learn from it. 
Your point has been taken so let’s move on.

We need documentation first, then do a change after which all tests should 
pass. Even better is we write (missing) tests before changing stuff so you know 
they pass before and after the fact.

When doing reviews, feel free to ask for design documentation if you feel it is 
needed.

Regards, Remi



On 28/09/15 11:02, "Bharat Kumar" 
mailto:bharat.ku...@citrix.com>> wrote:

Hi Remi,

I never intended to say that we should not run tests, but even before tests we 
should have proper documentation. My concern was if a major change is being 
introduced it should be properly documented. All the issues which we are trying 
to fix are majorly due to VR refactor. If there was a proper documentation for 
this we could
have fixed this in a better way.  Even to add tests we need to understand how a 
particular thing works and what data dose it expect. I think while fixing the 
python based code changes this is where most of the people are facing issues. A 
proper documentation will help in understanding these in a better way.

Thanks,
Bharat.

On 28-Sep-2015, at 1:57 pm, Remi Bergsma 
mailto:rberg...@schubergphilis.com>> wrote:

Hi Bharat,

There is no bigger problem. We should always run the tests and if we find a 
case that isn’t currently covered by the tests we should simply 

Re: Blameless post mortem

2015-09-28 Thread Wilder Rodrigues
Hi Bharat,

Perhaps you haven’t been away of not reading all the email that were sent to 
the list in the past. Why am I saying that? just based on your sentence where 
you said  “i wonder why was this ignored when merging the VR refactor code"

Is there any particular point you want to make that we are not aware of? Remi 
and Sebastien already exposed their thoughts concerning the importance of 
documentation and tests. So, why to continue this whole thing? Please, help us 
all to understand it better.

Concerning the rVPC tests, you can find some reports here:

* http://markmail.org/message/khjw4y6m57pia5pm
* http://markmail.org/message/4yzzew6fu2rrpz2p

And if you go to markmail you will find more.

-1 for you 1st suggestion - I will fix the 2 remaining VR issues and also add 
(and execute) tests to cover them.

About documentation, once issues are fixed, I will go through the whole code 
and write some design about it. I’m in the same situation as you and many 
other: I don’t know the code 100%, but I would rather fix/document it than 
waste energy with those discussions.

Sorry to say, but you are being captain obvious with all those emails about 
documentation/bugs when we all already know about it and are working hard to 
get it in a better shape.

Cheers,
Wilder

> On 28 Sep 2015, at 13:54, Bharat Kumar  wrote:
> 
> Hi Remi,
> 
> Whatever  ever we think  we have discovered are all well known best practices 
> while developing code in community. 
> I agree that tests need to be run on a new PR,  but i wonder why was this 
> ignored when merging the VR refactor code. Perhaps we will uncover some more 
> issues if we investigate this. I believe 
> one of the reasons for this is the complexity and incomplete nature of the vr 
> refactor change and failing to identify the areas which got effected. If we 
> had a good documentation i think we cloud have understood the areas that were 
> getting 
> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
> etc  and run the relevant tests. The documentation will also help us focus on 
> these areas while reviewing  and fixing subsequent issues. Currently no one 
> knows the areas that got effected 
> due to the vr refactor change, we are seeing issues all over the code.  I 
> think this is a bigger problem than what we have discussed so far.
> 
> I think presently we should stop fixing all the vr refactoring  bugs until we 
> come up with a  proper document describing the VR refactoring  changes.
> 
> I am not suggesting that we should revert the vr refactor code, I am willing 
> to work on this and fix the issues,  I am only asking if we can get some 
> documentation.
> 
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:
> 
>> 
>>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>>> wrote:
>>> 
>>> Hi Bharat,
>>> 
>>> 
>>> There is only one way to prove a feature works: with tests. That’s why I 
>>> say actually _running_ the tests we have today on any new PR, is the most 
>>> important thing. Having no documentation is a problem, I agree, but it is 
>>> not more important IMHO. If we had the documentation, we still would have 
>>> issues if nobody runs the tests and verifies pull requests. Documentation 
>>> that is perfect does not automatically lead to perfect implementation. So 
>>> we need tests to verify.
>>> 
>>> If we don’t agree that is also fine. We need to do both anyway and I think 
>>> we do agree on that.
>>> 
>> 
>> Also we need to move forward. We should have a live chat once 4.6 is out to 
>> discuss all issues/problems and iron out the process.
>> 
>> But reverting the VR refactor is not going to happen. There was ample 
>> discussions on the PR when it was submitted, there was time to review and 
>> raise concerns at that time. It went through quite a few reviews, tests 
>> etc…Maybe the documentation is not good, but the time to raise this concern 
>> I am afraid was six months ago. We can learn from it, but we are not going 
>> to revert it, this would not go cleanly as David mentioned.
>> 
>> So let’s get back to blockers for 4.6, are there still VR related issues 
>> with master ?
>> 
>> 
>> 
>> 
>>> Regards,
>>> Remi
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>>> 
 Hi Remi,
 
 i do not agree with “There is no bigger problem”  part of your reply. so I 
 had to repeat myself to make it more clear, Not because i am not aware of 
 what this thread is supposed to do.
 
 Regards,
 Bharat.
 
 On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
 wrote:
 
> Hi Bharat,
> 
> I understand your frustrations but we already agreed on this so no need 
> to repeat. This thread is supposed to list some improvements and learn 
> from it. Your point has been taken so let’s move on.
> 
> We need documentation first, then do a change after which all tests 
> should pass. Even better is we write (missing)

[GitHub] cloudstack pull request: Fixed: Error given when creating VPN user...

2015-09-28 Thread kansal
Github user kansal commented on the pull request:

https://github.com/apache/cloudstack/pull/826#issuecomment-143727275
  
@remibergsma Rebased against the master. Repro steps are as follows:
1) Create two VMs in new isolated networks, testnet1 and testnet2.
2) Enable VPN in testnet1 and testnet2.
3) Add a VPN user in each network.
4) Notice that VPN users added.
5) Stop the VR for testnet2.
6) Add a VPN user in testnet1.
7) Notice that it fails with an error even though the testnet1 VR is 
running.

The reason it is failing is that presently we there is only a check for the 
running VR's. If not running then error is thrown. Based on the present 
implementation of the VR rules, we should skip the STOPPED/STOPPING VR's as the 
rules get applied to them after they get restarted. 

To check rules are applied(VPN in this case), ssh to the NOW running VR and 
look into the /usr/share/ppp/chap-secrets file. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8911: VM start job got stuck i...

2015-09-28 Thread SudharmaJain
GitHub user SudharmaJain opened a pull request:

https://github.com/apache/cloudstack/pull/895

CLOUDSTACK-8911: VM start job got stuck in loop looking for suitable …

…host

VM instance creation job get stuck in the loop, when VMs require local 
storage there are host that reached max guest limit and remain hosts does have 
storage available.  This happens because the hosts that reach the max guest 
limit were not getting added to the avoid list and hence the cluster. 

Verified the fix on my local setup.

Repro Steps:
1. Take an environment with single cluster and 2 hosts.
2. change the max guest limit for the hypervisor such that on one host max 
guest limit should reach.
3. change thresholds so that other host should not have enough storage. If 
required create a VM for sufficient bigger disk.
4. Now deploy a VM with local storage.
5. cluster will never be put in the avoid set and job will keep looking for 
suitable host.
6. once we increase the max guest limit, VM will deploy or will fail if 
there is a lack of storage.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SudharmaJain/cloudstack cs-8911

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/895.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #895


commit 17050a07dbcf1ac1430c6b0dae8e415d9fcd5181
Author: SudharmaJain 
Date:   2015-09-25T12:35:54Z

CLOUDSTACK-8911: VM start job got stuck in loop looking for suitable host




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Dude,

There was nothing friendly about the postmortem you did, It was only partial, 
we should do a complete postmortem and then draw conclusions.

I think the post-mortems like this are of no use if we do not do them 
completely.

Regards,
Bharat.

On 28-Sep-2015, at 5:39 pm, Remi Bergsma  wrote:

> Dude, this is the final friendly email about his. All points have been made 
> in previous mails. This has nothing to do with ‘blameless’ and ‘learning’ 
> anymore.
> 
> Read Seb’s mail. We will move on now.
> 
> 
> Regards, Remi
> 
> 
> 
> On 28/09/15 13:54, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> Whatever  ever we think  we have discovered are all well known best 
>> practices while developing code in community. 
>> I agree that tests need to be run on a new PR,  but i wonder why was this 
>> ignored when merging the VR refactor code. Perhaps we will uncover some more 
>> issues if we investigate this. I believe 
>> one of the reasons for this is the complexity and incomplete nature of the 
>> vr refactor change and failing to identify the areas which got effected. If 
>> we had a good documentation i think we cloud have understood the areas that 
>> were getting 
>> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
>> etc  and run the relevant tests. The documentation will also help us focus 
>> on these areas while reviewing  and fixing subsequent issues. Currently no 
>> one knows the areas that got effected 
>> due to the vr refactor change, we are seeing issues all over the code.  I 
>> think this is a bigger problem than what we have discussed so far.
>> 
>> I think presently we should stop fixing all the vr refactoring  bugs until 
>> we come up with a  proper document describing the VR refactoring  changes.
>> 
>> I am not suggesting that we should revert the vr refactor code, I am willing 
>> to work on this and fix the issues,  I am only asking if we can get some 
>> documentation.
>> 
>> 
>> Regards,
>> Bharat.
>> 
>> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:
>> 
>>> 
 On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
 wrote:
 
 Hi Bharat,
 
 
 There is only one way to prove a feature works: with tests. That’s why I 
 say actually _running_ the tests we have today on any new PR, is the most 
 important thing. Having no documentation is a problem, I agree, but it is 
 not more important IMHO. If we had the documentation, we still would have 
 issues if nobody runs the tests and verifies pull requests. Documentation 
 that is perfect does not automatically lead to perfect implementation. So 
 we need tests to verify.
 
 If we don’t agree that is also fine. We need to do both anyway and I think 
 we do agree on that.
 
>>> 
>>> Also we need to move forward. We should have a live chat once 4.6 is out to 
>>> discuss all issues/problems and iron out the process.
>>> 
>>> But reverting the VR refactor is not going to happen. There was ample 
>>> discussions on the PR when it was submitted, there was time to review and 
>>> raise concerns at that time. It went through quite a few reviews, tests 
>>> etc…Maybe the documentation is not good, but the time to raise this concern 
>>> I am afraid was six months ago. We can learn from it, but we are not going 
>>> to revert it, this would not go cleanly as David mentioned.
>>> 
>>> So let’s get back to blockers for 4.6, are there still VR related issues 
>>> with master ?
>>> 
>>> 
>>> 
>>> 
 Regards,
 Remi
 
 
 
 
 
 
 On 28/09/15 12:15, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> i do not agree with “There is no bigger problem”  part of your reply. so 
> I had to repeat myself to make it more clear, Not because i am not aware 
> of what this thread is supposed to do.
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
> wrote:
> 
>> Hi Bharat,
>> 
>> I understand your frustrations but we already agreed on this so no need 
>> to repeat. This thread is supposed to list some improvements and learn 
>> from it. Your point has been taken so let’s move on.
>> 
>> We need documentation first, then do a change after which all tests 
>> should pass. Even better is we write (missing) tests before changing 
>> stuff so you know they pass before and after the fact. 
>> 
>> When doing reviews, feel free to ask for design documentation if you 
>> feel it is needed.
>> 
>> Regards, Remi
>> 
>> 
>> 
>> On 28/09/15 11:02, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> I never intended to say that we should not run tests, but even before 
>>> tests we should have proper documentation. My concern was if a major 
>>> change is being introduced it should be properly documented. All the 
>>> issues which we are trying to fix are majorly due to VR refactor. If 
>>> ther

RE: Blameless post mortem

2015-09-28 Thread Raja Pullela
My 2 cents... agree with Bharat - this was such a critical piece... changing of 
VR scripts from bash to python.  Most of the 4.6 blockers filed were around 
this.  

the FS - 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Refactoring+Redundant+Virtual+Router+Implementation,
 I see which is last modified on May 19th, 2015 - is about RVR changes.  this 
does not talk anything about VR changes - Is there any CWIKI documentation 
around the VR changes?

-Original Message-
From: Bharat Kumar [mailto:bharat.ku...@citrix.com] 
Sent: Monday, September 28, 2015 5:24 PM
To: dev@cloudstack.apache.org
Subject: Re: Blameless post mortem

Hi Remi,

Whatever  ever we think  we have discovered are all well known best practices 
while developing code in community. 
I agree that tests need to be run on a new PR,  but i wonder why was this 
ignored when merging the VR refactor code. Perhaps we will uncover some more 
issues if we investigate this. I believe one of the reasons for this is the 
complexity and incomplete nature of the vr refactor change and failing to 
identify the areas which got effected. If we had a good documentation i think 
we cloud have understood the areas that were getting 
impacted early on, areas like the vpn ,  iptables, isolated networks rvr   etc  
and run the relevant tests. The documentation will also help us focus on these 
areas while reviewing  and fixing subsequent issues. Currently no one knows the 
areas that got effected 
due to the vr refactor change, we are seeing issues all over the code.  I think 
this is a bigger problem than what we have discussed so far.

I think presently we should stop fixing all the vr refactoring  bugs until we 
come up with a  proper document describing the VR refactoring  changes.

I am not suggesting that we should revert the vr refactor code, I am willing to 
work on this and fix the issues,  I am only asking if we can get some 
documentation.


Regards,
Bharat.

On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:

> 
>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>> wrote:
>> 
>> Hi Bharat,
>> 
>> 
>> There is only one way to prove a feature works: with tests. That's why I say 
>> actually _running_ the tests we have today on any new PR, is the most 
>> important thing. Having no documentation is a problem, I agree, but it is 
>> not more important IMHO. If we had the documentation, we still would have 
>> issues if nobody runs the tests and verifies pull requests. Documentation 
>> that is perfect does not automatically lead to perfect implementation. So we 
>> need tests to verify.
>> 
>> If we don't agree that is also fine. We need to do both anyway and I think 
>> we do agree on that.
>> 
> 
> Also we need to move forward. We should have a live chat once 4.6 is out to 
> discuss all issues/problems and iron out the process.
> 
> But reverting the VR refactor is not going to happen. There was ample 
> discussions on the PR when it was submitted, there was time to review and 
> raise concerns at that time. It went through quite a few reviews, tests 
> etc...Maybe the documentation is not good, but the time to raise this concern 
> I am afraid was six months ago. We can learn from it, but we are not going to 
> revert it, this would not go cleanly as David mentioned.
> 
> So let's get back to blockers for 4.6, are there still VR related issues with 
> master ?
> 
> 
> 
> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> i do not agree with "There is no bigger problem"  part of your reply. so I 
>>> had to repeat myself to make it more clear, Not because i am not aware of 
>>> what this thread is supposed to do.
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
>>> wrote:
>>> 
 Hi Bharat,
 
 I understand your frustrations but we already agreed on this so no need to 
 repeat. This thread is supposed to list some improvements and learn from 
 it. Your point has been taken so let's move on.
 
 We need documentation first, then do a change after which all tests should 
 pass. Even better is we write (missing) tests before changing stuff so you 
 know they pass before and after the fact. 
 
 When doing reviews, feel free to ask for design documentation if you feel 
 it is needed.
 
 Regards, Remi
 
 
 
 On 28/09/15 11:02, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> I never intended to say that we should not run tests, but even 
> before tests we should have proper documentation. My concern was if a 
> major change is being introduced it should be properly documented. All 
> the issues which we are trying to fix are majorly due to VR refactor. If 
> there was a proper documentation for this we could have fixed this in a 
> better way.  Even to add tests we need to understand how a particular 
> thing works and what data dose it expect. I 

Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi guys,

Anyway of all the things said and done I think we all agree that we need some 
documentation related to python changes.


Regards,
Bharat.

On 28-Sep-2015, at 5:46 pm, Wilder Rodrigues  
wrote:

> Hi Bharat,
> 
> Perhaps you haven’t been away of not reading all the email that were sent to 
> the list in the past. Why am I saying that? just based on your sentence where 
> you said  “i wonder why was this ignored when merging the VR refactor code"
> 
> Is there any particular point you want to make that we are not aware of? Remi 
> and Sebastien already exposed their thoughts concerning the importance of 
> documentation and tests. So, why to continue this whole thing? Please, help 
> us all to understand it better.
> 
> Concerning the rVPC tests, you can find some reports here:
> 
> * http://markmail.org/message/khjw4y6m57pia5pm
> * http://markmail.org/message/4yzzew6fu2rrpz2p
> 
> And if you go to markmail you will find more.
> 
> -1 for you 1st suggestion - I will fix the 2 remaining VR issues and also add 
> (and execute) tests to cover them.
> 
> About documentation, once issues are fixed, I will go through the whole code 
> and write some design about it. I’m in the same situation as you and many 
> other: I don’t know the code 100%, but I would rather fix/document it than 
> waste energy with those discussions.
> 
> Sorry to say, but you are being captain obvious with all those emails about 
> documentation/bugs when we all already know about it and are working hard to 
> get it in a better shape.
> 
> Cheers,
> Wilder
> 
>> On 28 Sep 2015, at 13:54, Bharat Kumar  wrote:
>> 
>> Hi Remi,
>> 
>> Whatever  ever we think  we have discovered are all well known best 
>> practices while developing code in community. 
>> I agree that tests need to be run on a new PR,  but i wonder why was this 
>> ignored when merging the VR refactor code. Perhaps we will uncover some more 
>> issues if we investigate this. I believe 
>> one of the reasons for this is the complexity and incomplete nature of the 
>> vr refactor change and failing to identify the areas which got effected. If 
>> we had a good documentation i think we cloud have understood the areas that 
>> were getting 
>> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
>> etc  and run the relevant tests. The documentation will also help us focus 
>> on these areas while reviewing  and fixing subsequent issues. Currently no 
>> one knows the areas that got effected 
>> due to the vr refactor change, we are seeing issues all over the code.  I 
>> think this is a bigger problem than what we have discussed so far.
>> 
>> I think presently we should stop fixing all the vr refactoring  bugs until 
>> we come up with a  proper document describing the VR refactoring  changes.
>> 
>> I am not suggesting that we should revert the vr refactor code, I am willing 
>> to work on this and fix the issues,  I am only asking if we can get some 
>> documentation.
>> 
>> 
>> Regards,
>> Bharat.
>> 
>> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:
>> 
>>> 
 On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
 wrote:
 
 Hi Bharat,
 
 
 There is only one way to prove a feature works: with tests. That’s why I 
 say actually _running_ the tests we have today on any new PR, is the most 
 important thing. Having no documentation is a problem, I agree, but it is 
 not more important IMHO. If we had the documentation, we still would have 
 issues if nobody runs the tests and verifies pull requests. Documentation 
 that is perfect does not automatically lead to perfect implementation. So 
 we need tests to verify.
 
 If we don’t agree that is also fine. We need to do both anyway and I think 
 we do agree on that.
 
>>> 
>>> Also we need to move forward. We should have a live chat once 4.6 is out to 
>>> discuss all issues/problems and iron out the process.
>>> 
>>> But reverting the VR refactor is not going to happen. There was ample 
>>> discussions on the PR when it was submitted, there was time to review and 
>>> raise concerns at that time. It went through quite a few reviews, tests 
>>> etc…Maybe the documentation is not good, but the time to raise this concern 
>>> I am afraid was six months ago. We can learn from it, but we are not going 
>>> to revert it, this would not go cleanly as David mentioned.
>>> 
>>> So let’s get back to blockers for 4.6, are there still VR related issues 
>>> with master ?
>>> 
>>> 
>>> 
>>> 
 Regards,
 Remi
 
 
 
 
 
 
 On 28/09/15 12:15, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> i do not agree with “There is no bigger problem”  part of your reply. so 
> I had to repeat myself to make it more clear, Not because i am not aware 
> of what this thread is supposed to do.
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
> wrote:
> 
>> Hi Bha

Re: Blameless post mortem

2015-09-28 Thread Wilder Rodrigues
Only few tests…. 51 tests against a real environment.

At that time Nux also tested it and we tried to get Paul Angus, Geoff and Rohit 
from Shape Blue to test it as well. Nux found a couple of issues that were 
reported and fixed (see email below).

When I came back from holidays, 4 weeks ago, a PR containing 360 files changed 
and almost 4000 lines, which was not even compiling, was merged onto Master. We 
have less than a handful of people executing tests against PRs - so few that I 
could even name who tests and who doesn’t. But hey, that’s a blames email. I’m 
not trying to justify anything, but that handful of people, who actually care 
about ACS, are getting quite fedup with this whole discussion.

Cheers,
Wilder

===

On 20 Feb 2015, at 10:03, Nux! mailto:n...@li.nux.ro>> wrote:

Well, it looks like we were right to test it, found some problems.

1 - the passwords for instances are not served properly, `wget --header 
"DomU_Request: send_my_password" $router:8080` returns blank response. I am not 
sure why this happens, though I tried to look around the router.

2 - in addition to the above, in a redundant VPC the SNAT does not work. From 
an instance I can ping the router(s), but not any further than that. SNAT works 
fine in a normal/non-vpc network.
I'll try to look more into it later today.

Have a nice day :)

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro




On 28 Sep 2015, at 14:13, Bharat Kumar 
mailto:bharat.ku...@citrix.com>> wrote:

Hi Sebastien,

You are confused, we are talking about  persistent VR config changes. below is 
the pr related to it.
https://github.com/apache/cloudstack/pull/118

If you look at it you will notice that there are more than 250 commits and only 
a few tests that were run.

Regards,
Bharat.

On 28-Sep-2015, at 5:24 pm, Bharat Kumar 
mailto:bharat.ku...@citrix.com>>
 wrote:

Hi Remi,

Whatever  ever we think  we have discovered are all well known best practices 
while developing code in community.
I agree that tests need to be run on a new PR,  but i wonder why was this 
ignored when merging the VR refactor code. Perhaps we will uncover some more 
issues if we investigate this. I believe
one of the reasons for this is the complexity and incomplete nature of the vr 
refactor change and failing to identify the areas which got effected. If we had 
a good documentation i think we cloud have understood the areas that were 
getting
impacted early on, areas like the vpn ,  iptables, isolated networks rvr   etc  
and run the relevant tests. The documentation will also help us focus on these 
areas while reviewing  and fixing subsequent issues. Currently no one knows the 
areas that got effected
due to the vr refactor change, we are seeing issues all over the code.  I think 
this is a bigger problem than what we have discussed so far.

I think presently we should stop fixing all the vr refactoring  bugs until we 
come up with a  proper document describing the VR refactoring  changes.

I am not suggesting that we should revert the vr refactor code, I am willing to 
work on this and fix the issues,  I am only asking if we can get some 
documentation.


Regards,
Bharat.

On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen 
mailto:run...@gmail.com>> wrote:


On Sep 28, 2015, at 1:14 PM, Remi Bergsma 
mailto:rberg...@schubergphilis.com>>
 wrote:

Hi Bharat,


There is only one way to prove a feature works: with tests. That’s why I say 
actually _running_ the tests we have today on any new PR, is the most important 
thing. Having no documentation is a problem, I agree, but it is not more 
important IMHO. If we had the documentation, we still would have issues if 
nobody runs the tests and verifies pull requests. Documentation that is perfect 
does not automatically lead to perfect implementation. So we need tests to 
verify.

If we don’t agree that is also fine. We need to do both anyway and I think we 
do agree on that.


Also we need to move forward. We should have a live chat once 4.6 is out to 
discuss all issues/problems and iron out the process.

But reverting the VR refactor is not going to happen. There was ample 
discussions on the PR when it was submitted, there was time to review and raise 
concerns at that time. It went through quite a few reviews, tests etc…Maybe the 
documentation is not good, but the time to raise this concern I am afraid was 
six months ago. We can learn from it, but we are not going to revert it, this 
would not go cleanly as David mentioned.

So let’s get back to blockers for 4.6, are there still VR related issues with 
master ?




Regards,
Remi






On 28/09/15 12:15, "Bharat Kumar" 
mailto:bharat.ku...@citrix.com>>
 wrote:

Hi Remi,

i do not agree with “There is no bigger problem”  part of your reply. so I had 
t

Re: Blameless post mortem

2015-09-28 Thread Daan Hoogland
On Mon, Sep 28, 2015 at 2:32 PM, Wilder Rodrigues <
wrodrig...@schubergphilis.com> wrote:

> Only few tests…. 51 tests against a real environment.
>
​...
and then a lot of people wrote a lot more.

@Bharat, @Raja,

I hope you don't see design as part of quality assurance. It is not. It is
only useful for discussion. Discussion was initiated on the redundant vr
before Budapest last year and greatly ignored in spite of several attempts.
This has been less then motivating to write more of it on list or on the
design pages. Not a good excuse to stop writing it but please stop
hammering the point already made. The wiki pages for FS are not sacred and
the VR page is not by far the best example of their shortcomings.

@all,

The RVR changes have been in master for so long and have worked for
everyone actually testing them for so long. And the main issue is that t
and now problems are coming out just for release in, to our surprise,
tests. These tests have not been run for more then half a year obviously
and that is what we first need to address.

-- 
Daan


Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen
Folks let’s chill for a second here,

Let’s be pragmatic:

First,

- Master got unstable with lots of issues related to the VPC
- Issues were fixed 
- Let’s go back to blockers, fix and release 4.6

Second,

- We have a postmortem from Remi.
- Let’s talk it out, first with the folks that will be in Dublin or in a 
separate thread.
- Then live with a planned on-line hangout of sorts
- I believe that for now our focus should be on 4.6

Third,

- We are trying a new process for 4.6. Which is to stabilize master and release 
from there
- This is a big change compared to developing on master which is what we used 
to do.
- We are embracing github PR
- PR are the place to ask for questions on code, ask for more tests etc..
- We know we need more tests and more docs, we have known that forever
- The VPC refactor started a long time back, maybe I am confused but it seems 
that any VPC related add-on will have to work with this refactor.

So please, no name calling, this is against ASF policy, we have thick skins but 
let’s keep it cordial and get our eyes back on the ball
ie. What’s the latest BVT result ? What are the blockers etc ?

-Sebastien

> On Sep 28, 2015, at 2:19 PM, Bharat Kumar  wrote:
> 
> Dude,
> 
> There was nothing friendly about the postmortem you did, It was only partial, 
> we should do a complete postmortem and then draw conclusions.
> 
> I think the post-mortems like this are of no use if we do not do them 
> completely.
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 5:39 pm, Remi Bergsma  wrote:
> 
>> Dude, this is the final friendly email about his. All points have been made 
>> in previous mails. This has nothing to do with ‘blameless’ and ‘learning’ 
>> anymore.
>> 
>> Read Seb’s mail. We will move on now.
>> 
>> 
>> Regards, Remi
>> 
>> 
>> 
>> On 28/09/15 13:54, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> Whatever  ever we think  we have discovered are all well known best 
>>> practices while developing code in community. 
>>> I agree that tests need to be run on a new PR,  but i wonder why was this 
>>> ignored when merging the VR refactor code. Perhaps we will uncover some 
>>> more issues if we investigate this. I believe 
>>> one of the reasons for this is the complexity and incomplete nature of the 
>>> vr refactor change and failing to identify the areas which got effected. If 
>>> we had a good documentation i think we cloud have understood the areas that 
>>> were getting 
>>> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
>>> etc  and run the relevant tests. The documentation will also help us focus 
>>> on these areas while reviewing  and fixing subsequent issues. Currently no 
>>> one knows the areas that got effected 
>>> due to the vr refactor change, we are seeing issues all over the code.  I 
>>> think this is a bigger problem than what we have discussed so far.
>>> 
>>> I think presently we should stop fixing all the vr refactoring  bugs until 
>>> we come up with a  proper document describing the VR refactoring  changes.
>>> 
>>> I am not suggesting that we should revert the vr refactor code, I am 
>>> willing to work on this and fix the issues,  I am only asking if we can get 
>>> some documentation.
>>> 
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen  wrote:
>>> 
 
> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
> wrote:
> 
> Hi Bharat,
> 
> 
> There is only one way to prove a feature works: with tests. That’s why I 
> say actually _running_ the tests we have today on any new PR, is the most 
> important thing. Having no documentation is a problem, I agree, but it is 
> not more important IMHO. If we had the documentation, we still would have 
> issues if nobody runs the tests and verifies pull requests. Documentation 
> that is perfect does not automatically lead to perfect implementation. So 
> we need tests to verify.
> 
> If we don’t agree that is also fine. We need to do both anyway and I 
> think we do agree on that.
> 
 
 Also we need to move forward. We should have a live chat once 4.6 is out 
 to discuss all issues/problems and iron out the process.
 
 But reverting the VR refactor is not going to happen. There was ample 
 discussions on the PR when it was submitted, there was time to review and 
 raise concerns at that time. It went through quite a few reviews, tests 
 etc…Maybe the documentation is not good, but the time to raise this 
 concern I am afraid was six months ago. We can learn from it, but we are 
 not going to revert it, this would not go cleanly as David mentioned.
 
 So let’s get back to blockers for 4.6, are there still VR related issues 
 with master ?
 
 
 
 
> Regards,
> Remi
> 
> 
> 
> 
> 
> 
> On 28/09/15 12:15, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> i do not agree with “T

Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Wilder,

I am not talking about just the vpc networks. There are many other ares getting 
effected because of this, some of them are vpn(not implemented) , rvr in 
isolated networks etc. 
All i am saying is the design doc will help us understand the complete impact 
of the changes and deal with them accordingly.


Regards,
Bharat.


On 28-Sep-2015, at 6:02 pm, Wilder Rodrigues  
wrote:

> Only few tests…. 51 tests against a real environment.
> 
> At that time Nux also tested it and we tried to get Paul Angus, Geoff and 
> Rohit from Shape Blue to test it as well. Nux found a couple of issues that 
> were reported and fixed (see email below).
> 
> When I came back from holidays, 4 weeks ago, a PR containing 360 files 
> changed and almost 4000 lines, which was not even compiling, was merged onto 
> Master. We have less than a handful of people executing tests against PRs - 
> so few that I could even name who tests and who doesn’t. But hey, that’s a 
> blames email. I’m not trying to justify anything, but that handful of people, 
> who actually care about ACS, are getting quite fedup with this whole 
> discussion.
> 
> Cheers,
> Wilder
> 
> ===
> 
> On 20 Feb 2015, at 10:03, Nux! mailto:n...@li.nux.ro>> wrote:
> 
> Well, it looks like we were right to test it, found some problems.
> 
> 1 - the passwords for instances are not served properly, `wget --header 
> "DomU_Request: send_my_password" $router:8080` returns blank response. I am 
> not sure why this happens, though I tried to look around the router.
> 
> 2 - in addition to the above, in a redundant VPC the SNAT does not work. 
> >From an instance I can ping the router(s), but not any further than that. 
> SNAT works fine in a normal/non-vpc network.
> I'll try to look more into it later today.
> 
> Have a nice day :)
> 
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> 
> 
> 
> On 28 Sep 2015, at 14:13, Bharat Kumar 
> mailto:bharat.ku...@citrix.com>> wrote:
> 
> Hi Sebastien,
> 
> You are confused, we are talking about  persistent VR config changes. below 
> is the pr related to it.
> https://github.com/apache/cloudstack/pull/118
> 
> If you look at it you will notice that there are more than 250 commits and 
> only a few tests that were run.
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 5:24 pm, Bharat Kumar 
> mailto:bharat.ku...@citrix.com>>
>  wrote:
> 
> Hi Remi,
> 
> Whatever  ever we think  we have discovered are all well known best practices 
> while developing code in community.
> I agree that tests need to be run on a new PR,  but i wonder why was this 
> ignored when merging the VR refactor code. Perhaps we will uncover some more 
> issues if we investigate this. I believe
> one of the reasons for this is the complexity and incomplete nature of the vr 
> refactor change and failing to identify the areas which got effected. If we 
> had a good documentation i think we cloud have understood the areas that were 
> getting
> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
> etc  and run the relevant tests. The documentation will also help us focus on 
> these areas while reviewing  and fixing subsequent issues. Currently no one 
> knows the areas that got effected
> due to the vr refactor change, we are seeing issues all over the code.  I 
> think this is a bigger problem than what we have discussed so far.
> 
> I think presently we should stop fixing all the vr refactoring  bugs until we 
> come up with a  proper document describing the VR refactoring  changes.
> 
> I am not suggesting that we should revert the vr refactor code, I am willing 
> to work on this and fix the issues,  I am only asking if we can get some 
> documentation.
> 
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen 
> mailto:run...@gmail.com>> wrote:
> 
> 
> On Sep 28, 2015, at 1:14 PM, Remi Bergsma 
> mailto:rberg...@schubergphilis.com>>
>  wrote:
> 
> Hi Bharat,
> 
> 
> There is only one way to prove a feature works: with tests. That’s why I say 
> actually _running_ the tests we have today on any new PR, is the most 
> important thing. Having no documentation is a problem, I agree, but it is not 
> more important IMHO. If we had the documentation, we still would have issues 
> if nobody runs the tests and verifies pull requests. Documentation that is 
> perfect does not automatically lead to perfect implementation. So we need 
> tests to verify.
> 
> If we don’t agree that is also fine. We need to do both anyway and I think we 
> do agree on that.
> 
> 
> Also we need to move forward. We should have a live chat once 4.6 is out to 
> discuss all issues/problems and iron out the process.
> 
> But reverting the VR refactor is not going to happen. There was ample 
> discussions on the PR when it was sub

Re: Blameless post mortem

2015-09-28 Thread Wilder Rodrigues
I agree with the docs stuff, that I said 5 emails ago.

Once things are fixed, I will take the time to understand the code as a whole 
and write the documentation: we will need ir for release purposes anyway.

Cheers,
Wilder


> On 28 Sep 2015, at 14:47, Bharat Kumar  wrote:
> 
> Hi Wilder,
> 
> I am not talking about just the vpc networks. There are many other ares 
> getting effected because of this, some of them are vpn(not implemented) , rvr 
> in isolated networks etc. 
> All i am saying is the design doc will help us understand the complete impact 
> of the changes and deal with them accordingly.
> 
> 
> Regards,
> Bharat.
> 
> 
> On 28-Sep-2015, at 6:02 pm, Wilder Rodrigues  
> wrote:
> 
>> Only few tests…. 51 tests against a real environment.
>> 
>> At that time Nux also tested it and we tried to get Paul Angus, Geoff and 
>> Rohit from Shape Blue to test it as well. Nux found a couple of issues that 
>> were reported and fixed (see email below).
>> 
>> When I came back from holidays, 4 weeks ago, a PR containing 360 files 
>> changed and almost 4000 lines, which was not even compiling, was merged onto 
>> Master. We have less than a handful of people executing tests against PRs - 
>> so few that I could even name who tests and who doesn’t. But hey, that’s a 
>> blames email. I’m not trying to justify anything, but that handful of 
>> people, who actually care about ACS, are getting quite fedup with this whole 
>> discussion.
>> 
>> Cheers,
>> Wilder
>> 
>> ===
>> 
>> On 20 Feb 2015, at 10:03, Nux! mailto:n...@li.nux.ro>> wrote:
>> 
>> Well, it looks like we were right to test it, found some problems.
>> 
>> 1 - the passwords for instances are not served properly, `wget --header 
>> "DomU_Request: send_my_password" $router:8080` returns blank response. I am 
>> not sure why this happens, though I tried to look around the router.
>> 
>> 2 - in addition to the above, in a redundant VPC the SNAT does not work. 
>> From an instance I can ping the router(s), but not any further than that. 
>> SNAT works fine in a normal/non-vpc network.
>> I'll try to look more into it later today.
>> 
>> Have a nice day :)
>> 
>> Lucian
>> 
>> --
>> Sent from the Delta quadrant using Borg technology!
>> 
>> Nux!
>> www.nux.ro
>> 
>> 
>> 
>> 
>> On 28 Sep 2015, at 14:13, Bharat Kumar 
>> mailto:bharat.ku...@citrix.com>> wrote:
>> 
>> Hi Sebastien,
>> 
>> You are confused, we are talking about  persistent VR config changes. below 
>> is the pr related to it.
>> https://github.com/apache/cloudstack/pull/118
>> 
>> If you look at it you will notice that there are more than 250 commits and 
>> only a few tests that were run.
>> 
>> Regards,
>> Bharat.
>> 
>> On 28-Sep-2015, at 5:24 pm, Bharat Kumar 
>> mailto:bharat.ku...@citrix.com>>
>>  wrote:
>> 
>> Hi Remi,
>> 
>> Whatever  ever we think  we have discovered are all well known best 
>> practices while developing code in community.
>> I agree that tests need to be run on a new PR,  but i wonder why was this 
>> ignored when merging the VR refactor code. Perhaps we will uncover some more 
>> issues if we investigate this. I believe
>> one of the reasons for this is the complexity and incomplete nature of the 
>> vr refactor change and failing to identify the areas which got effected. If 
>> we had a good documentation i think we cloud have understood the areas that 
>> were getting
>> impacted early on, areas like the vpn ,  iptables, isolated networks rvr   
>> etc  and run the relevant tests. The documentation will also help us focus 
>> on these areas while reviewing  and fixing subsequent issues. Currently no 
>> one knows the areas that got effected
>> due to the vr refactor change, we are seeing issues all over the code.  I 
>> think this is a bigger problem than what we have discussed so far.
>> 
>> I think presently we should stop fixing all the vr refactoring  bugs until 
>> we come up with a  proper document describing the VR refactoring  changes.
>> 
>> I am not suggesting that we should revert the vr refactor code, I am willing 
>> to work on this and fix the issues,  I am only asking if we can get some 
>> documentation.
>> 
>> 
>> Regards,
>> Bharat.
>> 
>> On 28-Sep-2015, at 4:59 pm, Sebastien Goasguen 
>> mailto:run...@gmail.com>> wrote:
>> 
>> 
>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma 
>> mailto:rberg...@schubergphilis.com>>
>>  wrote:
>> 
>> Hi Bharat,
>> 
>> 
>> There is only one way to prove a feature works: with tests. That’s why I say 
>> actually _running_ the tests we have today on any new PR, is the most 
>> important thing. Having no documentation is a problem, I agree, but it is 
>> not more important IMHO. If we had the documentation, we still would have 
>> issues if nobody runs the tests and verifies pull requests. Documentation 
>> that is perfect does not automatically lea

[GitHub] cloudstack pull request: CLOUDSTACK-8911: VM start job got stuck i...

2015-09-28 Thread borisroman
Github user borisroman commented on the pull request:

https://github.com/apache/cloudstack/pull/895#issuecomment-143750945
  
Hi @SudharmaJain,

The pull-analysis failed due to a segfault in the surefire plugin. This 
will probably be unrelated, though please force push your commit again to this 
PR. That way the test will run again.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Koushik Das
I had asked for the documentation on persistent VR (PR # 118) changes in the 
context of another discussion and this is what I got at that time.
http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39
 

Right now as I see from the discussion no one understands the changes fully and 
there is no documentation available explaining the design/code flow etc. Even 
if someone volunteers to fix an issue, the person may not be sure if something 
else will break given the nature of code and current status of tests. The 
existing VR scripts (may not be as appealing as the new .py scripts :) ) were 
stabilised over multiple releases and I am sure there must have been lot of 
manual testing done as well. To do the same on the new changes will at least 
take similar amount of testing.

Given that 4.6 release is nearing, one option would be to fix the pending ones 
and release with a disclaimer that VR related functionality may have regressed 
over last release (at this point no one knows what all other potential issues 
may crop up). Or the other options would be to indefinitely postpone 4.6, fix 
the documentation, fix issues and once there is consensus on the stability 
aspect go ahead with the release.


-Koushik


On 28-Sep-2015, at 6:20 PM, Wilder Rodrigues 
 wrote:

> I agree with the docs stuff, that I said 5 emails ago.
> 
> Once things are fixed, I will take the time to understand the code as a whole 
> and write the documentation: we will need ir for release purposes anyway.
> 
> Cheers,
> Wilder
> 
> 
>> On 28 Sep 2015, at 14:47, Bharat Kumar  wrote:
>> 
>> Hi Wilder,
>> 
>> I am not talking about just the vpc networks. There are many other ares 
>> getting effected because of this, some of them are vpn(not implemented) , 
>> rvr in isolated networks etc. 
>> All i am saying is the design doc will help us understand the complete 
>> impact of the changes and deal with them accordingly.
>> 
>> 
>> Regards,
>> Bharat.
>> 
>> 
>> On 28-Sep-2015, at 6:02 pm, Wilder Rodrigues  
>> wrote:
>> 
>>> Only few tests…. 51 tests against a real environment.
>>> 
>>> At that time Nux also tested it and we tried to get Paul Angus, Geoff and 
>>> Rohit from Shape Blue to test it as well. Nux found a couple of issues that 
>>> were reported and fixed (see email below).
>>> 
>>> When I came back from holidays, 4 weeks ago, a PR containing 360 files 
>>> changed and almost 4000 lines, which was not even compiling, was merged 
>>> onto Master. We have less than a handful of people executing tests against 
>>> PRs - so few that I could even name who tests and who doesn’t. But hey, 
>>> that’s a blames email. I’m not trying to justify anything, but that handful 
>>> of people, who actually care about ACS, are getting quite fedup with this 
>>> whole discussion.
>>> 
>>> Cheers,
>>> Wilder
>>> 
>>> ===
>>> 
>>> On 20 Feb 2015, at 10:03, Nux! mailto:n...@li.nux.ro>> 
>>> wrote:
>>> 
>>> Well, it looks like we were right to test it, found some problems.
>>> 
>>> 1 - the passwords for instances are not served properly, `wget --header 
>>> "DomU_Request: send_my_password" $router:8080` returns blank response. I am 
>>> not sure why this happens, though I tried to look around the router.
>>> 
>>> 2 - in addition to the above, in a redundant VPC the SNAT does not work. 
>>> From an instance I can ping the router(s), but not any further than that. 
>>> SNAT works fine in a normal/non-vpc network.
>>> I'll try to look more into it later today.
>>> 
>>> Have a nice day :)
>>> 
>>> Lucian
>>> 
>>> --
>>> Sent from the Delta quadrant using Borg technology!
>>> 
>>> Nux!
>>> www.nux.ro
>>> 
>>> 
>>> 
>>> 
>>> On 28 Sep 2015, at 14:13, Bharat Kumar 
>>> mailto:bharat.ku...@citrix.com>> wrote:
>>> 
>>> Hi Sebastien,
>>> 
>>> You are confused, we are talking about  persistent VR config changes. below 
>>> is the pr related to it.
>>> https://github.com/apache/cloudstack/pull/118
>>> 
>>> If you look at it you will notice that there are more than 250 commits and 
>>> only a few tests that were run.
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 5:24 pm, Bharat Kumar 
>>> mailto:bharat.ku...@citrix.com>>
>>>  wrote:
>>> 
>>> Hi Remi,
>>> 
>>> Whatever  ever we think  we have discovered are all well known best 
>>> practices while developing code in community.
>>> I agree that tests need to be run on a new PR,  but i wonder why was this 
>>> ignored when merging the VR refactor code. Perhaps we will uncover some 
>>> more issues if we investigate this. I believe
>>> one of the reasons for this is the complexity and incomplete nature of the 
>>> vr refactor change and failing to identify the areas which got effected. If 
>>> we had a good documentation i think we cloud have understood the areas that 
>>> were getting
>>> impacted early on, areas like the vpn ,  iptables, i

Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen
Let me try to reply,

> On Sep 28, 2015, at 5:17 PM, Koushik Das  wrote:
> 
> I had asked for the documentation on persistent VR (PR # 118) changes in the 
> context of another discussion and this is what I got at that time.
> http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39
>  
> 

That was back in June. So one has to think that you were happy with the answers 
you got, otherwise you had to keep pushing for documentation and make noise 
about it at that time.

> Right now as I see from the discussion no one understands the changes fully 
> and there is no documentation available explaining the design/code flow etc.

Just like anyone coming to CloudStack from scratch, now and 3 years ago. There 
is limited or terrible documentation about how the system works, people pick it 
up on their own by looking at the code. This is how it’s done in a lot of 
project, and a lot of software. 

This is not specific to this issue, if you really want my opinion, our 
documentation has always sucked, our api documentation is pathetic and our wiki 
is useless.

> Even if someone volunteers to fix an issue, the person may not be sure if 
> something else will break given the nature of code and current status of 
> tests. The existing VR scripts (may not be as appealing as the new .py 
> scripts :) ) were stabilised over multiple releases and I am sure there must 
> have been lot of manual testing done as well. To do the same on the new 
> changes will at least take similar amount of testing.
> 
> Given that 4.6 release is nearing, one option would be to fix the pending 
> ones and release with a disclaimer that VR related functionality may have 
> regressed over last release (at this point no one knows what all other 
> potential issues may crop up).

Nope…we don’t. But it is a very general statement, we don’t know what can come 
up due to this change or any other change…mostly because we have always had bad 
tests coverage, unit or integration or what have you, and that 3 years in we 
still don’t have a dedicated CI.

> Or the other options would be to indefinitely postpone 4.6, fix the 
> documentation, fix issues and once there is consensus on the stability aspect 
> go ahead with the release.

This in my view is traditional software engineering release concept. We don’t 
need this in an open source project like this. We can release, it takes 5 
minutes to cut a release. We should release and fix, release and fix, release 
and fix.

The all idea about what we are doing is that we can release everytime we find 
and fix a bug. There is no benefit about postponing things.


> 
> 
> -Koushik
> 
> 
> On 28-Sep-2015, at 6:20 PM, Wilder Rodrigues 
> wrote:
> 
>> I agree with the docs stuff, that I said 5 emails ago.
>> 
>> Once things are fixed, I will take the time to understand the code as a 
>> whole and write the documentation: we will need ir for release purposes 
>> anyway.
>> 
>> Cheers,
>> Wilder
>> 
>> 
>>> On 28 Sep 2015, at 14:47, Bharat Kumar  wrote:
>>> 
>>> Hi Wilder,
>>> 
>>> I am not talking about just the vpc networks. There are many other ares 
>>> getting effected because of this, some of them are vpn(not implemented) , 
>>> rvr in isolated networks etc. 
>>> All i am saying is the design doc will help us understand the complete 
>>> impact of the changes and deal with them accordingly.
>>> 
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> 
>>> On 28-Sep-2015, at 6:02 pm, Wilder Rodrigues 
>>>  wrote:
>>> 
 Only few tests…. 51 tests against a real environment.
 
 At that time Nux also tested it and we tried to get Paul Angus, Geoff and 
 Rohit from Shape Blue to test it as well. Nux found a couple of issues 
 that were reported and fixed (see email below).
 
 When I came back from holidays, 4 weeks ago, a PR containing 360 files 
 changed and almost 4000 lines, which was not even compiling, was merged 
 onto Master. We have less than a handful of people executing tests against 
 PRs - so few that I could even name who tests and who doesn’t. But hey, 
 that’s a blames email. I’m not trying to justify anything, but that 
 handful of people, who actually care about ACS, are getting quite fedup 
 with this whole discussion.
 
 Cheers,
 Wilder
 
 ===
 
 On 20 Feb 2015, at 10:03, Nux! mailto:n...@li.nux.ro>> 
 wrote:
 
 Well, it looks like we were right to test it, found some problems.
 
 1 - the passwords for instances are not served properly, `wget --header 
 "DomU_Request: send_my_password" $router:8080` returns blank response. I 
 am not sure why this happens, though I tried to look around the router.
 
 2 - in addition to the above, in a redundant VPC the SNAT does not work. 
 From an instance I can ping the router(s), but not any further than that. 
 SNAT works fine in a norm

Re: Blameless post mortem

2015-09-28 Thread Koushik Das
inline

On 28-Sep-2015, at 9:15 PM, Sebastien Goasguen  wrote:

> Let me try to reply,
> 
>> On Sep 28, 2015, at 5:17 PM, Koushik Das  wrote:
>> 
>> I had asked for the documentation on persistent VR (PR # 118) changes in the 
>> context of another discussion and this is what I got at that time.
>> http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39
>>  
>> 
> 
> That was back in June. So one has to think that you were happy with the 
> answers you got, otherwise you had to keep pushing for documentation and make 
> noise about it at that time.

In the context of that discussion yes, but not in terms of overall feature. 
Unfortunately bugs weren't reported at that time otherwise would have made 
enough noise :).


>> Right now as I see from the discussion no one understands the changes fully 
>> and there is no documentation available explaining the design/code flow etc.
> 
> Just like anyone coming to CloudStack from scratch, now and 3 years ago. 
> There is limited or terrible documentation about how the system works, people 
> pick it up on their own by looking at the code. This is how it’s done in a 
> lot of project, and a lot of software. 
> 
> This is not specific to this issue, if you really want my opinion, our 
> documentation has always sucked, our api documentation is pathetic and our 
> wiki is useless.

Are you saying since we were pathetic in terms of documentation 3 years back, 
we still stick to the same? If we put same logic to the code base then why care 
about PRs, code reviews at all. I feel we as a community need to improve the 
documentation, at least start with the new changes. And if the changes impact 
the core pieces of the system it becomes even more important. If you look at PR 
118 you will realise that the person who submitted the PR is calming that he is 
not fully aware of the changes. What do you expect me to make out of that?

> 
>> Even if someone volunteers to fix an issue, the person may not be sure if 
>> something else will break given the nature of code and current status of 
>> tests. The existing VR scripts (may not be as appealing as the new .py 
>> scripts :) ) were stabilised over multiple releases and I am sure there must 
>> have been lot of manual testing done as well. To do the same on the new 
>> changes will at least take similar amount of testing.
>> 
>> Given that 4.6 release is nearing, one option would be to fix the pending 
>> ones and release with a disclaimer that VR related functionality may have 
>> regressed over last release (at this point no one knows what all other 
>> potential issues may crop up).
> 
> Nope…we don’t. But it is a very general statement, we don’t know what can 
> come up due to this change or any other change…mostly because we have always 
> had bad tests coverage, unit or integration or what have you, and that 3 
> years in we still don’t have a dedicated CI.
> 
>> Or the other options would be to indefinitely postpone 4.6, fix the 
>> documentation, fix issues and once there is consensus on the stability 
>> aspect go ahead with the release.
> 
> This in my view is traditional software engineering release concept. We don’t 
> need this in an open source project like this. We can release, it takes 5 
> minutes to cut a release. We should release and fix, release and fix, release 
> and fix.
> 
> The all idea about what we are doing is that we can release everytime we find 
> and fix a bug. There is no benefit about postponing things.
> 

Theoretically we can cut any number of releases at any frequency. But since 
concerns have been raised on the quality aspect I feel it needs to be addressed 
appropriately. Let users take an informed decision on whether to use a 
particular release or not. I am not saying that postponing is the only option.

> 
>> 
>> 
>> -Koushik
>> 
>> 
>> On 28-Sep-2015, at 6:20 PM, Wilder Rodrigues 
>> wrote:
>> 
>>> I agree with the docs stuff, that I said 5 emails ago.
>>> 
>>> Once things are fixed, I will take the time to understand the code as a 
>>> whole and write the documentation: we will need ir for release purposes 
>>> anyway.
>>> 
>>> Cheers,
>>> Wilder
>>> 
>>> 
 On 28 Sep 2015, at 14:47, Bharat Kumar  wrote:
 
 Hi Wilder,
 
 I am not talking about just the vpc networks. There are many other ares 
 getting effected because of this, some of them are vpn(not implemented) , 
 rvr in isolated networks etc. 
 All i am saying is the design doc will help us understand the complete 
 impact of the changes and deal with them accordingly.
 
 
 Regards,
 Bharat.
 
 
 On 28-Sep-2015, at 6:02 pm, Wilder Rodrigues 
  wrote:
 
> Only few tests…. 51 tests against a real environment.
> 
> At that time Nux also tested it and we tried to get Paul Angus, Geoff and 
> Rohit from Shape Blue to test it as well. Nux found a couple of issues 
> that were report

Re: Blameless post mortem

2015-09-28 Thread Wilder Rodrigues
Koushik,

Please, say my name! Don’t mention stuff like “the person blah blah blah”, 
please! If you want to start pointing fingers, I can also play this game and 
get references to the PRs which were not tested, pushed straight to master 
(when we agreed on not doing so) or got 2 LGTM without any comments about test 
procedure/reports. In addition, I said I’m don’t know the code as a whole 
because I was not the one writing it. We should not be talking about this.

Now, please pay some attention to the work we have done over the weekend to get 
master stable, after it was broken because not tested PRs got through. It has 
nothing to do with lack of documentation, dude! It’s all about changing code 
and testing it. And if the one changing it doesn’t know what is being done, 
it’s a matter of being honest and saying it, asking for help!

Documentation is also important, as I said before, and I’m glad to help doing 
it, for the code I wrote and the one I did not. That’s community work, from my 
point of view at least. But if you or Bharat or Raja or anyone else got pissed 
off with my, or anyone else, emails asking for tests, I can’t help with that. 
I’m already working almost 60h per week trying to help the community with many 
bugs, not only VR related, and getting no freaking “thank you” back! And 
believe me, I’m not taking it personal. It’s just very frustrating that there 
are people insisting on this thread for so long. It’s about to become a pissing 
contest!

All the energy spent here today, trying to find people to blame, could have 
been used to fix the blockers!

Btw, that’s my last email on this thread, but I hope that won’t be the last one 
in the community. I still believe in ACS, despite those immature discussions.


Cheers,
Wilder


On 28 Sep 2015, at 18:49, Koushik Das 
mailto:koushik@citrix.com>> wrote:

inline

On 28-Sep-2015, at 9:15 PM, Sebastien Goasguen 
mailto:run...@gmail.com>> wrote:

Let me try to reply,

On Sep 28, 2015, at 5:17 PM, Koushik Das 
mailto:koushik@citrix.com>> wrote:

I had asked for the documentation on persistent VR (PR # 118) changes in the 
context of another discussion and this is what I got at that time.
http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39


That was back in June. So one has to think that you were happy with the answers 
you got, otherwise you had to keep pushing for documentation and make noise 
about it at that time.

In the context of that discussion yes, but not in terms of overall feature. 
Unfortunately bugs weren't reported at that time otherwise would have made 
enough noise :).


Right now as I see from the discussion no one understands the changes fully and 
there is no documentation available explaining the design/code flow etc.

Just like anyone coming to CloudStack from scratch, now and 3 years ago. There 
is limited or terrible documentation about how the system works, people pick it 
up on their own by looking at the code. This is how it’s done in a lot of 
project, and a lot of software.

This is not specific to this issue, if you really want my opinion, our 
documentation has always sucked, our api documentation is pathetic and our wiki 
is useless.

Are you saying since we were pathetic in terms of documentation 3 years back, 
we still stick to the same? If we put same logic to the code base then why care 
about PRs, code reviews at all. I feel we as a community need to improve the 
documentation, at least start with the new changes. And if the changes impact 
the core pieces of the system it becomes even more important. If you look at PR 
118 you will realise that the person who submitted the PR is calming that he is 
not fully aware of the changes. What do you expect me to make out of that?


Even if someone volunteers to fix an issue, the person may not be sure if 
something else will break given the nature of code and current status of tests. 
The existing VR scripts (may not be as appealing as the new .py scripts :) ) 
were stabilised over multiple releases and I am sure there must have been lot 
of manual testing done as well. To do the same on the new changes will at least 
take similar amount of testing.

Given that 4.6 release is nearing, one option would be to fix the pending ones 
and release with a disclaimer that VR related functionality may have regressed 
over last release (at this point no one knows what all other potential issues 
may crop up).

Nope…we don’t. But it is a very general statement, we don’t know what can come 
up due to this change or any other change…mostly because we have always had bad 
tests coverage, unit or integration or what have you, and that 3 years in we 
still don’t have a dedicated CI.

Or the other options would be to indefinitely postpone 4.6, fix the 
documentation, fix issues and once there is consensus on the stability aspect 
go ahead with the release.

This in my view is traditional software engineering re

[GitHub] cloudstack pull request: CLOUDSTACK-8908 After copying the templat...

2015-09-28 Thread yvsubhash
GitHub user yvsubhash opened a pull request:

https://github.com/apache/cloudstack/pull/896

CLOUDSTACK-8908 After copying the template charging for that template is 
getting stopped

This is happening as the zone id is not part of the query.  Zone id is 
added to the query and unit tests are also added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yvsubhash/cloudstack CLOUDSTACK-8908

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/896.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #896


commit 3f99d42de9560005d6a2996f307f5dd7f45b8b4d
Author: subhash yedugundla 
Date:   2015-09-28T17:35:43Z

CLOUDSTACK-8908 After copying the template charging for that template is 
stopped




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Wilder,

I think you are taking this in a wrong way. I am not pissed because people are 
asking for tests.  I am pissed with the blame game that is being played in the 
community, I  do not understand why do we need to sight PRs and try to blame 
others, other who trying to fix the bugs which were not introduced by them in 
the first place. I hope we do not do this type of “blameless" email which 
blames people anyway. This dose not help anyone but in turn causes lot of pain 
to people who are actively fixing broken stuff. bugs occur unintentionally and 
due lack of information of misunderstanding of code and  I am sure we all have 
made our share of contributions to them at some point of time or other. if we 
follow this example we can do a “Blameless” postmortem of every bug in acs 
jira. Instead of this we should try to minimise them by discussing and 
implementing good processes without reverting to name calling or indirect 
blaming of people. 

Regards,
Bharat.

On 28-Sep-2015, at 11:07 pm, Wilder Rodrigues  
wrote:

> Koushik,
> 
> Please, say my name! Don’t mention stuff like “the person blah blah blah”, 
> please! If you want to start pointing fingers, I can also play this game and 
> get references to the PRs which were not tested, pushed straight to master 
> (when we agreed on not doing so) or got 2 LGTM without any comments about 
> test procedure/reports. In addition, I said I’m don’t know the code as a 
> whole because I was not the one writing it. We should not be talking about 
> this.
> 
> Now, please pay some attention to the work we have done over the weekend to 
> get master stable, after it was broken because not tested PRs got through. It 
> has nothing to do with lack of documentation, dude! It’s all about changing 
> code and testing it. And if the one changing it doesn’t know what is being 
> done, it’s a matter of being honest and saying it, asking for help!
> 
> Documentation is also important, as I said before, and I’m glad to help doing 
> it, for the code I wrote and the one I did not. That’s community work, from 
> my point of view at least. But if you or Bharat or Raja or anyone else got 
> pissed off with my, or anyone else, emails asking for tests, I can’t help 
> with that. I’m already working almost 60h per week trying to help the 
> community with many bugs, not only VR related, and getting no freaking “thank 
> you” back! And believe me, I’m not taking it personal. It’s just very 
> frustrating that there are people insisting on this thread for so long. It’s 
> about to become a pissing contest!
> 
> All the energy spent here today, trying to find people to blame, could have 
> been used to fix the blockers!
> 
> Btw, that’s my last email on this thread, but I hope that won’t be the last 
> one in the community. I still believe in ACS, despite those immature 
> discussions.
> 
> 
> Cheers,
> Wilder
> 
> 
> On 28 Sep 2015, at 18:49, Koushik Das 
> mailto:koushik@citrix.com>> wrote:
> 
> inline
> 
> On 28-Sep-2015, at 9:15 PM, Sebastien Goasguen 
> mailto:run...@gmail.com>> wrote:
> 
> Let me try to reply,
> 
> On Sep 28, 2015, at 5:17 PM, Koushik Das 
> mailto:koushik@citrix.com>> wrote:
> 
> I had asked for the documentation on persistent VR (PR # 118) changes in the 
> context of another discussion and this is what I got at that time.
> http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39
> 
> 
> That was back in June. So one has to think that you were happy with the 
> answers you got, otherwise you had to keep pushing for documentation and make 
> noise about it at that time.
> 
> In the context of that discussion yes, but not in terms of overall feature. 
> Unfortunately bugs weren't reported at that time otherwise would have made 
> enough noise :).
> 
> 
> Right now as I see from the discussion no one understands the changes fully 
> and there is no documentation available explaining the design/code flow etc.
> 
> Just like anyone coming to CloudStack from scratch, now and 3 years ago. 
> There is limited or terrible documentation about how the system works, people 
> pick it up on their own by looking at the code. This is how it’s done in a 
> lot of project, and a lot of software.
> 
> This is not specific to this issue, if you really want my opinion, our 
> documentation has always sucked, our api documentation is pathetic and our 
> wiki is useless.
> 
> Are you saying since we were pathetic in terms of documentation 3 years back, 
> we still stick to the same? If we put same logic to the code base then why 
> care about PRs, code reviews at all. I feel we as a community need to improve 
> the documentation, at least start with the new changes. And if the changes 
> impact the core pieces of the system it becomes even more important. If you 
> look at PR 118 you will realise that the person who submitted the PR is 
> calming that he is not fully aware of the changes. What do you expect

[GitHub] cloudstack pull request: CLOUDSTACK-8919: Slow UI response while l...

2015-09-28 Thread nitin-maharana
GitHub user nitin-maharana opened a pull request:

https://github.com/apache/cloudstack/pull/897

CLOUDSTACK-8919: Slow UI response while loading the list of networks in 
network tab.

Instead of searching for each network, now it is searching for each zone.
For basic zone, it will show the security group directly because by default 
securitygroupsenabled is true.
For advanced zone, check the securitygroupsenabled option in each zone. If 
any one has value true, then show.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nitin-maharana/cloudstack CloudStack-Nitin10

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #897


commit bee8e87ead67d3d0fd7953f5e97b9327c88db2ba
Author: Nitin Kumar Maharana 
Date:   2015-09-28T18:39:15Z

CLOUDSTACK-8919: Slow UI response while loading the list of networks in 
network tab.

Instead of searching for each network, now it is searching for each zone.
For basic zone, it will show the security group directly because by default 
securitygroupsenabled is true.
For advanced zone, check the securitygroupsenabled option in each zone. If 
any one has value true, then show.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[4.6] Master fails to add secondary storage network, deployment fails

2015-09-28 Thread Nux!
Hello,

Am testing 4.6 master with CentOS 6 HVs. 
After installing 4.6 from yum repo at 
http://jenkins.buildacloud.org/view/4.6/job/package-centos6-4.6/ and running 
the initial setup, it fails at the end because "unknwon parameters zoneid" ... :

2015-09-28 19:34:15,072 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
(API-Job-Executor-25:ctx-e08dd156 job-27) Executing AsyncJobVO {id:27, userId: 
2, accountId: 2, instanceType: None, instanceId: null, cmd: 
org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd, 
cmdInfo: {"response":"json","ctxDetails":"{\"interface 
com.cloud.dc.Pod\":\"b1c7836b-3bae-4d83-b113-b8308cea57ab\"}","cmdEventType":"STORAGE.IP.RANGE.CREATE","ctxUserId":"2","gateway":"192.168.200.67","podid":"b1c7836b-3bae-4d83-b113-b8308cea57ab","zoneid":"d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63","startip":"192.168.200.200","vlan":"123","httpmethod":"GET","_":"1443465255029","ctxAccountId":"2","ctxStartEventId":"68","netmask":"255.255.255.0","endip":"192.168.200.222"},
 cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: 
null, initMsid: 266785867798693, completeMsid: null, lastUpdated: null, 
lastPolled: null, created: null}
2015-09-28 19:34:15,073 DEBUG [c.c.a.ApiServlet] (catalina-exec-24:ctx-def815dc 
ctx-c4ca8865) ===END===  85.13.192.198 -- GET  
command=createStorageNetworkIpRange&response=json&gateway=192.168.200.67&netmask=255.255.255.0&vlan=123&startip=192.168.200.200&endip=192.168.200.222&zoneid=d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63&podid=b1c7836b-3bae-4d83-b113-b8308cea57ab&_=1443465255029
2015-09-28 19:34:15,075 WARN  [c.c.a.d.ParamGenericValidationWorker] 
(API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Received unknown 
parameters for command createStorageNetworkIpRange. Unknown parameters : zoneid
2015-09-28 19:34:15,122 WARN  [o.a.c.a.c.a.n.CreateStorageNetworkIpRangeCmd] 
(API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Create storage network 
IP range failed
com.cloud.utils.exception.CloudRuntimeException: Unable to commit or close the 
connection. 
at 
com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:730)
at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
at 
com.cloud.network.StorageNetworkManagerImpl.createIpRange(StorageNetworkManagerImpl.java:229)
at 
org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd.execute(CreateStorageNetworkIpRangeCmd.java:118)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Connection is closed.
at 
org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:185)
at 
org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.commit(PoolingDataSource.java:210)
at 
com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:722)

Anyone aware of this bug or should I submit a new one in Jira?

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen
ok so :)

While fixing your broken stuff , he broke some other stuff which you attempted 
to fix but broke other stuff doing it, so he decided to fix your broken stuff 
that was supposed to fix the broken stuff he did to improve old stuff.

Great 0-0, ball in the middle.

Let’s fix some blockers, add some tests and release

And thank you all :)

-sebastien

> On Sep 28, 2015, at 7:54 PM, Bharat Kumar  wrote:
> 
> Hi Wilder,
> 
> I think you are taking this in a wrong way. I am not pissed because people 
> are asking for tests.  I am pissed with the blame game that is being played 
> in the community, I  do not understand why do we need to sight PRs and try to 
> blame others, other who trying to fix the bugs which were not introduced by 
> them in the first place. I hope we do not do this type of “blameless" email 
> which blames people anyway. This dose not help anyone but in turn causes lot 
> of pain to people who are actively fixing broken stuff. bugs occur 
> unintentionally and due lack of information of misunderstanding of code and  
> I am sure we all have made our share of contributions to them at some point 
> of time or other. if we follow this example we can do a “Blameless” 
> postmortem of every bug in acs jira. Instead of this we should try to 
> minimise them by discussing and implementing good processes without reverting 
> to name calling or indirect blaming of people. 
> 
> Regards,
> Bharat.
> 
> On 28-Sep-2015, at 11:07 pm, Wilder Rodrigues  
> wrote:
> 
>> Koushik,
>> 
>> Please, say my name! Don’t mention stuff like “the person blah blah blah”, 
>> please! If you want to start pointing fingers, I can also play this game and 
>> get references to the PRs which were not tested, pushed straight to master 
>> (when we agreed on not doing so) or got 2 LGTM without any comments about 
>> test procedure/reports. In addition, I said I’m don’t know the code as a 
>> whole because I was not the one writing it. We should not be talking about 
>> this.
>> 
>> Now, please pay some attention to the work we have done over the weekend to 
>> get master stable, after it was broken because not tested PRs got through. 
>> It has nothing to do with lack of documentation, dude! It’s all about 
>> changing code and testing it. And if the one changing it doesn’t know what 
>> is being done, it’s a matter of being honest and saying it, asking for help!
>> 
>> Documentation is also important, as I said before, and I’m glad to help 
>> doing it, for the code I wrote and the one I did not. That’s community work, 
>> from my point of view at least. But if you or Bharat or Raja or anyone else 
>> got pissed off with my, or anyone else, emails asking for tests, I can’t 
>> help with that. I’m already working almost 60h per week trying to help the 
>> community with many bugs, not only VR related, and getting no freaking 
>> “thank you” back! And believe me, I’m not taking it personal. It’s just very 
>> frustrating that there are people insisting on this thread for so long. It’s 
>> about to become a pissing contest!
>> 
>> All the energy spent here today, trying to find people to blame, could have 
>> been used to fix the blockers!
>> 
>> Btw, that’s my last email on this thread, but I hope that won’t be the last 
>> one in the community. I still believe in ACS, despite those immature 
>> discussions.
>> 
>> 
>> Cheers,
>> Wilder
>> 
>> 
>> On 28 Sep 2015, at 18:49, Koushik Das 
>> mailto:koushik@citrix.com>> wrote:
>> 
>> inline
>> 
>> On 28-Sep-2015, at 9:15 PM, Sebastien Goasguen 
>> mailto:run...@gmail.com>> wrote:
>> 
>> Let me try to reply,
>> 
>> On Sep 28, 2015, at 5:17 PM, Koushik Das 
>> mailto:koushik@citrix.com>> wrote:
>> 
>> I had asked for the documentation on persistent VR (PR # 118) changes in the 
>> context of another discussion and this is what I got at that time.
>> http://dev.cloudstack.apache.narkive.com/MH47etbS/discuss-out-of-band-vr-migration-should-we-reboot-vr-or-not#post39
>> 
>> 
>> That was back in June. So one has to think that you were happy with the 
>> answers you got, otherwise you had to keep pushing for documentation and 
>> make noise about it at that time.
>> 
>> In the context of that discussion yes, but not in terms of overall feature. 
>> Unfortunately bugs weren't reported at that time otherwise would have made 
>> enough noise :).
>> 
>> 
>> Right now as I see from the discussion no one understands the changes fully 
>> and there is no documentation available explaining the design/code flow etc.
>> 
>> Just like anyone coming to CloudStack from scratch, now and 3 years ago. 
>> There is limited or terrible documentation about how the system works, 
>> people pick it up on their own by looking at the code. This is how it’s done 
>> in a lot of project, and a lot of software.
>> 
>> This is not specific to this issue, if you really want my opinion, our 
>> documentation has always sucked, our api documentation is pathetic and our 
>> wiki is useless.
>> 
>> Are 

CCC Dublin one week away

2015-09-28 Thread Sebastien Goasguen
Hi everyone,

Next week 8-9 is CCC Dublin.

There is still time to register and tell your friends to come.

http://events.linuxfoundation.org/events/cloudstack-collaboration-conference-europe

There is one or two spots available on the schedule, if have something exciting 
to talk about please drop me a note.

-Sebastien

Cloustack 4.3.2 announced?

2015-09-28 Thread Ron Wheeler

I am running 4.5.2 on Centos7.
Thanks to all who posted helpful hints on the ML and on private blogs.

When I click on "Help" I get a page that has a "Latest Announcement"  
"Announcing Apache CloudStack 4.3.2 on the left side of the page and a 
"Get CloudStack" saying that Apache CloudStack 4.5.2 is out!

Underneath that is an announcement about 4.4.1?

Is the 4.3.2 just a typo?

Ron

--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



Contradictory host configuration guidance in docs.

2015-09-28 Thread Ron Wheeler

http://docs.cloudstack.apache.org/en/master/concepts.html#what-is-apache-cloudstack
In "About Clusters" it says:
"The hosts in a cluster all have identical hardware, run the same 
hypervisor, are on the same subnet, and access the same shared primary 
storage."


Later in the "About Hosts" it says
"May have different capacities (different CPU speeds, different amounts 
of RAM, etc.), although the hosts within a cluster must all be homogeneous".


The use of "identical" and "homogeneneous" to describe hardware is 
confusing.
IMHO you don't have "identical" hardware if you have different "CPU 
Speeds" and "different amounts of RAM" and different "etc."


What exactly should be said about Cluster hosts? Is it sufficient to 
have the same CPU family?

How far can you stray in the same family?
Can a host with an AMD Opteron 6300 be in the same cluster as one based 
on the AMD Opteron 6200?


Is there any way to clarify these 2 sections on the same page and 
replace "identical" and "homogeneous" with a clearer description of what 
is required.
The cluster comment in the Host section should probably be removed or 
replaced with a reference to About Clusters to avoid having duplicate  
and possibly conflicting information.


Ron


--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



Re: Contradictory host configuration guidance in docs.

2015-09-28 Thread Rafael Weingärtner
If you are thinking on using Xen, this link might provide you with some
clarification about CPU requirements.
http://support.citrix.com/article/CTX123491?_ga=1.929334.668244781.1422273077

As far as I understand, when creating a cluster you should match CPUs. The
amount of memory in each host does not need to match.

On Mon, Sep 28, 2015 at 6:38 PM, Ron Wheeler  wrote:

>
> http://docs.cloudstack.apache.org/en/master/concepts.html#what-is-apache-cloudstack
> In "About Clusters" it says:
> "The hosts in a cluster all have identical hardware, run the same
> hypervisor, are on the same subnet, and access the same shared primary
> storage."
>
> Later in the "About Hosts" it says
> "May have different capacities (different CPU speeds, different amounts of
> RAM, etc.), although the hosts within a cluster must all be homogeneous".
>
> The use of "identical" and "homogeneneous" to describe hardware is
> confusing.
> IMHO you don't have "identical" hardware if you have different "CPU
> Speeds" and "different amounts of RAM" and different "etc."
>
> What exactly should be said about Cluster hosts? Is it sufficient to have
> the same CPU family?
> How far can you stray in the same family?
> Can a host with an AMD Opteron 6300 be in the same cluster as one based on
> the AMD Opteron 6200?
>
> Is there any way to clarify these 2 sections on the same page and replace
> "identical" and "homogeneous" with a clearer description of what is
> required.
> The cluster comment in the Host section should probably be removed or
> replaced with a reference to About Clusters to avoid having duplicate  and
> possibly conflicting information.
>
> Ron
>
>
> --
> Ron Wheeler
> President
> Artifact Software Inc
> email: rwhee...@artifact-software.com
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>
>


-- 
Rafael Weingärtner


Re: [4.6] Master fails to add secondary storage network, deployment fails [Unknown parameters : zoneid]

2015-09-28 Thread Nux!
Any ideas?

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Nux!" 
> To: "dev" 
> Sent: Monday, 28 September, 2015 19:39:43
> Subject: [4.6] Master fails to add secondary storage network, deployment fails

> Hello,
> 
> Am testing 4.6 master with CentOS 6 HVs.
> After installing 4.6 from yum repo at
> http://jenkins.buildacloud.org/view/4.6/job/package-centos6-4.6/ and running
> the initial setup, it fails at the end because "unknwon parameters zoneid" ...
> :
> 
> 2015-09-28 19:34:15,072 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (API-Job-Executor-25:ctx-e08dd156 job-27) Executing AsyncJobVO {id:27, userId:
> 2, accountId: 2, instanceType: None, instanceId: null, cmd:
> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd,
> cmdInfo: {"response":"json","ctxDetails":"{\"interface
> com.cloud.dc.Pod\":\"b1c7836b-3bae-4d83-b113-b8308cea57ab\"}","cmdEventType":"STORAGE.IP.RANGE.CREATE","ctxUserId":"2","gateway":"192.168.200.67","podid":"b1c7836b-3bae-4d83-b113-b8308cea57ab","zoneid":"d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63","startip":"192.168.200.200","vlan":"123","httpmethod":"GET","_":"1443465255029","ctxAccountId":"2","ctxStartEventId":"68","netmask":"255.255.255.0","endip":"192.168.200.222"},
> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result:
> null, initMsid: 266785867798693, completeMsid: null, lastUpdated: null,
> lastPolled: null, created: null}
> 2015-09-28 19:34:15,073 DEBUG [c.c.a.ApiServlet] 
> (catalina-exec-24:ctx-def815dc
> ctx-c4ca8865) ===END===  85.13.192.198 -- GET
> command=createStorageNetworkIpRange&response=json&gateway=192.168.200.67&netmask=255.255.255.0&vlan=123&startip=192.168.200.200&endip=192.168.200.222&zoneid=d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63&podid=b1c7836b-3bae-4d83-b113-b8308cea57ab&_=1443465255029
> 2015-09-28 19:34:15,075 WARN  [c.c.a.d.ParamGenericValidationWorker]
> (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Received unknown
> parameters for command createStorageNetworkIpRange. Unknown parameters : 
> zoneid
> 2015-09-28 19:34:15,122 WARN  [o.a.c.a.c.a.n.CreateStorageNetworkIpRangeCmd]
> (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Create storage network
> IP range failed
> com.cloud.utils.exception.CloudRuntimeException: Unable to commit or close the
> connection.
>   at 
> com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:730)
>   at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
>   at
>   
> com.cloud.network.StorageNetworkManagerImpl.createIpRange(StorageNetworkManagerImpl.java:229)
>   at
>   
> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd.execute(CreateStorageNetworkIpRangeCmd.java:118)
>   at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
>   at 
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>   at
>   
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
>   at
>   
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>   at
>   
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>   at
>   
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>   at
>   
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>   at
>   
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>   at
>   
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at
>   
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at
>   
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.sql.SQLException: Connection is closed.
>   at
>   
> org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.checkOpen(PoolingDataSource.java:185)
>   at
>   
> org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.commit(PoolingDataSource.java:210)
>   at 
> com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:722)
> 
> Anyone aware of this bug or should I submit a new one in Jira?
> 
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro


Re: [4.6] Master fails to add secondary storage network, deployment fails [Unknown parameters : zoneid]

2015-09-28 Thread Boris Schrijver
Hi Nux,

The only thing I can say right now; that Jenkins job didn't run for the past
week. So maybe it has already been fixed, maybe not. Could you try package it
yourself and deploy again? That way you know the problem still persists. And if
it does, please file a Jira ticket!

Best regards,

Boris Schrijver

TEL: +31633784542
MAIL: bo...@pcextreme.nl

> 
> On September 28, 2015 at 11:51 PM Nux!  wrote:
> 
> 
> Any ideas?
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
> > From: "Nux!" 
> > To: "dev" 
> > Sent: Monday, 28 September, 2015 19:39:43
> > Subject: [4.6] Master fails to add secondary storage network, deployment
> > fails
> 
> > Hello,
> >
> > Am testing 4.6 master with CentOS 6 HVs.
> > After installing 4.6 from yum repo at
> > http://jenkins.buildacloud.org/view/4.6/job/package-centos6-4.6/ and
> > running
> > the initial setup, it fails at the end because "unknwon parameters
> > zoneid" ...
> > :
> >
> > 2015-09-28 19:34:15,072 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > (API-Job-Executor-25:ctx-e08dd156 job-27) Executing AsyncJobVO {id:27,
> > userId:
> > 2, accountId: 2, instanceType: None, instanceId: null, cmd:
> > 
> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd,
> > cmdInfo: {"response":"json","ctxDetails":"{\"interface
> > 
> com.cloud.dc.Pod\":\"b1c7836b-3bae-4d83-b113-b8308cea57ab\"}","cmdEventType":"STORAGE.IP.RANGE.CREATE","ctxUserId":"2","gateway":"192.168.200.67","podid":"b1c7836b-3bae-4d83-b113-b8308cea57ab","zoneid":"d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63","startip":"192.168.200.200","vlan":"123","httpmethod":"GET","_":"1443465255029","ctxAccountId":"2","ctxStartEventId":"68","netmask":"255.255.255.0","endip":"192.168.200.222"},
> > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
> > result:
> > null, initMsid: 266785867798693, completeMsid: null, lastUpdated: null,
> > lastPolled: null, created: null}
> > 2015-09-28 19:34:15,073 DEBUG [c.c.a.ApiServlet]
> > (catalina-exec-24:ctx-def815dc
> > ctx-c4ca8865) ===END=== 85.13.192.198 -- GET
> > 
> command=createStorageNetworkIpRange&response=json&gateway=192.168.200.67&netmask=255.255.255.0&vlan=123&startip=192.168.200.200&endip=192.168.200.222&zoneid=d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63&podid=b1c7836b-3bae-4d83-b113-b8308cea57ab&_=1443465255029
> > 2015-09-28 19:34:15,075 WARN [c.c.a.d.ParamGenericValidationWorker]
> > (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Received unknown
> > parameters for command createStorageNetworkIpRange. Unknown parameters :
> > zoneid
> > 2015-09-28 19:34:15,122 WARN
> > [o.a.c.a.c.a.n.CreateStorageNetworkIpRangeCmd]
> > (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Create storage
> > network
> > IP range failed
> > com.cloud.utils.exception.CloudRuntimeException: Unable to commit or
> > close the
> > connection.
> > at
> > com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:730)
> > at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
> > at
> > 
> com.cloud.network.StorageNetworkManagerImpl.createIpRange(StorageNetworkManagerImpl.java:229)
> > at
> > 
> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd.execute(CreateStorageNetworkIpRangeCmd.java:118)
> > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
> > at
> > 
> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
> > at
> > 
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
> > at
> > 
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> > at
> > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> > at
> > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> > at
> > 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> > at
> > 
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> > at
> > 
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
> > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> > 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> > 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)

Re: [4.6] Master fails to add secondary storage network, deployment fails [Unknown parameters : zoneid]

2015-09-28 Thread Nux!
Thanks Boris, I'll package this tomorrow and test again.

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "boris" 
> To: "dev" 
> Sent: Monday, 28 September, 2015 23:16:35
> Subject: Re: [4.6] Master fails to add secondary storage network, deployment 
> fails [Unknown parameters : zoneid]

> Hi Nux,
> 
> The only thing I can say right now; that Jenkins job didn't run for the past
> week. So maybe it has already been fixed, maybe not. Could you try package it
> yourself and deploy again? That way you know the problem still persists. And 
> if
> it does, please file a Jira ticket!
> 
> Best regards,
> 
> Boris Schrijver
> 
> TEL: +31633784542
> MAIL: bo...@pcextreme.nl
> 
>> 
>> On September 28, 2015 at 11:51 PM Nux!  wrote:
>> 
>> 
>> Any ideas?
>> 
>> --
>> Sent from the Delta quadrant using Borg technology!
>> 
>> Nux!
>> www.nux.ro
>> 
>> - Original Message -
>> > From: "Nux!" 
>> > To: "dev" 
>> > Sent: Monday, 28 September, 2015 19:39:43
>> > Subject: [4.6] Master fails to add secondary storage network, 
>> deployment
>> > fails
>> 
>> > Hello,
>> >
>> > Am testing 4.6 master with CentOS 6 HVs.
>> > After installing 4.6 from yum repo at
>> > http://jenkins.buildacloud.org/view/4.6/job/package-centos6-4.6/ and
>> > running
>> > the initial setup, it fails at the end because "unknwon parameters
>> > zoneid" ...
>> > :
>> >
>> > 2015-09-28 19:34:15,072 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (API-Job-Executor-25:ctx-e08dd156 job-27) Executing AsyncJobVO {id:27,
>> > userId:
>> > 2, accountId: 2, instanceType: None, instanceId: null, cmd:
>> > 
>> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd,
>> > cmdInfo: {"response":"json","ctxDetails":"{\"interface
>> > 
>> com.cloud.dc.Pod\":\"b1c7836b-3bae-4d83-b113-b8308cea57ab\"}","cmdEventType":"STORAGE.IP.RANGE.CREATE","ctxUserId":"2","gateway":"192.168.200.67","podid":"b1c7836b-3bae-4d83-b113-b8308cea57ab","zoneid":"d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63","startip":"192.168.200.200","vlan":"123","httpmethod":"GET","_":"1443465255029","ctxAccountId":"2","ctxStartEventId":"68","netmask":"255.255.255.0","endip":"192.168.200.222"},
>> > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> > result:
>> > null, initMsid: 266785867798693, completeMsid: null, lastUpdated: null,
>> > lastPolled: null, created: null}
>> > 2015-09-28 19:34:15,073 DEBUG [c.c.a.ApiServlet]
>> > (catalina-exec-24:ctx-def815dc
>> > ctx-c4ca8865) ===END=== 85.13.192.198 -- GET
>> > 
>> command=createStorageNetworkIpRange&response=json&gateway=192.168.200.67&netmask=255.255.255.0&vlan=123&startip=192.168.200.200&endip=192.168.200.222&zoneid=d08602b2-2ec6-4fd0-9dbb-5eca2d9b7c63&podid=b1c7836b-3bae-4d83-b113-b8308cea57ab&_=1443465255029
>> > 2015-09-28 19:34:15,075 WARN [c.c.a.d.ParamGenericValidationWorker]
>> > (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Received unknown
>> > parameters for command createStorageNetworkIpRange. Unknown parameters 
>> :
>> > zoneid
>> > 2015-09-28 19:34:15,122 WARN
>> > [o.a.c.a.c.a.n.CreateStorageNetworkIpRangeCmd]
>> > (API-Job-Executor-25:ctx-e08dd156 job-27 ctx-4198e321) Create storage
>> > network
>> > IP range failed
>> > com.cloud.utils.exception.CloudRuntimeException: Unable to commit or
>> > close the
>> > connection.
>> > at
>> > com.cloud.utils.db.TransactionLegacy.commit(TransactionLegacy.java:730)
>> > at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
>> > at
>> > 
>> com.cloud.network.StorageNetworkManagerImpl.createIpRange(StorageNetworkManagerImpl.java:229)
>> > at
>> > 
>> org.apache.cloudstack.api.command.admin.network.CreateStorageNetworkIpRangeCmd.execute(CreateStorageNetworkIpRangeCmd.java:118)
>> > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
>> > at
>> > 
>> com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
>> > at
>> > 
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
>> > at
>> > 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>> > at
>> > 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>> > at
>> > 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>> > at
>> > 
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>> > at
>> > 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>> > at
>> > 
>> org.apache.cloud

Jenkins - slaves reporting "out of space"

2015-09-28 Thread Raja Pullela
Hi,

Following is the status of machines on http://jenkins.buildacloud.org/


* cca-slave-01 - Dead

* cca-slave-02 - Dead

* cca-slave-03 - Dead

* coohq-slave-01 - Dead

* msaz-slave-01 - Dead

* msaz-slave-02 - Dead

* test-infra-common - Dead

can someone take a look at freeing up some space on these machines ?
Also, is there a list (of people) on CWIKI who has permissions to do this ?

best,
Raja

java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:316)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
at hudson.util.AtomicFileWriter.close(AtomicFileWriter.java:94)
at hudson.util.AtomicFileWriter.commit(AtomicFileWriter.java:109)
at hudson.util.TextFile.write(TextFile.java:121)
at hudson.model.Job.saveNextBuildNumber(Job.java:274)
at hudson.model.Job.assignBuildNumber(Job.java:332)
at hudson.model.Run.(Run.java:286)
at hudson.model.AbstractBuild.(AbstractBuild.java:167)
at hudson.model.Build.(Build.java:92)
at hudson.model.FreeStyleBuild.(FreeStyleBuild.java:34)
at sun.reflect.GeneratedConstructorAccessor9915.newInstance(Unknown 
Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance



[GitHub] cloudstack pull request: CLOUDSTACK-8856 Primary Storage Used(type...

2015-09-28 Thread bvbharatk
Github user bvbharatk commented on the pull request:

https://github.com/apache/cloudstack/pull/865#issuecomment-143944642
  
The above travis error is because of a failure to do git pull below are the 
relevant logs.


system_info
Build system information
Build language: java
Build image provisioning date and time
Wed Feb  4 18:22:50 UTC 2015
Operating System Details
Distributor ID: Ubuntu
Description:Ubuntu 12.04 LTS
Release:12.04
Codename:   precise
Linux Version
2.6.32-042stab090.5
Cookbooks Version
23bb455 https://github.com/travis-ci/travis-cookbooks/tree/23bb455
GCC version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
LLVM version
clang version 3.4 (tags/RELEASE_34/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Pre-installed Ruby versions
ruby-1.9.3-p551
Pre-installed Node.js versions
v0.10.36
Pre-installed Go versions
1.4.1
Redis version
redis-server 2.8.19
riak version
2.0.2
MongoDB version
MongoDB 2.4.12
CouchDB version
couchdb 1.6.1
Neo4j version
1.9.4
Cassandra version
2.0.9
RabbitMQ Version
3.4.3
ElasticSearch version
1.4.0
Installed Sphinx versions
2.0.10
2.1.9
2.2.6
Default Sphinx version
2.2.6
Installed Firefox version
firefox 31.0esr
PhantomJS version
1.9.8
ant -version
Apache Ant(TM) version 1.8.2 compiled on December 3 2011
mvn -version
Apache Maven 3.2.5 (12a6b3acb947671f09b81f49094c53f426d8cea1; 
2014-12-14T17:29:23+00:00)
Maven home: /usr/local/maven
Java version: 1.7.0_76, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-oracle/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-042stab090.5", arch: "amd64", family: 
"unix"
75.16s$ git clone --depth=50 https://github.com/apache/cloudstack.git 
apache/cloudstack
Cloning into 'apache/cloudstack'...
fatal: unable to access 'https://github.com/apache/cloudstack.git/': 
Couldn't resolve host 'github.com'
The command "eval git clone --depth=50 
https://github.com/apache/cloudstack.git apache/cloudstack" failed. Retrying, 2 
of 3.
Cloning into 'apache/cloudstack'...
fatal: unable to access 'https://github.com/apache/cloudstack.git/': 
Couldn't resolve host 'github.com'
The command "eval git clone --depth=50 
https://github.com/apache/cloudstack.git apache/cloudstack" failed. Retrying, 3 
of 3.
Cloning into 'apache/cloudstack'...
fatal: unable to access 'https://github.com/apache/cloudstack.git/': 
Couldn't resolve host 'github.com'
The command "eval git clone --depth=50 
https://github.com/apache/cloudstack.git apache/cloudstack" failed 3 times.
The command "git clone --depth=50 https://github.com/apache/cloudstack.git 
apache/cloudstack" failed and exited with 128 during .
Your build has been stopped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


CloudStack networking documentation

2015-09-28 Thread Ron Wheeler

http://docs.cloudstack.apache.org/en/master/concepts.html#what-is-apache-cloudstack

In the opening paragraph of About Physical Networks" it says
"The network corresponds to a NIC on the hypervisor host."

In Basic Zone Network Traffic Types it says that there is only one 
physical network in the zone but later on in the middle of discussing 
the various traffic types there is a note
"We strongly recommend the use of separate NICs for management traffic 
and guest traffic".

There is no reason given for this statement.
No suggestion about what tradeoff is being made if you go with a single 
NIC.

Is it performance or security concerns that prompts this?

It might be helpful to describe what is meant by a single physical 
network with multiple NICs.


The note itself appears to be out of place since it is in the middle of 
some definitions rather that in a discussion block.
It also looks like it applies to Advanced Networks but is missing in 
that section.


"CIDR of the pod" is used without any description of what this is and 
how it gets setup.

Might be helpful to add a sentence or two about this. It seems important.

"guest virtual router" is another concept that seems important but has 
no definition or discussion.


Ron

--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



[GitHub] cloudstack pull request: CLOUDSTACK-89027 Restart Network fails in...

2015-09-28 Thread bvbharatk
GitHub user bvbharatk opened a pull request:

https://github.com/apache/cloudstack/pull/898

CLOUDSTACK-89027 Restart Network fails in EIP/ELB zone

The restart network was failing when using external loadbalencer. The 
failure was because of a number format exception. When 
BroadcastDomainType.getValue(guestConfig.getBroadcastUri() is executed this 
returns a string untagged. We were trying to parse this as long so there was a 
number pointer exception.

This happens only when the vlan uri is vlan://untagged. in other cases were 
there is a number instead of untagged (vlan tag) this used to succeed. Although 
we were trying to convert the number to long we were not really using it. we 
were converting the number to long and then back to string when creating the 
IpAddressTo. so I removed this unnecessary conversion in this case for fixing 
the issue at hand.


I did a manual restart of the network and checked for this number format 
exception in a EIP/ELB setup.

 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bvbharatk/cloudstack CLOUDSTACK-8902

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #898


commit 3c61746dcec5025d10fda0f85c9157f69f58b15d
Author: Bharat Kumar 
Date:   2015-09-23T08:35:38Z

CLOUDSTACK-89027 Restart Network fails in EIP/ELB zone




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


OVS Plugin documentation issues

2015-09-28 Thread Ron Wheeler
On the page 
http://docs.cloudstack.apache.org/en/master/networking/ovs-plugin.html

"Configuring the OVS Plugin"
starts with
"Prerequisites"
which opens with this sentence
"Before enabling the OVS plugin the hypervisor needs to be install 
OpenvSwitch."

which needs to be fixed.
Does "Before enabling the OVS plugin, OpenvSwitch must be installed on  
the hypervisor ." express the idea correctly.


"Default, XenServer has already installed OpenvSwitch." might be trying 
to say "Xenserver has OpenvSwitch installed by default."


Ron

--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



OVS documentation

2015-09-28 Thread Ron Wheeler
http://docs.cloudstack.apache.org/en/master/networking/ovs-plugin.html 
has the following sentence:
"CentOS 6.4 and OpenvSwitch 1.10 are recommended." These seem to be very 
old. Are they really the recommended versions?


Ron

--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102



Re: OVS documentation

2015-09-28 Thread Nguyen Anh Tu
Dear Ron,

It's old I believe.

--Tuna

On Tue, Sep 29, 2015 at 1:05 PM, Ron Wheeler  wrote:

> http://docs.cloudstack.apache.org/en/master/networking/ovs-plugin.html
> has the following sentence:
> "CentOS 6.4 and OpenvSwitch 1.10 are recommended." These seem to be very
> old. Are they really the recommended versions?
>
> Ron
>
> --
> Ron Wheeler
> President
> Artifact Software Inc
> email: rwhee...@artifact-software.com
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>
>


Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen

> On Sep 28, 2015, at 7:22 AM, Sanjeev N  wrote:
> 
> I have a concern here. Some of us are actively involved in reviewing the
> PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
> we have to test a PR it requires an environment to be created with actual
> resources and this is going to take lot of time. Some of the tests can run
> on simulator but most of the tests require real hardware to test. PR
> submitter is already testing and submitting the test results along with the
> PR.

In lots of cases we don’t see those test results. 
We should make sure we do or at a minimum explain what tests we did.

> So is it require to test these PRs by reviewers?
> 

If you LGTM a PR, explain why and what tests we did.
Just “LGTM” is not enough

> On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
> 
>> Remi, thanks for the detailed post-mortem, it's a good read and great
>> learning.
>> I hope everyone reads it.
>> 
>> The one thing to emphasize is that we now have a very visible way to get
>> code into master, we have folks investing time to provide review (great),
>> we need the submitters to make due diligence and answer all comments in the
>> reviews.
>> 
>> In another project i work on, nothing can be added to the code without
>> unit tests. I think we could go down the route of asking for new
>> integration tests and unit tests for anything. If not, the PR does not get
>> merged. But let's digest your post-mortem and we can discuss after 4.6.0.
>> 
>> I see that you reverted one commit that was not made by you, that's great.
>> 
>> Let's focus on the blockers now, everything else can wait.
>> 
>> The big bonus of doing what we are doing is that once 4.6.0 is out, we can
>> merge PRs again (assuming they are properly rebased and tested) and we can
>> release 4.6.1 really quickly after.
>> 
>> -sebastien
>> 
>> On Sep 25, 2015, at 9:51 PM, Remi Bergsma 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> This mail is intended to be blameless. We need to learn something from
>> it. That's why I left out who exactly did what because it’s not relevant.
>> There are multiple examples but it's about the why. Let's learn from this
>> without blaming anyone.
>>> 
>>> We know we need automated testing. We have integration tests, but we are
>> unable to run all of them on any Pull Request we receive. If we would have
>> that in place, it'd be much easier to spot errors, regression and so on.
>> It'd also be more rewarding to write more tests.
>>> 
>>> Unfortunately we're not there yet. So, we need to do something else
>> instead until we get there. If we do nothing, we know we have many issues
>> because a master that breaks on a regular basis is the most frustrating
>> things. We said we'd use Pull Requests with at least two humans to review
>> and give their OK for a Pull Request. In the form of LGTM: Looks Good To
>> Me. Ok, so the LGTMs are there because we have no automated testing. Keep
>> that in mind. You are supposed to replace automated testing until it's
>> there.
>>> 
>>> Since we do this, master got a lot more stable. But every now and then
>> we still have issues. Let's look at how we do manual reviews. Again, this
>> is not to blame anyone. It's to open our eyes and make us realise what
>> we're doing and what results we get out of that.
>>> 
>>> 
>>> Example Pull Request #784:
>>> Title: CLOUDSTACK-8799 fixed the default routes
>>> 
>>> That's nice, it has a Jira id and a short description (as it should be).
>>> 
>>> The first person comes along and makes a comment:
>>> "There was also an issue with VPC VRs" ... "Have you seen this issue?
>> Does your change affects the VPC VR (single/redundant)?"
>>> 
>>> Actually a good question. Unfortunaly there comes no answer. After a
>> reminder, it was promised to do tests against VPC networks. Great!
>>> 
>>> The Jenkins builds both succeed and also Travis is green. But how much
>> value does this have? They have the impression to do automated testing, and
>> although you could argue they do, it's far from complete. If it breaks, you
>> know you have an issue. But it doesn’t work the other way around.
>>> 
>>> Back to our example PR. In the mean time, another commit gets pushed to
>> it: "CLOUDSTACK-8799 fixed for vpc networks." But if you look at the Jira
>> issue, you see it is about redundant virtual routers. The non-VPC ones. So
>> this is vague at best. But a reviewer gives a LGTM because the person could
>> create a VPC. That doesn't have anything to do with the problem being fixed
>> in this PR nor with the comments made earlier. But, at least the person
>> said what he did and we should all do that. What nobody knew back then, was
>> that this broke the default route on VPCs.
>>> 
>>> Then something strange happens: the two commits from the PR end up on
>> master as direct commits. With just one LGTM and no verification from the
>> person commenting about the linked issue. This happened on Friday September
>> 11th.
>>> 
>>> That day 21 commits came i

Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Remi,

Thank you for the Blame less postmortem. 

I think there is a bigger problem here than just the review process and running 
tests. Even if we run the tests we cannot be sure that every thing will work as 
intended. The tests will only give some level of confidence. The tests may not 
cover all the use cases.

I think the problem here is that the way major changes to the code base are 
dealt with. For example,  VR refactoring was done without discussing the design 
implications and the amount of changes it would bring in. I could not find any 
design document. The vr refactor changed a lot of code and the way VR used to 
work and in my opinion it was incomplete-vpn, isolated networks, basic 
networks, iptable rules and rvr in isolated case etc were not implemented. Most 
of us are still in the process of understanding this. Even before reaching this 
state we had to spend a lot of time fixing issues mentioned in the thread 
[Blocker/Critical] VR related Issues.  

When a change of this magnitude is being made, we should call out all the 
changes and document them properly. This will help people to create better 
fixes. Currently when we attempt to fix the isolated vr case it is effecting 
the vpc and vice versa. for example pr 738 fixed it for vpc networks but broke 
it for isolated case. I believe it is not too late to at least start 
documenting the changes now.

Thanks,
Bharat.

On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:

> I have a concern here. Some of us are actively involved in reviewing the
> PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
> we have to test a PR it requires an environment to be created with actual
> resources and this is going to take lot of time. Some of the tests can run
> on simulator but most of the tests require real hardware to test. PR
> submitter is already testing and submitting the test results along with the
> PR. So is it require to test these PRs by reviewers?
> 
> On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
> 
>> Remi, thanks for the detailed post-mortem, it's a good read and great
>> learning.
>> I hope everyone reads it.
>> 
>> The one thing to emphasize is that we now have a very visible way to get
>> code into master, we have folks investing time to provide review (great),
>> we need the submitters to make due diligence and answer all comments in the
>> reviews.
>> 
>> In another project i work on, nothing can be added to the code without
>> unit tests. I think we could go down the route of asking for new
>> integration tests and unit tests for anything. If not, the PR does not get
>> merged. But let's digest your post-mortem and we can discuss after 4.6.0.
>> 
>> I see that you reverted one commit that was not made by you, that's great.
>> 
>> Let's focus on the blockers now, everything else can wait.
>> 
>> The big bonus of doing what we are doing is that once 4.6.0 is out, we can
>> merge PRs again (assuming they are properly rebased and tested) and we can
>> release 4.6.1 really quickly after.
>> 
>> -sebastien
>> 
>> On Sep 25, 2015, at 9:51 PM, Remi Bergsma 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> This mail is intended to be blameless. We need to learn something from
>> it. That's why I left out who exactly did what because it’s not relevant.
>> There are multiple examples but it's about the why. Let's learn from this
>> without blaming anyone.
>>> 
>>> We know we need automated testing. We have integration tests, but we are
>> unable to run all of them on any Pull Request we receive. If we would have
>> that in place, it'd be much easier to spot errors, regression and so on.
>> It'd also be more rewarding to write more tests.
>>> 
>>> Unfortunately we're not there yet. So, we need to do something else
>> instead until we get there. If we do nothing, we know we have many issues
>> because a master that breaks on a regular basis is the most frustrating
>> things. We said we'd use Pull Requests with at least two humans to review
>> and give their OK for a Pull Request. In the form of LGTM: Looks Good To
>> Me. Ok, so the LGTMs are there because we have no automated testing. Keep
>> that in mind. You are supposed to replace automated testing until it's
>> there.
>>> 
>>> Since we do this, master got a lot more stable. But every now and then
>> we still have issues. Let's look at how we do manual reviews. Again, this
>> is not to blame anyone. It's to open our eyes and make us realise what
>> we're doing and what results we get out of that.
>>> 
>>> 
>>> Example Pull Request #784:
>>> Title: CLOUDSTACK-8799 fixed the default routes
>>> 
>>> That's nice, it has a Jira id and a short description (as it should be).
>>> 
>>> The first person comes along and makes a comment:
>>> "There was also an issue with VPC VRs" ... "Have you seen this issue?
>> Does your change affects the VPC VR (single/redundant)?"
>>> 
>>> Actually a good question. Unfortunaly there comes no answer. After a
>> reminder, it was promised to do tests again

[GitHub] cloudstack pull request: CLOUDSTACK-8793 Enable s2s VPN connection...

2015-09-28 Thread wilderrodrigues
Github user wilderrodrigues commented on the pull request:

https://github.com/apache/cloudstack/pull/879#issuecomment-143670869
  
Hi @pdion891 ,

But the PR is going against Master, so testing it against 4.5 only and 
giving a LGTM is a bit pointless.

Could you please execute tests agains master as well and put the result 
here?

I can also help with testing, but for that to happen @pdube has to write 
some details on his PR and let us know which steps to follow in order to test 
it.

Cheers,
Wilder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Remi Bergsma
Hi Bharat,

There is no bigger problem. We should always run the tests and if we find a 
case that isn’t currently covered by the tests we should simply add tests for 
it. There’s no way we’ll get a stable master without them. The fact that they 
may not cover everything, is no reason to not rely on them. If a feature is not 
important enough to write a test for, then the feature is probably not 
important anyway. And if it is, then add a test :-)

I do agree on the design documentation requirement for any (major?) change. I 
found some design documentations on the subject you mention, but it should have 
been more detailed. 

Regards,
Remi






On 28/09/15 09:58, "Bharat Kumar"  wrote:

>Hi Remi,
>
>Thank you for the Blame less postmortem. 
>
>I think there is a bigger problem here than just the review process and 
>running tests. Even if we run the tests we cannot be sure that every thing 
>will work as intended. The tests will only give some level of confidence. The 
>tests may not cover all the use cases.
>
>I think the problem here is that the way major changes to the code base are 
>dealt with. For example,  VR refactoring was done without discussing the 
>design implications and the amount of changes it would bring in. I could not 
>find any design document. The vr refactor changed a lot of code and the way VR 
>used to work and in my opinion it was incomplete-vpn, isolated networks, basic 
>networks, iptable rules and rvr in isolated case etc were not implemented. 
>Most of us are still in the process of understanding this. Even before 
>reaching this state we had to spend a lot of time fixing issues mentioned in 
>the thread [Blocker/Critical] VR related Issues.  
>
>When a change of this magnitude is being made, we should call out all the 
>changes and document them properly. This will help people to create better 
>fixes. Currently when we attempt to fix the isolated vr case it is effecting 
>the vpc and vice versa. for example pr 738 fixed it for vpc networks but broke 
>it for isolated case. I believe it is not too late to at least start 
>documenting the changes now.
>
>Thanks,
>Bharat.
>
>On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:
>
>> I have a concern here. Some of us are actively involved in reviewing the
>> PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
>> we have to test a PR it requires an environment to be created with actual
>> resources and this is going to take lot of time. Some of the tests can run
>> on simulator but most of the tests require real hardware to test. PR
>> submitter is already testing and submitting the test results along with the
>> PR. So is it require to test these PRs by reviewers?
>> 
>> On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
>> 
>>> Remi, thanks for the detailed post-mortem, it's a good read and great
>>> learning.
>>> I hope everyone reads it.
>>> 
>>> The one thing to emphasize is that we now have a very visible way to get
>>> code into master, we have folks investing time to provide review (great),
>>> we need the submitters to make due diligence and answer all comments in the
>>> reviews.
>>> 
>>> In another project i work on, nothing can be added to the code without
>>> unit tests. I think we could go down the route of asking for new
>>> integration tests and unit tests for anything. If not, the PR does not get
>>> merged. But let's digest your post-mortem and we can discuss after 4.6.0.
>>> 
>>> I see that you reverted one commit that was not made by you, that's great.
>>> 
>>> Let's focus on the blockers now, everything else can wait.
>>> 
>>> The big bonus of doing what we are doing is that once 4.6.0 is out, we can
>>> merge PRs again (assuming they are properly rebased and tested) and we can
>>> release 4.6.1 really quickly after.
>>> 
>>> -sebastien
>>> 
>>> On Sep 25, 2015, at 9:51 PM, Remi Bergsma 
>>> wrote:
>>> 
 Hi all,
 
 This mail is intended to be blameless. We need to learn something from
>>> it. That's why I left out who exactly did what because it’s not relevant.
>>> There are multiple examples but it's about the why. Let's learn from this
>>> without blaming anyone.
 
 We know we need automated testing. We have integration tests, but we are
>>> unable to run all of them on any Pull Request we receive. If we would have
>>> that in place, it'd be much easier to spot errors, regression and so on.
>>> It'd also be more rewarding to write more tests.
 
 Unfortunately we're not there yet. So, we need to do something else
>>> instead until we get there. If we do nothing, we know we have many issues
>>> because a master that breaks on a regular basis is the most frustrating
>>> things. We said we'd use Pull Requests with at least two humans to review
>>> and give their OK for a Pull Request. In the form of LGTM: Looks Good To
>>> Me. Ok, so the LGTMs are there because we have no automated testing. Keep
>>> that in mind. You are supposed to replace automated testing until it's

Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Remi,

I never intended to say that we should not run tests, but even before tests we 
should have proper documentation. My concern was if a major change is being 
introduced it should be properly documented. All the issues which we are trying 
to fix are majorly due to VR refactor. If there was a proper documentation for 
this we could
have fixed this in a better way.  Even to add tests we need to understand how a 
particular thing works and what data dose it expect. I think while fixing the 
python based code changes this is where most of the people are facing issues. A 
proper documentation will help in understanding these in a better way.

Thanks,
Bharat.

On 28-Sep-2015, at 1:57 pm, Remi Bergsma  wrote:

> Hi Bharat,
> 
> There is no bigger problem. We should always run the tests and if we find a 
> case that isn’t currently covered by the tests we should simply add tests for 
> it. There’s no way we’ll get a stable master without them. The fact that they 
> may not cover everything, is no reason to not rely on them. If a feature is 
> not important enough to write a test for, then the feature is probably not 
> important anyway. And if it is, then add a test :-)
> 
> I do agree on the design documentation requirement for any (major?) change. I 
> found some design documentations on the subject you mention, but it should 
> have been more detailed. 
> 
> Regards,
> Remi
> 
> 
> 
> 
> 
> 
> On 28/09/15 09:58, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> Thank you for the Blame less postmortem. 
>> 
>> I think there is a bigger problem here than just the review process and 
>> running tests. Even if we run the tests we cannot be sure that every thing 
>> will work as intended. The tests will only give some level of confidence. 
>> The tests may not cover all the use cases.
>> 
>> I think the problem here is that the way major changes to the code base are 
>> dealt with. For example,  VR refactoring was done without discussing the 
>> design implications and the amount of changes it would bring in. I could not 
>> find any design document. The vr refactor changed a lot of code and the way 
>> VR used to work and in my opinion it was incomplete-vpn, isolated networks, 
>> basic networks, iptable rules and rvr in isolated case etc were not 
>> implemented. Most of us are still in the process of understanding this. Even 
>> before reaching this state we had to spend a lot of time fixing issues 
>> mentioned in the thread [Blocker/Critical] VR related Issues.  
>> 
>> When a change of this magnitude is being made, we should call out all the 
>> changes and document them properly. This will help people to create better 
>> fixes. Currently when we attempt to fix the isolated vr case it is effecting 
>> the vpc and vice versa. for example pr 738 fixed it for vpc networks but 
>> broke it for isolated case. I believe it is not too late to at least start 
>> documenting the changes now.
>> 
>> Thanks,
>> Bharat.
>> 
>> On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:
>> 
>>> I have a concern here. Some of us are actively involved in reviewing the
>>> PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
>>> we have to test a PR it requires an environment to be created with actual
>>> resources and this is going to take lot of time. Some of the tests can run
>>> on simulator but most of the tests require real hardware to test. PR
>>> submitter is already testing and submitting the test results along with the
>>> PR. So is it require to test these PRs by reviewers?
>>> 
>>> On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
>>> 
 Remi, thanks for the detailed post-mortem, it's a good read and great
 learning.
 I hope everyone reads it.
 
 The one thing to emphasize is that we now have a very visible way to get
 code into master, we have folks investing time to provide review (great),
 we need the submitters to make due diligence and answer all comments in the
 reviews.
 
 In another project i work on, nothing can be added to the code without
 unit tests. I think we could go down the route of asking for new
 integration tests and unit tests for anything. If not, the PR does not get
 merged. But let's digest your post-mortem and we can discuss after 4.6.0.
 
 I see that you reverted one commit that was not made by you, that's great.
 
 Let's focus on the blockers now, everything else can wait.
 
 The big bonus of doing what we are doing is that once 4.6.0 is out, we can
 merge PRs again (assuming they are properly rebased and tested) and we can
 release 4.6.1 really quickly after.
 
 -sebastien
 
 On Sep 25, 2015, at 9:51 PM, Remi Bergsma 
 wrote:
 
> Hi all,
> 
> This mail is intended to be blameless. We need to learn something from
 it. That's why I left out who exactly did what because it’s not relevant.
 There are multiple examples but it's about the why. Let's learn 

[GitHub] cloudstack pull request: CLOUDSTACK-8848: ensure power state is up...

2015-09-28 Thread remibergsma
Github user remibergsma commented on the pull request:

https://github.com/apache/cloudstack/pull/885#issuecomment-143685748
  
@resmo Thanks! I'll run some tests today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Remi Bergsma
Hi Bharat,

I understand your frustrations but we already agreed on this so no need to 
repeat. This thread is supposed to list some improvements and learn from it. 
Your point has been taken so let’s move on.

We need documentation first, then do a change after which all tests should 
pass. Even better is we write (missing) tests before changing stuff so you know 
they pass before and after the fact. 

When doing reviews, feel free to ask for design documentation if you feel it is 
needed.

Regards, Remi



On 28/09/15 11:02, "Bharat Kumar"  wrote:

>Hi Remi,
>
>I never intended to say that we should not run tests, but even before tests we 
>should have proper documentation. My concern was if a major change is being 
>introduced it should be properly documented. All the issues which we are 
>trying to fix are majorly due to VR refactor. If there was a proper 
>documentation for this we could
>have fixed this in a better way.  Even to add tests we need to understand how 
>a particular thing works and what data dose it expect. I think while fixing 
>the python based code changes this is where most of the people are facing 
>issues. A proper documentation will help in understanding these in a better 
>way.
>
>Thanks,
>Bharat.
>
>On 28-Sep-2015, at 1:57 pm, Remi Bergsma  wrote:
>
>> Hi Bharat,
>> 
>> There is no bigger problem. We should always run the tests and if we find a 
>> case that isn’t currently covered by the tests we should simply add tests 
>> for it. There’s no way we’ll get a stable master without them. The fact that 
>> they may not cover everything, is no reason to not rely on them. If a 
>> feature is not important enough to write a test for, then the feature is 
>> probably not important anyway. And if it is, then add a test :-)
>> 
>> I do agree on the design documentation requirement for any (major?) change. 
>> I found some design documentations on the subject you mention, but it should 
>> have been more detailed. 
>> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 09:58, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> Thank you for the Blame less postmortem. 
>>> 
>>> I think there is a bigger problem here than just the review process and 
>>> running tests. Even if we run the tests we cannot be sure that every thing 
>>> will work as intended. The tests will only give some level of confidence. 
>>> The tests may not cover all the use cases.
>>> 
>>> I think the problem here is that the way major changes to the code base are 
>>> dealt with. For example,  VR refactoring was done without discussing the 
>>> design implications and the amount of changes it would bring in. I could 
>>> not find any design document. The vr refactor changed a lot of code and the 
>>> way VR used to work and in my opinion it was incomplete-vpn, isolated 
>>> networks, basic networks, iptable rules and rvr in isolated case etc were 
>>> not implemented. Most of us are still in the process of understanding this. 
>>> Even before reaching this state we had to spend a lot of time fixing issues 
>>> mentioned in the thread [Blocker/Critical] VR related Issues.  
>>> 
>>> When a change of this magnitude is being made, we should call out all the 
>>> changes and document them properly. This will help people to create better 
>>> fixes. Currently when we attempt to fix the isolated vr case it is 
>>> effecting the vpc and vice versa. for example pr 738 fixed it for vpc 
>>> networks but broke it for isolated case. I believe it is not too late to at 
>>> least start documenting the changes now.
>>> 
>>> Thanks,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:
>>> 
 I have a concern here. Some of us are actively involved in reviewing the
 PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
 we have to test a PR it requires an environment to be created with actual
 resources and this is going to take lot of time. Some of the tests can run
 on simulator but most of the tests require real hardware to test. PR
 submitter is already testing and submitting the test results along with the
 PR. So is it require to test these PRs by reviewers?
 
 On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
 
> Remi, thanks for the detailed post-mortem, it's a good read and great
> learning.
> I hope everyone reads it.
> 
> The one thing to emphasize is that we now have a very visible way to get
> code into master, we have folks investing time to provide review (great),
> we need the submitters to make due diligence and answer all comments in 
> the
> reviews.
> 
> In another project i work on, nothing can be added to the code without
> unit tests. I think we could go down the route of asking for new
> integration tests and unit tests for anything. If not, the PR does not get
> merged. But let's digest your post-mortem and we can discuss after 4.6.0.
> 
> I see that you reverted one co

[GitHub] cloudstack pull request: [4.6]CLOUDSTACK-8912: Fixed listGuestOsMa...

2015-09-28 Thread remibergsma
Github user remibergsma commented on the pull request:

https://github.com/apache/cloudstack/pull/890#issuecomment-143689040
  
Thanks @borisroman I'll run soms tests today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8901: PrepareTemplate job thre...

2015-09-28 Thread koushik-das
Github user koushik-das commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/880#discussion_r40532523
  
--- Diff: server/src/com/cloud/configuration/Config.java ---
@@ -1999,7 +1999,9 @@
 // StatsCollector
 StatsOutPutGraphiteHost("Advanced", ManagementServer.class, 
String.class, "stats.output.uri", "", "URI to additionally send StatsCollector 
statistics to", null),
 
-SSVMPSK("Hidden", ManagementServer.class, String.class, 
"upload.post.secret.key", "", "PSK with SSVM", null);
+SSVMPSK("Hidden", ManagementServer.class, String.class, 
"upload.post.secret.key", "", "PSK with SSVM", null),
+
+TemplatePreloaderPoolSize("Advanced", TemplateManager.class, 
Integer.class, "template.preloader.pool.size", "8", "Size of the 
TemplateManager threadpool", null);
--- End diff --

Use the mechanism described here to add a new configuration. 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Bharat Kumar
Hi Remi,

 i do not agree with “There is no bigger problem”  part of your reply. so I had 
to repeat myself to make it more clear, Not because i am not aware of what this 
thread is supposed to do.
 
Regards,
Bharat.

On 28-Sep-2015, at 2:51 pm, Remi Bergsma  wrote:

> Hi Bharat,
> 
> I understand your frustrations but we already agreed on this so no need to 
> repeat. This thread is supposed to list some improvements and learn from it. 
> Your point has been taken so let’s move on.
> 
> We need documentation first, then do a change after which all tests should 
> pass. Even better is we write (missing) tests before changing stuff so you 
> know they pass before and after the fact. 
> 
> When doing reviews, feel free to ask for design documentation if you feel it 
> is needed.
> 
> Regards, Remi
> 
> 
> 
> On 28/09/15 11:02, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> I never intended to say that we should not run tests, but even before tests 
>> we should have proper documentation. My concern was if a major change is 
>> being introduced it should be properly documented. All the issues which we 
>> are trying to fix are majorly due to VR refactor. If there was a proper 
>> documentation for this we could
>> have fixed this in a better way.  Even to add tests we need to understand 
>> how a particular thing works and what data dose it expect. I think while 
>> fixing the python based code changes this is where most of the people are 
>> facing issues. A proper documentation will help in understanding these in a 
>> better way.
>> 
>> Thanks,
>> Bharat.
>> 
>> On 28-Sep-2015, at 1:57 pm, Remi Bergsma  wrote:
>> 
>>> Hi Bharat,
>>> 
>>> There is no bigger problem. We should always run the tests and if we find a 
>>> case that isn’t currently covered by the tests we should simply add tests 
>>> for it. There’s no way we’ll get a stable master without them. The fact 
>>> that they may not cover everything, is no reason to not rely on them. If a 
>>> feature is not important enough to write a test for, then the feature is 
>>> probably not important anyway. And if it is, then add a test :-)
>>> 
>>> I do agree on the design documentation requirement for any (major?) change. 
>>> I found some design documentations on the subject you mention, but it 
>>> should have been more detailed. 
>>> 
>>> Regards,
>>> Remi
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 28/09/15 09:58, "Bharat Kumar"  wrote:
>>> 
 Hi Remi,
 
 Thank you for the Blame less postmortem. 
 
 I think there is a bigger problem here than just the review process and 
 running tests. Even if we run the tests we cannot be sure that every thing 
 will work as intended. The tests will only give some level of confidence. 
 The tests may not cover all the use cases.
 
 I think the problem here is that the way major changes to the code base 
 are dealt with. For example,  VR refactoring was done without discussing 
 the design implications and the amount of changes it would bring in. I 
 could not find any design document. The vr refactor changed a lot of code 
 and the way VR used to work and in my opinion it was incomplete-vpn, 
 isolated networks, basic networks, iptable rules and rvr in isolated case 
 etc were not implemented. Most of us are still in the process of 
 understanding this. Even before reaching this state we had to spend a lot 
 of time fixing issues mentioned in the thread [Blocker/Critical] VR 
 related Issues.  
 
 When a change of this magnitude is being made, we should call out all the 
 changes and document them properly. This will help people to create better 
 fixes. Currently when we attempt to fix the isolated vr case it is 
 effecting the vpc and vice versa. for example pr 738 fixed it for vpc 
 networks but broke it for isolated case. I believe it is not too late to 
 at least start documenting the changes now.
 
 Thanks,
 Bharat.
 
 On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:
 
> I have a concern here. Some of us are actively involved in reviewing the
> PRs related to marvin tests(Enhancing existing tests/Adding new tests). If
> we have to test a PR it requires an environment to be created with actual
> resources and this is going to take lot of time. Some of the tests can run
> on simulator but most of the tests require real hardware to test. PR
> submitter is already testing and submitting the test results along with 
> the
> PR. So is it require to test these PRs by reviewers?
> 
> On Sat, Sep 26, 2015 at 1:49 PM, sebgoa  wrote:
> 
>> Remi, thanks for the detailed post-mortem, it's a good read and great
>> learning.
>> I hope everyone reads it.
>> 
>> The one thing to emphasize is that we now have a very visible way to get
>> code into master, we have folks investing time to provide review (great),
>> we need the submitters to make 

[GitHub] cloudstack pull request: CLOUDSTACK-8917 : Instance tab takes long...

2015-09-28 Thread sudhansu7
GitHub user sudhansu7 opened a pull request:

https://github.com/apache/cloudstack/pull/894

CLOUDSTACK-8917 : Instance tab takes long time to load with 12K Vms

modified sql that is used for retrieving vm count .

In load test environment listVirtualmachine takes 8-11 sec to load. This 
environment has around 12k active VMs. Total number of rows is 190K.

Performance bottleneck in listVirtualmachine command is fetching the count 
and distinct vms.
{noformat}
// search vm details by ids
Pair, Integer> uniqueVmPair = 
_userVmJoinDao.searchAndCount(sc, searchFilter);
Integer count = uniqueVmPair.second();
{noformat}
 
 This takes 95% of the total time.

To fetch the count and distinct vms we are using below sqls.
 
 Query 1: 
{noformat}
SELECT DISTINCT(user_vm_view.id) FROM user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL  ORDER BY user_vm_view.id ASC  LIMIT 0, 20
 {noformat}

 Query 2: 

select count(distinct id) from user_vm_view WHERE user_vm_view.account_type 
!= 5  AND user_vm_view.display_vm = 1  AND user_vm_view.removed IS NULL


Query 2 is a problematic query. 

If we rewrite the query as mentioned below then it will be ~2x faster.

select count(*) from (select distinct id from user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL) as temp;


Mysql Test result:

With 134 active Vms (total rows 349)
mysql> select count(*) from vm_instance;
+--+
| count(*) |
+--+
|  349 |
+--+
1 row in set (0.00 sec)
mysql> select count(*) from user_vm_view;
+--+
| count(*) |
+--+
|  135 |
+--+
1 row in set (0.02 sec)
mysql> select count(distinct id) from user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL;
++
| count(distinct id) |
++
|134 |
++
1 row in set (0.02 sec)

mysql> select count(*) from (select distinct id from user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL) as temp;
+--+
| count(*) |
+--+
|  134 |
+--+
1 row in set (0.01 sec)


With 14326 active Vms (total rows 195660)

mysql> select count(*) from vm_instance;
+--+
| count(*) |
+--+
|   195660 |
+--+
1 row in set (0.04 sec)
mysql> select count(*) from user_vm_view;
+--+
| count(*) |
+--+
|41313 |
+--+
1 row in set (4.55 sec)
mysql> select count(distinct id) from user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL;
++
| count(distinct id) |
++
|  14326 |
++
1 row in set (7.39 sec)

mysql> select count(*) from (select distinct id from user_vm_view WHERE 
user_vm_view.account_type != 5  AND user_vm_view.display_vm = 1  AND 
user_vm_view.removed IS NULL) as temp;
+--+
| count(*) |
+--+
|14326 |
+--+
1 row in set (2.08 sec)


UI test Results:
Before:
![screen shot 2015-09-28 at 2 19 55 
pm](https://cloud.githubusercontent.com/assets/1062642/10133848/66af7c40-65fe-11e5-9ef5-ec6489c0fc06.png)

After
![screen shot 2015-09-28 at 2 33 38 
pm](https://cloud.githubusercontent.com/assets/1062642/10133852/6f512c9a-65fe-11e5-9ea1-890cf84d02b4.png)





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudhansu7/cloudstack CLOUDSTACK-8917

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/894.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #894


commit c28a58a8ff4ddde7b86e151ffee35ad26645e584
Author: Sudhansu 
Date:   2015-09-28T10:54:26Z

CLOUDSTACK-8917 : Instance tab takes long time to load with 12K active VM 
(total vms: 190K)

modified sql that is used for retrieving vm count .




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Blameless post mortem

2015-09-28 Thread Remi Bergsma
Hi Bharat,


There is only one way to prove a feature works: with tests. That’s why I say 
actually _running_ the tests we have today on any new PR, is the most important 
thing. Having no documentation is a problem, I agree, but it is not more 
important IMHO. If we had the documentation, we still would have issues if 
nobody runs the tests and verifies pull requests. Documentation that is perfect 
does not automatically lead to perfect implementation. So we need tests to 
verify.

If we don’t agree that is also fine. We need to do both anyway and I think we 
do agree on that.

Regards,
Remi






On 28/09/15 12:15, "Bharat Kumar"  wrote:

>Hi Remi,
>
> i do not agree with “There is no bigger problem”  part of your reply. so I 
> had to repeat myself to make it more clear, Not because i am not aware of 
> what this thread is supposed to do.
> 
>Regards,
>Bharat.
>
>On 28-Sep-2015, at 2:51 pm, Remi Bergsma  wrote:
>
>> Hi Bharat,
>> 
>> I understand your frustrations but we already agreed on this so no need to 
>> repeat. This thread is supposed to list some improvements and learn from it. 
>> Your point has been taken so let’s move on.
>> 
>> We need documentation first, then do a change after which all tests should 
>> pass. Even better is we write (missing) tests before changing stuff so you 
>> know they pass before and after the fact. 
>> 
>> When doing reviews, feel free to ask for design documentation if you feel it 
>> is needed.
>> 
>> Regards, Remi
>> 
>> 
>> 
>> On 28/09/15 11:02, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> I never intended to say that we should not run tests, but even before tests 
>>> we should have proper documentation. My concern was if a major change is 
>>> being introduced it should be properly documented. All the issues which we 
>>> are trying to fix are majorly due to VR refactor. If there was a proper 
>>> documentation for this we could
>>> have fixed this in a better way.  Even to add tests we need to understand 
>>> how a particular thing works and what data dose it expect. I think while 
>>> fixing the python based code changes this is where most of the people are 
>>> facing issues. A proper documentation will help in understanding these in a 
>>> better way.
>>> 
>>> Thanks,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 1:57 pm, Remi Bergsma  
>>> wrote:
>>> 
 Hi Bharat,
 
 There is no bigger problem. We should always run the tests and if we find 
 a case that isn’t currently covered by the tests we should simply add 
 tests for it. There’s no way we’ll get a stable master without them. The 
 fact that they may not cover everything, is no reason to not rely on them. 
 If a feature is not important enough to write a test for, then the feature 
 is probably not important anyway. And if it is, then add a test :-)
 
 I do agree on the design documentation requirement for any (major?) 
 change. I found some design documentations on the subject you mention, but 
 it should have been more detailed. 
 
 Regards,
 Remi
 
 
 
 
 
 
 On 28/09/15 09:58, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> Thank you for the Blame less postmortem. 
> 
> I think there is a bigger problem here than just the review process and 
> running tests. Even if we run the tests we cannot be sure that every 
> thing will work as intended. The tests will only give some level of 
> confidence. The tests may not cover all the use cases.
> 
> I think the problem here is that the way major changes to the code base 
> are dealt with. For example,  VR refactoring was done without discussing 
> the design implications and the amount of changes it would bring in. I 
> could not find any design document. The vr refactor changed a lot of code 
> and the way VR used to work and in my opinion it was incomplete-vpn, 
> isolated networks, basic networks, iptable rules and rvr in isolated case 
> etc were not implemented. Most of us are still in the process of 
> understanding this. Even before reaching this state we had to spend a lot 
> of time fixing issues mentioned in the thread [Blocker/Critical] VR 
> related Issues.  
> 
> When a change of this magnitude is being made, we should call out all the 
> changes and document them properly. This will help people to create 
> better fixes. Currently when we attempt to fix the isolated vr case it is 
> effecting the vpc and vice versa. for example pr 738 fixed it for vpc 
> networks but broke it for isolated case. I believe it is not too late to 
> at least start documenting the changes now.
> 
> Thanks,
> Bharat.
> 
> On 28-Sep-2015, at 10:52 am, Sanjeev N  wrote:
> 
>> I have a concern here. Some of us are actively involved in reviewing the
>> PRs related to marvin tests(Enhancing existing tests/Adding new tests). 
>> If
>> we have to

Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen

> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  wrote:
> 
> Hi Bharat,
> 
> 
> There is only one way to prove a feature works: with tests. That’s why I say 
> actually _running_ the tests we have today on any new PR, is the most 
> important thing. Having no documentation is a problem, I agree, but it is not 
> more important IMHO. If we had the documentation, we still would have issues 
> if nobody runs the tests and verifies pull requests. Documentation that is 
> perfect does not automatically lead to perfect implementation. So we need 
> tests to verify.
> 
> If we don’t agree that is also fine. We need to do both anyway and I think we 
> do agree on that.
> 

Also we need to move forward. We should have a live chat once 4.6 is out to 
discuss all issues/problems and iron out the process.

But reverting the VR refactor is not going to happen. There was ample 
discussions on the PR when it was submitted, there was time to review and raise 
concerns at that time. It went through quite a few reviews, tests etc…Maybe the 
documentation is not good, but the time to raise this concern I am afraid was 
six months ago. We can learn from it, but we are not going to revert it, this 
would not go cleanly as David mentioned.

So let’s get back to blockers for 4.6, are there still VR related issues with 
master ?




> Regards,
> Remi
> 
> 
> 
> 
> 
> 
> On 28/09/15 12:15, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> i do not agree with “There is no bigger problem”  part of your reply. so I 
>> had to repeat myself to make it more clear, Not because i am not aware of 
>> what this thread is supposed to do.
>> 
>> Regards,
>> Bharat.
>> 
>> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  wrote:
>> 
>>> Hi Bharat,
>>> 
>>> I understand your frustrations but we already agreed on this so no need to 
>>> repeat. This thread is supposed to list some improvements and learn from 
>>> it. Your point has been taken so let’s move on.
>>> 
>>> We need documentation first, then do a change after which all tests should 
>>> pass. Even better is we write (missing) tests before changing stuff so you 
>>> know they pass before and after the fact. 
>>> 
>>> When doing reviews, feel free to ask for design documentation if you feel 
>>> it is needed.
>>> 
>>> Regards, Remi
>>> 
>>> 
>>> 
>>> On 28/09/15 11:02, "Bharat Kumar"  wrote:
>>> 
 Hi Remi,
 
 I never intended to say that we should not run tests, but even before 
 tests we should have proper documentation. My concern was if a major 
 change is being introduced it should be properly documented. All the 
 issues which we are trying to fix are majorly due to VR refactor. If there 
 was a proper documentation for this we could
 have fixed this in a better way.  Even to add tests we need to understand 
 how a particular thing works and what data dose it expect. I think while 
 fixing the python based code changes this is where most of the people are 
 facing issues. A proper documentation will help in understanding these in 
 a better way.
 
 Thanks,
 Bharat.
 
 On 28-Sep-2015, at 1:57 pm, Remi Bergsma  
 wrote:
 
> Hi Bharat,
> 
> There is no bigger problem. We should always run the tests and if we find 
> a case that isn’t currently covered by the tests we should simply add 
> tests for it. There’s no way we’ll get a stable master without them. The 
> fact that they may not cover everything, is no reason to not rely on 
> them. If a feature is not important enough to write a test for, then the 
> feature is probably not important anyway. And if it is, then add a test 
> :-)
> 
> I do agree on the design documentation requirement for any (major?) 
> change. I found some design documentations on the subject you mention, 
> but it should have been more detailed. 
> 
> Regards,
> Remi
> 
> 
> 
> 
> 
> 
> On 28/09/15 09:58, "Bharat Kumar"  wrote:
> 
>> Hi Remi,
>> 
>> Thank you for the Blame less postmortem. 
>> 
>> I think there is a bigger problem here than just the review process and 
>> running tests. Even if we run the tests we cannot be sure that every 
>> thing will work as intended. The tests will only give some level of 
>> confidence. The tests may not cover all the use cases.
>> 
>> I think the problem here is that the way major changes to the code base 
>> are dealt with. For example,  VR refactoring was done without discussing 
>> the design implications and the amount of changes it would bring in. I 
>> could not find any design document. The vr refactor changed a lot of 
>> code and the way VR used to work and in my opinion it was 
>> incomplete-vpn, isolated networks, basic networks, iptable rules and rvr 
>> in isolated case etc were not implemented. Most of us are still in the 
>> process of understanding this. Even before reaching this sta

Re: Blameless post mortem

2015-09-28 Thread Sebastien Goasguen

> On Sep 28, 2015, at 1:29 PM, Sebastien Goasguen  wrote:
> 
> 
>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>> wrote:
>> 
>> Hi Bharat,
>> 
>> 
>> There is only one way to prove a feature works: with tests. That’s why I say 
>> actually _running_ the tests we have today on any new PR, is the most 
>> important thing. Having no documentation is a problem, I agree, but it is 
>> not more important IMHO. If we had the documentation, we still would have 
>> issues if nobody runs the tests and verifies pull requests. Documentation 
>> that is perfect does not automatically lead to perfect implementation. So we 
>> need tests to verify.
>> 
>> If we don’t agree that is also fine. We need to do both anyway and I think 
>> we do agree on that.
>> 
> 
> Also we need to move forward. We should have a live chat once 4.6 is out to 
> discuss all issues/problems and iron out the process.
> 
> But reverting the VR refactor is not going to happen. There was ample 
> discussions on the PR when it was submitted, there was time to review and 
> raise concerns at that time. It went through quite a few reviews, tests 
> etc…Maybe the documentation is not good, but the time to raise this concern I 
> am afraid was six months ago. We can learn from it, but we are not going to 
> revert it, this would not go cleanly as David mentioned.
> 
> So let’s get back to blockers for 4.6, are there still 
> 

I will add that the VPC refactor started being discussed in Sept 2014. Over one 
year ago, this was one of our first PR when we were still setting up github 
settings.

There is ample description in the following PR, which should trigger lots of 
discussions...

https://github.com/apache/cloudstack/pull/18

The first merge occurred on October 6th, with only Rohit commenting. But this 
merge was in master prior to our new process.

https://github.com/apache/cloudstack/pull/19

I am not saying this is perfect, just saying that this code has been there for 
almost one year.

> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> i do not agree with “There is no bigger problem”  part of your reply. so I 
>>> had to repeat myself to make it more clear, Not because i am not aware of 
>>> what this thread is supposed to do.
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
>>> wrote:
>>> 
 Hi Bharat,
 
 I understand your frustrations but we already agreed on this so no need to 
 repeat. This thread is supposed to list some improvements and learn from 
 it. Your point has been taken so let’s move on.
 
 We need documentation first, then do a change after which all tests should 
 pass. Even better is we write (missing) tests before changing stuff so you 
 know they pass before and after the fact. 
 
 When doing reviews, feel free to ask for design documentation if you feel 
 it is needed.
 
 Regards, Remi
 
 
 
 On 28/09/15 11:02, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> I never intended to say that we should not run tests, but even before 
> tests we should have proper documentation. My concern was if a major 
> change is being introduced it should be properly documented. All the 
> issues which we are trying to fix are majorly due to VR refactor. If 
> there was a proper documentation for this we could
> have fixed this in a better way.  Even to add tests we need to understand 
> how a particular thing works and what data dose it expect. I think while 
> fixing the python based code changes this is where most of the people are 
> facing issues. A proper documentation will help in understanding these in 
> a better way.
> 
> Thanks,
> Bharat.
> 
> On 28-Sep-2015, at 1:57 pm, Remi Bergsma  
> wrote:
> 
>> Hi Bharat,
>> 
>> There is no bigger problem. We should always run the tests and if we 
>> find a case that isn’t currently covered by the tests we should simply 
>> add tests for it. There’s no way we’ll get a stable master without them. 
>> The fact that they may not cover everything, is no reason to not rely on 
>> them. If a feature is not important enough to write a test for, then the 
>> feature is probably not important anyway. And if it is, then add a test 
>> :-)
>> 
>> I do agree on the design documentation requirement for any (major?) 
>> change. I found some design documentations on the subject you mention, 
>> but it should have been more detailed. 
>> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 09:58, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> Thank you for the Blame less postmortem. 
>>> 
>>> I think there is a bigger problem here than just the review process and 
>>> running tests. Even if we run the tests we cannot be sure that every 

Re: Blameless post mortem

2015-09-28 Thread Remi Bergsma
+1

There are two VR related issues left:
- CLOUDSTACK-8697: Assign VPC Internal LB rule to a VM fails
- CLOUDSTACK-8915: Cannot SSH into VMs deployed Redundant VPC routers

The first one has been tested today and seems still present. The second we 
discovered this weekend while testing. It was broken by a recent PR and we’ll 
try to fix it and prove with tests that everything still works. Wilder and I 
will focus on these issues.

As for the other blockers, I believe Rajani works on CLOUDSTACK-8808 and René 
has sent a PR for CLOUDSTACK-8848.

Regards, Remi





On 28/09/15 13:29, "Sebastien Goasguen"  wrote:

>
>> On Sep 28, 2015, at 1:14 PM, Remi Bergsma  
>> wrote:
>> 
>> Hi Bharat,
>> 
>> 
>> There is only one way to prove a feature works: with tests. That’s why I say 
>> actually _running_ the tests we have today on any new PR, is the most 
>> important thing. Having no documentation is a problem, I agree, but it is 
>> not more important IMHO. If we had the documentation, we still would have 
>> issues if nobody runs the tests and verifies pull requests. Documentation 
>> that is perfect does not automatically lead to perfect implementation. So we 
>> need tests to verify.
>> 
>> If we don’t agree that is also fine. We need to do both anyway and I think 
>> we do agree on that.
>> 
>
>Also we need to move forward. We should have a live chat once 4.6 is out to 
>discuss all issues/problems and iron out the process.
>
>But reverting the VR refactor is not going to happen. There was ample 
>discussions on the PR when it was submitted, there was time to review and 
>raise concerns at that time. It went through quite a few reviews, tests 
>etc…Maybe the documentation is not good, but the time to raise this concern I 
>am afraid was six months ago. We can learn from it, but we are not going to 
>revert it, this would not go cleanly as David mentioned.
>
>So let’s get back to blockers for 4.6, are there still VR related issues with 
>master ?
>
>
>
>
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 12:15, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> i do not agree with “There is no bigger problem”  part of your reply. so I 
>>> had to repeat myself to make it more clear, Not because i am not aware of 
>>> what this thread is supposed to do.
>>> 
>>> Regards,
>>> Bharat.
>>> 
>>> On 28-Sep-2015, at 2:51 pm, Remi Bergsma  
>>> wrote:
>>> 
 Hi Bharat,
 
 I understand your frustrations but we already agreed on this so no need to 
 repeat. This thread is supposed to list some improvements and learn from 
 it. Your point has been taken so let’s move on.
 
 We need documentation first, then do a change after which all tests should 
 pass. Even better is we write (missing) tests before changing stuff so you 
 know they pass before and after the fact. 
 
 When doing reviews, feel free to ask for design documentation if you feel 
 it is needed.
 
 Regards, Remi
 
 
 
 On 28/09/15 11:02, "Bharat Kumar"  wrote:
 
> Hi Remi,
> 
> I never intended to say that we should not run tests, but even before 
> tests we should have proper documentation. My concern was if a major 
> change is being introduced it should be properly documented. All the 
> issues which we are trying to fix are majorly due to VR refactor. If 
> there was a proper documentation for this we could
> have fixed this in a better way.  Even to add tests we need to understand 
> how a particular thing works and what data dose it expect. I think while 
> fixing the python based code changes this is where most of the people are 
> facing issues. A proper documentation will help in understanding these in 
> a better way.
> 
> Thanks,
> Bharat.
> 
> On 28-Sep-2015, at 1:57 pm, Remi Bergsma  
> wrote:
> 
>> Hi Bharat,
>> 
>> There is no bigger problem. We should always run the tests and if we 
>> find a case that isn’t currently covered by the tests we should simply 
>> add tests for it. There’s no way we’ll get a stable master without them. 
>> The fact that they may not cover everything, is no reason to not rely on 
>> them. If a feature is not important enough to write a test for, then the 
>> feature is probably not important anyway. And if it is, then add a test 
>> :-)
>> 
>> I do agree on the design documentation requirement for any (major?) 
>> change. I found some design documentations on the subject you mention, 
>> but it should have been more detailed. 
>> 
>> Regards,
>> Remi
>> 
>> 
>> 
>> 
>> 
>> 
>> On 28/09/15 09:58, "Bharat Kumar"  wrote:
>> 
>>> Hi Remi,
>>> 
>>> Thank you for the Blame less postmortem. 
>>> 
>>> I think there is a bigger problem here than just the review process and 
>>> running tests. Even if we run the tests we cannot be sure that every 
>