> On Sep 28, 2015, at 7:22 AM, Sanjeev N <sanj...@apache.org> wrote: > > I have a concern here. Some of us are actively involved in reviewing the > PRs related to marvin tests(Enhancing existing tests/Adding new tests). If > we have to test a PR it requires an environment to be created with actual > resources and this is going to take lot of time. Some of the tests can run > on simulator but most of the tests require real hardware to test. PR > submitter is already testing and submitting the test results along with the > PR.
In lots of cases we don’t see those test results. We should make sure we do or at a minimum explain what tests we did. > So is it require to test these PRs by reviewers? > If you LGTM a PR, explain why and what tests we did. Just “LGTM” is not enough > On Sat, Sep 26, 2015 at 1:49 PM, sebgoa <run...@gmail.com> wrote: > >> Remi, thanks for the detailed post-mortem, it's a good read and great >> learning. >> I hope everyone reads it. >> >> The one thing to emphasize is that we now have a very visible way to get >> code into master, we have folks investing time to provide review (great), >> we need the submitters to make due diligence and answer all comments in the >> reviews. >> >> In another project i work on, nothing can be added to the code without >> unit tests. I think we could go down the route of asking for new >> integration tests and unit tests for anything. If not, the PR does not get >> merged. But let's digest your post-mortem and we can discuss after 4.6.0. >> >> I see that you reverted one commit that was not made by you, that's great. >> >> Let's focus on the blockers now, everything else can wait. >> >> The big bonus of doing what we are doing is that once 4.6.0 is out, we can >> merge PRs again (assuming they are properly rebased and tested) and we can >> release 4.6.1 really quickly after. >> >> -sebastien >> >> On Sep 25, 2015, at 9:51 PM, Remi Bergsma <rberg...@schubergphilis.com> >> wrote: >> >>> Hi all, >>> >>> This mail is intended to be blameless. We need to learn something from >> it. That's why I left out who exactly did what because it’s not relevant. >> There are multiple examples but it's about the why. Let's learn from this >> without blaming anyone. >>> >>> We know we need automated testing. We have integration tests, but we are >> unable to run all of them on any Pull Request we receive. If we would have >> that in place, it'd be much easier to spot errors, regression and so on. >> It'd also be more rewarding to write more tests. >>> >>> Unfortunately we're not there yet. So, we need to do something else >> instead until we get there. If we do nothing, we know we have many issues >> because a master that breaks on a regular basis is the most frustrating >> things. We said we'd use Pull Requests with at least two humans to review >> and give their OK for a Pull Request. In the form of LGTM: Looks Good To >> Me. Ok, so the LGTMs are there because we have no automated testing. Keep >> that in mind. You are supposed to replace automated testing until it's >> there. >>> >>> Since we do this, master got a lot more stable. But every now and then >> we still have issues. Let's look at how we do manual reviews. Again, this >> is not to blame anyone. It's to open our eyes and make us realise what >> we're doing and what results we get out of that. >>> >>> >>> Example Pull Request #784: >>> Title: CLOUDSTACK-8799 fixed the default routes >>> >>> That's nice, it has a Jira id and a short description (as it should be). >>> >>> The first person comes along and makes a comment: >>> "There was also an issue with VPC VRs" ... "Have you seen this issue? >> Does your change affects the VPC VR (single/redundant)?" >>> >>> Actually a good question. Unfortunaly there comes no answer. After a >> reminder, it was promised to do tests against VPC networks. Great! >>> >>> The Jenkins builds both succeed and also Travis is green. But how much >> value does this have? They have the impression to do automated testing, and >> although you could argue they do, it's far from complete. If it breaks, you >> know you have an issue. But it doesn’t work the other way around. >>> >>> Back to our example PR. In the mean time, another commit gets pushed to >> it: "CLOUDSTACK-8799 fixed for vpc networks." But if you look at the Jira >> issue, you see it is about redundant virtual routers. The non-VPC ones. So >> this is vague at best. But a reviewer gives a LGTM because the person could >> create a VPC. That doesn't have anything to do with the problem being fixed >> in this PR nor with the comments made earlier. But, at least the person >> said what he did and we should all do that. What nobody knew back then, was >> that this broke the default route on VPCs. >>> >>> Then something strange happens: the two commits from the PR end up on >> master as direct commits. With just one LGTM and no verification from the >> person commenting about the linked issue. This happened on Friday September >> 11th. >>> >>> That day 21 commits came in, from 7 Pull Request and unfortunately also >> from some direct commits. We noticed the direct commits and notified the >> list (http://cloudstack.markmail.org/message/srmszloyipkxml36). As a lot >> came in at the same time, it was decided not to revert them. Looking back, >> we should have done it. >>> >>> From this point on, VPCs were broken as they wouldn't get a default >> route. So, no public internet access from VMs in VPC tiers, no VPNs >> working, etc. This was mentioned to the list on Thursday September 15th, >> after some chats and debugging going on over the weekend ( >> http://cloudstack.markmail.org/message/73ulpu4p75ex24tc) >>> >>> Here we are, master is broken functionality wise and new Pull Requests >> come in to fix blockers. But we cannot ever test their proper working, >> because VPCs are broken in master and so also in the PRs branched off of >> it. With or without change in the PR. >>> >>> It starts to escalate as the days go by. >>> >>> I’ll leave out the bit on how this frustrated people. Although it’s good >> to know we do not want to be in this situation. >>> >>> Eventually Wilder and I spent an evening and a day working on a branch >> where we loaded 7 PRs on top of each other (all VR related) only to find >> the VPC is still broken. It allowed us to zoom in and find the default >> route was missing again. We said it worked 3 weeks before, because the same >> tests that succeeded then, now were broken. We had already fixed this in PR >> #738 on August 25 so were sure about it. >>> >>> After some digging we could trace it back to Pull Request #784. Imagine >> the feeling seeing your own comment there mentioning the previous issue on >> the default gateways. Fair to say our human review process clearly failed >> here. Many many hours were spent on this problem over the past two weeks. >> Could we have prevented this from happening? I think so, yes. >>> >>> >>> This example clearly shows why: >>> >>> - we should use Pull Requests >>> It made the change visible: Great! >>> >>> - we do reviews and ask for feedback >>> We got feedback and questions: Also great! >>> >>> - we should always respond to feedback and verify it is resolved, before >> merging >>> We need to improve here. Even with two reviewers that say LGTM, we >> should still address any feedback before merging. >>> >>> - we should have two humans doing a review >>> We need to improve here as well. Not one reviewer, we need two. Really. >>> >>> - we need to document why we say LGTM. >>> Another improvement. It’s nice to say LGTM, but a review of only 4 >> characters and nothing more is useless. We need to know what was tested and >> how. Test results, screen shots or anything that shows what's been >> verified. If you only reviewed the code, also fine but at least say that. >> Then the next reviewer should do another type of review to get the comlete >> picture. Remember you're replacing automated testing! >>> >>> - we should always merge Pull Requests >>> We made it easy, merging is the de facto standard, and it has even more >> benefits. You can trace commits back to their Pull Request (and find all >> comments and discussion there: saves time, trust me). It also allows for >> easier reverting of a Pull Request. We’ll see even more benefits once 4.7 >> is there. Although the intentions to merge the Pull Request were there, it >> still didn't happen. We should always check before we push. As a committer >> we just need to be sure. >>> >>> - we need automated testing! >>> The sooner the better. It’s all about the missing automated testing. >> After 4.6, we all need to focus on this. Saves a lot of time. And >> frustrations. >>> >>> >>> >>> We're doing final testing on PR #887 and will merge it soon. From that >> point on we can look into new issues. Be aware that any PR out there that >> was created after September 10 needs to be rebased with current master >> (when #887 is merged). Without that, no serious testing can be done. >>> >>> Let's be careful what to land on master. I'll only be merging Pull >> Requests that have had proper reviews with information on what was tested. >> At least one reviewer needs to actually verify it works (and show the rest >> of us). We simply cannot assume it will work. >>> >>> If we do this, I think we can start resolving the remaining blockers >> one-by-one and go into the first RC round. Please help out where you can so >> we can make this a success together. Thanks! >>> >>> Looking forward to the day we have our automated testing in place ;-) >>> >>> Regards, >>> Remi >>> >> >>