On Fri, Mar 4, 2016 at 9:23 AM, Emilien Macchi <emil...@redhat.com> wrote: > That's not the name of any Summit's talk, it's just an e-mail I wanted > to write for a long time. > > It is an attempt to expose facts or things I've heard a lot; and bring > constructive thoughts about why it's challenging to contribute in > TripleO project.
Thanks for sharing your thoughts. I struggled a bit with responding to be honest, as I think the points you call attention to often lead to frustration on the side of both patch authors and reviewers. I think these things are more anecdotal than fact based to be honest. But, that doesn't mean that it's not a constructive conversation, so thanks for calling out the issues. At the core, we need a lot more investment in CI. We could use more physical resources and more contributors. Or, we could direct some of the capacity away from new development and put it towards CI improvements instead. That might allow us to cover more features in CI, which I think would have a direct impact on review velocity. There are also some architectural changes proposed (split-stack) that will allow us to scale CI more effectively than we have in the past. > > > 1/ "I don't review this patch, we don't have CI coverage." If a patch is obviously not covered by CI, it would go along way if the author indicated if and how they manually tested a patch. Often, I see a non-trivial patch that our CI does not cover. When I try to manually test the patch, it doesn't work. Sometimes in very obvious ways. Over time, this sort of pattern lowers confidence of core reviewers to +2 and approve non-trivial patches that they themselves haven't manually tested. I know that manual testing doesn't scale. However, right now, our CI doesn't scale to cover every possible combination of features either. So, if we want to keep moving forward at all, some things will have to be manually tested. If patch authors added a comment such as "i tested this with network isolation and it worked as expected, and it doesn't break existing CI", that would go a long ways towards giving people the confidence to approve it. > > One thing I've noticed in TripleO is that a very few people are involved > in CI work. > In my opinion, CI system is more critical than any feature in a product. > Developing Software without tests is a bit like http://goo.gl/OlgFRc > All people - specially core - in the project should be involved in CI > work. If you are TripleO core and you don't contribute on CI, you might > ask yourself why. That's fair. Although it's also fair to ask the same of non-cores. It's not just the job of core reviewers to get patches to pass CI. A lot of times I get pinged to review patches with a sense of urgency about them, yet they are sitting with failed CI. And it's not like people are pinging me to help them understand and fix why the patch has failed CI. They just want it reviewed/approved. > > > 2/ "I don't review this patch, CI is broken." Do you mean when CI is generally failing across the board? I can honestly say that when CI is generally failing for whatever reason (infrastructure issue, OpenStack regression, TripleO regression), there is almost always 1 or 2 TripleO cores all over that issue. Not everyone needs to be working on the issue at once. But, I can honestly say I've never seen TripleO just completely red where at least one person wasn't working on it almost exclusively. However, if you're saying that people don't review a patch if that specific patch has failed CI on it, then I think there is a lot of shared responsibility there. It's not just on reviewers to see why something has failed CI, or to try to get it to pass. I'm less likely to review a patch if it has been sitting for several days with a failed CI job on it. The author probably doesn't need it landed that bad if that's the case. Often, there's not even a comment if they looked at the failed job to see why. So, yea, I'm less likely to review those patches honestly, and maybe that's not fair. Just so I'm clear, I'm not saying that I ignore patches with failed CI. I try and help new contributors or people I might not recognize get their patches to pass CI. I recognize there is a steep learning curve there. But when I see patches out there from folks who are capable of at least triaging the failure, and that hasn't been done, I'm certainly guilty of de-prioritizing reviewing that patch. Maybe that's a bad thing. > > Another thing I've noticed in TripleO is that when CI is broken, again, > a very few people are actually working on fixing failures. This sounds like you're talking about scenarios when all of TripleO Ci is failing. In which case, I really disagree with your assertion that people aren't rapidly fixing those failures. > My experience over the last years taught me to stop my daily work when > CI is broken and fix it asap. > > 3/ "I don't review it, because this feature / code is not my area". > > My first though is "Aren't we supposed to be engineers and learn new areas?" > My second though is that I think we have a problem with TripleO Heat > Templates. > THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If > TripleO core say "I'm not familiar with Puppet", we have a problem here, > isn't? > Maybe should we split this repository? Or revisit the list of people who > can +2 patches on THT. I'm sure we have some of this behavior. But I don't see it as an issue to be honest. People should review code they are comfortable reviewing. It would also be great if people spent time getting to know parts of the codebase they aren't as familiar with. But not everyone has time to make that a priority, or they might not choose to work that way, and that's Ok. We can't force people to step outside their comfort zone as that's more of a personal decision. I don't honestly think we've made poor technology choices that make it difficult to review. Heat templates and puppet manifests are well understood technologies in the deployment and configuration space. Just like anything else, they take time to get familiar with. As for revisiting who can +2 patches on THT, I think that is a discussion that is always open. If you have any specifics in mind, I would make those proposals or perhaps consult with the PTL and existing cores to get a sense of what the concensus might be before making a proposal, if you're more comfortable going that route. > > > 4/ Patches are stalled. Most of the time. > > Over the last 12 months, I've pushed a lot of patches in TripleO and one > thing I've noticed is that if I don't ping people, my patch got no > review. And I have to rebase it, every week, because the interface > changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 > again... and so on.. > > I personally spent 20% of my time to review code, every day. > I wrote a blog post about how I'm doing review, with Gertty: > http://my1.fr/blog/reviewing-puppet-openstack-patches/ > I suggest TripleO folks to spend more time on reviews, for some reasons: > > * decreasing frustration from contributors > * accelerate development process > * teach new contributors to work on TripleO, and eventually scale-up the > core team. It's a time investment, but worth it. We definitely have more patches out there than the core team can keep up with. > > In Puppet team, we have weekly triage sessions and it's pretty helpful. > > > 5/ Most of the tests are run... manually. > > How many times I've heard "I've tested this patch locally, and it does > not work so -1". I don't understand your point here. How would you expect someone to review a patch once they've discovered it doesn't even work? Having to manually test things isn't what most people want to spend their time doing. However, some reviewers choose to manually test patches that they know are not covered by CI. I choose to do that since I know we don't have enough capacity to CI everything and I want to make sure that non-trivial patches do not break TripleO for everyone. I don't see a problem with that. If you want to decrease the likelihood your patch gets manually tested, you could add CI coverage for it. If we don't have capacity to add CI coverage for it, work on finding a different way, such as adding a periodic job. If you don't want to do that, push a temporary patch to tripleo-ci that depends on your patch, exercises the code path, and shows it working. Some people will still choose to manually test though, and I see absolutely nothing wrong with that. And if it doesn't work as advertised, I'd expect to see a -1 on the review. In the cases when I have done this, I try and provide all of the details about how it failed for me and what I think the problem is. I think that's a fair and constructive review and it helps push it forward in terms of getting it committed. > > The only test we do in current CI is a ping to an instance. Seriously? > Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and > real scenarios. And we run a ping. > That's similar to 1/ but I wanted to raise it too. I'd love to see us do a tempest run, with at least the smoketests. After redeploying the testenvs to have 5GB overcloud nodes, this might be possible. We should prove this out before turning it on, probably via some tripleo-ci patches, so that we know how much time it adds, and if it's something we can live with. -- -- James Slagle -- __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev