On Aug 28, 2014, at 2:16 PM, Sean Dague <s...@dague.net> wrote: > On 08/28/2014 02:07 PM, Joe Gordon wrote: >> >> >> >> On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague <s...@dague.net >> <mailto:s...@dague.net>> wrote: >> >> On 08/28/2014 12:48 PM, Doug Hellmann wrote: >>> >>> On Aug 27, 2014, at 5:56 PM, Sean Dague <s...@dague.net >> <mailto:s...@dague.net>> wrote: >>> >>>> On 08/27/2014 05:27 PM, Doug Hellmann wrote: >>>>> >>>>> On Aug 27, 2014, at 2:54 PM, Sean Dague <s...@dague.net >> <mailto:s...@dague.net>> wrote: >>>>> >>>>>> Note: thread intentionally broken, this is really a different >> topic. >>>>>> >>>>>> On 08/27/2014 02:30 PM, Doug Hellmann wrote:> >>>>>>> On Aug 27, 2014, at 1:30 PM, Chris Dent <chd...@redhat.com >> <mailto:chd...@redhat.com>> wrote: >>>>>>> >>>>>>>> On Wed, 27 Aug 2014, Doug Hellmann wrote: >>>>>>>> >>>>>>>>> I have found it immensely helpful, for example, to have a >> written set >>>>>>>>> of the steps involved in creating a new library, from >> importing the >>>>>>>>> git repo all the way through to making it available to other >> projects. >>>>>>>>> Without those instructions, it would have been much harder >> to split up >>>>>>>>> the work. The team would have had to train each other by word of >>>>>>>>> mouth, and we would have had constant issues with inconsistent >>>>>>>>> approaches triggering different failures. The time we spent >> building >>>>>>>>> and verifying the instructions has paid off to the extent >> that we even >>>>>>>>> had one developer not on the core team handle a graduation >> for us. >>>>>>>> >>>>>>>> +many more for the relatively simple act of just writing >> stuff down >>>>>>> >>>>>>> "Write it down.” is my theme for Kilo. >>>>>> >>>>>> I definitely get the sentiment. "Write it down" is also hard >> when you >>>>>> are talking about things that do change around quite a bit. >> OpenStack as >>>>>> a whole sees 250 - 500 changes a week, so the interaction >> pattern moves >>>>>> around enough that it's really easy to have *very* stale >> information >>>>>> written down. Stale information is even more dangerous than no >>>>>> information some times, as it takes people down very wrong paths. >>>>>> >>>>>> I think we break down on communication when we get into a >> conversation >>>>>> of "I want to learn gate debugging" because I don't quite know >> what that >>>>>> means, or where the starting point of understanding is. So those >>>>>> intentions are well meaning, but tend to stall. The reality was >> there >>>>>> was no road map for those of us that dive in, it's just >> understanding >>>>>> how OpenStack holds together as a whole and where some of the >> high risk >>>>>> parts are. And a lot of that comes with days staring at code >> and logs >>>>>> until patterns emerge. >>>>>> >>>>>> Maybe if we can get smaller more targeted questions, we can >> help folks >>>>>> better? I'm personally a big fan of answering the targeted >> questions >>>>>> because then I also know that the time spent exposing that >> information >>>>>> was directly useful. >>>>>> >>>>>> I'm more than happy to mentor folks. But I just end up finding >> the "I >>>>>> want to learn" at the generic level something that's hard to >> grasp onto >>>>>> or figure out how we turn it into action. I'd love to hear more >> ideas >>>>>> from folks about ways we might do that better. >>>>> >>>>> You and a few others have developed an expertise in this >> important skill. I am so far away from that level of expertise that >> I don’t know the questions to ask. More often than not I start with >> the console log, find something that looks significant, spend an >> hour or so tracking it down, and then have someone tell me that it >> is a red herring and the issue is really some other thing that they >> figured out very quickly by looking at a file I never got to. >>>>> >>>>> I guess what I’m looking for is some help with the patterns. >> What made you think to look in one log file versus another? Some of >> these jobs save a zillion little files, which ones are actually >> useful? What tools are you using to correlate log entries across all >> of those files? Are you doing it by hand? Is logstash useful for >> that, or is that more useful for finding multiple occurrences of the >> same issue? >>>>> >>>>> I realize there’s not a way to write a how-to that will live >> forever. Maybe one way to deal with that is to write up the research >> done on bugs soon after they are solved, and publish that to the >> mailing list. Even the retrospective view is useful because we can >> all learn from it without having to live through it. The mailing >> list is a fairly ephemeral medium, and something very old in the >> archives is understood to have a good chance of being out of date so >> we don’t have to keep adding disclaimers. >>>> >>>> Sure. Matt's actually working up a blog post describing the thing he >>>> nailed earlier in the week. >>> >>> Yes, I appreciate that both of you are responding to my questions. :-) >>> >>> I have some more specific questions/comments below. Please take >> all of this in the spirit of trying to make this process easier by >> pointing out where I’ve found it hard, and not just me complaining. >> I’d like to work on fixing any of these things that can be fixed, by >> writing or reviewing patches for early in kilo. >>> >>>> >>>> Here is my off the cuff set of guidelines: >>>> >>>> #1 - is it a test failure or a setup failure >>>> >>>> This should be pretty easy to figure out. Test failures come at >> the end >>>> of console log and say that tests failed (after you see a bunch of >>>> passing tempest tests). >>>> >>>> Always start at *the end* of files and work backwards. >>> >>> That’s interesting because in my case I saw a lot of failures >> after the initial “real” problem. So I usually read the logs like C >> compiler output: Assume the first error is real, and the others >> might have been caused by that one. Do you work from the bottom up >> to a point where you don’t see any more errors instead of reading >> top down? >> >> Bottom up to get to problems, then figure out if it's in a subprocess so >> the problems could exist for a while. That being said, not all tools do >> useful things like actually error when they fail (I'm looking at you >> yum....) so there are always edge cases here. >> >>>> >>>> #2 - if it's a test failure, what API call was unsuccessful. >>>> >>>> Start with looking at the API logs for the service at the top >> level, and >>>> see if there is a simple traceback at the right timestamp. If not, >>>> figure out what that API call was calling out to, again look at the >>>> simple cases assuming failures will create ERRORS or TRACES >> (though they >>>> often don't). >>> >>> In my case, a neutron call failed. Most of the other services seem >> to have a *-api.log file, but neutron doesn’t. It took a little >> while to find the API-related messages in screen-q-svc.txt (I’m glad >> I’ve been around long enough to know it used to be called >> “quantum”). I get that screen-n-*.txt would collide with nova. Is it >> necessary to abbreviate those filenames at all? >> >> Yeh... service naming could definitely be better, especially with >> neutron. There are implications for long names in screen, but maybe we >> just get over it as we already have too many tabs to be in one page in >> the console anymore anyway. >> >>>> Hints on the service log order you should go after are on the footer >>>> over every log page - >>>> >> >> http://logs.openstack.org/76/79776/15/gate/gate-tempest-dsvm-full/700ee7e/logs/ >>>> (it's included as an Apache footer) for some services. It's been >> there >>>> for about 18 months, I think people are fully blind to it at this >> point. >>> >>> Where would I go to edit that footer to add information about the >> neutron log files? Is that Apache footer defined in an infra repo? >> >> Note the following at the end of the footer output: >> >> About this Help >> >> This help file is part of the openstack-infra/config project, and can be >> found at modules/openstack_project/files/logs/help/tempest_logs.html . >> The file can be updated via the standard OpenStack Gerrit Review >> process. >> >> >> I took a first whack at trying to add some more information to the >> footer here: https://review.openstack.org/#/c/117390/ > > \o/ - you rock joe!
+1!! Doug _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev