Thanks a lot for putting these together on the wiki, Beth! It really helps.
Song -----Original Message----- From: Flanagan, Elizabeth [mailto:elizabeth.flana...@intel.com] Sent: Wednesday, January 11, 2012 11:12 AM To: Liu, Song Cc: yocto@yoctoproject.org Subject: Re: [yocto] Yocto SWAT team kickoff I've done some initial wikification of this: https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Live_Debugging_Process Michael Halstead, as the SA, should probably be included on this as access rights to the infrastructure should in most cases, go through him. -b On Tue, Jan 10, 2012 at 7:24 AM, Liu, Song <song....@intel.com> wrote: > Hi all, > > We would like to kick off the Yocto SWAT team this week. Please see the > following for the purpose of the SWAT team and let me know if you have any > questions or concerns. We welcome any community participation on the SWAT > team. At the same time, I will work with the team to make sure thing get > started. > > Thanks, > Song > > YOCTO SWAT TEAM > > GOAL > > The assembly of the Yocto Project SWAT team is mainly to tackle urgent > technical problems that break build on the master branch or major release > branches in a timely manner, thus to maintain the stability of the master and > release branch. The SWAT team includes volunteers or appointed members of the > Yocto Project team. Community members can also volunteer to be part of the > SWAT team. > > SCOPE OF RESPONSIBILITY > > Whenever a build (nightly build, weekly build, release build) fails, the SWAT > team is responsible for ensuring the necessary debugging occurs and > organizing resources to solve the issue and ensure successful builds. If > resolving the issues requires schedule or resource adjustment, the SWAT team > should work with program and development management to accommodate the change > in the overall planning. > > MEMBERS: > > * Darren Hart (US) > * Elizabeth Flanagan (US) > * Paul Eggleton (UK) > * Jessica Zhang (US) > * Dexuan Cui (CN) > * Saul Wold (US) > * Richard Purdie (UK) > > ROTATING CHAIR: > > A chairperson role will be rotated among team members each week. The > Chairperson should monitor the build status for the entire week. Whenever a > build is broken, the Chairperson should do necessary debugging and organize > resources to solve the problems in a timely manner to meet the overall > project and release schedule. The Chairperson serves as the focal point of > the SWAT team to external people such as program managers or development > managers. > > ROTATING PROCESS > > Each week on a specific day (propose Monday), a SWAT team meeting could be > called at the chairperson's discretion to discuss current issues and status. > Either during the meeting or offline, the Chairperson of last week will > identify and pass the role to another person in the team. The program manager > should be notified at the same time. Usually, this will take a simple round > robin order. In case the next person cannot take the role due to tight > schedule, vacation or some other reasons, the role will be passed to the next > person. > > The current Chairperson's full name and email address will be published on > the project status wiki page: > https://wiki.yoctoproject.org/wiki/Yocto_Project_v1.2_Status under "Current > SWAT team Chairperson" section. > > BKM (RICHARD PURDIE) > > When looking at a failure, the first question is what the baseline was and > what changed. If there were recent known good builds it helps to narrow down > the number of changes that were likely responsible for the failure. It's also > useful to note if the build was from scratch or from existing sstate files. > You can tell by seeing what "setscene" tasks run in the log. > > The primary responsibility is to ensure that any failures are categorized > correctly and that the right people get to know about them. > > It's important *someone* is then tasked with fixing it. Image failures are > particular tricky since its likely some component of the image that failed > and the question is then whether that component changed recently, whether it > was some kind of core functionality at fault and so on. > > Ideally we want to get the failure reported to the person who knows something > about the area and can come up with a fix without it distracting them too > much. > As a secondary responsibility, its often helpful for to triage the failure. > This might mean documenting a way to reproduce the failure outside a full > build and/or documenting how the failure is happening and maybe even propose > a fix. > > Sometimes failures are difficult to understand and can require direct ssh > access to the autobuilder so the issue can be debugged passively on the > system to examine contents of files and so forth. If doing this ensure you > don't change any of the file system for example adding files that couldn't > then be deleted by the autobuilder when it rebuilds. > > Rarely, "live" debugging might be needed where you'd su to the pokybuild user > and run a build manually to see the failure in real time. If doing this, > ensure you only create files as the pokybuild user and you are careful not to > generate sstate packages which shouldn't be present or any other bad state > that might get reused. In general its recommended not to do "live" debugging. > This can be escalated to RP/Saul/Beth if needed. > > To fulfill the primary responsibility, it's suggested that bugs are opened on > the bugzilla for each type of failure. This way, appropriate people can be > brought into the discussion and a specific owner of the failure can be > assigned. Replying to the build failure with the bug ID and also bringing the > bug to the attention of anyone you suspect was responsible for the problem > are also good practices. > > _______________________________________________ > yocto mailing list > yocto@yoctoproject.org > https://lists.yoctoproject.org/listinfo/yocto -- Elizabeth Flanagan Yocto Project Build and Release _______________________________________________ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto