>> This reminds me; do we have a utility to reference wmesg strings back
>> to the code that sets them, a la TAGS?  Would this be useful?
> No, and yes respectively.

Okay, I've got an early version written.  It's got some fairly
substantial TODO's, and needs a fair bit of cleanup.  I would
appreciate any comments anybody has.

-----cut here-----
#! /usr/bin/perl -w

# Copyright (c) 1999 Joel Ray Holveck.  All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
# $Id$

# NAME
#      wtags - generate wchan tag file
#
# SYNOPSIS
#      wtags [-cegw] [-v] [path]
#
# DESCRIPTION
#      wtags scans a 4.4BSD kernel source tree and creates a database
#      listing all the wchan's which are explicitly specified to
#      tsleep(9) or similar functions.  This is useful for identifying
#      where in the kernel a process may be hung.
#
#      The source tree to be searched may be specified with path, or
#      the current directory is used.  Subdirectories are always
#      scanned, and symbolic links are always followed.
#
#      The options are as follows:
#
#      -c      Generate ctags(1)-compatible output.  Output is appended
#              to the file "tags" in the current directory.  This file
#              is typically used with vi(1).
#
#      -e      Generate etags(1)-compatible output.  Output is appended
#              to the file "TAGS" in the current directory.  This file
#              is typically used with Emacs(1).
#
#      -g      Generate gtags(1)-compatible output.  Output is appended
#              to the file "GSYMS" in the current directory.  This file
#              is typically used with the global(1) tags system by
#              Shigio Yamaguchi, which may be used with vi(1), Emacs(1),
#              or other systems.
#
#      -w      Generate a native file format (described below).  This
#              format is designed to be easily read by humans or machines,
#              but no utilities currently use it.  -w is the default if
#              no other output format is specified.
#
#      -v      Generate warnings for many cases when a possible call to
#              tsleep(9) or a related function is found, but a wchan
#              could not be isolated.  There are normally many of these
#              in correct code; one version of wtags produced 83 such
#              diagnostics on the 4.4BSD-Lite kernel.  See DIAGNOSTICS,
#              below.
#
#      wtags will only recognize string literals for wchan arguments.
#      A function (such as lf_setlock) which uses a string constant
#      instead, or one (such as ttread) which uses one of a few known
#      possibilities selected via :? or another mechanism, may have a
#      comment such as /* WCHAN: lockf */ on the line in question.
#      wtags will then use the indicated channel, and ignore any tsleep
#      call on that line.
#
# FILES
#      /sys    Traditional kernel source location
#      WTAGS   Default output file, used with -w or if no other
#              tag file is specified
#      TAGS    Emacs(1) tags output file, used with -e
#      tags    vi(1) tags file, used with -c
#      GSYMS   global(1) tags file, used with -g
#
# DIAGNOSTICS
#      If the -v option is specified, then whenever tsleep (or another
#      function that uses wchan) is written in the source file, but no
#      wchan can be found, a diagnostic is printed.  These diagnostics
#      do not always properly describe the issue.
#
#      There is one exception to this: if the item appears to be a
#      reference to tsleep from within a comment (using a heuristic),
#      then no diagnostic is printed.
#
# SEE ALSO
#      ps(1), etags(1), ctags(1), global(1), tsleep(9), glimpse(1)
#
# NOTES
#      wtags was designed under FreeBSD.  Any contributions to allow it
#      to work with other OS's would be appreciated.
#
#      wtags is written for Perl 5.  It may break under other versions
#      of Perl.
#
# HISTORY
#      wtags was originally written on 22 February, 1999 for FreeBSD 4.0 by
#      Joel Ray Holveck <jo...@gnu.org>.
#
# BUGS
#      -e and -g are not yet implemented.  Notes on the respective file
#      formats are in the source.
#
#      Presently, only calls to tsleep, asleep, ttysleep, and lockinit are
#      scanned.  A list of other calls to be scanned is in the source.
#
#      Mentions of tsleep in strings are not skipped, and will normally
#      generate a diagnostic (if -v is specified).

# TODO:
#
# 0. Fix multiline functionality.  Try scanning /sys/kern/subr_log.c for
# an example.
#
# 1. Add etags(1) support.  Since I use Emacs so often, this was my original
# goal.  I will describe the file format for Emacs's TAGS here:
# Each file is listed separately.  At the beginning of the file's entries
# is a formfeed, newline, the complete filename, and the number of bytes
# until the next entry, all terminated by a newline.  Example:
#   ^L
#   /usr/local/src/emacs-20.3/src/alloca.c,785
# The bytecount is from after the newline to before the next formfeed
# (or EOF).  Each item in the file is on a separate line.  If the tag has
# a name, then the line contains the search string, a del (\177), the
# tag name, a ^A (\001), the line number, a comma, and the byte number.
# This means that we will probably be using something along the lines of:
#   tsleep^?wait^A1132,29901
# This will make Emacs search for "tsleep" when the match is used.
# (If the item has no name, then the name and the ^A are elided.)
#
# 2. Add gtags(1) support.  Specifically, these should be added to GSYMS.
# I still need to do more research on the GSYMS file format.  It seems
# that each record includes the name of the tag, the filename, and a
# comma-separated list of line numbers where the tag appears.  If a tag
# appears in more than one file, then more than one entry is created.
#
# 3. Add support for more sleep calls.  I need to change out of
# hardcoded regexps first.  As of FreeBSD 4.0-CURRENT (24 Feb 1999), 
# the calls in question include usbd_bulk_transfer (from
# /sys/dev/usb/usbdi_util.c), waitforit (from /sys/i386/isa/matcd/matcd.c),
# wdsleep (from /sys/i386/isa/wd.c and /sys/pc98/pc98/wd.c), wtwait
# (from /sys/i386/isa/wt.c), BPF_SLEEP (from /sys/net/bpf.c), and
# vm_page_sleep and vm_page_asleep (both from /sys/vm/vm_page.c).

use strict;
use Getopt::Std;
use vars qw($opt_w $opt_c $opt_v);

# A hash indexing wchan strings to a list of locations.  The location
# format currently is filename\tlineno, but this may change (probably
# to append ",byteno") when etags support is added.
# Current example:
#    "slock"-> [ "/sys/kern/kern_lock.c\t557" ]
# future example:
#    "slock"-> [ "/sys/kern/kern_lock.c\t557,14263" ]
# it is also likely that etags support will use a completely different
# system.
my %wchan;

# a hash indexing location strings to the function name being used (eg,
# "kern_lock.c\t557"->"tsleep")
my %sleepcall;

# a hash relating sleep calls to the argument offset of the
# wchan string
# FIXME The regexes that scan for these calls should be built
# from this.
my %wchanarg;
$wchanarg{"tsleep"}=2;
$wchanarg{"asleep"}=2;
$wchanarg{"ttysleep"}=3;
$wchanarg{"lockinit"}=2;

# $prog_label is prefixed on all warnings and error messages.  to
# comply with standards, it should be "wtags: ", but leaving it empty
# allows you to use emacs's compile-mode with its output to check the
# source of each warning message against the offending kernel source.
# my $prog_label = "wtags: ";
my $prog_label = "";

sub scanfile {
    my ($file) = @_;
    open FILE, $file or warn "$prog_label$file: can't open: $!; skipped\n";
    while (<FILE>) {
        # FIXME Isn't recognized on subsequent lines
        m,/\*\s*WCHAN:\s*([A-Za-z0-9]+)\s*\*/, && goto EXPLICIT_WCHAN;
        /\b(tsleep|asleep|ttysleep|lockinit)(.*)/ || next;
        my $looks_like_comment;
        my $sleepcall = $1;     # Saved to add to %sleepcall
        my $rest = $2;

        # This matches either /* or a line starting with a * (for
        # boxed comments).
        # FIXME This doesn't work for calls within calls.
        $looks_like_comment = ((m,^\s*/?\*,) ? 1 : 0);
        $_ = $rest;

      GOT_SLEEP:
        # We keep the current arg number in $arg.  If we are in a
        # comment, then set $incomment to 1.  parens is the current
        # parens level.  Inside the tsleep call's parens is level 0.

        my ($curarg, $incomment, $parens, $lines_scanned) = (0, 0, -1, 0);
        while ($curarg < $wchanarg{$sleepcall}) {
            
            # Handle this first so it can next; to itself.
            if ($incomment) {
                if (m,\*/(.*),) {
                    $_ = $1;
                    $incomment = 0;
                } else {
                    if (++$lines_scanned >= 5) {
                        $opt_v and warn "$prog_label$file: $.: tsleep call over 
five lines; skipped\n";
                        goto PARSE_FAILED;
                    }
                    $_ = <FILE>;
                    if (!defined $_) {
                        $opt_v and warn "$prog_label$file: $.: early EOF hit; 
continuing";
                        goto PARSE_FAILED;
                    }
                    next;
                }
            }
        
            # Parse out the following: /*, */, (, ), ',', and 'tsleep'.
            # FIXME Doesn't exclude tokens within strings
            if 
(m:.*?(/\*|\*/|\(|\)|,|\btsleep|\basleep|\bttysleep|\blockinit)(.*)$:) {
                if ($1 eq "/*") {
                    $incomment = $1;
                } elsif ($1 eq "*/") {
                    # End of comment hit
                    goto PARSE_FAILED;
                } elsif ($1 eq "(") {
                    ++$parens;
                } elsif ($1 eq ")") {
                    $parens--;
                    if ($parens < 0) {
                        # Suppress the warning if there is a comment
                        # that includes a reference to tsleep just
                        # like this one does.
                        if (!($looks_like_comment && $curarg == 0)) {
                            $opt_v and warn "$prog_label$file: $.: tsleep call 
with too few args; skipped\n";
                        }
                        goto PARSE_FAILED;
                    }
                } elsif ($1 eq ",") {
                    $parens == 0 and $curarg++;
                } elsif ($1 eq "*/") {
                    # There was a reference to tsleep within a comment.
                    goto PARSE_FAILED;
                } else {        # new tsleep call
                    $opt_v and warn "$prog_label$file: $.: tsleep called within 
tsleep\n";
                    $sleepcall = $1;
                    $_ = $2;
                    goto GOT_SLEEP;
                }
                $_ = $2;
            } else {
                # This is taken out because it probably is not useful.
                # if ($parens == -1) {
                #     # This was probably a reference to tsleep, rather than
                #     # a call.
                #     goto PARSE_FAILED;
                # }
                $_ = <FILE>;    # get the next line
                if (!defined $_) {
                    $opt_v and warn "$prog_label$file: $.: early EOF hit; 
continuing";
                    goto PARSE_FAILED;
                }
                $looks_like_comment = ((m,^\s*/?\*,) ? 1 : 0); # mostly for 
switching to the next tsleep call
                if (++$lines_scanned >= 5) {
                    # This is probably a sign of a misparse, or incomplete code
                    # (say, in an #if 0 or something).
                    $opt_v and warn "$prog_label$file: $.: tsleep call over 
five lines; skipped\n";
                    goto PARSE_FAILED;
                }
            }
        }
        
        # We are now looking at the correct argument.
        # FIXME Handle looking at a comment
        while (/^\s*$/) {
            # End of line was after the comma; skip blank lines.
            $_ = <FILE>;
            if (!defined $_) {
                $opt_v and warn "$prog_label$file: $.: early EOF hit; 
continuing";
                goto PARSE_FAILED;
            }
        }
        if (! m,^\s*"([^\042]*)",) { # Use octal to keep quotes matched for 
emacs
            $opt_v and warn "$prog_label$file: $.: wmesg is not a string 
literal; skipped\n";
            goto PARSE_FAILED;
        }
      EXPLICIT_WCHAN:
        my $wmesg = $1;
        if (defined $wchan{$wmesg}) {
            push @{ $wchan{$wmesg} }, ("$file\t$.");
        } else {
            $wchan{$wmesg} = [ "$file\t$." ];
        }
        $sleepcall{$wmesg} = $sleepcall;
      PARSE_FAILED:
    }
    close FILE;
}

sub scandir {
    my ($wd, $glob) = @_;
    my $file;
    my @flist;
    print "==> $wd\n";
    @flist = glob("$wd/$glob");
    defined @flist or warn "${prog_label}can't scan $wd: $!; skipped\n";
    foreach $file (@flist) {
        if (-d $file) {
            scandir($file,$glob);
        } else {
            $file =~ /.c$/ and scanfile($file);
        }
    }
}

getopts('cvw');
$opt_c or $opt_w = 1;           # -w is the default output format
$opt_w and (open WTAGS, ">WTAGS" or die "${prog_label}can't create WTAGS:$!\n");
$opt_c and (open CTAGS, ">>tags" or die "${prog_label}can't append to 
tags:$!\n");

# Use the output of `pwd` as the directory if nothing was specified on the
# command line.
my $wd = ((defined $ARGV[0])?$ARGV[0]:`pwd`);

$wd =~ s,/?\n?$,,;            # Strip off any terminating slash and/or newline
scandir($wd, "*");
# FIXME This loop should be built and eval'd.
foreach my $wchan (sort keys %wchan) {
    $opt_w && print WTAGS "\f\n$wchan\n";
    foreach my $spot (@{ $wchan{$wchan} }) {
        # Should I use a regex for ctags instead?
        $opt_c && print CTAGS "$wchan\t$spot\n";
        $opt_w && print WTAGS "$spot\n";
    }
}

close CTAGS;
close WTAGS;
__END__
-----cut here-----

-- 
Joel Ray Holveck - jo...@gnu.org
   Fourth law of programming:
   Anything that can go wrong wi
sendmail: segmentation violation - core dumped


To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-current" in the body of the message

Reply via email to