Peter Jeremy <[EMAIL PROTECTED]> wrote:
> I use id-utils (/usr/ports/devel/id-utils). It builds a single database
> file and has a variety of tools (including e-lisp) to search the database.
>
> Since global(1) was mentioned in this threaad, I decided to have a look
> at it. It seems much slower and my sample (samba-2.0.5a) database was
> nearly 20 times larger.
Well, as a longtime-user of mkid, mkid2, and mkid3 (the
predecessors to id-utils), here are some comments on the various
packages:
[ Note: in the following, I'm not quite comparing apples and apples.
However, I'm too lazy to do a strict comparison, but this should still
give people a vague idea of each package's performance. Take the
following as you will, with a grain of salt. ]
* As a baseline, let's look at plain grep. First, generate a list of
files to search (this assumes that we don't want to look through all
files, including Makefiles, man pages, etc.):
cd /usr/src
find * -type f | time grep '\.[chsSly][cxp]*$' > /tmp/foo
Now, on my system (-current from Aug. 21, a PII 300MHz w/128MB & a F/W
SCSI disk), this takes around 50 seconds (real time):
xargs grep ptrace < /tmp/foo
Not too bad, but not great, either. Let's try looking for utmp.h:
xargs grep 'utmp\.h' < /tmp/foo
This takes around a minute.
Now, let's look at "grep -R":
cd /usr/src
grep -R ptrace . # 2 minutes 42 seconds
grep -R 'utmp\.h' . # 2 minutes 40 seconds
In other words, with grep, you need to limit your searches. Also,
"grep -R" doesn't work very well if you also happen to have glimpse,
global, or mkid/id-utils indices under /usr/src.
* Global is OK (does not appear to support C++, though), but generates
HUGE databases (by default). For /usr/src, the databases are around
as large as the total size of the indexed source files (the gtags "-c"
option was not used). Indexing is slow, but searching seems to be
quite fast. In particular, "global -x name" is nice, because it just
return where "name" is defined, as opposed to a plain grep which can
also return matches on "fooname" and "namebar", as well as where
"name" is used. However, global appears to be optimized for locating
where a function is defined. It appears to be difficult to locate,
for example, where a preprocessor macro is defined; except for "global
-g" (which is often too slow to be usable), I haven't found a way of
getting global to search through .h header files.
On my system, indexing /usr/src took around an hour, and the indices
took up around 240MB+ (this was with "gtags" and not "gtags -c").
This is 20+ times larger than a glimpse or mkid/id-utils database.
It's interesting to note that "global -x -g ptrace" takes around twice
as long to execute (over two minutes), compared to plain grep.
However, "global -x -s ptrace" is very fast (under 1 second).
Searching for ptrace generates two (2) lines of output, in well under
one second:
global -x ptrace
as do these:
global -x -s ptrace
global -x -s uap
Looking for where "utmp.h" is used:
global -x -s utmp.h
This takes more than 2212 seconds (over 36 minutes!), and outputs
nothing. Well, let's try this instead:
global -x -g utmp.h
This works, taking a bit over a minute and a half. However, plain
grep is faster (note that, as global searches through source files
only, you have to compare it to the source-file-only grep, and not
"grep -R").
However, looking for the definition of a preprocessor macro is a
pain. Try looking for KBD_DATA_PORT:
global -x KBD_DATA_PORT
This runs quickly, but displays nothing. Next, try:
global -x -s KBD_DATA_PORT
This runs quickly, and shows where this is used in .c source files.
However, where's the definition? It's not shown.
This works:
global -x -g KBD_DATA_PORT
However, this takes around two minutes to run, which is much slower
than a plain grep.
* Glimpse is a general-purpose text indexer which can be used to index
source files. It's basically an intelligent grep, but it works quite
well. Unlike mkid, you can search through comments and non-source
files (like Makefiles, man pages, README's, etc.).
On my system, indexing /usr/src took around 6 minutes (using the "-M
20" option), and the indices took up around 10MB.
On my system, searching for ptrace took 35 seconds, with 505 lines of
output (ChangeLogs, man pages, etc. account for the extra lines):
glimpse -w ptrace
Searching for uap takes around 21 seconds:
glimpse -w uap
Looking for utmp.h:
glimpse -y -w utmp.h
This takes a bit over 45 seconds. However, glimpse searched through
(and displayed hits in) non-source files, like configure,
configure.in, Makefiles, etc..
It is possible to have glimpse exclude certain files and index only
those files you want indexed. However, I don't have the time to
configure and test this. Perhaps someone else will do this.
* Mkid/mkid2/mkid3/id-utils appear to generate the smallest index
databases, and they run quickly. They're great for looking up where a
particular identifier is used (e.g., "gid ptrace", which is an
intelligent grep), but it can't just tell you where something is
defined, and only that place. The place where something is defined is
output along with every place that it's used. You're basically doing
a very intelligent grep. However, grep'ing via gid is *MUCH* faster
than "global -g" (it's like 100X faster); on the other hand, "global
-s" is often comparable to gid.
Mkid and friends can also (supposedly, as I've never tried it) tell
you where a number occurs, in any base. If you know the number 100 is
somewhere in your source code, mkid can show you where it occurs, as
"100" (decimal), "64" (hex)", or "144" (octal).
Only source files are indexed, as mkid & friends only know about
certain languages (C, C++, & assembly being a few). Also, comments
aren't indexed, although gid will display hits in comments (because
the file being grep'd contains a hit in a non-comment line).
However, the "id-utils-3.2" package for -current dumps core when used
to index /usr/src. I don't have the time to track this down.
On my system, indexing /usr/src using mkid3 took a bit over 2 minutes,
and the indices took up around 9.1MB. The index was built using:
find . -type f | grep '\.[chsSly][cxp]*$' | time mkid -
(Note: id-utils is further broken, since it cannot take the list of
files to index from stdin or a file -- this example is for mkid3.)
Both glimpse and global index more files by default (in the case of
glimpse, Makefiles, CVS/Root, CVS/Repository, COPYRIGHT files,
etc. were indexed).
It's VERY fast. On my system, searching for ptrace takes under 0.5 sec.:
gid ptrace
Yup, that's under one-half second, with 195 lines of output.
Let's try looking for where "utmp.h" is used:
gid utmp.h
This takes around 2.5 seconds.
***** Bottom line:
For general-purpose use, mkid and friends is best, as long as you
don't need to search through comments or non-source files (Makefiles,
README's, etc.). The database index is reasonably small, the
indexing time is relatively quick, and the search times are often
comparable to or better than those of global. However, mkid and friends
can't just tell you where something is defined; they can only show where
it is defined and used.
If you need to search through comments, or need to search
non-source files, glimpse is good. The index is larger than that of
mkid/id-utils, and the search speed is decent, but not great. For many
searches, it's faster than plain grep, although it can be comparable to
grep in some cases.
I've got mixed feelings about global. On the one hand, you can't
beat it for locating where a function is defined, and it's very good at
showing where a variable is used. However, for best results, you have
to remember to use different options when searching for function
definitions, identifier usage, preprocessor definitions, etc., and you
may still have to resort to doing a full grep because, for some
searches, global is too slow. The indices for global are HUGE, and
indexing takes much longer than other approaches. I'm surprised that
global is part of the base distribution, instead of being a port.
--
Darryl Okahata
[EMAIL PROTECTED]
DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Hewlett-Packard, or of the
little green men that have been following him all day.
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message