On 3/14/12 1:44 PM, Richard Neill wrote:
> Dear All,
> 
> I don't know for certain if this is a bug per se, but I think
> "compgen -W" is much slower than it "should" be in the case of a large
> (10000+) number of options.
> 
> For example (on a fast i7 2700 CPU), I measure:
> 
> compgen -W "`seq 1 50000`" 1794       #3.83 s
> compgen -W "`seq 1 50000 | grep 1794`"   #0.019 s
> 
> In these examples, I'm using `seq` as a trivial way to generate some data,
> and picking 1794 as a totally arbitrary match.
> 
> In the first example, compgen is doing the filtering, whereas in the 2nd, I
> obtain the same result very much faster with grep.
> 
> If I increase the upper number by a factor of 10, to 500000, these times
> become,  436 s (yes, really, 7 minutes!) and 0.20 s respectively.  This
> suggests that the algorithm used by compgen is O(n^2) whereas the algorithm
> used by grep is 0(1).

You've identified a problem, but your analysis is incomplete.  The search
algorithm isn't the problem -- it's simply a linked list traversal that
does an initial substring match.

The function (pcomplete.c:gen_wordlist_matches()) spends more than 60% of
its time generating that linked list.  It splits the string argument to -W
into words at $IFS characters, but it differs from standard shell word
splitting in that it splits everything (not just the results of expansion),
understands embedded quoted strings, and understands embedded shell
expansions.  Though word splitting itself is costly when you're splitting a
300K character string, this function is slower.

I will look at optimizing that function, but it's always going to take time
to plow through 300K when you have to split it into words.  (There's not
actually any word splitting of consequence happening with your second
example using the pipeline.)

I've attached a script I used to identify the costs associated with the
various parts of the shell word expansions and compgen's behavior.  The
rest I had to build a profiling version of bash to analyze.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    c...@case.edu    http://cnswww.cns.cwru.edu/~chet/
echo 1. assignment with command substitution
time S1=`seq 1 50000`
time S2=`seq 1 50000 | grep 1794`
echo

echo 2. execution
time seq 1 50000 > /dev/null
time seq 1 50000 | grep 1794 > /dev/null
echo

echo 3. null builtin with command substitution
time : "`seq 1 50000`"
time : "`seq 1 50000 | grep 1794`"
echo

echo 4. null builtin with word splitting from variable
time : $S1
time : $S2
echo

echo 5. null builtin with command substitution and word splitting
time : `seq 1 50000`
time : `seq 1 50000 | grep 1794`
echo

echo 6. compgen with variable
time compgen -W "$S1" 1794 >/dev/null
time compgen -W "$S2" >/dev/null
echo

echo 7. compgen with command substitution
time compgen -W "`seq 1 50000`" 1794 >/dev/null
time compgen -W "`seq 1 50000 | grep 1794`" >/dev/null
echo

echo 8. compgen with variable, null match
time compgen -W "$S1" >/dev/null
echo

echo 9. compgen with command substitution, null match
time compgen -W "`seq 1 50000`" >/dev/null

Reply via email to