Raymond Hettinger wrote: > > Mark Dickinson wrote: >> I have a simple 192-line Python script that begins with the line: >> >> dummy0 = 47 >> >> The script runs in less than 2.5 seconds. The variable dummy0 is never >> referenced again, directly or indirectly, by the rest of the script. >> >> Here's the surprise: if I remove or comment out this first line, the >> script takes more than 15 seconds to run. So it appears that adding a >> redundant line produces a spectacular six-fold increase in speed! > > Thanks for your post. It is cute, brilliant, and interesting. > > I haven't had time to investigate but can point at the likely cause. > All of the global names are stored in a single hash table. Search time > is dictated by two factors, the sparseness of the hash table and the > distribution of hash values. With respect to sparseness, whenever that > table becomes 2/3 full, it grows by a factor of four and becomes only > 1/6 full (leading to many fewer collisions). With respect to > distribution, it should be noted that string hash values are decidely > non-random and your variable names likely congested consecutive spaces > in a nearly full table (resulting in seven times as many search probes > to find a global value). > > When the extra value was added, it likely resized the table four-fold > and redistributed the hash values into fewer consecutive positions.
If that's the cause, then here's another reason to use long, descriptive names instead of C64-BASIC style a, b, c, i, j... - with long names the chances of hash collisions are pretty low. Or everyone will start optimizing their programs by using long, *random* names ;) -- Benjamin Niemann Email: pink at odahoda dot de WWW: http://www.odahoda.de/ -- http://mail.python.org/mailman/listinfo/python-list