Re: RFC 326 (v1) Symbols, symbols everywhere

Paolo Molaro Mon, 02 Oct 2000 06:37:23 -0700
On 09/27/00 Ken Fox wrote:
> Dave Storrs wrote:
> > It isn't terribly clear to me either
> 
> Well, he does give a couple references that would clear it up.
> X11 Atoms are well documented.
> 
> > saying is that you can qs() a method name, get a "thingie" out, store the
> > thingine in a scalar, and then that scalar becomes a direct portal to the
> > method...somewhat like a coderef, but without a required deref.
> 
> Actually it's more trivial than that. When you "intern" a symbol, you're
> adding a string-to-integer mapping to the compiler's symbol table. Whenever
> the compiler sees the string, it replaces it with the corresponding
> integer. (The type stays "symbol" though; I'm sort of mixing implementation
> and semantics.) Think of it like a compile-time hash for barewords.

Not only that: every time the compiler sees another symbol with the
same string representation, it uses the already created symbol, so
it doesn't use more memory.
A non-trivial program probably will use several packages (or binary
modules that use several packages, ie Gtk). Let's look at the DESTROY 
method. Currently a string is malloc()ed (in the symbol table for
every package), so that takes 8 bytes for the string + the malloc 
overhead (at least 4 bytes, probably 8 on 32 bit systems). This
doesn't consider other memory that could be saved using hash tables
optimized for symbols (ie integers instead of strings).
Repeat that for all the duplicated method names in a class hierarchy
and you'll easily gain several KB of memory.
As a bonus you'll get faster performance (as integer compare is
faster than a strcmp).

As for the possible uses in the language I should have used a
better example. Let's consider an XML/SGML file with all that
ugly tags and attributes we love (well, no!). An XML parser
loads the file and stores the tags and attributes names as
strings: a lot of tags appear many times in an XML file
leading to a huge memory consumption problem. Now, if the
parser could use symbols, the memory for a tag name would
be allocated only once (so it's also faster because it
doesn't call malloc() that often).
Walking the tree in your perl program you could use integer 
comparison instead of string comparison.

use Benchmark;

$num1 = 10;
$num2 = 20;
$string1 = 'htmltag';
$string2 = 'htmltag';
$string3 = 'buffy';

timethese(10000000, {
        'number' => '$num1 == $num2',
        'stringe' => '$string1 eq $string2', # worst case
        'string' => '$string1 eq $string3',  # best case: length differs
});

Gives:
Benchmark: timing 10000000 iterations of number, string, stringe...
    number:  4 wallclock secs ( 3.87 usr +  0.00 sys =  3.87 CPU)
    string:  6 wallclock secs ( 4.28 usr +  0.01 sys =  4.29 CPU)
   stringe:  7 wallclock secs ( 5.97 usr +  0.00 sys =  5.97 CPU)

In the internals using C the performance gains are way more than
the 30% average here.

So, both for internal use and as a language feature there are
advantages, implementation is easy. If no one shows a significant
drawback, it's a deal:-)

The only real problem I see is choosing the single character for
using symbols in the language. I suggested ^ or :, but * may work
as well if typeglobs go away.

Thanks,
        lupus

-- 
Paolo Molaro, Open Source Developer, Linuxcare, Inc.
+39.049.8043411 tel, +39.049.8043412 fax
[EMAIL PROTECTED], http://www.linuxcare.com/
Linuxcare. Support for the revolution.
Re: RFC 326 (v1) Symbols, symbols everywhere

Reply via email to