"Brent Dax" <[EMAIL PROTECTED]> writes:

> Dan Sugalski:
> # Huh? No, you misunderstand. Each chunk of the bytecode has a separate 
> # TOC for stuff like this. The full identifier would be 
> # file/chunk/entry, which should be reasonably guaranteed to be unique. 
> # When the compiler's emitting code to reference a piece of binary data 
> # (which is essentially a big binary string constant, but I realize 
> # that having it in separate segments is terribly useful) it can turn 
> # any human-readable identifier into the internal identifier the engine 
> # needs to look up the actual data.
> 
>       DIRECTORY:
>               SEG 1 OFFSET: 324
>               SEG 2 OFFSET: 2496
>               SEG 3 OFFSET: 32482
>               ...
> 
>       SEG 1:
>               TYPE: Line Locations
>               LENGTH: 2070
>               DATA: 101011101001...
> 
> I was thinking in terms of what TYPE: stores; it seems you were thinking
> about how you identify a particular segment.  Yeah, you can probably get
> away with just numbering the segments, although that might slow things
> down a bit when you're looking for a particular type of segment.  (In
> foo.pbc, the line location segment might be 1, but in bar.pbc, it's
> 2.)

Thinking a little about it the Type-field the correct way. This would
allow diffrent __DATA__ segments with same type. The name of the
segment is a totally diffrent concept.

> BTW, my father (a programmer too, although most of his work is with
> database-driven programs) suggested a solution that's half-way between
> string and number: hash the string and use the hash as the number.  With
> a good hashing function (say, MD5 with the four chunks XORed together)
> you'll probably be able to avoid collisions but still have unique
> identifiers.

The storage size is not really an issue. We use 32-bit opcodes
(sometimes even 64-bit). So we could store the name/type fields as
text in the file. The hash will be generated at load time (or the
first time a lookup_by_name is done.) This could be speed up if we
dump the hash to disc, but this would make the hash function part of
the Packfile definition.

As the bytecode itself knows at what index the segment lies, it won't
normaly lookup_by_name but rather lookup_by_index. If the directory is
aranged clever this can be done without reading the names.

    DIRECTORY:
        number_of_items
        SEG1:
            size
            offset
            flags
            varlen_pos
        SEG2:
            size
            offset
            flags
            varlen_pos
        ...
        SEG1_varlen:
            name
            type
        SEG2_varlen:
            name
            type
        ...

bye
b.
-- 
Juergen Boemmels                        [EMAIL PROTECTED]
Fachbereich Physik                      Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern             Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47

Reply via email to