On Fri, May 24, 2002 at 12:43:00AM +0200, Peter Gibbs wrote: Reformatted slightly as "X-Mailer: Microsoft Outlook Express 5.50.4133.2400" seems to like re-wrapping your hardwrapped lines.
> Reading this made me wonder if we should consider cached string > transcodings, if we don't end up storing strings in a single form > internally. The worst case is probably string constants, which could be > transcoded over and over again into the same alternate encoding. As an > extension of this, the hashing algorithm could be deemed to be another > encoding i.e. the hashed value of a string could also be cached. Trying to > decide when a cache entry was no longer needed could be a little bit tricky, > but it might be worth giving some thought to. I suspect that string constants could be quite important. Being able to cache each transcodings as it got needed could speed things up. It would be good to have all references to a constant still point to one place, so that the transcoding only had to be done once and all benefit. I don't know if threading screws this idea up. (because you'd have to lock the shared constant every time anything reads it, to stop anyone else making a new transcoding appear just as you read it) Nick I-S recoded parts of perl5 so that constants like the bar in $foo{bar} would be stored as scalars which both point as entries into the shared string table, and contain their hash value pre-computed. He did see a speedup in his heavy OO program (in Tk, I suspect) of a few percent, but the infamous perlbench can't be goaded into showing any sort of speed up or slowdown on this or various of my hash key experiments. I believe having the scalars use a pointer to shared hash key (rather than a private malloc()ed buffer) was the bigger win, as memcmp on the keys becomes left == right || memcmp(left, right, length) == 0 and hopefully the pointer comparison hits often. However, pre-computed hash keys may have helped a bit in perl5. There are probably several ideas to take out of my ramble: 1: If it can be arranged for parrot to have constants in some shared pool, and better still things are copy-on-write from it, then C<eq> can be accelerated by comparing pointers to things (if one isn't a substring) 2: If scalars are able to cache their hash value, then C<eq> for 2 scalars with cached hash values can be accelerated by first comparing the hash values, and rejecting if different. This is only going to work on good old fashioned binary comparisons, or if hash values are calculated by transcoding and normalising to some form that considers things equivalent in the same way that C<eq> should. 3: If it can be arranged for hash keys to become cached in scalars (even if the transcoding of the string into whatever encoding the hash keys are stored in is no longer cached) then it provides a quick reject mechanism when looking to see if that string is in a hash - if the cached hash value doesn't match an the hash value of any keys in the target hash, then you know it's not in there, and you don't need to transcode the string. I confess I don't have an understanding of how and when the innards of the parrot string system does transcoding, or how C<eq> and hash keys are going to deal with Unicode normal forms, so there may be some flaws in the above which are obvious to anyone who does understand these things. I could also have stupid mistakes in my reasoning. Nicholas Clark -- Even better than the real thing: http://nms-cgi.sourceforge.net/