Steve Fink wrote: <lots of interesting stuff about hashes etc.> Reading this made me wonder if we should consider cached string transcodings, if we don't end up storing strings in a single form internally. The worst case is probably string constants, which could be transcoded over and over again into the same alternate encoding. As an extension of this, the hashing algorithm could be deemed to be another encoding i.e. the hashed value of a string could also be cached. Trying to decide when a cache entry was no longer needed could be a little bit tricky, but it might be worth giving some thought to.
Interestingly, the current implementation of string_compare first transcodes if required, then proceeds to do a codepoint-by-codepoint comparison loop anyway. If a new vtable entry was added to do a single codepoint extract and transcode, i.e. extract_unicode or some such, then the transcodings could be removed. This would be best handled using iterators, as the current two vtable calls (decode, then advance) would then become one. Also, if the encodings and chartypes match, a straight memory compare could be used - this is what we would want for opaque data anyway. As far as the GC goes, it looks like what you really need for the hash structures is immobile memory. This can almost be done at the moment by resorting to using system-level allocation (checking the code I see that BUFFER_sysmem_FLAG memory will not be correctly released at present; this will be fixed). Alternatively, the ability to lock a specific buffer for a short period might be useful; this is somewhat difficult to implement with a copy-collection system, but not impossible. In the meantime, we can introduce functions in resources.c to handle the blocking/unblocking of collections instead of direct manipulation; then an attempt to collect while blocked could set a flag, and the unblocking would trigger the blocked collection - this should reduce the impact of blocking. -- Peter Gibbs EmKel Systems