Steve Fink wrote:
<lots of interesting stuff about hashes etc.>

Reading this made me wonder if we should consider cached string
transcodings,
if we don't end up storing strings in a single form internally. The worst
case is
probably string constants, which could be transcoded over and over again
into
the same alternate encoding. As an extension of this, the hashing algorithm
could
be deemed to be another encoding i.e. the hashed value of a string could
also be cached. Trying to decide when a cache entry was no longer needed
could be a little bit tricky, but it might be worth giving some thought to.

Interestingly, the current implementation of string_compare first transcodes
if required, then proceeds to do a codepoint-by-codepoint comparison loop
anyway. If a new vtable entry was added to do a single codepoint extract and
transcode, i.e. extract_unicode or some such, then the transcodings could be
removed. This would be best handled using iterators, as the current two
vtable calls (decode, then advance) would then become one. Also, if the
encodings and chartypes match, a straight memory compare could be used -
this is what we would want for opaque data anyway.

As far as the GC goes, it looks like what you really need for the hash
structures is immobile memory. This can almost be done at the moment by
resorting to using system-level allocation (checking the code I see that
BUFFER_sysmem_FLAG memory will not be correctly released at present; this
will be fixed). Alternatively, the ability to lock a specific buffer for a
short period might be useful; this is somewhat difficult to implement with a
copy-collection system, but not impossible.
In the meantime, we can introduce functions in resources.c to handle the
blocking/unblocking of collections instead of direct manipulation; then an
attempt to collect while blocked could set a flag, and the unblocking would
trigger the blocked collection - this should reduce the impact of blocking.

--
Peter Gibbs
EmKel Systems

Reply via email to