At 1:51 PM +0200 4/28/02, Peter Gibbs wrote: > > The data which needs to be stored along with the buffer data, can be >> stored as either a header or a footer. The size of this header needs to be >> a multiple of 16 (or possibly even 8) bytes, so that the real buffer >> which follows would be correctly aligned. I'm not sure if this applies for >> a footer. > >> A footer might allow us to to tack 5-8 bytes on to the end of an >> allocation, which might not always go 'over' the 16 byte rounding-up limit >> we currently have in place. As long as we're not just-below the 16 byte >> barrier on most of our allocations, this shouldn't waste any more memory. > >My current code (see >http://www.mail-archive.com/perl6-internals@perl.org/msg09196.html) requires >one flag byte within the buffer itself, provided the buffer is always >guaranteed to be large enough to hold a pointer, which the current 16-byte >allocation scheme ensures. To implement substrings requires an additional >pointer (or offset) in the string header. >I was assuming that we only want to implement COW for strings, as non-string >buffers are generally speaking not under our control.
Fair enough. A minimum allocation of one pointer plus a byte isn't unreasonable. >I still think that the cheapest implementation for the flag byte is a >footer, as we don't need to worry about alignment. This is actually a >zero-cost option as far as memory allocation is concerned, as the current >allocation scheme always allocates at least one extra byte. > >Variable-sized string headers are not really an option; if the overhead of >an additional pointer is a problem, my first inclination would be to combine >the chartype and encoding pointers into a single vtable entry. It wasn't the extra pointer so much as the guaranteed extra 16 bytes per allocation if we went with a header scheme. (Since we guarantee 16-byte alignment at the moment, we have to parcel out memory in 16 byte chunks) However... At this point I think we should do COW, and I think I know how to do it cheaply. What we need is separate allocation routines for String and Buffer data. String data doesn't need to be aligned, so we don't have to bother with 16 byte chunks. Going 2 or 4 byte chunks should be sufficient for Strings. I'd been conflating string and non-string general memory and, while that made for a smaller API, it also is rather wasteful for string data, of which we'll have a lot. As part of that, I think we could also do with an overhaul of the allocation system to better handle constants as well. We yank constant data all over the place for no good reason. Constants are immortal so if they're in their own pool there's no reason at all to collect the things. (Well, not often--if we support module removal at runtime we can potentially have constants go away, but that shouldn't be at all common) So, let's do this: 1) We'll add allocate_string and reallocate_string functions, which the strings use. It'll give us COW space at the end of the string data. 2) We'll add in new_*_const_header to match the new_*_header functions, to allocate String/Buffer/PMC headers from constant header arenas rather than from the default arenas 3) We'll add in (re)?allocate_const functions to allocate memory from constant pools rather than the default collectable pools. 4) We'll add in COW functionality for strings and see what sort of win we get. (I'm not sure that we'll win much with general COW, since my gut feeling is that most COW strings will have a constant string as their source, but I could be wrong here. That'd be OK) This should decrease the amount of data we copy on GC runs, the number of headers we trace for DOD runs, and generally tighten up our memory usage. You up for implementing this, Peter? -- Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk