Mike Lambert wrote: > Rather impressive. Except that it makes me look bad. :)
You have to follow orders - I don't any more! As I said before, I am now doing this for fun - and when running on a slow machine, speed becomes very important. I have been very pleased to see that a lot of my code has made it into the official codebase, and that you have been extending the concepts. > Your solution, (ignoring the extra cycle counter byte for now), cannot > handle vtable methods implemented in Parrot. The current system to I agree that the cycle counter approach is not going to handle all possibilities - however, it was originally dismissed because: << start quote from Dan Unfortunately I don't. I'm willing to live with a performance degredation in those spots where it's truly necessary, but only there. The performance hit has to be localized to those places where infant mortality is actually a problem. Those spots are currently reasonably rare, and I'm not sure that the hoops we're building to jump through (and I'm as guilty as anyone) are worth it. >> end quote I'm sure you'll agree that the stack-walking code suffers from the same problem - we take a performance hit for every DOD run. As Nick suggested, perhaps we need multiple methodologies. The cycle counter approach really only has one merit - it is transparent - and that was why I proposed it originally, because we had multiple developers producing code that had neonate bugs. > 2) Currently, we use linked list of buffer headers for freeing and > allocating headers. I'm not sure what you mean by saying that they are > sorted by bufstart? What does this buy over the current system? I am now keeping a linked list of *allocated* buffer headers, headed up by the Memory_Block structure, and linked in ascending order by their bufstart value (this is how they are allocated anyway) This is primarily used by COW-3 - when a shared header is created, it is inserted into the linked list in the appropriate place. Then compact_pool runs through the Memory_Block list, and finds each header in sequence. If the header is dead (this is one of the things I have not re-implemented yet) then it is freed and no copy takes place; if the header overlaps the previous one (i.e. COW) then the header is re-addressed without a copy. This fixes the problem of a large buffer being kept around only because a small substring is still alive. Unfortunately, it is very expensive in terms of memory usage, as it uses a doubly-linked list and a reference back to the pool (although the latter could probably be eliminated.) > 4) Isn't this really the same thing as item 3? I'm basing my knowledge off > your old COW patches. Has additional work been done on the string function > integration since then, or do #3 and #4 both come from those patches? The logic described above is what I meant by #3 i.e. the GC code is equipped to handle COW of any buffer type, not only strings. #4 meant that only for strings is any actual COW functionality implemented in the next layer. > I'm not sure how much of the new code you've merged with. Which of the new > files are you planning to integrate/merge with, and which have you thrown > out in favor of older versions? I'm specifically referring to any of > resources/dod/smallobject/headers.c. I have basically cut-and-pasted my changes into your new files; because otherwise I was going to be arrested for cruelty to CVS. My last diff against current CVS produced a 24K patch file - I don't believe that diff is quite accurate any more, but I have also installed Steve's patch to test with GC_DEBUG, so I will have to do a bit of work to sort it out. Once I have a working patch, I will email it to you directly, in case you can find anything useful in it. For interest, the other changes in my old version that I am still working on integrating are: 1) Allocate memory pool in fixed size chunks, even during compaction. In other words, instead of allocating a new block of memory equal to the current size used, allocate a chunk (of say 16K or whatever), then allocate another chunk when that one is full. During compaction, once a chunk has been emptied, it is returned to a free chunk list, and can then be reallocated. This reduces the peak memory consumption, and allows the compaction process to be suspended between chunks. I believe it would also make generational collection easier, though I haven't tried that yet. The downside is obviously wasted space at the end of each chunk; but there are ways to minimise that. Chunk sizes are increased dynamically as memory usage grows, and to accomodate large allocation requests. 2) Postpone freeing of buffers. At present, the DOD run first marks everything it can find, then frees what it couldn't. By updating the cycle counter in the buffer header when a buffer is marked live, the free_unused_buffers runs can be delayed until we actually need free buffers in a specific resource (sorry, small object) pool. In conjunction with the code I referred to above for freeing buffer headers during a compaction run, this can reduce the frequency of running free_unused_buffers. This requires that buffer_lives be passed the interpreter, so it can access the cycle counter. The other changes I made were to the string subsystem, and I probably won't do them again because they change too many files: a) replace chartype and encoding by a single charset entry This removes one pointer field from the string header, and simplifies all the string functions, as they are forever checking both fields to determine if transcoding is required. I do not believe that the two fields are orthogonal, and therefore the number of charsets would be less than #chartypes * #encodings. b) some alterations to the single vtable thus created, in particular the addition of a find_substring method. -- Peter Gibbs EmKel Systems ----- Original Message ----- To: "Peter Gibbs" <[EMAIL PROTECTED]> Cc: "perl6-internals" <[EMAIL PROTECTED]> Sent: 16 August 2002 07:36 Subject: Re: [INFO] The first pirate parrot takes to the air > > For purely academic purposes, I have re-synchronised some of my > > forbidden code with the latest CVS version of Parrot. All tests pass > > without gc debug; and with gc_debug plus Steve Fink's patch. > > Benchmarks follow, on a 166MHz Pentium running linux 2.2.18. > > > > Parrot African Grey > > life (5000 generations) 172 seconds 81 seconds > > reverse <core_ops.c >/dev/null 193 seconds 130 seconds > > hanoi 14 >/dev/null 51 seconds 37 seconds > > > > The differences between the two versions are: > > 1) Use of the interpreter cycle-counter instead of stack walking. > > 2) Linked lists of buffer headers sorted by bufstart > > 3) COW-supporting code in GC (for all buffer objects) > > 4) Implementation of COW for string_copy and string_substr > > 1) Yeah, the approach of cycle-counter is a nice one. I also had a similar > solution involving neonate flag usage, somewhere in the archives. Both > have *significant* speed advantages versus the curent codebase's stack > walking. > > I tried to convince Dan of the merit, but they failed for various reasons: > > implement this involves the interpreter recursively calling runops_core to > handle the vtable method. If you increment cycles on the inner loop, you > risk pre-collection of stuff on the stack of the vtable method calling > stuff. If you don't increment cycles, you prevent any of the memory > allocated inside of this vtable method from ever being collected during > the method execution...bad stuff when your vtable methods are multiplying > gigantic matrices or somesuch. > > My neonate buffers solution fails only in the presence of longjmp. > > Granted, we don't do any of this yet, so these solutions will mop the > floor with my current stackwalk code, and pass tests to boot. But it's the > planned introduction of these requirements which are the reason for making > these solutions 'forbidden'. > > One of Nick's solutions was to fallback on stack-walking to handle the > cases where our faster solutions fail. I can definitely see that working > with neonate buffers to handle the fixups needed after a longjmp call. But > it doesn't seem as attractive in the presence of your solution, for which > it would require stackwalking for all re-entrant runops calls. Do you have > another solutioin in mind for handling re-entrant runops calls? > > As far as the extra byte in the buffer, I don't mind that one at all. > There are a lot of restrictions on the GC code in the interest of making > stuff "lightweight". Unfortuantely, GC code takes a significant portion of > the execution time in any realistic application. Hopefully we can convince > Dan to allow extra fields in the buffers in the interest of speed, but I > don't think we can reduce parrot/perl6's feature set in the interest of > speed...might as well use C if that's what you want. :) > > > 3) Definitely a good one. I've been trying to merge your original COW > patch into my code here. Without GC_DEBUG, it fails one test. With > GC_DEBUG, it fails the traditional set plus that one test. The test case > is rather large unfortunately, I haven't been able to narrow down the > problem further or I'd have committed it. > > > > Some of the changes I made before the memory management > > code was totally reorganised have not yet been re-integrated. > > My last version prior to that reorganisation ran 5000 lives in > > 61 seconds, and I hope to get back to somewhere close to > > that again. > > > Regardless of whether or not it goes in, I'd be interested in seeing a > patch. I can work on integrating a lot of your non-forbidden code into the > current codebase. > > Thanks for spending the time to generate these numbers...they're a nice > eyeopener on what can be done without the current restrictions. Hopefully > they'll allow us to reconsider each restriction in the context of > the speed of our GC. > > Mike Lambert