Re: GC design
On Mon, May 27, 2002 at 08:41:59AM -0700, Sean O'Rourke wrote: > But there are two kinds of "available" here: available without asking the > operating system for more; and available period. If we're in the first > situation, it seems reasonable to just ask the OS for a new block and keep > going, noting that collecting soon would be a Good Thing. Our memory > requirements have probably increased, so we'd be asking for more soon > anyways, and if they haven't, we'll reclaim a lot of memory on the next > collection, and only be "wasting" 8 or 16k. We can even try to give some > back, though the OS probably won't take it. If we're in the second > situation, then either (1) we've gone far too long without collectiong, or > (2) we're in serious trouble anyways. And even in this situation, we can > wriggle out of it most of the time by keeping some amount of "emergency" > memory stashed away. I don't think (1) is true. For performance reason, we should let quite a lot of garbage accumulate before triggering a collection (something like 40% of space overhead). So, for instance, if the heap is full and we need to allocate a huge string (10 Mbytes), it is quite likely that performing a collection will free enough memory. And I'm not sure we are ready to always keep 10 Mbytes of "emergency" free memory to be able to handle this case without collection. [...] > By the way, neither of string_grow()'s callers > checks its return value now, which indicates to me that this may be > error-prone. Also, it seems that the return value of alloc_new_block is never checked, even when really huge blocks are allocated, as in compact_buffer_pool... -- Jerome
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision
Okay. I have yet another idea for solving our infant mortality problem, which I think Dan might like. :) The neonate idea originally was intended to be set on *all* headers returned by the memory system, and they'd be reset by a clear_neonate op. At least, that's how I understood it. A straightforward implementation of the above is about 50% slower than it was before, so I think that rules this option out. The current code (without this patch), adds neonate wherever it discovers that it is needed, and turns it off when it is done. This was quite efficient, but required the user to constantly think about what functions could cause GC, etc. It was rather error-prone. If I understood Dan correctly on IRC yesterday, he was proposing that our current approach of handling infant mortality everywhere it can occur, is the 'correct' approach. It definitely buys us speed, but as mentioned above, it's somewhat error prone. The below is an attempt to try and convince Dan that in lieu of hardcore GC-everywhere programming, there is a middle ground. I believe we need a middle ground because forcing users to learn the quirks of our GC system makes parrot programming less fun, and raises Parrot's barrier to entry. As I was working on my revised GC system, I came up with a relaxation of the above that should be easier on programmers, and yet still be fast. It's not revolutionary by any means, but rather grabbing bits and pieces of different people's solutions. When you call new_*_header, the neonate flag is automatically turned on for you. As a programmer writing a function, you explicitly turn off the neonate flag when you attach it to the root set, or let it die on the stack. If you return it, you don't do anything, as it becomes the caller's job to handle. Neonate guarantees that it won't be collected, avoiding infant mortality. The programmer does not have to explicitly turn it on. Just turn it off. >From a cursory glance over string.c, only string_concat and string_compare create strings which die within the scope of that function, and thus need to be modified. This approach would complicate many of our string .ops, however. Stuff like "$1 = s" needs to turn off the neonate flag. Perhaps we can encode logic into the ops2c converter to turn off the neonate flag for things that it can detect, or perhaps we can require the user to do it because automated converters are guaranteed to fail. Core.ops requires a lot of such modifications, however. Things like err, open, readline, print, read, write, clone, set, set_keyed, the various string ops (substr, pack, etc), and savec, all require modification. I think these guidelines make it easy for non-GC-programmers to writ GC-dafe code, since they do not need to be aware of what allocates memory, and what does not. What do people think of this approach? Mike Lambert
Re: [netlabs #629] [PATCH] Memory manager/garbage collector - major revision
On Mon, May 27, 2002 at 04:33:07PM -, Peter Gibbs wrote: > These changes do cause a slight performance degradation, but I believe it is > worth it for the overall simplification of transparent protection of the > newborn. > Performance can only be a secondary goal, after correct behaviour. What level of performance are we aiming at? It seems to me that memory allocation is already quite slow. So, I find it worrying that this change make it noticably slower. Another point to consider is that if we start to write code that assumes that newborns are not garbage collected, then it will hard to fix this code later if this turns out to be too costly. -- Jerome
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, May 28, 2002 at 04:45:06AM -0400, Mike Lambert wrote: > When you call new_*_header, the neonate > flag is automatically turned on for you. As a programmer writing a > function, you explicitly turn off the neonate flag when you attach it to > the root set, or let it die on the stack. If you return it, you don't do > anything, as it becomes the caller's job to handle. Suppose your C code builds a nested datastructure. For instance, it creates some strings and add them to a hash-table. The hash-table is then returned. Should it clear the neonate flag of the strings? -- Jerome
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
I propose the following alternate guidelines. First, the code would look something like this: STRING * concat (STRING* a, STRING* b, STRING* c) { PARROT_start(); PARROT_str_params_3(a, b, c); PARROT_str_local_2(d, e); d = string_concat(a, b); e = string_concat(d, c); PARROT_return(e); } Then, the rules would be: (1) start your functions with PARROT_start (2) register all parameters of type STRING * with PARROT_str_params (2') register all parameters of type PMC * with PARROT_pmc_params (3) declare the local variables of type STRING * with PARROT_str_local (3') declare the local variables of type PMC * with PARROT_pmc_local (4) use PARROT_return to exit the function (5) do not nest function calls (for instance, "e = string_concat (string_concat(a, b), c);" would be forbidden) The idea is to explicitly manage a stack of parrot objects, which can be traversed by the GC. This rules let a lot of freedom to the garbage collector: - it can garbage collect anything which is not rooted; - it can move objects (strings and PMCs) around; - objects do not need any additional field/flag; - exception can be implemented using longjmp; if an exception is raised, temporary allocated objects will be properly freed by the GC. Do you think that these rules would be too error-prone, or too cumbersome? -- Jerome
RE: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
Jerome Vouillon: # I propose the following alternate guidelines. # # First, the code would look something like this: # # STRING * concat (STRING* a, STRING* b, STRING* c) { # PARROT_start(); # PARROT_str_params_3(a, b, c); # PARROT_str_local_2(d, e); # # d = string_concat(a, b); # e = string_concat(d, c); # # PARROT_return(e); # } # # Then, the rules would be: # (1) start your functions with PARROT_start # (2) register all parameters of type STRING * with PARROT_str_params # (2') register all parameters of type PMC * with PARROT_pmc_params # (3) declare the local variables of type STRING * with # PARROT_str_local # (3') declare the local variables of type PMC * with PARROT_pmc_local I assume the lack of mentions of Buffers are an oversight. # (4) use PARROT_return to exit the function # (5) do not nest function calls #(for instance, "e = string_concat (string_concat(a, b), c);" # would be forbidden) I don't understand the reasoning behind (5). Would you care to elaborate? # The idea is to explicitly manage a stack of parrot objects, # which can be traversed by the GC. # # This rules let a lot of freedom to the garbage collector: # - it can garbage collect anything which is not rooted; # - it can move objects (strings and PMCs) around; # - objects do not need any additional field/flag; # - exception can be implemented using longjmp; if an exception is # raised, temporary allocated objects will be properly freed # by the GC. Anything that lets us use longjmp is fine by me. ;^) # Do you think that these rules would be too error-prone, or # too cumbersome? May I suggest an alternate version? STRING * concat (STRING* a, STRING* b, STRING* c) { Enter_sub(); /* The macros don't really need a Parrot_ prefix, do they? */ Str_params_3(a, b, c); Decl_str(d); /* These aren't needed anymore, actually... */ Decl_str(e); Return( string_concat( string_concat(a, b), c ) ); } --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) blink: Text blinks (alternates between visible and invisible). Conforming user agents are not required to support this value. --The W3C CSS-2 Specification
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision
> STRING * concat (STRING* a, STRING* b, STRING* c) { >PARROT_start(); >PARROT_str_params_3(a, b, c); >PARROT_str_local_2(d, e); > >d = string_concat(a, b); >e = string_concat(d, c); > >PARROT_return(e); > } Yet more ideas. Woohoo! :) I considered this kind of approach myself, but discarded it due to the ton of extraneous code you have to write to do the simplest of things. :( I'm not sure if the other people have considered it, discarded it, or are still considering it. As far as the pros/cons... First, it requires you write in a pseudo-language to define your local PMC headers and how to return data. I'm sure the macro freaks that have been scarred by perl5 will jump on here and beat you down in a few hours or so. :) Can you provide an implementation of the macros you described above? I have a few concerns which I'm not sure if they are addressed. For example: PARROT_str_local(d) I'm assuming it puts a reference to d onto the rooted stack. It would also need to initialize d to NULL to avoid pointing at garbage buffers. PARROT_str_params_3(a, b, c); What's the point of this? With rule 5 that prevents function call nesting, you're guaranteed of all your arguments being rooted. I think you can lose either the nesting requirement or the str_params requirement. PARROT_return(e); I'm assuming this backs the stack up to the place pointed to by PARROT_start(), right? This means during a longjmp, the stack won't be backed up properly until another PARROT_return() is called, somewhere farther up the chain, right? Finally, I think Dan has already outlawed longjmp due to problems with threading, but he'll have to elaborate on that. I agree my most recently stated approach is not longjmp safe since it could leave neonate set on certain buffers/pmcs. Finally, in response to my original post, you asked: > Suppose your C code builds a nested datastructure. For instance, > it creates some strings and add them to a hash-table. The hash-table is > then returned. Should it clear the neonate flag of the strings? I think I'd have to say...don't do that. Ops and functions shouldn't be building large data structures, imo. Stuff like buliding large hashes and/or arrays of data should be done in opcode, in perl code, or whatever language is operating on parrot. If you *really* need to operate on a nested datastructure, and you're going to hold it against my proposal, then there are two options. a) write code like: base = newbasepmc #nenoate pmc other = newchildpmc #also neonate base->add(other) #even if collecting/dod'ing, can't collect above two done_with_pmc(other) #un-neonates it, since it's attached to a root (neonate0 set repeat... It works, and then you just need to worry about what to do with your 'base' at the end of the function (to un-neonate it or not). b) make a done_with_children_of_pmc() style function. it hijacks onto the tracing functionality inherent in the DOD code, and searches for a contiguous selection of neonate buffers and pointers eminating from the pmc we pass in, and un-neonates them, leaving the passed-in-pmc neonated. Since everything we do in the function is nenoate, everything we construct into this base pmc should be contiguously neonate, if that makes sense. Granted, it's a little bit expensive to do the tracing, but you shouldn't need to trace too deep at all, and its time is proportional to the size of the nested data structure you are creating. Does that help? Mike Lambert PS: Oh, and I forgot to mention in my previous proposal about the need for nenonating pmc headers, and to look into what functions need to un-neonate pmc headers. That should be localized to the vtable methods, which are sort of a mess right now anyway with the transmogrification of vtables and have other GC problems.
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, May 28, 2002 at 04:57:01AM -0700, Brent Dax wrote: > I assume the lack of mentions of Buffers are an oversight. Right. It would be great if there was only one kind of parrot objects... > # (5) do not nest function calls > #(for instance, "e = string_concat (string_concat(a, b), c);" > # would be forbidden) > > I don't understand the reasoning behind (5). Would you care to > elaborate? Actually, the example I give is safe. But consider: e = string_concat (string_concat(a, b), string_concat(c,d)); Let us assume that the argument of the functions are evaluated rightmost first. Then, the return value of "string_concat(a, b)" is not "rooted" anywhere when "string_concat(c,d)" is executed. So, it will be wrongly freed by the GC if a collection occurs at this time. > The macros don't really need a Parrot_ prefix, do they? Right, you can use whatever name you prefer. -- Jerome
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, May 28, 2002 at 08:30:52AM -0400, Mike Lambert wrote: > Can you provide an implementation of the macros you described above? I > have a few concerns which I'm not sure if they are addressed. For example: #define PARROT_start() \ frame * saved_top = stack_top; > PARROT_str_local(d) > I'm assuming it puts a reference to d onto the rooted stack. It would also > need to initialize d to NULL to avoid pointing at garbage buffers. #define PARROT_str_local(d)\ STRING * d = NULL; \ frame frame_##d;\ int dummy_##d = ( \ (frame_##d.ptr = &d), \ (frame_##d.next = stack_top), \ (stack_top = &frame_##d), \ 0); > PARROT_str_params_3(a, b, c); > What's the point of this? With rule 5 that prevents function call nesting, > you're guaranteed of all your arguments being rooted. I think you can lose > either the nesting requirement or the str_params requirement. Yes, you are right: we don't need this macro if all the arguments are already rooted. > PARROT_return(e); > I'm assuming this backs the stack up to the place pointed to by > PARROT_start(), right? Right: #define PARROT_return(e) \ do { \ stack_top = saved_top; \ return e; \ } while (0); \ > This means during a longjmp, the stack won't be > backed up properly until another PARROT_return() is called, somewhere > farther up the chain, right? Right, the stack has to be backed up explicitely. But the exception handler can do it immediately. > Finally, in response to my original post, you asked: > > > Suppose your C code builds a nested datastructure. For instance, > > it creates some strings and add them to a hash-table. The hash-table is > > then returned. Should it clear the neonate flag of the strings? > > I think I'd have to say...don't do that. Ops and functions shouldn't be > building large data structures, imo. Stuff like buliding large hashes > and/or arrays of data should be done in opcode, in perl code, or whatever > language is operating on parrot. Still, it seems reasonable for a function to return a small datastructure, such as a pair of strings. > If you *really* need to operate on a nested datastructure, and you're > going to hold it against my proposal, then there are two options. > > a) write code like: > base = newbasepmc #nenoate pmc > other = newchildpmc #also neonate > base->add(other) #even if collecting/dod'ing, can't collect above two > done_with_pmc(other) #un-neonates it, since it's attached to a root (neonate0 set > repeat... > > It works, and then you just need to worry about what to do with your > 'base' at the end of the function (to un-neonate it or not). This sounds reasonable. -- Jerome
Re: REGEX structure and Regex implementation
On Sun, 26 May 2002, Steve Fink wrote: > I implemented it that way once in my private tree. But I ended up > replacing it with a couple of PerlArrays. > > I am now of the opinion that there's currently nothing for a regex PMC > to do. At compile-time, you know what sort of beast you're matching > against. If you want to incrementally match an input sequence of some > sort, then you should probably be using the same continuation or > coroutine mechanism that regular subs use. I've done some thinking about it this weekend. Upon reflection, what I should have said was "Regexes should be objects", which is more of a perl6 thing. What you say makes sense: how would the regex ops be called from within the PMC? It would be unneccesary overhead, and it doesn't make much sense... - D <[EMAIL PROTECTED]>
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
>#define PARROT_str_local(d)\ > STRING * d = NULL; \ > frame frame_##d;\ > int dummy_##d = ( \ > (frame_##d.ptr = &d), \ > (frame_##d.next = stack_top), \ > (stack_top = &frame_##d), \ > 0); This triggers the perl5 macro-iitis alarm. Please don't do this :) If you need to do things like that, then maybe we need to develop our own function inliner -- there will still be debugging issues with the optimizer, but less, if it looks like this: STRING * concat (STRING* a, STRING* b, STRING* c) { PARROT_str_local_2(d, e); foo; } Expands into: #line sourcefile.pmc 50 STRING * concat (STRING* a, STRING* b, STRING* c) { #line macros.h 50 STRING * d = NULL; frame frame_d; frame_d.ptr = &d; frame_d.next = stack_top; stack_top = &frame_d; #line sourcefile.pmc 52 BUT-- We've got enough complicated preprocessor issues right now - I'm not sure we want to add another one. Defining perl5ish macros will cause too many troubles down the road. Or... since C99 supports C function inlining (iirc) - we could just rely on a C99 compiler -R
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, 28 May 2002 12:50:06 +0200 Jerome Vouillon <[EMAIL PROTECTED]> wrote: >I propose the following alternate guidelines. > > > STRING * concat (STRING* a, STRING* b, STRING* c) { >PARROT_start(); >PARROT_str_params_3(a, b, c); >PARROT_str_local_2(d, e); > >d = string_concat(a, b); >e = string_concat(d, c); >PARROT_return(e); > } If you search the archive, you'll find I've already proposed this type of solution in 2 different flavors and both times the consensus has gone back to setting flags. It seems we are going around in a circles. I also agree its funny we are worrying about performance when its not apparent the allocation overhead even has anything to do with the current performance problem. -Melvin
Re: [COMMIT] Added preprocessor layer to newasm.
On Tue, 28 May 2002 01:19:25 -0400 Jeff <[EMAIL PROTECTED]> wrote: >newasm now handles constants, macros, and local >labels within. Here's a Great work! >expansion. Also, they don't expand >recursively. '.constant FOO >"blah"n.constant BAR "Hey, .FOO"' won't do what >you want, sadly. Thats exactly what I want. I don't think the assembler should do any sort of interpolation with string constants at all. >that's what I'll work on next. When Simon first committed it, I tested newasm and noticed 2-3x speedup on assembly speed. Is this still the case? I've been running some tests with executing Cola on the fly, and the assembly phase (slow) is the big bug in the soup right now, however I've been using the old assembler. PS: Thanks (Simon and Jeff) for a lot of hard work on newasm. -Melvin
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, May 28, 2002 at 03:45:58PM +0200, Jerome Vouillon wrote: > On Tue, May 28, 2002 at 08:30:52AM -0400, Mike Lambert wrote: > > PARROT_str_params_3(a, b, c); > > What's the point of this? With rule 5 that prevents function call nesting, > > you're guaranteed of all your arguments being rooted. I think you can lose > > either the nesting requirement or the str_params requirement. > > Yes, you are right: we don't need this macro if all the arguments are > already rooted. Well, actually we need this macro if we want to allow the garbage collector to move objects: then, the values of the arguments may need to be updated. -- Jerome
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
On Tue, May 28, 2002 at 07:54:49AM -0700, Robert Spier wrote: > > >#define PARROT_str_local(d)\ > > STRING * d = NULL; \ > > frame frame_##d;\ > > int dummy_##d = ( \ > > (frame_##d.ptr = &d), \ > > (frame_##d.next = stack_top), \ > > (stack_top = &frame_##d), \ > > 0); > > This triggers the perl5 macro-iitis alarm. > > Please don't do this :) > > If you need to do things like that, then maybe we need to develop our > own function inliner -- there will still be debugging issues with the > optimizer, but less, if it looks like this: [...] What are the debugging issues you mention? Note that this macro will never fail: there is no pointer deferencing, no memory allocation, ... -- Jerome
Re: GC design
On Mon, May 27, 2002 at 08:41:59AM -0700, Sean O'Rourke wrote: > Since our memory-allocating routines return NULL quite a ways back up the > call chain (all the way through e.g. string_grow()), here's another way to > do this -- if allocation returns NULL all the way to an op function, it > can make the things it wants to keep reachable from somewhere, do a > collection, and retry. By the way, neither of string_grow()'s callers > checks its return value now, which indicates to me that this may be > error-prone. That's an interesting point, actually. What is the right thing to do when we run out of memory? - Abort immediately. This is not very user-friendly. - Return a special value. But then we need to check the return value of almost all functions (including string_compare, for instance). - Instead of returning a special value, we can set a global variable to signal the error. Again, we need to check this variable everywhere. Note that this is the solution adopted by Java. - Raise an exception using longjmp. But then, if we start using locks all over the place like in Java, we are pretty sure to leave the program in an inconsistent state. (It is neither safe to release the locks nor to keep them.) - Use a small amount of emergency memory to invoke a user-defined handler which can either release some memory or signal the problem to the user. (If the handler returns, we do another collection and we try to reserve some emergency memory again. If this succeeds, then we resume the program. Otherwise, we abort.) But how much emergency memory is enough? -- Jerome
Re: Parrot and Mono / .Net
I believe that the idea is to make things flexible enough that FURTHER changes to Perl, beyond Perl 6, will be easier too. On Sun, 2002-05-26 at 06:10, Ask Bjoern Hansen wrote: > [EMAIL PROTECTED] (Sebastian Bergmann) writes: > > > Leon Brocard wrote: > > > Oh, this happens to be a FAQ. The main reason is: > > > > > > http://www.parrotcode.org/faq/ > > > > I know the technical reason for a new VM, but this could've been a new > > VM for Perl 6 only. What I'd like to know is the motivation to open up > > the architecture and allow for plugable parser, compilers, bytecode > > generators / optimizers, ... > > Because if we can support [insert random language here] then we can > support a very flexible Perl 6 language. Or the other way around. > > > - ask > > -- > ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision
>What are the debugging issues you mention? Note that this macro will >never fail: there is no pointer deferencing, no memory allocation, ... Never is a bad word to use for anything more complicated than x=1+2. (Which will hopefully get constant folded and optimized away anyway.) It is impossible to single step (line by line.. statement by statement) through that macro in a debugger. That is a general class of things we want to avoid. Computers can do weird things sometimes, and if, for example, something is up with the stack, I'm going to want to look at it right before it gets assigned, no matter that it shouldn't have changed in the past 10 lines. -R
Re: GC design
On Tue, 28 May 2002, Jerome Vouillon wrote: > That's an interesting point, actually. What is the right thing to do > when we run out of memory? > - Abort immediately. > This is not very user-friendly. > - Return a special value. > But then we need to check the return value of almost all functions > (including string_compare, for instance). I personally dislike this approach, as it requires a large amount of programming discipline from everyone who works on the project. The current code indicates that if we took this approach, we would spend quite a bit of time squashing bugs from not checking return values in wierd places. It probably also hurts common-case performance to litter your code with redundant null-checks as well, but I don't have any data. > - Instead of returning a special value, we can set a global variable > to signal the error. > Again, we need to check this variable everywhere. This by itself seems worse than the above, since it makes problems even easier to ignore. > Note that this is the solution adopted by Java. Last time I wrote in Java, errors were entirely exception-based. Have things changed because of the locking issues you mention below? > - Raise an exception using longjmp. > But then, if we start using locks all over the place like in Java, > we are pretty sure to leave the program in an inconsistent state. We're currently lock-free, which makes this sound like a good option. > - Use a small amount of emergency memory to invoke a user-defined > handler which can either release some memory or signal the problem > to the user. I worked on some code that had something like this for memory allocation, and found it worked well. If malloc failed, the code would call a handler which free()d a chunk of emergency memory grabbed on startup, and set a global "we're low on memory" flag. Then it would retry the allocation, and exit if it failed. Whenever memory was reclaimed, the system would check to see if it had freed the emergency chunk, and if so, try to reallocate it and clear the flag. The global low-mem flag could then be used to try to reduce overall memory use. > But how much emergency memory is enough? 64k? There will certainly be pathologically large allocations for which we can never have enough, but it seems like we could experiment until we found a size that gives us enough time to recover or exit gracefully. /s
ICU and Parrot
Hello all, Hopefully I won't get too burned by flames by jumping into the middle of the conversation like this. I recently stumbled across your list talking about ICU and Unicode. I am not advocating that you should or shouldn't use ICU. Each group has their own requirements. As a person that actively works on the ICU implementation, I thought I should clear up some of your questions and misconceptions on ICU. I also have a question of my own for this mailing list later on. ICU 2.1 works on MacOS X, and has mostly worked in the past on MacOS 8 and 9 (project files for older Macs are not included). Some companies actively use ICU on the MacOS 8 and 9. The list of supported platforms that was quoted on your mailing list recently was old. Please take a look here http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html#HowToBuildSupported for the latest list of supported platforms. ICU does not work on PalmOS or VMS yet; however, ICU does work on Alpha based Linux, and ICU has been ported to the Tru64 (OSF) platform with Compaq's cxx compiler in the past. We do not have the resources to do this porting effort for every platform ourselves, and so we need other people to do the porting for us. As an open source project we always welcome contributions to make ICU work on other platforms. It is true that parts of ICU uses C++. Some parts of ICU are written in C++ with a C wrapper. Some other parts are written in C with a C++ wrapper. It depends on the API being used. Most of the functionality in the common library is written in C, and most of our i18n library uses C++. You can see some of the C/C++ dependencies here http://oss.software.ibm.com/icu/userguide/design.html under API dependencies. The vast majority of people that we encounter do have a C++ compiler, and we only use the most portable subset of C++. C++ features like templates, exceptions, run-time type information, STL and multiple inheritance are NOT used in ICU. All of our C code is ANSI C89 compliant according to gcc. Since ICU works with some old C compilers, I'm sure that there shouldn't be any concerns about our usage of the C language. Many questions about ICU can be answered on our icu4c-support list. You can go here http://oss.software.ibm.com/icu/archives/index.html to see how to subscribe to the list. On a side note, we have been thinking about putting regular expressions into ICU someday (no firm plans yet). Maybe we could do some collaboration with a regular expression engine in ICU. Would this group be interested in such a collaboration? Thank you for your interest in ICU. George
GC, exceptions, and stuff
Okay, i've thought things over a bit. Here's what we're going to do to deal with infant mortality, exceptions, and suchlike things. Important given: We can *not* use setjmp/longjmp. Period. Not an option--not safe with threads. At this point, having considered the alternatives, I wish it were otherwise but it's not. Too bad for us. So, on to the rules/proclamations/exercise of unreasonable dictatorial power. 1) Functions are only responsible for ensuring liveness of strings/buffers/PMCs up until the point they exit. Strings/PMCs/Buffers that are returned to a caller are the caller's problem. 2) All calls to routines which may fail must check for failure and exit indicating an exception if they do so. They should exit as gracefully as they can. They *may* override the exception if appropriate. (Potentially voiding it, or throwing a different exception) 3) Opcode functions which note that something they called has thrown an exception are responsible for posting an interpreter exception. 4) Everything that can fail *must* be checked. So no code like: string_foo(string_foo(), string_foo()) if string_foo can pitch an exception. 5) We're dealing with infant mortality by pushing baby strings on the stack. We'll add in a stack_extend and quickpush routine to pre-extend (guaranteed) the stack and push the potential baby string/PMC/Buffer respectively. Yes, these will be a pain to deal with. Alas, too bad for us. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: GC design
Sean O'Rourke: # On Tue, 28 May 2002, Jerome Vouillon wrote: # > That's an interesting point, actually. What is the right # thing to do # > when we run out of memory? # > - Abort immediately. # > This is not very user-friendly. # > - Return a special value. # > But then we need to check the return value of almost all functions # > (including string_compare, for instance). # # I personally dislike this approach, as it requires a large # amount of programming discipline from everyone who works on # the project. The current code indicates that if we took this # approach, we would spend quite a bit of time squashing bugs # from not checking return values in wierd places. It probably # also hurts common-case performance to litter your code with # redundant null-checks as well, but I don't have any data. # # > - Instead of returning a special value, we can set a global variable # > to signal the error. # > Again, we need to check this variable everywhere. # # This by itself seems worse than the above, since it makes # problems even easier to ignore. OTOH, if all C functions start with a boilerplate like: if(interpreter->flags & PARROT_exception_FLAG) { return NULL; } then this allows us to easily do things like: string_concat(interpreter, string_concat(interpreter, a, b), c); as long as we check for the exception immediately after. In fact, that even lets you delay checking for exceptions so you can centralize things--see below. # > Note that this is the solution adopted by Java. # # Last time I wrote in Java, errors were entirely # exception-based. Have things changed because of the locking # issues you mention below? I assume he means within the JVM's internals. # > - Raise an exception using longjmp. # > But then, if we start using locks all over the place like in Java, # > we are pretty sure to leave the program in an inconsistent state. # # We're currently lock-free, which makes this sound like a good option. I think we just have to say "if you put a lock on something, make sure to set up an exception handler to unlock it and then rethrow the exception". I also think that any way you do it we'll have to wrap it in macros--at least for embedders and extenders. If I ever find a few months of pure boredom, I'd like to try to reimplement Parrot in C++, just to see how much easier it would make things like PMCs; if we have appropriately set up macros, we can hide from the user the actual implementation of exceptions: #define PARROT_TRY /* nothing */ #define PARROT_CATCH if(interpreter->flags & PARROT_exception_FLAG) ... PARROT_TRY { d=string_concat(interpreter, string_concat(interpreter, a, b), c); ... } PARROT_CATCH { (code) } Just change the macros to: #define PARROT_TRY try #define PARROT_CATCH catch(void* PARROT_CATCH_this_is_unused) for C++, or: #define PARROT_TRY if(!setjmp()) #define PARROT_CATCH else for setjmp/longjmp. (Yes, I know that's not the way setjmp is really used, but you get the idea.) For short stretches of code, I'd imagine that PARROT_TRY d=string_concat(interpreter, string_concat(interpreter, a, b), c) PARROT_CATCH Parrot_fputs(interpreter, Parrot_stderr, "We're screwed!"); (assuming that C++'s try and catch can take statements instead of blocks, anyway--although even if they don't, it's just four more characters.) --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) blink: Text blinks (alternates between visible and invisible). Conforming user agents are not required to support this value. --The W3C CSS-2 Specification
Re: Parrot and Mono / .Net
> > > I know the technical reason for a new VM, but this could've been a new > > > VM for Perl 6 only. What I'd like to know is the motivation to open up > > > the architecture and allow for plugable parser, compilers, bytecode > > > generators / optimizers, ... > > > > > > I believe that the idea is to make things flexible enough that FURTHER > changes to Perl, beyond Perl 6, will be easier too. And, even better, (and this is stated on the Parrot site) this allows Perl 6 to be written _in Perl_. Some people I've spoken to don't see the signficance of this, but it'll be an amazing feat.. no longer will anyone be able to talk Perl down ;-) BTW, this is my first post here, so hi! Cheers, Peter Cooper
Re: Perl6 currying
Larry Wall wrote: > > If we're going to make it a method, however, it's possible that "curry" > is the wrong popular name, despite its being the correct technical name. > There's really nothing about the word "curry" that suggest partial > binding to the casual reader. Perhaps we really want something like: > > my &half = &div.prebind(y => 2); > > or: > > my &half = &div.rewrite(y => 2); > > or even: > > my &half = &div.assume(y => 2); > > I think I like that last one the best. Maybe it would read better as > "assuming". But that's getting a bit long for Mr Huffman. Maybe it's > finally time to reach into our bag of English topicalizers and pull out > "with": > > my &half = &div.with(y => 2); "with" reads very nicely, but we already have a perl6 precedent, perhaps... how about reusing "when" as the method name for currying? This may not curry favor with Damian, but I suggest my & half = & div.when(y => 2); would declare the subroutine "half" to be equal to the subroutine "div" when the parameter y is given the value 2. The code and the English both read very nicely IMHO, and virtually identically, and the English version seems even more clear to me when using "when" rather than with "with". Further, there is a nice analogy with the usage of the keyword "when" in a given clause and the usage of the method name "when" to perform currying: both "when"s specify particular values that control the choice of code to be executed. The "given" keyword supplies the 'parameters' for the "given" statement, and the original function declaration supplies the parameters for the original function. > Larry -- Glenn = Remember, 84.3% of all statistics are made up on the spot.
Re: Perl6 currying
On Tue, 28 May 2002, Glenn Linderman wrote: > "with" reads very nicely, but we already have a perl6 precedent, > perhaps... how about reusing "when" as the method name for currying? > This may not curry favor with Damian, but I suggest > > my & half = & div.when(y => 2); > > would declare the subroutine "half" to be equal to the subroutine "div" > when the parameter y is given the value 2. The code and the English > both read very nicely IMHO, and virtually identically, and the English > version seems even more clear to me when using "when" rather than with > "with". > > Further, there is a nice analogy with the usage of the keyword "when" in > a given clause and the usage of the method name "when" to perform > currying: both "when"s specify particular values that control the choice > of code to be executed. The "given" keyword supplies the 'parameters' > for the "given" statement, and the original function declaration > supplies the parameters for the original function. It is precisely that similarity that's going to become confusing. A big problem with Perl 5, as I have seen, is that it takes a lot of effort to learn. Overloading C will give learners a really hard time. "Wait, so when I give a variable after it then a block it does that block... but if I give a dot before it it returns a function?" they'd seem to say. Wait, does this have any meaning?: my &half = \div(y => 2) Is backslash even a valid operator for reference anymore? If so, this makes sense to me. Luke
RE: Perl6 currying
Luke Palmer: # Wait, does this have any meaning?: # # my &half = \div(y => 2) Call div() with the named parameter 'y' equal to 2, take a reference to its return value, and store that in &half. # Is backslash even a valid operator for reference anymore? If so, this # makes sense to me. I'm sure it's still there. Otherwise there's no way to take a reference to a scalar. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) blink: Text blinks (alternates between visible and invisible). Conforming user agents are not required to support this value. --The W3C CSS-2 Specification
Re: [COMMIT] Added preprocessor layer to newasm.
[EMAIL PROTECTED] wrote: > > On Tue, 28 May 2002 01:19:25 -0400 Jeff <[EMAIL PROTECTED]> wrote: > > >newasm now handles constants, macros, and local >labels within. Here's a > > Great work! Thanks. > >expansion. Also, they don't expand >recursively. '.constant FOO > >"blah"n.constant BAR "Hey, .FOO"' won't do what >you want, sadly. > > Thats exactly what I want. I don't think > the assembler should do any sort of interpolation > with string constants at all. I wasn't crazy about recursive expansion either. Glad I don't have to work on it now :) > >that's what I'll work on next. > > When Simon first committed it, I tested > newasm and noticed 2-3x speedup on assembly > speed. Is this still the case? I haven't been tracking assembly speed at all. Keep in mind that a perl assembler is only a temporary measure, and it'll be rewritten in C eventually. It's only written in Perl so that we can change features rapidly. If it were in C (as it will be, once changes settle down), the assembler would likely crystallize and problems would get fixed much slower, if at all. Also, if you want more speed, then separate the Macro and Assembler classes out of the main file, and pass your code directly to the Macro object instead of writing tests to a file. After the XS object is rewritten in Perl, I'll rewrite the tests to use the newasm syntax, and newasm will become the new standard. > I've been running some tests with executing > Cola on the fly, and the assembly phase (slow) > is the big bug in the soup right now, however > I've been using the old assembler. Well, again, once this is all redone in C that should go away. I might even work on that next, if everyone is comfortable with the current format. > PS: Thanks (Simon and Jeff) for a lot of hard > work on newasm. Thanks on both of our behalves. (Is that a word?) I don't want to cut it over until the XS stuff is redone to my satisfaction, which means that it's rewritten in perl. Building the XS extension there is causing no end of confusion in the build process. > -Melvin -- Jeff <[EMAIL PROTECTED]>
RE: GC, exceptions, and stuff
> Okay, i've thought things over a bit. Here's what we're going to do > to deal with infant mortality, exceptions, and suchlike things. > > Important given: We can *not* use setjmp/longjmp. Period. Not an > option--not safe with threads. At this point, having considered the > alternatives, I wish it were otherwise but it's not. Too bad for us. I think this statement is not very accurate. The real problem is setjmp/longjmp does not work well inside signal handler. The thread-package-compatible setjmp/longjmp can be easily implemented using assembly code. It does not require access to any private data structures. Note that Microsoft Windows "Structured Exception Handler" works well under thread and signal. The assembly code of __try will show you how to do it. However, signal-compatible will be very difficult. It requries access to ucontext, and most of thread package can not provide 100% correct ucontext for signal. (The thread package may have the right info, but the ucontext parameter may not have the info.) My basic suggestion is if we need convenient and fast C-based exception handling, we can write our own setjmp/longjmp in assembly code. The functionality will be exported as magic macros. Such as TRY { ... } CATCH (EBADF) { ... } CATCH (ENOMEM) { ... } END; Hong
RE: GC, exceptions, and stuff
At 5:47 PM -0700 5/28/02, Hong Zhang wrote: > > Okay, i've thought things over a bit. Here's what we're going to do >> to deal with infant mortality, exceptions, and suchlike things. >> >> Important given: We can *not* use setjmp/longjmp. Period. Not an >> option--not safe with threads. At this point, having considered the >> alternatives, I wish it were otherwise but it's not. Too bad for us. > >I think this statement is not very accurate. The real problem is >setjmp/longjmp does not work well inside signal handler. > >The thread-package-compatible setjmp/longjmp can be easily implemented >using assembly code. It does not require access to any private data >structures. Note that Microsoft Windows "Structured Exception Handler" >works well under thread and signal. The assembly code of __try will >show you how to do it. Yup, and we can use platform-specific exception handling mechanisms as well, if there are any. Except... >However, signal-compatible will be very difficult. It requries access >to ucontext, and most of thread package can not provide 100% correct >ucontext for signal. (The thread package may have the right info, but >the ucontext parameter may not have the info.) You hit this. And we can't universally guarantee that it'll work, either. >My basic suggestion is if we need convenient and fast C-based exception >handling, we can write our own setjmp/longjmp in assembly code. The >functionality will be exported as magic macros. Such as If we're going to do this, and believe me I dearly want to, we're going to be yanking ourselves out a bunch of levels. We'll be setting the setjmp in runops.c just outside the interpreter loop, and yank ourselves way the heck out. It's that multi-level cross-file jumping that I really worry about. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
[netlabs #632] dispatching SIGPIPE, SIGCHLD to proper threa
# New Ticket Created by Rocco Caputo # Please include the string: [netlabs #632] # in the subject line of all future correspondence about this issue. # http://bugs6.perl.org/rt2/Ticket/Display.html?id=632 > 21:37 We'll need to interface with signals somehow. Getting them properly dispatched will be tough given how little info they carry. 21:38 You can fake sigpipe by catching errno in I/O ops. 21:39 Yep. Except in those cases where sigpipe's considered async and gets delivered process wide. 21:40 Well, OK, in those cases too, but you get stray SIGPIPEs for fun. 21:40 CHLD is not so hard if fork() knows which child PID was spawned in which thread. 21:40 async sigpipe seems a little bogus 21:41 Tell that to the Tru64 folks. I trust they had a reason. 21:41 Oh, right, child tracking. Good idea. Could someone send that to bugs6 so it doesn't get forgotten? 21:42 alarm() is going to be fun, though going full async for IO will reduce the number of blocking system calls. -- Rocco Caputo / [EMAIL PROTECTED] / poe.perl.org / poe.sf.net
RE: [COMMIT] Added preprocessor layer to newasm.
Jeff: # I haven't been tracking assembly speed at all. Keep in mind # that a perl assembler is only a temporary measure, and it'll # be rewritten in C eventually. It's only written in Perl so C or PASM (or Perl 6)? The latter might be better. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) blink: Text blinks (alternates between visible and invisible). Conforming user agents are not required to support this value. --The W3C CSS-2 Specification
RE: GC, exceptions, and stuff
> >The thread-package-compatible setjmp/longjmp can be easily implemented > >using assembly code. It does not require access to any private data > >structures. Note that Microsoft Windows "Structured Exception Handler" > >works well under thread and signal. The assembly code of __try will > >show you how to do it. > > Yup, and we can use platform-specific exception handling mechanisms > as well, if there are any. Except... The stack unwinding is very basic, that is why we have setjmp/longjmp. Even it is CPU specific, it requires only very small piece of asm code, much less than JIT. BTW, JIT needs similar kind of functionalities, otherwise JIT will not be able to handle exceptions very fast. It will be very awrkward to check for every null pointer and every function return. > >However, signal-compatible will be very difficult. It requries access > >to ucontext, and most of thread package can not provide 100% correct > >ucontext for signal. (The thread package may have the right info, but > >the ucontext parameter may not have the info.) > > You hit this. And we can't universally guarantee that it'll work, either. Parrot has to handle signals, such as SIGSEGV. I believe we have to solve this problem, no matter whether use sigjmp/longjmp as general exception handling. In general, most of libc functions do not work well inside signal handler. > >My basic suggestion is if we need convenient and fast C-based exception > >handling, we can write our own setjmp/longjmp in assembly code. The > >functionality will be exported as magic macros. Such as > > If we're going to do this, and believe me I dearly want to, we're > going to be yanking ourselves out a bunch of levels. We'll be setting > the setjmp in runops.c just outside the interpreter loop, and yank > ourselves way the heck out. It's that multi-level cross-file jumping > that I really worry about. The multi-level jump should not be a problem inside parrot code itself. The GC disapline should have handled the problem already. 1) If the parrot code allocate any thing that can not be handle by GC, it must setup exception handler to release it, see sample. void * mem = NULL; TRY { mem = malloc(sizeof(foo)); } FINALLY { free(mem); } END; 2) If the parrot code allocate any thing that are finalizable, there is no need to release them. When the object is not referenced, the next gc will finalize it. We can still use TRY block to enfore cleanup in timely fashion. However, we can not use setjmp/longjmp (even parrot-specific version) to unwind non-parrot frames. If an third party C application calls Parrot_xxx, the Parrot_xxx should catch any exception and translate it into error code and returns it. Implement parrot-specific version setjmp/longjmp will be trivial compare to the complexity of JIT and GC. When we solved the JIT, GC, threading, and signal handling, the problems with setjmp/longjmp should have been solved by then. But if we only want a simple interpreter solution, there is no need to take on this additional complexity. Hong
Re: [COMMIT] Added preprocessor layer to newasm.
Brent Dax wrote: > > Jeff: > # I haven't been tracking assembly speed at all. Keep in mind > # that a perl assembler is only a temporary measure, and it'll > # be rewritten in C eventually. It's only written in Perl so > > C or PASM (or Perl 6)? The latter might be better. PASM is tempting, if only for the bootstrap potential... In other news, I've rewritten the test suite, and with the most recent checkin (adds one PMC type I missed and stops processing when it encounters a flying argument), it passes everything but one Regex test. The diffs are rather large, and will be attached to this file. If there are no objections, I might simply spruce up the documentation, add a '.include' statement, and change assemble.pl over to this style, after going through the rest of the languages/ directory. Of course, the people that are responsible for their language could do it themselves, letting me get back to converting the assembler to .pasm or maybe just ..c or .perl... -- Jeff <[EMAIL PROTECTED]> diff -ru parrot_foo/t/op/ifunless.t parrot/t/op/ifunless.t --- parrot_foo/t/op/ifunless.t Tue May 28 21:34:07 2002 +++ parrot/t/op/ifunless.t Fri May 17 23:52:34 2002 @@ -7,18 +7,21 @@ set I1, -2147483648 set I2, 0 +# if_i_ic I0, ONE if I0, ONE branch ERROR print "bad\\n" ONE: print "ok 1\\n" +# if_i_ic I1, TWO if I1, TWO branch ERROR print "bad\\n" TWO: print "ok 2\\n" +# if_i_ic I2, ERROR if I2, ERROR branch THREE print "bad\\n" @@ -100,12 +103,14 @@ set I0, 0 set I1, -2147483648 +# unless_i_ic I0, ONE unless I0, ONE branch ERROR print "bad\\n" ONE: print "ok 1\\n" +# unless_i_ic I1, ERROR unless I1, ERROR branch TWO print "bad\\n" diff -ru parrot_foo/t/op/integer.t parrot/t/op/integer.t --- parrot_foo/t/op/integer.t Tue May 28 21:36:53 2002 +++ parrot/t/op/integer.t Fri May 17 23:52:34 2002 @@ -340,6 +340,7 @@ output_is(<<> OUTPUT output_is(gentest('b', <<'CODE'), <<'OUTPUT', 'A is not B'); - rx_literal P0, "a", ADVANCE + rx_literal P0, "a", $advance CODE no match OUTPUT output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'Pattern longer than string'); - rx_literal P0, "aa", ADVANCE + rx_literal P0, "aa", $advance CODE no match OUTPUT output_is(gentest('ba', <<'CODE'), <<'OUTPUT', 'inching through the string'); - rx_literal P0, "a", ADVANCE + rx_literal P0, "a", $advance CODE <> OUTPUT output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'character classes (successful)'); - rx_oneof P0, "aeiou", ADVANCE + rx_oneof P0, "aeiou", $advance CODE <><> OUTPUT output_is(gentest('b', <<'CODE'), <<'OUTPUT', 'character classes (failure)'); - rx_oneof P0, "aeiou", ADVANCE + rx_oneof P0, "aeiou", $advance CODE no match OUTPUT output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'dot (success)'); - rx_dot P0, ADVANCE + rx_dot P0, $advance CODE <><> OUTPUT output_is(gentest('\n', <<'CODE'), <<'OUTPUT', 'dot (failure)'); - rx_dot P0, ADVANCE + rx_dot P0, $advance CODE no match OUTPUT output_is(gentest('aA9_', <<'CODE'), <<'OUTPUT', '\w (success)'); - rx_is_w P0, ADVANCE - rx_is_w P0, ADVANCE - rx_is_w P0, ADVANCE - rx_is_w P0, ADVANCE + rx_is_w P0, $advance + rx_is_w P0, $advance + rx_is_w P0, $advance + rx_is_w P0, $advance CODE <><> OUTPUT output_is(gentest('?', <<'CODE'), <<'OUTPUT', '\w (failure)'); - rx_is_w P0, ADVANCE + rx_is_w P0, $advance CODE no match OUTPUT output_is(gentest('0123456789', <<'CODE'), <<'OUTPUT', '\d (success)'); - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance + rx_is_d P0, $advance CODE <><0123456789><> OUTPUT output_is(gentest('@?#', <<'CODE'), <<'OUTPUT', '\d (failure)'); - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE - rx_is_d P0, ADVANCE +
Re: ICU and Parrot
At 02:42 PM 5/28/2002 -0700, George Rhoten wrote: >Hello all, > >Hopefully I won't get too burned by flames by jumping into the middle of >the conversation like this. Fortunately this list is very low on flammable material. :) Thanks for the helpful info. One of the concerns with using an external library is it should be in ANSI C if we were to include it with Parrot. I think there are so many unanswered and unfinished issues other than Unicode that noone has had to time to really define what using ICU would spell, but it appears that including ICU would spell allowing C++ elsewhere in the project also. I'm not saying C++ is bad, it would make life easier in some areas, but I think its a Pandora's box that noone wants to open, ...yet. We also don't wish to inherit ICU development if ICU becomes unsupported. However, I agree with your point about collaboration. I rather like the idea of being able to mail the ICU guys a bug report as I go back to hacking my Parrot mess. I'm curious, is it possible to carve out an "ICU-lite" in ANSI-C89? -Melvin
Re: Perl6 currying
We've pretty much settled on &div.prebind(y => 2) as the most informative and least conflictive. Larry
GC Benchmarking Tests
Hey all, After finding out that life.pasm only does maybe 1KB per collection, and Sean reminding me that there's more to GC than life, I decided to create some pasm files testing specific behaviors. Attached is what I've been using to test and compare running times for different GC systems. It's given a list of builds of parrot, a list of tests to run, and runs each four times and takes the sum of them as the value for that test. Then it prints out a simple table for comparing the results. It's not really robust or easily workable in a CVS checkout (since it operates on multiple parrot checkouts). Included are five tests of certain memory behaviors. They are: gc_alloc_new.pbc allocates more and more memory checks collection speed, and the ability to grow the heap gc_alloc_reuse.pbc allocates more memory, but discards the old checks collection speed, and the ability to reclaim the heap gc_header_new.pbc allocates more and more headers checks DOD speed, and the ability to allocate new headers gc_header_reuse.pbc allocates more headers, but discards the old checks DOD speed, and the ability to pick up old headers gc_waves_headers.pbc total headers (contain no data) allocated is wave-like no data, so collection is not tested tests ability to handle wavelike header usage pattersn gc_waves_sizeable_data.pbc buffer data (pointed to by some headers) is wave-like a few headers, so some DOD is tested mainly tests ability to handle wavelike buffer usage patterns gc_waves_sizeable_headers.pbc total headers (and some memory) allocated is wave-like sort of a combination of the previous two each header points to some data, so it tests the collectors ability to handle changing header and small-sized memory usage gc_generations.pbc me trying to simulate behavior which should perform exceptionally well under a genertaional collector, even though we don't have one :) each memory allocation lasts either a long time, a medium time, or a short time Please let me know if there are any other specific behaviors which could use benchmarking to help compare every aspect of our GCs? Real-world programs are too hard to come by. :) Results of the above test suite on my machine comparing my local GC work and the current parrot GC are coming soon... Enjoy! Mike Lambert PS: If you get bouncing emails from me because my email server is down, I apologize, and I do know about it. My email server is behind cox's firewall which prevents port 25 access. It should be relocated and online again in a few days. gc_bench.zip Description: gc_bench.zip
Re: Perl6 currying
Larry Wall wrote: > We've pretty much settled on &div.prebind(y => 2) as the most informative and > least conflictive. and I'll demonstrate it in my next Conway Channel diary entry later today. Damian
[netlabs #634] GC Bench: Linked-list for free header list
# New Ticket Created by Mike Lambert # Please include the string: [netlabs #634] # in the subject line of all future correspondence about this issue. # http://bugs6.perl.org/rt2/Ticket/Display.html?id=634 > Peter recently submitted a patch to RT that uses a linked-list for free headers. Here are before and after results: before after gc_alloc_new4.1559994.016 gc_alloc_reuse 16.574 12.648002 gc_generations 4.025 3.975001 gc_header_new 3.686 3.986 gc_header_reuse 5.5779994.175998 gc_waves_headers3.8150023.595999 gc_waves_sizeable_data 8.3830028.381999 gc_waves_sizeable_hdrs 5.668 5.396999 We win on the header-intensive stuff. Not sure why it would be slower on the gc_header_new tests. My best guess is that we know are touching the contents of the buffer header, which we weren't doing before. And when we allocate a bunch of new headers, we have to explcitly free them all, which involves touching the first pointer of every buffer in that memory, as opposed to one pointer in the Parrot_allocated memory we used before. IMO, the gc_alloc_reuse and gc_header_reuse benchmarks more than outweigh gc_header_new. The portion of Peter's patch to do just this change is included below. Mike Lambert Index: resources.c === RCS file: /cvs/public/parrot/resources.c,v retrieving revision 1.60 diff -u -r1.60 resources.c --- resources.c 26 May 2002 20:20:08 - 1.60 +++ resources.c 29 May 2002 07:08:26 - @@ -41,28 +41,15 @@ /* Create a new tracked resource pool */ static struct Resource_Pool * -new_resource_pool(struct Parrot_Interp *interpreter, size_t free_pool_size, +new_resource_pool(struct Parrot_Interp *interpreter, size_t unit_size, size_t units_per_alloc, void (*replenish)(struct Parrot_Interp *, struct Resource_Pool *), struct Memory_Pool *mem_pool) { struct Resource_Pool *pool; -size_t temp_len; pool = mem_sys_allocate(sizeof(struct Resource_Pool)); -temp_len = free_pool_size * sizeof(void *); -if (interpreter->arena_base->buffer_header_pool) { -pool->free_pool_buffer = new_buffer_header(interpreter); -} -else { -pool->free_pool_buffer = mem_sys_allocate(sizeof(Buffer)); -} -pool->free_pool_buffer->bufstart = -mem_allocate(interpreter, &temp_len, - interpreter->arena_base->memory_pool); -pool->free_pool_buffer->buflen = temp_len; -pool->free_pool_buffer->flags = BUFFER_immune_FLAG; -pool->free_pool_size = temp_len / sizeof(void *); +pool->free_list = NULL; pool->free_entries = 0; pool->unit_size = unit_size; pool->units_per_alloc = units_per_alloc; @@ -72,28 +59,6 @@ return pool; } -/* Expand free pool to accomdate at least n additional entries - * Currently, the minimum expansion is 20% of the current size -*/ -static void -expand_free_pool(struct Parrot_Interp *interpreter, - struct Resource_Pool *pool, size_t n) -{ -size_t growth; - -if (pool->free_pool_size - pool->free_entries < n) { -growth = (n - (pool->free_pool_size - pool->free_entries)) * - sizeof(void *); -if (growth < pool->free_pool_buffer->buflen / 5) { -growth = pool->free_pool_buffer->buflen / 5; -} -Parrot_reallocate(interpreter, pool->free_pool_buffer, - pool->free_pool_buffer->buflen + growth); -pool->free_pool_size += (growth / sizeof(void *)); -} -} - - /* Add entry to free pool * Requires that any object-specific processing (eg flag setting, statistics) * has already been done by the caller @@ -102,20 +67,8 @@ add_to_free_pool(struct Parrot_Interp *interpreter, struct Resource_Pool *pool, void *to_add) { -void **temp_ptr; - -if (pool->free_pool_size == pool->free_entries) { -expand_free_pool(interpreter, pool, 1); -} - -#ifdef GC_DEBUG -Parrot_go_collect(interpreter); -#endif - -/* Okay, so there's space. Add the header on */ -temp_ptr = pool->free_pool_buffer->bufstart; -temp_ptr += pool->free_entries; -*temp_ptr = to_add; +*(void **)to_add = pool->free_list; +pool->free_list = to_add; pool->free_entries++; } @@ -127,7 +80,7 @@ get_from_free_pool(struct Parrot_Interp *interpreter, struct Resource_Pool *pool) { -void ** ptr; +void *ptr; if (!pool->free_entries) { Parrot_do_dod_run(interpreter); @@ -140,9 +93,10 @@ return NULL; } -ptr = pool->free_pool_buffer->bufstart; -ptr += --pool->free_entries; -return *ptr; +ptr = pool->free_list; +pool->free_list = *(void **)ptr; +pool->free_entries--; +return ptr; } /* We have no more headers on the f