error in rx.ops example
rx_popindex signature is incorrect in the pre-patch example. --- rx.ops.old 2003-06-06 18:27:00.0 +0200 +++ rx.ops 2003-10-20 23:08:24.0 +0200 @@ -108,7 +108,7 @@ rx_literal S0, I1, "b", $next branch $top $backtrack: - rx_popindex S0, I1, $advance + rx_popindex I1, $advance $next: rx_oneof S0, I1, "cd", $backtrack branch $success -- stef
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > Since this has come up again, ... [ FYI: I was starting implementing this, based on a general traverse vtable with callback functions. Two patches got backed out by Dan after some discussion in PM ] > ... and it's apparent that the last time around > I wasn't sufficiently clear, it's time to go through this again, and for > the final time. I'd be really happy, if you could go through my concerns mentioned in the summary in the thread: Subject: Re: [RfC] vtable->dump Date: Thu, 4 Sep 2003 12:31:08 +0200 > ... (I will beat this thing into the ground by the time we're > done) Sorry for the inconvenience and being ignorant ... > PMC *thaw(interpreter, STRING *) This should IMHO be able to create constant PMCs out of metadata, e.g. for subroutine objects. So there should be some means to tell thaw() to create PMC(s) in the constant_pmc_pool. > The chill and warm runtime methods take a PMC or a frozen representation > of a PMC (respectively) and provide a human readable version of that PMC. I dunno, why chill() is superior to dump() or pretty_print(), but the name doesn't really matter. > 1) Freezing at the destruction level may *not* use any additional memory > for object traversal What is "Freezing at the destruction level"? Is this anyhow related to destruction ordering? > Note that I do *not* want to have multiple object traversal systems in > parrot! We have one for DOD, and proposals have ranged upwards from there. > No. That is *not* happening--the chance for error is significant, the > side-effects of the error annoying and tough to track down for complex > cases (akin to the trouble with tracking down GC issues), and just not > necessary. (Perhaps desirable for speed/space reasons, but desirable > isn't necessary) DOD's mark() routine has different requirements then a general traverse() for freeze(), chill(), clone(), and destruction ordering. Using just mark() will have these side effects that you want to avoid. A general traverse() can be depth first of breadth first, mark() isn't required do have any specific ordering as long as it sets live bits everywhere. mark() is called permanently in a running interpreter, that does non trivial things. There are shortcuts for scalars, DOD is highly optimized not to destroy cache coherency. Using mark() also implies to back out my small PMC patches. All the advantages of smaller scalars are gone then. While freeze() and friends have to pull in each PMC into the cache, just setting the live bit on a PMC hasn't. Further: Lukes proposal for speeding up timely destruction puts objects either in front or at the end of the next_for_GC chain. This IMHO implies that mark() is unusable as your general and solely iterator. > ... This is something that's hidden under a number of layers > of API, so regardless of the outcome it doesn't affect the assembly, PMC, > or runtime API. So when its hidden, I really don't understand, why you are insisting on an (IMHO) suboptimal design. > The thread-safety is an issue, While all schemes aren't thread-safe from user level (e.g. manually sorting an array containing shared PMCs, while it gets frozen), your scheme isn't thread-safe at low-level, as the next_for_GC pointer inside the PMC is used as a duplicate marker. But if a user changes shared resources its a user problem. We only guarantee atomic updates per PMC (s. P6E p 86f by Dan). > Dan Comments addressing all these issues are highly welcome, leo
Re: error in rx.ops example
Stéphane Payrard <[EMAIL PROTECTED]> wrote: > rx_popindex signature is incorrect in the pre-patch example. > - rx_popindex S0, I1, $advance > + rx_popindex I1, $advance Thanks, changed. leo
pcc: the parrot C compiler (just a wrapper, don't expect big things :-)
hello, I was trying to debug the t/src tests, and realized that doing by hand what Parrot::Test::c_output_(is|like) does is not really easy. I wanted to compile the source code embedded in t/src/sprintf.t (the third test, in my case) to see exactly where and how it was failing, and possibly have a chance to debug it. so, I ended up looking in lib/Parrot/Test.pm and writing a simple script which mimics what that module does (just a wrapper that calls gcc with all the settings from Parrot::Config). I've called it pcc, and it should reside in the root parrot directory. the source is merely 17 lines long, so I attach it at the end of this mail. now, I'm able to pull out the source code from t/src/sprintf.t, save it, let's say, in many_printfs.c and then do: $ ./pcc many_printfs.c $ ./many_printfs hope this helps :-) $ cat pcc #!/usr/bin/perl -w use lib 'lib'; use Parrot::Config; my $libparrot = $PConfig{blib_lib_libparrot_a}; $libparrot =~ s/\$\(A\)/$PConfig{a}/; my $source_f = $ARGV[0] || die "no source specified\n"; (my $obj_f = $source_f) =~ s/\.c/$PConfig{o}/ie; (my $exe_f = $source_f) =~ s/\.c/$PConfig{exe}/ie; my $cmd = "$PConfig{cc} $PConfig{ccflags} $PConfig{cc_debug} ". " -I./include -c $PConfig{cc_o_out}$obj_f $source_f"; system("$cmd") && die "compile failed with exit code ".($?>>8)."\n"; $cmd = "$PConfig{link} $PConfig{linkflags} $PConfig{ld_debug} $obj_f ". "$PConfig{ld_out}$exe_f $libparrot $PConfig{libs}"; system("$cmd") && die "link failed with exit code ".($?>>8)."\n"; __END__ cheers, Aldo __END__ $_=q,just perl,,s, , another ,,s,$, hacker,,print;
Re: Taint mode testing and project Phalanx
On Mon, Oct 20, 2003 at 10:27:34PM -0700, Michael G Schwern wrote: > On Tue, Oct 21, 2003 at 12:24:03AM -0500, Dave Rolsky wrote: > > Not to mention that it's buggy as hell. For example, in various versions > > of Perl I've used there have been rather serious bugs in the regex engine > > when taint mode is on, even when dealing with untainted variables! > > I've never hit anything like this. Do you have examples? http://rt.perl.org/rt2/Ticket/Display.html?id=24248 variations on the theme of #!perl -T { local $ENV{PATH} = "/bin"; my $r = "foo"; $ARGV[0] =~ /($r)/; my $c = "echo $1"; system $c; } __END__ http://rt.perl.org/rt2/Ticket/Display.html?id=22270 where I don't agree with any of the explainations (IIRC) and stand by the bug. (But ran out of time to find a better explaination) Nicholas Clark
Re: No more code coverage
On Mon, Oct 20, 2003 at 11:05:38PM +0200, Paul Johnson wrote: > On Mon, Oct 20, 2003 at 09:34:38PM +0100, Tony Bowden wrote: > > On Mon, Oct 20, 2003 at 10:16:40PM +0200, Paul Johnson wrote: > > > I wrote "database" in quotes because currently we are talking about a > > > flat file, written using Data::Dumper and eval'd in. I have considered > > > other options - specifically YAML and Storable. I have found YAML to be > > > even slower and too buggy, and Storable to less reliable. (I never > > > tracked down specific problems.) This in an area that needs to be > > > addressed. > > > > Have you considered SQLite? > > Initially I wanted something with few, or better yet no dependencies. I > also wanted something that required little or no work when I changed the > internal data structures. > > I'll compromise on both of these, and especially the latter, for > something that is efficient and reliable. > > I'll look into SQLite. I'd caution against rushing in any particular direction without some profiling information to back it up. Having said that, I'd strongly recommend switching to Storable first. It did have problems but it's now very robust and far, far, faster than Data::Dumper+eval. This small change would yield a big gain. The next step would be to get some profile information. There's little point in doing that first as Data::Dumper+eval will dwarf time spent elsewhere. Tim. p.s. Could someone suggest a pure-perl module with lots of tests as a suitable testbed for Devel::Cover?
Re: No more code coverage
Tim Bunce wrote: > p.s. Could someone suggest a pure-perl module with lots of tests as > a suitable testbed for Devel::Cover? http://search.cpan.org/dist/Acme-EyeDrops has 22 test programs, 769 tests and no dependencies. /-\ http://personals.yahoo.com.au - Yahoo! Personals New people, new possibilities. FREE for a limited time.
Re: No more code coverage
On Tue, Oct 21, 2003 at 10:38:48PM +1000, Andrew Savige wrote: > Tim Bunce wrote: > > p.s. Could someone suggest a pure-perl module with lots of tests as > > a suitable testbed for Devel::Cover? > > http://search.cpan.org/dist/Acme-EyeDrops has 22 test programs, > 769 tests and no dependencies. Test-Simple's another good one. 44 programs, 267 tests, no dependencies, backwards compatible to 5.4.0, does some complex stuff (evals, %SIG, tied handles, system()...). -- Michael G Schwern[EMAIL PROTECTED] http://www.pobox.com/~schwern/ Here's some scholarly-ass opinions...
Re: Object freezing
Leopold Toetsch <[EMAIL PROTECTED]> writes: > > 1) Freezing at the destruction level may *not* use any additional memory > > for object traversal This is a really hard problem. In some early experiments with destruction ordering (one of the problems wich need iteration) I didn't get around with allocating new memory, or recursing on the stack. It may be that we can get arround with a second pointer, but I'm not sure. > What is "Freezing at the destruction level"? Is this anyhow related to > destruction ordering? > > > Note that I do *not* want to have multiple object traversal systems in > > parrot! We have one for DOD, and proposals have ranged upwards from there. > > No. That is *not* happening--the chance for error is significant, the > > side-effects of the error annoying and tough to track down for complex > > cases (akin to the trouble with tracking down GC issues), and just not > > necessary. (Perhaps desirable for speed/space reasons, but desirable > > isn't necessary) I did some benchmarking (to test our hash implementation, but thats a different story). One thing I found out: We are completely dominated by gc. I'm not sure if it was trace_systemareas or the mark method, but don't put any load on mark. mark should be as fast as possible. The other uses of traverse for freeze, dump, destruction-ordering etc. are all more or less called on user request, so the user needs to know its cost. One other thing that makes mark different. If we ever want to use a copying collector (Which is not reachable currently because of the conservative stack-walking) The mark routine needs to know about the moving of objects. All other traverse routine never get this problem. > DOD's mark() routine has different requirements then a general > traverse() for freeze(), chill(), clone(), and destruction ordering. > Using just mark() will have these side effects that you want to avoid. My words. mark() is not traverse() also they do similar things. > A general traverse() can be depth first of breadth first, mark() isn't > required do have any specific ordering as long as it sets live bits > everywhere. > > mark() is called permanently in a running interpreter, that does non > trivial things. There are shortcuts for scalars, DOD is highly optimized > not to destroy cache coherency. Using mark() also implies to back out > my small PMC patches. All the advantages of smaller scalars are gone > then. This ist just on more thing of mark() speed. > While freeze() and friends have to pull in each PMC into the cache, just > setting the live bit on a PMC hasn't. Further: Lukes proposal for > speeding up timely destruction puts objects either in front or at the > end of the next_for_GC chain. This IMHO implies that mark() is unusable > as your general and solely iterator. > > > ... This is something that's hidden under a number of layers > > of API, so regardless of the outcome it doesn't affect the assembly, PMC, > > or runtime API. > > So when its hidden, I really don't understand, why you are insisting on > an (IMHO) suboptimal design. We have at the moment 15 (in words fifteen) vtable slots for dividing/remainder, 5 for multiplikation, 24 for bitwise ops. So bloating the vtable is by design, but it is the end of world if we special case the most often called function and have 2 (in words two) walking functions. Sorry, I think there are other places in the vtable which need some cleanup. > > The thread-safety is an issue, > > While all schemes aren't thread-safe from user level (e.g. > manually sorting an array containing shared PMCs, while it gets > frozen), your scheme isn't thread-safe at low-level, as the next_for_GC > pointer inside the PMC is used as a duplicate marker. But if a user > changes shared resources its a user problem. We only guarantee atomic > updates per PMC (s. P6E p 86f by Dan). The thread safty is less a problem for marking. It only needs to make sure that other threads don't munge the data they are walking. Write barriers or mutexes might help here. But how to freeze an object of an other thread? This needs to freeze the whole thread. > > Dan > > Comments addressing all these issues are highly welcome, > leo I think we should address this issue like experimentalists: Create the general traverse function. (No don't call it mark). Implement freeze, dump, destruction ordering using this function. When this all is running, port the mark function to use this new functionality. Benchmark, and watch the speedup of the brandnew design (or just find out that the slowdown is not bad enough to satisfy two walking functions). When the benchmarking is done lets descide if we need only one walk-function, and only than remove the mark function. bye boe -- Juergen Boemmels[EMAIL PROTECTED] Fachbereich Physik Tel: ++49-(0)631-205-2817 Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906 PGP Key finger
Re: [RfC] and [PATCH]: Libraries
Last week i send this: > I spent the last day getting parrot running under Borland. The > attached patch is whats need to get linking and running make test on > both Windows/Borland and Linux/gcc. I'm not sure if its ready for > inclusion in the tree, but I want some feedback on the approach. No feedback is not very much. Sure I just can commit this thing right away, but this patch changes one fundamental thing: There is no more one single (static) libparrot. So Comments please. boe -- Juergen Boemmels[EMAIL PROTECTED] Fachbereich Physik Tel: ++49-(0)631-205-2817 Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906 PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47
Re: Object freezing
On Mon, 20 Oct 2003, Melvin Smith wrote: > At 04:38 PM 10/20/2003 -0400, Dan Sugalski wrote: > >The encoding methods for freezing (and corresponding decoding methods for > >thawing) may be overridden to provide an alternate serialization format. > >The only requirement of the serialziation format is that it starts with a > >minimally valid piece of XML that encodes the format and version of the > >serialized format. The rest of the serialization format need not be XML. > >This is done because the format and version of the serialized data are > >required in the stream, and making it XML incoveniences nobody and makes > >the XML folks happy. It's good enough, and not up for discussion. > > Does that mean all encodings will start with the "standard" markup header: > > Each serialized data stream, yeah. Just once, and probably a few extra characters in there, to make it legit XML. Definitely not once per object, just once per stream--if you do: freeze S3, P5 and P5 happens to have a PMC that points to your top level symbol table, the string in S3 will be darned huge, and have the thing in there exactly once. (Well, unless you've chosen an XML encoding, in which case I expect you've just blown memory... :) Dan
Re: Object freezing
On Mon, 20 Oct 2003, Gregor N. Purdy wrote: > > > the xml header is only for the top level thing in the serialized > > tree. if it is nonstandard you have to mark the serialized string so you > > can call the matching thaw methods. each object in the serialized tree > > will have to support that method or some code has to be supplied to > > handle all the freeze/thaw calls made by the tree traverser code. so the > > xml header is just a way to mark which external class will be used for > > the freeze/thaw and it will always be called for each object in the > > tree. you can't mix/match different freeze/thaw techniques in one > > operation (yes, you could but then you do have to mark each node with > > its technique which is a lot of overhead and painful in other ways). > > I find the notion of an "XML header" a bit confusing, given Dan's > statement to the effect that it was a throw to XML folks. > > I think anything "XML folks" will be interested in will entail > *wrapping* stuff, not *prefixing* it. Nah, I expect what they'll want is for the entire data stream of serialized objects to be in XML format. Which is fine--they can have that. (It's why I mentioned the serialization routines can be overridden) For an XML stream the header might be with the rest of the stream in XML. A YAML stream would start with the rest in YAML, and teh binary format as . Or something like that, modulo actual correct XML. This way we have a single, fixed-format type/version header, which makes the initial identification easier and less error-prone. (Possibly even fit for file and programs of its ilk to note) The binary format won't care, and teh YAML format shouldn't care (as long as the indenting's right) but the XML format would, so it seems to make sense to use the XML stuff for the initial header. Dan
Re: Object freezing
On Tue, 21 Oct 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > Since this has come up again, ... > > [ FYI: I was starting implementing this, based on a general traverse >vtable with callback functions. Two patches got backed out by >Dan after some discussion in PM ] Right, because you'd implemented some stuff I'd specifically said we weren't doing, and didn't back them out any of the times I asked... > > ... and it's apparent that the last time around > > I wasn't sufficiently clear, it's time to go through this again, and for > > the final time. > > I'd be really happy, if you could go through my concerns mentioned in > the summary in the thread: That's why I did this, in part. It's the plan, until declared otherwise. > > PMC *thaw(interpreter, STRING *) > > This should IMHO be able to create constant PMCs out of metadata, e.g. > for subroutine objects. So there should be some means to tell thaw() to > create PMC(s) in the constant_pmc_pool. There should be a way to put PMCs in the constant pool in general. I was thinking a constant op would work--something like constant Ix, [SP]y to make the string or PMC Y a constant at slot X in the constant pool. Passing in the PMC header to be filled in also works, though both fail if you want full PMC trees marked as constants since thawing out a PMC stream may involve creating multiple PMCs. (In which case we might be better temporarily switching allocation pools at constant creation time, rather than passing in PMCs) > > The chill and warm runtime methods take a PMC or a frozen representation > > of a PMC (respectively) and provide a human readable version of that PMC. > > I dunno, why chill() is superior to dump() or pretty_print(), but the > name doesn't really matter. The important thing is that it's not a vtable method. It's a function that belongs in the freeze/thaw API as it's just an alternate encoding or decoding. (Arguably it ought not be a separate API entry at all and just another encoding scheme, but that requires transcoding serialization forms, and I'd rather not get into that) > > 1) Freezing at the destruction level may *not* use any additional memory > > for object traversal > > What is "Freezing at the destruction level"? Is this anyhow related to > destruction ordering? No. There are some valid cases where an object, after having been declared dead by the DOD, wants to serialize itself. Persistent object stores apparently do this, and it makes a certain amount of sense--when the object goes out of scope the current state is flushed to disk. It puts a number of unpleasant constraints on the core freeze routines. User code can violate them and take the consequences, but we can't. > > Note that I do *not* want to have multiple object traversal systems in > > parrot! We have one for DOD, and proposals have ranged upwards from there. > > No. That is *not* happening--the chance for error is significant, the > > side-effects of the error annoying and tough to track down for complex > > cases (akin to the trouble with tracking down GC issues), and just not > > necessary. (Perhaps desirable for speed/space reasons, but desirable > > isn't necessary) > > DOD's mark() routine has different requirements then a general > traverse() for freeze(), chill(), clone(), and destruction ordering. > Using just mark() will have these side effects that you want to avoid. The only thing that mark does that the general traversal doesn't, in the abstract, is flip the object's live flag. Everything else is an optimization of code which we can, if we need, discard. > A general traverse() can be depth first of breadth first, mark() isn't > required do have any specific ordering as long as it sets live bits > everywhere. I'm pretty sure that with a singly linked list we can get a generally properly-ordered flattened tree without having to do an insane number of passes across the dead object store. I may be incorrect in this, but I don't think so, and for our purposes the live bit can be safely ignored if we end up setting it, though potentially with another pass over the dead store, which may end up prohibitively expensive. We'll see. > mark() is called permanently in a running interpreter, that does non > trivial things. There are shortcuts for scalars, DOD is highly optimized > not to destroy cache coherency. Using mark() also implies to back out > my small PMC patches. All the advantages of smaller scalars are gone > then. All of this stuff for freezing is going to end up killing the small PMC patch anyway, unfortunately, since we're going to have to be able to traverse PMCs in the destruction phase, which means we have to have the means of traversal at hand as we can't guarantee that we can allocate more PMCs or resize the PMCs ext data. > While freeze() and friends have to pull in each PMC into the cache, just > setting the live bit on a PMC hasn't. Further: Lukes proposal for > speeding up timel
Re: Object freezing
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: > At 08:21 -0400 10/21/03, Dan Sugalski wrote: > > > I find the notion of an "XML header" a bit confusing, given Dan's > >> statement to the effect that it was a throw to XML folks. > >> > >> I think anything "XML folks" will be interested in will entail > >> *wrapping* stuff, not *prefixing* it. > > > >Nah, I expect what they'll want is for the entire data stream of > >serialized objects to be in XML format. Which is fine--they can have that. > >(It's why I mentioned the serialization routines can be overridden) > > > >For an XML stream the header might be >version=1.0> with the rest of the stream in XML. A YAML stream would start > > with the rest in YAML, and teh > >binary format as . Or something > >like that, modulo actual correct XML. > > If you want that to be looking like valid XML, it would have to be different: > > error: Specification mandate value for attribute parrot > >^ > Better in my opinion would be something like: > > data yadda yadda yadda I'm not an XML guy, and I'm making all this up as I go along. If that's better, fine with me. :) > >This way we have a single, fixed-format type/version header, which makes > >the initial identification easier and less error-prone. (Possibly even > >fit for file and programs of its ilk to note) The binary format won't > >care, and teh YAML format shouldn't care (as long as the indenting's > >right) but the XML format would, so it seems to make sense to use the XML > >stuff for the initial header. > > So are we talking about a header or a wrapper? If it is really a > header, it's not XML and then it's prettyy useless from an XML point > of view. We're talking about the first thing in a file (or stream, or whatever). I was under the impression that XML files should be entirely composed of valid XML, hence the need for the stream type marker being valid XML. YAML doesn't care as much, so far as I understand, and for our own internal binary format we cna do whatever we want. If that's not true, then we can go for a more compact header. Note that the serialized stream will be different depending on the encoder chosen. If you have the structure: $bar = 1; @foo[0] = \$bar; @foo[1] = "Baz"; The XML stream serializing @foo might look like: PerlArray bar Baz PerlInt 1 Only not inevitably horribly broken, invalid, and poorly done. :) The YAML form might look like PMC: foo type: PerlArray values: pmc: bar string: Baz PMC: bar type: PerlInt values: integer:1 Once again, modulo my limited and inevitably incorrect YAML knowledge. So if the header says it's XML the whole thing is valid XML, while if it doesn't the rest of the stream doesn't have to be. (Just enough of the header so that an XML processing program can examine the stream and decide that the valid XML chunk at the beginning says that the rest of the stream's not XML) Basically we want some nice, fixed (mostly) thing at the head of the stream that doesn't vary regardless of the way the stream is encoded, and XML seemed to be the most restrictive of the forms I know people will clamor for. (I know, it means the stream can't be valid Lisp-style sexprs, but XML's more widespread :) Dan
Object instantiation
After thinking about this a bit, it became glaringly obvious that the right way to instantiate an object for class "Foo" is to do: new P5, .Foo Or whatever the constant value assigned to the Foo class upon its creation is. When a class is created, it should be assigned a number, and for most things PMC-only classes or full-on HLL classes should behave identically. Duh. One more thing down--now to actually make it work out... Dan
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Tue, 21 Oct 2003, Leopold Toetsch wrote: [ thaw ] >> This should IMHO be able to create constant PMCs out of metadata, e.g. >> for subroutine objects. So there should be some means to tell thaw() to >> create PMC(s) in the constant_pmc_pool. > There should be a way to put PMCs in the constant pool in general. I was > thinking a constant op would work--something like >constant Ix, [SP]y > to make the string or PMC Y a constant at slot X in the constant pool. You can append items to the constant table. You can't declare existing items as constant, because you can't change the underlying object pool, where the object was allocated. This would change the objects address. > Passing in the PMC header to be filled in also works, though both fail if > you want full PMC trees marked as constants since thawing out a PMC stream > may involve creating multiple PMCs. (In which case we might be better > temporarily switching allocation pools at constant creation time, rather > than passing in PMCs) These are either serious shortcomings or unneeded workarounds. An extra parameter to relevant vtables can take care of such special cases. >> I dunno, why chill() is superior to dump() or pretty_print(), but the >> name doesn't really matter. > The important thing is that it's not a vtable method. Ah, that's the difference. How shall the system pretty-print dynamically loaded PMCs then, when only a bytecode-stream is available? IMHO only a vtable in the class can perform that job. >> > 1) Freezing at the destruction level may *not* use any additional memory >> > for object traversal > It puts a number of unpleasant constraints on the core freeze routines. Constructing the frozen stream definitely needs memory. I don't see the difference, to memory consumed by a seen hash. Can you please elaborate a bit more on this. > The only thing that mark does that the general traversal doesn't, in the > abstract, is flip the object's live flag. Everything else is an > optimization of code which we can, if we need, discard. Yes, mark() can be written in terms of a general traverse, which gets a vtable function (and a data pointer). mark is basically traverse(mark, 0). But this isn't true the other way round. You can't do freeze based on the mark iterator. How do you pass the desired output format? >> mark() is called permanently in a running interpreter, that does non >> trivial things. There are shortcuts for scalars, DOD is highly optimized >> not to destroy cache coherency. Using mark() also implies to back out >> my small PMC patches. All the advantages of smaller scalars are gone >> then. > All of this stuff for freezing is going to end up killing the small PMC > patch anyway, unfortunately, since we're going to have to be able to > traverse PMCs in the destruction phase, which means we have to have the > means of traversal at hand as we can't guarantee that we can allocate more > PMCs or resize the PMCs ext data. A scalar can't contain or reference other PMCs, so it can't be a potential source of freeze loops. If I now spit out (PMC: Int, ID=xy, value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take care of duplicates, if needed. Other PMCs have the next_for_GC pointer. Albeit I'm not convinced, that we can't have a seen hash. > YHO, in this case, turns out to not consider all the issues involved. That might very well be true, yes. So it would be fine, if you could fill the gaps. > Dan leo
Re: Object instantiation
Dan Sugalski <[EMAIL PROTECTED]> wrote: > After thinking about this a bit, it became glaringly obvious that the > right way to instantiate an object for class "Foo" is to do: > new P5, .Foo > Or whatever the constant value assigned to the Foo class upon its creation > is. When a class is created, it should be assigned a number, and for most > things PMC-only classes or full-on HLL classes should behave identically. Yep. The question does arise, if which range class enums are? Intermixed with enum_class_ numbers? And - what about: typeof S0, P0 <=> classname S0, P0 (IMHO the HLL compiler can't always know, which op to use) And the classname of objects vs the classname of classes (the classname PMC is in different array slots). > Dan leo
Re: Object freezing
On Tue, 21 Oct 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > On Tue, 21 Oct 2003, Leopold Toetsch wrote: > > [ thaw ] > > >> This should IMHO be able to create constant PMCs out of metadata, e.g. > >> for subroutine objects. So there should be some means to tell thaw() to > >> create PMC(s) in the constant_pmc_pool. > > > There should be a way to put PMCs in the constant pool in general. I was > > thinking a constant op would work--something like > > >constant Ix, [SP]y > > > to make the string or PMC Y a constant at slot X in the constant pool. > > You can append items to the constant table. You can't declare existing > items as constant, because you can't change the underlying object pool, > where the object was allocated. This would change the objects address. The object's address should be irrelevant for the constant table. PMCs are referenced in the opstream by table offset. This offset can be into a PMC pool, or into a pointer table. While the pointer table has an extra level of indirection to it it adds flexibility and takes some pressure off of the ordering of PMCs for instantiated constants. > > Passing in the PMC header to be filled in also works, though both fail if > > you want full PMC trees marked as constants since thawing out a PMC stream > > may involve creating multiple PMCs. (In which case we might be better > > temporarily switching allocation pools at constant creation time, rather > > than passing in PMCs) > > These are either serious shortcomings or unneeded workarounds. An extra > parameter to relevant vtables can take care of such special cases. Not necessarily, no. The number of PMCs that are reconstituted for a set of constant frozen PMCs is indeterminate. If we're instantiating bytecode with constant PMCs in it it's possible the class that backs those PMCs has changed and things instantiate differently than they might otherwise do. If we've frozen 20 PMCs, all we can guarantee is that when we unthaw them that we've got at least 20 PMCs, though we may have more, and the extras arguably should be allocated from the constant PMC arena (though not given slots in the constant table) so we can skip scanning the constant arenas for dead objects needing cleanup. > >> I dunno, why chill() is superior to dump() or pretty_print(), but the > >> name doesn't really matter. > > > The important thing is that it's not a vtable method. > > Ah, that's the difference. How shall the system pretty-print dynamically > loaded PMCs then, when only a bytecode-stream is available? IMHO only a > vtable in the class can perform that job. If the dynamically loaded PMC class doesn't have a backing Parrot class, you can't, and get the default, relatively primitive dump. > >> > 1) Freezing at the destruction level may *not* use any additional memory > >> > for object traversal > > > It puts a number of unpleasant constraints on the core freeze routines. > > Constructing the frozen stream definitely needs memory. I don't see the > difference, to memory consumed by a seen hash. Can you please elaborate > a bit more on this. Constructing the frozen stream will need some memory, yes. At the moment all it needs is a chunk of random memory and that's it, so we may well fail because we're out of memory. We may, however, have general pool memory handy. We can't guarantee that we have *any* headers, however, since we can legitimately be called from within the destruct phase of a DOD run, which may have been triggered by an out-of-headers condition. Depending on how we flesh things out freezing may also not require any additional memory--if we relax the requirement for freezing to allow the output to be a PMC, we may be backed directly to a file or other storage that doesn't involve RAM allocation. > > The only thing that mark does that the general traversal doesn't, in the > > abstract, is flip the object's live flag. Everything else is an > > optimization of code which we can, if we need, discard. > > Yes, mark() can be written in terms of a general traverse, which gets a > vtable function (and a data pointer). mark is basically traverse(mark, > 0). But this isn't true the other way round. You can't do freeze based > on the mark iterator. How do you pass the desired output format? What does the desired output format have to do with any of this? All marking does is put things on the list of PMCs to be visited if it hasn't already been visited, so we get to that PMC at some point as we walk the visited list. In the context of the DOD sweep it also sets the live flag, but we could, if we chose, skip that and use the presence of a non-NULL value in the mark chain address for a PMC as an indicator of liveness. (Though yes, I realize that this means potentially skipping some of the optimizations, so I'm not proposing it as a requirement for the DOD sweep implementation) > >> mark() is called permanently in a running interpreter, that does non > >> trivial things. There ar
Re: Object instantiation
On Tue, 21 Oct 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > After thinking about this a bit, it became glaringly obvious that the > > right way to instantiate an object for class "Foo" is to do: > > > new P5, .Foo > > > Or whatever the constant value assigned to the Foo class upon its creation > > is. When a class is created, it should be assigned a number, and for most > > things PMC-only classes or full-on HLL classes should behave identically. > > Yep. The question does arise, if which range class enums are? Intermixed > with enum_class_ numbers? Yes, intermixed. I added support a while back to pass in the class number to a PMC class being initialized for this very reason. The compiled-in PMCs get fixed numbers at the beginning because it's easiest, and things get referenced symbolically from there. > And - what about: > > typeof S0, P0 <=> classname S0, P0 > > (IMHO the HLL compiler can't always know, which op to use) At this point they're the same thing, I think. I'll need to think on it a bit. > And the classname of objects vs the classname of classes (the classname > PMC is in different array slots). Last I knew there was something of a fight over what class a class is in. At the moment I'm going to ignore the heck out of things and let the language folks fight over it some more. Dan
Re: Object instantiation
Try: new P0, 'std::array' # PMC new P1, 'Perl::PerlArray'# PMC (or class) new P2, 'Package::SomeClass' # Class At compile time the string can be converted to an integer enumerator. -Melvin Leopold Toetsch <[EMAIL PROTECTED]> 10/21/2003 10:24 AM Please respond to lt To: [EMAIL PROTECTED] (Dan Sugalski) cc: [EMAIL PROTECTED] Subject:Re: Object instantiation Dan Sugalski <[EMAIL PROTECTED]> wrote: > After thinking about this a bit, it became glaringly obvious that the > right way to instantiate an object for class "Foo" is to do: > new P5, .Foo > Or whatever the constant value assigned to the Foo class upon its creation > is. When a class is created, it should be assigned a number, and for most > things PMC-only classes or full-on HLL classes should behave identically. Yep. The question does arise, if which range class enums are? Intermixed with enum_class_ numbers? And - what about: typeof S0, P0 <=> classname S0, P0 (IMHO the HLL compiler can't always know, which op to use) And the classname of objects vs the classname of classes (the classname PMC is in different array slots). > Dan leo
Re: Object freezing
On Tue, 21 Oct 2003, Leopold Toetsch wrote: > Albeit I'm not convinced, that we can't have a seen hash. A seen hash most likely would: 1) Kill GC performance especially in pathological cases. The GC should be quiet and invisible. 2) Cause memory usage to double upon a mark run. -Melvin
A less controvertial API addition
While we're fighting^Wdiscussing the freezing system, there's a simpler thing we need to have added in. We need an API entry point that allows C code to invoke a sub/method PMC. This needs to be done both for the embedding API (we'll wrap it) where the embedding app will call in, but also for things like vtable functions where the actual function is parrot bytecode. Calling straight into runops looks a little too simplistic, but this'd be a good place to poke around and see what you can come up with. Dan
Re: Object instantiation
On Oct 21, 2003, at 7:14 AM, Dan Sugalski wrote: After thinking about this a bit, it became glaringly obvious that the right way to instantiate an object for class "Foo" is to do: new P5, .Foo Or whatever the constant value assigned to the Foo class upon its creation is. When a class is created, it should be assigned a number, and for most things PMC-only classes or full-on HLL classes should behave identically. Duh. That makes sense. What I keep wondering is what about things with the semantics of Perl5, in which new objects aren't instantiated directly--already-allocated things later become associated with a class. This doesn't seem quite like a case of morphing, since for instance a Perl array can be blessed into a class, but it's still a Perl array. JEff
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> writes: [...] > > > The chill and warm runtime methods take a PMC or a frozen representation > > > of a PMC (respectively) and provide a human readable version of that PMC. > > > > I dunno, why chill() is superior to dump() or pretty_print(), but the > > name doesn't really matter. > > The important thing is that it's not a vtable method. It's a function that > belongs in the freeze/thaw API as it's just an alternate encoding or > decoding. (Arguably it ought not be a separate API entry at all and just > another encoding scheme, but that requires transcoding serialization > forms, and I'd rather not get into that) This is really just a naming problem. Dan wants to call the vtable-function freeze and have different encodings for all kinds of dumping/pretty_printing/marking. Leo calls the function traverse and controlls it by callbacks. My personal opinion on this naming problem is: traverse describes more generaly what the function does. Marking live objects by freezing them in an encoding that does return nothing just sounds plain wrong. Freeze should be just a user of the general traverse function. (And this does mean it is also no vtable function) STRING *freeze(PMC *pmc, whatever *encoding) { return (STRING *)pmc->vtable->traverse(pmc, freeze_callbacks, encoding); } or even the freeze_encodings are callback_sets: freeze_xml, freeze_yaml, freeze_binary, whatever. >>> 1) Freezing at the destruction level may *not* use any additional memory >>> for object traversal >> >> What is "Freezing at the destruction level"? Is this anyhow related to >> destruction ordering? > > No. There are some valid cases where an object, after having been declared > dead by the DOD, wants to serialize itself. Persistent object stores > apparently do this, and it makes a certain amount of sense--when the > object goes out of scope the current state is flushed to disk. This is a question of what is allowed at destruction time. You don't want to allow memory allocation, but allow freezing. That gets hard, because you need at least allocate the STRING where you want to put your frozen stream. > It puts a number of unpleasant constraints on the core freeze routines. > User code can violate them and take the consequences, but we can't. We can call (hopefully) arbitary user code in destruction routines. So this argument does not count >>> Note that I do *not* want to have multiple object traversal systems in >>> parrot! We have one for DOD, and proposals have ranged upwards from there. >>> No. That is *not* happening--the chance for error is significant, the >>> side-effects of the error annoying and tough to track down for complex >>> cases (akin to the trouble with tracking down GC issues), and just not >>> necessary. (Perhaps desirable for speed/space reasons, but desirable >>> isn't necessary) Freeze is just another traversal method. Just calling it freeze instead of traverse does not change this fact. You can limit the power of encodings, but this does not change the fact that you need to walk all children >> DOD's mark() routine has different requirements then a general >> traverse() for freeze(), chill(), clone(), and destruction ordering. >> Using just mark() will have these side effects that you want to avoid. > > The only thing that mark does that the general traversal doesn't, in the > abstract, is flip the object's live flag. Everything else is an > optimization of code which we can, if we need, discard. mark() may be implemented in form of a general traverse. Let the profiler decide if a special purpose mark() or a general traverse is better. >> A general traverse() can be depth first of breadth first, mark() isn't >> required do have any specific ordering as long as it sets live bits >> everywhere. > > I'm pretty sure that with a singly linked list we can get a generally > properly-ordered flattened tree without having to do an insane number of > passes across the dead object store. I may be incorrect in this, but I > don't think so, and for our purposes the live bit can be safely ignored if > we end up setting it, though potentially with another pass over the dead > store, which may end up prohibitively expensive. We'll see. I'm pretty sure that a singly linked list is not enough. I had done some experiments with this. One pass my be enough, but you need to keep track of the tree-traversal and of the partial ordered list. These to things don't play well together. Maybe this can be cut down to two lists, or one list and one bit per pmc. >> mark() is called permanently in a running interpreter, that does non >> trivial things. There are shortcuts for scalars, DOD is highly optimized >> not to destroy cache coherency. Using mark() also implies to back out >> my small PMC patches. All the advantages of smaller scalars are gone >> then. > > All of this stuff for freezing is going to end up killing the small PMC > patch anyway, unfortunately, since we're going
Re: Object freezing
On Tue, 21 Oct 2003, Juergen Boemmels wrote: > Dan Sugalski <[EMAIL PROTECTED]> writes: > > [...] > > > > > The chill and warm runtime methods take a PMC or a frozen representation > > > > of a PMC (respectively) and provide a human readable version of that PMC. > > > > > > I dunno, why chill() is superior to dump() or pretty_print(), but the > > > name doesn't really matter. > > > > The important thing is that it's not a vtable method. It's a function that > > belongs in the freeze/thaw API as it's just an alternate encoding or > > decoding. (Arguably it ought not be a separate API entry at all and just > > another encoding scheme, but that requires transcoding serialization > > forms, and I'd rather not get into that) > > This is really just a naming problem. Dan wants to call the > vtable-function freeze and have different encodings for all kinds of > dumping/pretty_printing/marking. Leo calls the function traverse and > controlls it by callbacks. It's more than just a naming issue (or if it is, then traverse is the wrong name). The traversal must be done externally, since we can't be recursive. Mark puts a PMC on the list of PMCs to be frozen. Freeze dumps the PMC being frozen (and *only* that PMC) to the stream. The freeze routine for a PMC must mark (generally indirectly by calling the "add this pmc to the stream" api function) any PMCs that it needs to be in the stream. The external function that traverses this list of PMCs to be dumped is responsible for making sure there are no duplicates--the easiest way is to do what the DOD sweep does and note that a PMC has already been put on the list and thus not mark it. Mark and freeze are separate, though related by the subsystems that use them. > This is a question of what is allowed at destruction time. You don't > want to allow memory allocation, but allow freezing. That gets hard, > because you need at least allocate the STRING where you want to put > your frozen stream. It's more a question of what we we require the engine to do, vs what user code is allowed to do. A user program is allowed to write code that can fail at destroy time, however the infrastructure we provide (including, in this case, freezing--while I don't like it there's no choice) can't fail that way. It's the reason the DOD and GC systems don't allocate memory (or didn't--they shouldn't) when they run. The engine's not allowed to have failure modes in critical sections. Basically the engine may fail because of user code, but user code can't fail because of the engine. It makes some things annoyingly restrictive, but some problems are inherently annoyingly restrictive. > > It puts a number of unpleasant constraints on the core freeze routines. > > User code can violate them and take the consequences, but we can't. > > We can call (hopefully) arbitary user code in destruction routines. So > this argument does not count See above. User code can fail, we can't. > >> A general traverse() can be depth first of breadth first, mark() isn't > >> required do have any specific ordering as long as it sets live bits > >> everywhere. > > > > I'm pretty sure that with a singly linked list we can get a generally > > properly-ordered flattened tree without having to do an insane number of > > passes across the dead object store. I may be incorrect in this, but I > > don't think so, and for our purposes the live bit can be safely ignored if > > we end up setting it, though potentially with another pass over the dead > > store, which may end up prohibitively expensive. We'll see. > > I'm pretty sure that a singly linked list is not enough. I had done > some experiments with this. One pass my be enough, but you need to > keep track of the tree-traversal and of the partial ordered > list. These to things don't play well together. Maybe this can be cut > down to two lists, or one list and one bit per pmc. There may be a little more infrastructure--I've not dug out the algorithms books and gone hunting. The common algorithms tend to cheat by just dodging the whole problem. :) > Destruction ordering just enforces that small PMCs can't have > destructors. If they have destructors they must be big, big enough to > construct the ordered list of objects without allocating any memory. Can't have destructors *or* refer to PMCs that may either have a destructor or (indirectly) refer to a PMC that has a destructor. If we have 2 PMCs with destructors they may be connected by a chain of 100 PMCs that don't, but we still need to walk that chain. > If you think about it: The call to the destructors is done after > free_unused_pobjects completed. The memory of the objects without > destructors is already freed. Then we reorder. This can't happen, and it didn't used to happen--if that's how it works now then there's a bug in the DOD system. *All* destructors *must* be called before any headers are collected. > >> While freeze() and friends have to pull in each PMC into the cache, just > >> setting the
Re: Object freezing
On Oct 21, 2003, at 5:53 AM, Dan Sugalski wrote: Note that I do *not* want to have multiple object traversal systems in parrot! We have one for DOD, and proposals have ranged upwards from there. No. That is *not* happening--the chance for error is significant, the side-effects of the error annoying and tough to track down for complex cases (akin to the trouble with tracking down GC issues), and just not necessary. (Perhaps desirable for speed/space reasons, but desirable isn't necessary) DOD's mark() routine has different requirements then a general traverse() for freeze(), chill(), clone(), and destruction ordering. Using just mark() will have these side effects that you want to avoid. The only thing that mark does that the general traversal doesn't, in the abstract, is flip the object's live flag. Everything else is an optimization of code which we can, if we need, discard. I don't believe that is quite true. There are a couple of important differences between traversal-for-GC and traversal-for-serialization, which will be a challenge to reconcile in the one-true-traversal: 1) Serialization traversals need to "take note" of logical int and float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they can be serialized, but for GC you only need to worry about GC-able objects. It's difficult to come up with a reasonable callback which can take either int, float, or PObj arguments. 2) It's reasonable for an object to have a pointer to some sort of cache object, which is not logically part of the object, and shouldn't be serialized along with it. This needs to be traversed for GC purposes, but needs to not be traversed for serialization. (Situations such as this--physical but not logical membership--are the origin of the "mutable" keyword in C++.) 3) Traversal for GC needs to do loop detection, but can just stop going down a particular branch of the object graph once it encounters an object it's seen before. Serialization traversals would need to have a way, upon encountering an object seen before, to include in the serialization stream an indication that the current object has already been serialized, and enough information to enable deserialization code to go find it and recreate the loop. The only options I see here are either for serialization to involve the allocation of unbounded additional memory, or to expand the PObj structure to include a slot for a UUID which can be used as a back-reference in a stream, or to have serialization break loops (so that deserialized structures never have loops). I'm not 100% convinced that a single approach can't handle both applications, but it's looking as though their requirements are different enough that it may not work well. Two other questions/concerns/comments/issues: 1) I assume that ultimately a user-space iterator would end up calling the traversal code, right? If so, you can't reasonably mandate that only one traversal be in progress at one time. That would be the canonical way to compare two ordered collections--get an iterator for each, and compare element-by-element. 2) I don't see it as a huge problem that serialization code could end up creating additional objects if called from a destroy() method. (Though yes, it would be a problem for GC infrastructure code to.) I say that for two reasons: (a) destroy() methods can really do anything they want, and if that task involves allocating additional memory, that just makes it a risk to perform that task in a destroy() method--it may fail due to out-of-memory conditions. I think that Java design experts tend to argue against doing things like serialization in finalization methods. It sounds elegant, but it's problematic. One reason for this is that you tend to want to serialize structures as a whole, not piece-by-piece as they are garbage-collected. The second reason it is not always a problem in practice is that (b) a DOD run may be triggered by an out-of-headers conditions, but that doesn't mean that an additional chunk of memory for headers can't be allocated. If it can't be, then this is no more problematic that it would be in other user code--think of the case where I have some big tree of objects I want to make some sort of copy of, with the intention of then letting go of the original when I'm done. I'll be freeing up headers at the end of that process, but if I run out of memory part-way-through, then I'm just stuck. 3) I assume that not every object is assumed to be serializable? For instance, an object representing a filehandle can't really be serialized in a useful way. So I'm not sure of what sort of "fidelity" is required of a generic serialization method--that is, how similar a deserialized structure is guaranteed to be to the original. JEff
Re: A less controvertial API addition
Dan Sugalski <[EMAIL PROTECTED]> wrote: > While we're fighting^Wdiscussing the freezing system, there's a simpler > thing we need to have added in. We need an API entry point that allows C > code to invoke a sub/method PMC. What about params? I already thought about that a bit, and when looking at extent.c:Parrot_call(), it seems that this needs another thunk (the reverse of NCI), that sets up needed registers depending on a function signature. > Dan leo
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Tue, 21 Oct 2003, Leopold Toetsch wrote: >> You can append items to the constant table. You can't declare existing >> items as constant, because you can't change the underlying object pool, >> where the object was allocated. This would change the objects address. > The object's address should be irrelevant for the constant table. PMCs are > referenced in the opstream by table offset. Only in the opstream. But not when such PMCs are used then. I.e. when constant Sub PMC is refered to in the global stash. >> Ah, that's the difference. How shall the system pretty-print dynamically >> loaded PMCs then, when only a bytecode-stream is available? IMHO only a >> vtable in the class can perform that job. > If the dynamically loaded PMC class doesn't have a backing Parrot class, > you can't, and get the default, relatively primitive dump. I was thinking of plain PMCs, that where loaded to provide some special functionality. Parrot doesn't know anything about these, so will be unable to pretty print the opstream. Loaded classes OTOH as based on ParrotClass and should be printable. >> Constructing the frozen stream definitely needs memory. I don't see the >> difference, to memory consumed by a seen hash. Can you please elaborate >> a bit more on this. > Constructing the frozen stream will need some memory, yes. At the moment > all it needs is a chunk of random memory and that's it, so we may well > fail because we're out of memory. So, with the same argument I can say, (destructor level) freezing will need *system* memory for the stream plus the hash. So we may well fail. I don't see any difference. The hash hasn't to be a "fat" PerlHash. If we don't want a hash one bit inside the objects arena flags should be able to serve the same functionality - this PMC already got serialized. Anyway - how does/would freezing at destructor level look like from HLL POV? Shortly before, there ought to be a full DOD run (or all possible garbage would be frozen). At this time, the amount of still active and then to be serialized PMCs is known (an upper boundary is always known). So it should be possible to work around such constraints. > ... We may, however, have general pool > memory handy. We can't guarantee that we have *any* headers, however, > since we can legitimately be called from within the destruct phase of a > DOD run, which may have been triggered by an out-of-headers condition. I really doubt, that thawing a program (or some data of it), that died in middle of some non trivial operation, because it ran out of headers, will be of any use. >> A scalar can't contain or reference other PMCs, so it can't be a >> potential source of freeze loops. If I now spit out (PMC: Int, ID=xy, >> value=5) twice or (PMC: ID=other) doesn't really matter. thaw() can take >> care of duplicates, if needed. Other PMCs have the next_for_GC pointer. > Thaw can only properly take care of duplicates if the duplicates are > correctly indicated in the serialization stream. Identical end-values are > *not* sufficient to note multiple references to the same PMC. Sorry I thought of PMC IDs, which are the address of the frozen PMCs. >> Albeit I'm not convinced, that we can't have a seen hash. > It takes an insane amount of memory and requires header allocation. A PerlHash takes more memory, and yes. But we just need a hash of PMC addresses, or a bit inside the objects arena. We have several different traverse-like functions: * mark (DOD): called frequently, should get all possible speed * freeze (destruction): no speed issues, can't take Parrot resources * freeze (user): rarely used, can take resources * destruction ordering: only active objects to be visited * clone: can take resources thaw(freeze()), or separate vtable * dump/pretty-print: no vtable? * thaw: special class method, is different anyway The first 2 critical items have diametral usage patterns. This does not really imply, that they should be implemented based on the same scheme. > ... We > can't allocate headers, and the memory requirements are extreme. Been > there, done that, it was a bad idea. Consider this arbitrarily and > unconditionally ruled out if you're unwilling to believe the stats that > were previously posted about this. You are speaking of Storable.pm? I'm not aware of any stats regarding that. But I'm not thinking of using a full fledged hash for such a special case. > Dan leo
Re: Object freezing
Melvin Smith <[EMAIL PROTECTED]> wrote: >> Albeit I'm not convinced, that we can't have a seen hash. > A seen hash most likely would: > 1) Kill GC performance especially in pathological cases. The GC >should be quiet and invisible. > 2) Cause memory usage to double upon a mark run. GC isn't involved. A mark() run sets the live bit in the PMCs arena. No hash is needed for both cases. I have stated several times, that I don't like to mix mark() and the other traverse functions. > -Melvin leo
Re: A less controvertial API addition
On Tue, 21 Oct 2003, Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > While we're fighting^Wdiscussing the freezing system, there's a simpler > > thing we need to have added in. We need an API entry point that allows C > > code to invoke a sub/method PMC. > > What about params? I already thought about that a bit, and when looking > at extent.c:Parrot_call(), it seems that this needs another thunk (the > reverse of NCI), that sets up needed registers depending on a function > signature. We probably need two API entries. One, a vararg version, that just takes a bunch of PMC pointers (or some sort of (ick) parameter signature), and a second that assumes you've set the registers up properly already. Dan
Re: Object freezing
On Oct 21, 2003, at 6:12 AM, Dan Sugalski wrote: On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: At 08:21 -0400 10/21/03, Dan Sugalski wrote: I find the notion of an "XML header" a bit confusing, given Dan's statement to the effect that it was a throw to XML folks. I think anything "XML folks" will be interested in will entail *wrapping* stuff, not *prefixing* it. Nah, I expect what they'll want is for the entire data stream of serialized objects to be in XML format. Which is fine--they can have that. (It's why I mentioned the serialization routines can be overridden) For an XML stream the header might be version=1.0> with the rest of the stream in XML. A YAML stream would start with the rest in YAML, and teh binary format as . Or something like that, modulo actual correct XML. If you want that to be looking like valid XML, it would have to be different: error: Specification mandate value for attribute parrot ^ Better in my opinion would be something like: data yadda yadda yadda I'm not an XML guy, and I'm making all this up as I go along. If that's better, fine with me. :) Yeah, you can't put extra things in the " So are we talking about a header or a wrapper? If it is really a header, it's not XML and then it's prettyy useless from an XML point of view. We're talking about the first thing in a file (or stream, or whatever). I was under the impression that XML files should be entirely composed of valid XML, hence the need for the stream type marker being valid XML. No, XML _documents_ must be XML, but that doesn't mean that document == file. (For another example where this comes up, consider an XML document transmitted over HTTP. There are headers and other textual things in the stream along with the xml, and it's the HTTP protocol which determines where the document begins and ends, not xml's.) You can certainly have more than one XML document in a single file, but something needs to decide where an xml document begins and ends, and hand only that data to the xml parser. YAML doesn't care as much, so far as I understand, and for our own internal binary format we cna do whatever we want. If that's not true, then we can go for a more compact header. Yes, if you want the whole serialized steam to count as a well-formed xml document, then you can't but arbitrary binary data in the middle. See my previous post for why. Once again, modulo my limited and inevitably incorrect YAML knowledge. So if the header says it's XML the whole thing is valid XML, while if it doesn't the rest of the stream doesn't have to be. (Just enough of the header so that an XML processing program can examine the stream and decide that the valid XML chunk at the beginning says that the rest of the stream's not XML) Most XML parsers aren't expecting to handle this. That is, there's no such thing as a valid half-of-an-xml document, from the perspective of the xml spec, and in many cases you'd have trouble getting a parser to stop before hitting something problematic and blowing up. In other words, you can't rely on an xml parser to process something which starts out looking like xml, but isn't. Basically we want some nice, fixed (mostly) thing at the head of the stream that doesn't vary regardless of the way the stream is encoded, and XML seemed to be the most restrictive of the forms I know people will clamor for. (I know, it means the stream can't be valid Lisp-style sexprs, but XML's more widespread :) Yeah, if you're just needing to tag the stream with a label to indicate the type plus a version number, then xml's on the one hand overkill and on the other hand not necessarily a big help to xml proponents. JEff
Re: Object freezing
> Yeah, if you're just needing to tag the stream with a label to indicate > the type plus a version number, then xml's on the one hand overkill and > on the other hand not necessarily a big help to xml proponents. So, in a nutshell, throwing an XML format type tag at the beginning buys us nothing regardless of whether it's an XML stream or not? In that case, nuts to that. It's already terribly obvious I'm going to mess it up if I try, so we'll just skip it and move on to the next headache. :) (FWIW, with respect to binary data in the output stream--if an encoded format doesn't allow binary data then the encoder is responsible for changing it to a non-binary format. So for XML and YAML (and any other text encoding format, I expect) that'll likely be a base64 encoding or something) Dan
Re: Object freezing
On Tue, 21 Oct 2003, Jeff Clites wrote: > I don't believe that is quite true. There are a couple of important > differences between traversal-for-GC and traversal-for-serialization, > which will be a challenge to reconcile in the one-true-traversal: > > 1) Serialization traversals need to "take note" of logical int and > float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they > can be serialized, but for GC you only need to worry about GC-able > objects. It's difficult to come up with a reasonable callback which can > take either int, float, or PObj arguments. That's not an issue for us. A PMC is responsible for serializing itself, so if its got a string, float, or int component then it must take respnsibility for dumping those components to the serialization stream. Basically PMCs *must* dump themselves out completely, but the engine provides support to defer dumping of PMCs so that we don't get into recursive dumping and blow stack, as well as to make sure that we properly maintain multiple references to the same PMC. > 2) It's reasonable for an object to have a pointer to some sort of > cache object, which is not logically part of the object, and shouldn't > be serialized along with it. This needs to be traversed for GC > purposes, but needs to not be traversed for serialization. (Situations > such as this--physical but not logical membership--are the origin of > the "mutable" keyword in C++.) That's what custom mark routines are for, though it does argue that we should have a separate mark for freezing. > 3) Traversal for GC needs to do loop detection, but can just stop going > down a particular branch of the object graph once it encounters an > object it's seen before. Serialization traversals would need to have a > way, upon encountering an object seen before, to include in the > serialization stream an indication that the current object has already > been serialized, and enough information to enable deserialization code > to go find it and recreate the loop. The only options I see here are > either for serialization to involve the allocation of unbounded > additional memory, or to expand the PObj structure to include a slot > for a UUID which can be used as a back-reference in a stream, or to > have serialization break loops (so that deserialized structures never > have loops). The loop breaking needs for freezing are the same as for DOD sweeps, though with freezing we're at an advantage as we know where the tree starts. In all cases (I made sure this was in the example, but it might not have been clear) we only include a marker for child PMCs in the parent PMC's serialized data, and serialize the child PMCs later on in the stream. So if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the place of PMC2's data is just a marker saying "refer to PMC2 here" and then after the end of PMC1's data in the stream we dump out PMC2's data. > 1) I assume that ultimately a user-space iterator would end up calling > the traversal code, right? If so, you can't reasonably mandate that > only one traversal be in progress at one time. That would be the > canonical way to compare two ordered collections--get an iterator for > each, and compare element-by-element. While it could, I think it's infeasable to use the serialization iterator for normal user-space iteration, if only because the limits that have to be on the serialization iterator for use in restricted circumstances are a bit onerous for general use. I'm not entirely sure that parrot's going to provide this form of iteration as it stands anyway--it's not necessary for the core langauge support and while it'd be really useful there's a limit to the number of Big Problems I'm up to solving. (Having said that there may, probably will, be enough introspective capabilites to do this without engine support) > 2) I don't see it as a huge problem that serialization code could end > up creating additional objects if called from a destroy() method. User code may, parrot may not. The reasons are twofold--while parrot will let you shoot yourself in the foot, it provides the gun, not the foot. It should also be possible for carefully written destroy methods to serialize but not eat any headers or memory. (I can see this being the case in some embedded applications or systems) If we make it so freezing is not a guaranteed possibility at destroy time then this can't happen and it lessens the utility of the system some. We can, if we choose, loosen the restriction later if sufficient reason is presented. Can't really tighten it, though, so for now... > 3) I assume that not every object is assumed to be serializable? For > instance, an object representing a filehandle can't really be > serialized in a useful way. So I'm not sure of what sort of "fidelity" > is required of a generic serialization method--that is, how similar a > deserialized structure is guaranteed to be to the original. No fidelity is required at the moment,
Re: Object freezing
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: > At 12:53 -0400 10/21/03, Dan Sugalski wrote: > > > Yeah, if you're just needing to tag the stream with a label to indicate > >> the type plus a version number, then xml's on the one hand overkill and > > > on the other hand not necessarily a big help to xml proponents. > >So, in a nutshell, throwing an XML format type tag at the beginning buys > >us nothing regardless of whether it's an XML stream or not? > > Yep. But mainly I think because you'll need to encode binary data to > make it valid XML. That's on overhead you don't to suffer for those > serialization that don't need it. I had it in mind that the XML parsers were all event driven so they'd read the header and stop until prodded, and wouldn't be prodded on if it wasn't a real parrot XML serialization stream, so binary data wouldn't matter. > If you ask me, you could do easy with a simple header line like: > >parrot xml 1.0 >\0 > > basically magic word ('parrot') > followed by a space > followed by the type > followed by a space > followed by version > followed by a CRLF (not sure about this one, but could be nice) > followed by a null byte That works for me, including the crlf. Congrats, you just defined the parrot serialization header tag! :-) > I'm not clear if you would know beforehand how many bytes of data you > would receive. If that is possible to know at all time, then I would > suggest having the length as an extra part of the header. Since we're going to potentially be serializing to an on-the-fly unseekable device (i.e. dumping to a socket) so no length. > >In that case, nuts to that. It's already terribly obvious I'm going to > >mess it up if I try, so we'll just skip it and move on to the next > >headache. :) > > Which means I'll be going back to lurking mode again... ;-) Waiting to pounce, huh? :) Dan
Re: Object freezing
On Oct 21, 2003, at 10:41 AM, Elizabeth Mattijsen wrote: At 12:53 -0400 10/21/03, Dan Sugalski wrote: > Yeah, if you're just needing to tag the stream with a label to indicate the type plus a version number, then xml's on the one hand overkill and > on the other hand not necessarily a big help to xml proponents. So, in a nutshell, throwing an XML format type tag at the beginning buys us nothing regardless of whether it's an XML stream or not? Yep. But mainly I think because you'll need to encode binary data to make it valid XML. That's on overhead you don't to suffer for those serialization that don't need it. If you ask me, you could do easy with a simple header line like: parrot xml 1.0 \0 basically magic word ('parrot') followed by a space followed by the type followed by a space followed by version followed by a CRLF (not sure about this one, but could be nice) followed by a null byte Yep, that's the sort of thing that I was thinking, though I'd actually leave the CRLF (or just an LF or CR, whatever), and take out the null byte. My reason for that is that this way, if your serialization format always spits out vanilla ASCII w/o control characters, suitable for consumption by some foreign C program, then the header won't change this. (That's one of the nice features of the tar format--a tar archive of ASCII text file is itself an ASCII text file, if I recall corrrectly.) It could also be handy to allow additional "comment" text after the version (ignored by the deserialization, restricted to be ASCII w/o any CR or LF), because that would let you put in some human-readably comment to help out people trying to figure out what this file is. Some other formats to this, which is nice. Just another thought. JEff
Old Big problems before New Big problems
*sigh* I'm long overdue for a rant. I'm very happy with the progress Parrot has made, but that is because I took a year off. Otherwise, it would have been like watching a pot waiting for it to boil. However, some things have not changed, like us. We try to tackle too many NEW large problems and spend time arguing them while there are many things that we needed a year ago and still need them today. This is my list, not Dan's so he may disagree: 1) We should have a complete bytecode spec, with metadata and symbol tables and classes. 2) We should have support for classes and methods in the core 3) We should have class/method syntax in IMCC (working on this) so people can define these in a notation that makes sense. 4) We should have a complete IO system 5) We should have a regex core and functional regex compiler. We had started on this but disagreement on approach and developer bandwidth seems to have stopped this. 6) We should have at least ONE semi-stable high level language that allows us to write REAL software on Parrot and use all of the above features. Number 6 is where we really look bad compared to other VM efforts out there. We have a huge directory of partially working languages but nothing works besides trivial hello world style samples. Until we get there, I won't give a flip about whether we serialize in XML or morse code. There are also tons of academic material available on all the issues of late. Garbage collection, cylic issues, serialization, finalization and destruction. Not to even MENTION compiler development. We should be referencing books and papers in these discussions more than we have been, because these problems have been solved in various ways, many times. Working, bad implementations (straw men as some of us call it) are better than nothing at all. -Melvin
Re: Object freezing
On Oct 21, 2003, at 10:49 AM, Dan Sugalski wrote: On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: At 12:53 -0400 10/21/03, Dan Sugalski wrote: Yeah, if you're just needing to tag the stream with a label to indicate the type plus a version number, then xml's on the one hand overkill and on the other hand not necessarily a big help to xml proponents. So, in a nutshell, throwing an XML format type tag at the beginning buys us nothing regardless of whether it's an XML stream or not? Yep. But mainly I think because you'll need to encode binary data to make it valid XML. That's on overhead you don't to suffer for those serialization that don't need it. I had it in mind that the XML parsers were all event driven so they'd read the header and stop until prodded, and wouldn't be prodded on if it wasn't a real parrot XML serialization stream, so binary data wouldn't matter. The event-based parsers (such as expat and other SAX parsers) tend to be push instead of pull, so you hand them your bytes and they invoke your callbacks (as opposed to pull-style in which you'd ask for the next event). Sometimes you can hand them your bytes in chunks (and they'll process what they can and save up the rest to include with your next chunk), so with expat (for example) you could probably do what you wanted, but you'd have to hand it data one byte at a time until your first callback was invoked, then stop. So it could probably be done with some parsers, but it would be an unusual usage, and more overhead than it's worth to just parse out a couple of strings. :) JEff
Re: Object freezing
On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: > At 13:49 -0400 10/21/03, Dan Sugalski wrote: > >On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: > Hmmm... maybe as an optimilization, something that would fit in 4 or > 8 bytes would be better for the magic string (so a single or double > integer check would be suffcient?). > >prrt (4 byte) > >ParrotDS (8 byte) > > (DS for Data Stream, rather than what you think, Dan ;-) Yeah, that's a better option, I think. (And no, I didn't figure it stood for anything else--that thing in the build file is embarrassing enough :-P) > > > I'm not clear if you would know beforehand how many bytes of data you > > > would receive. If that is possible to know at all time, then I would > > > suggest having the length as an extra part of the header. > >Since we're going to potentially be serializing to an on-the-fly > >unseekable device (i.e. dumping to a socket) so no length. > > Ok, so how is the encoder to know that no more data will come? Good point. We'll have to have an end encoding entry in the encoding API, the same way we'll need a begin encoding. The encoding format itself defines the end of encoding marker and will know when it's hit the end of the data stream. Dan
Re: Object freezing
At 08:21 -0400 10/21/03, Dan Sugalski wrote: > I find the notion of an "XML header" a bit confusing, given Dan's statement to the effect that it was a throw to XML folks. I think anything "XML folks" will be interested in will entail *wrapping* stuff, not *prefixing* it. Nah, I expect what they'll want is for the entire data stream of serialized objects to be in XML format. Which is fine--they can have that. (It's why I mentioned the serialization routines can be overridden) For an XML stream the header might be with the rest of the stream in XML. A YAML stream would start with the rest in YAML, and teh binary format as . Or something like that, modulo actual correct XML. If you want that to be looking like valid XML, it would have to be different: error: Specification mandate value for attribute parrot ^ Better in my opinion would be something like: data yadda yadda yadda At least this would be a valid stand-alone XML container. And possibly parsers out there can be coerced into leaving the rest of the stream for other processes to be read. This way we have a single, fixed-format type/version header, which makes the initial identification easier and less error-prone. (Possibly even fit for file and programs of its ilk to note) The binary format won't care, and teh YAML format shouldn't care (as long as the indenting's right) but the XML format would, so it seems to make sense to use the XML stuff for the initial header. So are we talking about a header or a wrapper? If it is really a header, it's not XML and then it's prettyy useless from an XML point of view. Liz
[perl #24260] [PATCH] to build under win32
# New Ticket Created by "Nick Kostirya" # Please include the string: [perl #24260] # in the subject line of all future correspondence about this issue. # http://rt.perl.org/rt2/Ticket/Display.html?id=24260 > [PATCH] to build under win32 1. MS compiler do not support struct with empty body. 2. remove unistd.h patch -p0 < win.patch tesed on WinNT (MSVC 6.0) and Linux with gcc version 2.95.4 20011002 (Debian prerelease) -- attachment 1 -- url: http://rt.perl.org/rt2/attach/66327/49559/636663/win.patch win.patch Description: win.patch
[perl #24261] [PATCH] for t\harness under win32
# New Ticket Created by "Nick Kostirya" # Please include the string: [perl #24261] # in the subject line of all future correspondence about this issue. # http://rt.perl.org/rt2/Ticket/Display.html?id=24261 > cmd.exe of WinNT do not convert t\src\*.t into list files. D:\CvsProjects\parrot>nmake test D:\Programs\Perl\bin\perl.exe t\harness t\src\*.t t\src\*t\src\*.t does not exist FAILED--1 test script could be run, alas--no output ever seen NMAKE : fatal error U1077: 'D:\Programs\Perl\bin\perl.exe' : return code '0x2' Stop. Apply patch into "t" directory: patch < harness.patch -- attachment 1 -- url: http://rt.perl.org/rt2/attach/66334/49567/4ca3a6/harness.patch harness.patch Description: harness.patch
[perl #24262] [PATCH] for t\harness under win32
# New Ticket Created by "Nick Kostirya" # Please include the string: [perl #24262] # in the subject line of all future correspondence about this issue. # http://rt.perl.org/rt2/Ticket/Display.html?id=24262 > cmd.exe of WinNT do not convert t\src\*.t into list files. D:\CvsProjects\parrot>nmake test D:\Programs\Perl\bin\perl.exe t\harness t\src\*.t t\src\*t\src\*.t does not exist FAILED--1 test script could be run, alas--no output ever seen NMAKE : fatal error U1077: 'D:\Programs\Perl\bin\perl.exe' : return code '0x2' Stop. Apply patch into "t" directory: patch < harness.patch
Re: Object freezing
At 12:53 -0400 10/21/03, Dan Sugalski wrote: > Yeah, if you're just needing to tag the stream with a label to indicate the type plus a version number, then xml's on the one hand overkill and > on the other hand not necessarily a big help to xml proponents. So, in a nutshell, throwing an XML format type tag at the beginning buys us nothing regardless of whether it's an XML stream or not? Yep. But mainly I think because you'll need to encode binary data to make it valid XML. That's on overhead you don't to suffer for those serialization that don't need it. If you ask me, you could do easy with a simple header line like: parrot xml 1.0 \0 basically magic word ('parrot') followed by a space followed by the type followed by a space followed by version followed by a CRLF (not sure about this one, but could be nice) followed by a null byte I'm not clear if you would know beforehand how many bytes of data you would receive. If that is possible to know at all time, then I would suggest having the length as an extra part of the header. In that case, nuts to that. It's already terribly obvious I'm going to mess it up if I try, so we'll just skip it and move on to the next headache. :) Which means I'll be going back to lurking mode again... ;-) Liz
Re: Object freezing
At 13:49 -0400 10/21/03, Dan Sugalski wrote: On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote: > Yep. But mainly I think because you'll need to encode binary data to > make it valid XML. That's on overhead you don't to suffer for those > serialization that don't need it. I had it in mind that the XML parsers were all event driven so they'd read the header and stop until prodded, and wouldn't be prodded on if it wasn't a real parrot XML serialization stream, so binary data wouldn't matter. > If you ask me, you could do easy with a simple header line like: parrot xml 1.0 \0 basically magic word ('parrot') followed by a space followed by the type followed by a space followed by version followed by a CRLF (not sure about this one, but could be nice) > followed by a null byte That works for me, including the crlf. Congrats, you just defined the parrot serialization header tag! :-) Hmmm... maybe as an optimilization, something that would fit in 4 or 8 bytes would be better for the magic string (so a single or double integer check would be suffcient?). prrt (4 byte) ParrotDS (8 byte) (DS for Data Stream, rather than what you think, Dan ;-) > I'm not clear if you would know beforehand how many bytes of data you > would receive. If that is possible to know at all time, then I would > suggest having the length as an extra part of the header. Since we're going to potentially be serializing to an on-the-fly unseekable device (i.e. dumping to a socket) so no length. Ok, so how is the encoder to know that no more data will come? > >In that case, nuts to that. It's already terribly obvious I'm going to >mess it up if I try, so we'll just skip it and move on to the next > >headache. :) > Which means I'll be going back to lurking mode again... ;-) Waiting to pounce, huh? :) I wish. ;-) For now, I'm more the hidden dragon rather than the crouching tiger... ;-( Liz
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Tue, 21 Oct 2003, Juergen Boemmels wrote: >> You know we already have two versions of pobject_lives lying around. > Then we need to fix that, too. One is with ARENA_DOD_FLAGS one w/o. If you are trying to implement your universal mark() for everything, one is obsolete anyway. > Dan leo
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > So, in a nutshell, throwing an XML format type tag at the beginning buys > us nothing regardless of whether it's an XML stream or not? Yes. That's what people say :) What about a well known format called PBC. (Parrot bortable^Wbyte code :) It knows about en/decoding basic types. A PMC doesn't need a lot more. > Dan leo
Re: Object freezing
Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Tue, 21 Oct 2003, Jeff Clites wrote: >> 1) Serialization traversals need to "take note" of logical int and >> float slots > That's not an issue for us. A PMC is responsible for serializing itself, > so if its got a string, float, or int component then it must take > respnsibility for dumping those components to the serialization stream. > Basically PMCs *must* dump themselves out completely, but the engine > provides support to defer dumping of PMCs so that we don't get into > recursive dumping That's what my general traversal routine was intended for. A PerlHash may have native datatypes as well as PMCs as data members, plus the STRING keys, which are references. The hash itself and plain data members can get serialized/frozen/dumped whatever. The callback takes care of the desired action. PMCs inside (especially aggregates of any kind) would get postponed (only the ID or address needs to be serialized). If that is now done via the next_for_GC pointer, a seen hash a bitmap or whatever is debatable and seems (when destructor level freezing comes in) to be not too simple. But using mark() for it doesn't meat the goal. Its a different thing. It sets the live bit on objects. > That's what custom mark routines are for, though it does argue that we > should have a separate mark for freezing. Which can be traverse or visit or whatever, but different. >> 3) Traversal for GC needs to do loop detection, It stops, when the live bit is set or sets a live bit and places aggregates on an todo list. Its by far simpler then freezing. Its an optimization - yes. > The loop breaking needs for freezing are the same as for DOD sweeps, s/are/can be/ > In all cases (I made sure this was in the example, but it might not have > been clear) we only include a marker for child PMCs in the parent PMC's > serialized data, and serialize the child PMCs later on in the stream. So > if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the > place of PMC2's data is just a marker saying "refer to PMC2 here" and then > after the end of PMC1's data in the stream we dump out PMC2's data. That's clear. But plain scalars don't have child PMCs. Freez em and be done with them. There is no need to put these on a next_for_GC list, mark() doesn't do it (anymore) and freeze doesn't have to do it. I don't see the small PMCs approach dieing because of that. > Dan leo
Re: Old Big problems before New Big problems
On Tue, 21 Oct 2003, Melvin Smith wrote: > This is my list, not Dan's so he may disagree: The list is a valid one, and the complaint is real. We've been taking things out of order in part because it works for me, but I've got a bigger picture than anyone else and that's not necessarily a great way to work. (Better be careful, we'll offer you the pumpkin once Leo's done with it... :) FWIW, the cranking about the freeze mechanisms (though not really the format) is directly applicable to some of the points--it's necessary for PMC constants, which we need in part to finalize the bytecode format. We also need object stuff, though, so it is time to stop grumbling about it and get it done. Dan
Re: [perl #24261] [PATCH] for t\harness under win32
"Nick Kostirya" (via RT) <[EMAIL PROTECTED]> writes: > cmd.exe of WinNT do not convert t\src\*.t into list files. > > D:\CvsProjects\parrot>nmake test > D:\Programs\Perl\bin\perl.exe t\harness t\src\*.t > t\src\*t\src\*.t does not exist > FAILED--1 test script could be run, alas--no output ever seen > NMAKE : fatal error U1077: 'D:\Programs\Perl\bin\perl.exe' : return code > '0x2' > Stop. Applied (except white-space) identical patch which i had already in my tree. boe
Class metadata for PIR/assembly files
Here's the scoop: Metadata for classes is simple. In PIR/assembly, they're noted with .things: .class Foo .is bar .is baz .does some_thing .member x .member y .member z .ssalc Unless someone tells me that ssalc is horribly obscene in some relatively common language, and we may still if the translation amuses me sufficiently. Keywords are simple for the metadata. .class starts the declaration, has a single parameter the name. Class declarations end with .ssalc. Each .is defines a parent class, each .does defines an interface the class supports, and each .member defines a PMC member slot that each object. If a class is defined in the bytecode, it gets instantiated when the bytecode is created. (It's a constant class, though like any other class is mutable at runtime so it's not that constant) There is no difference between a class created with metadata and one created by executable code piecemeal. Classes, when instantiated, have a backing namespace that's identical to the class name. We will be adding version metadata to the classes, but that's going to be deferred. It's OK for the code that handles PIR and assembly to ignore this for the moment, at least until the metadata segment is better defined. Which will be soon, though I'd rather someone else do the bytecode modification as it's been a long time since I've had my hand in there. This would be a good time to comment on the metadata, as I'm about to go finish defining the ops to create classes dynamically and actually finish the fscking object.c. code to do it. Dan
Re: Taint mode testing and project Phalanx
On Mon, 20 Oct 2003, Michael G Schwern wrote: > On Tue, Oct 21, 2003 at 12:24:03AM -0500, Dave Rolsky wrote: > > On Mon, 20 Oct 2003, Andrew Savige wrote: > > > I noticed in Test::Tutorial: > > > "Taint mode is a funny thing. It's the globalest of all global features. > > > Once you turn it on it effects all code in your program and all modules > > > used (and all the modules they use). If a single piece of code isn't > > > taint clean, the whole thing explodes. With that in mind, it's very > > > important to ensure your module works under taint mode." > > > > Not to mention that it's buggy as hell. For example, in various versions > > of Perl I've used there have been rather serious bugs in the regex engine > > when taint mode is on, even when dealing with untainted variables! > > I've never hit anything like this. Do you have examples? Well, one example comes from my Params::Validate module, where I have this little bit of XS: while (he = hv_iternext(p)) { /* This may be related to bug #7387 on bugs.perl.org */ #if (PERL_VERSION == 5) if (! PL_tainting) #endif SvGETMAGIC(HeVAL(he)); Whee, a random taint related bug. Then there was the time I found that pos() didn't get updated inside s/\G...//gc matches when taint mode was on, for certain versions of Perl working with some strings (but not others). I don't think this bug exists in the current version any more. I could never reproduce this in a concise example, unfortunately. Anyway, my taint mode experience has been that random things break in very weird ways when using it. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: Taint mode testing and project Phalanx
On Tue, Oct 21, 2003 at 12:34:44PM -0500, Dave Rolsky wrote: > > Anyway, my taint mode experience has been that random things break in very > weird ways when using it. I'd guess that many extensions don't handle magic properly. Extension authors rarely add the extra logic, even if they know what logic needs to be added. Same possible applies to more obscure parts of perl. Proof of concept, for anyone that has the time: modify perl with a #idef so that all values are tainted but disable the tainted expression exception so that they're harmless. See what tests fail. Tim.
Re: No more code coverage
--- Tim Bunce <[EMAIL PROTECTED]> wrote: > > I'll look into SQLite. > > I'd caution against rushing in any particular direction without some > profiling information to back it up. > > Having said that, I'd strongly recommend switching to Storable first. > It did have problems but it's now very robust and far, far, faster > than Data::Dumper+eval. This small change would yield a big gain. > > The next step would be to get some profile information. There's > little point in doing that first as Data::Dumper+eval will dwarf > time spent elsewhere. It's not performance that's killing Devel::Cover when we run tests. It's that the data structure for the coverage data appears to be built in-memory and it's so huge that I run out of memory (and this is on a machine with a couple of gigs of RAM). If it's not the data structure being built but instead is the conversion to Data::Dumper format, then ignore what I say :) Cheers, Ovid = Silence is Evilhttp://users.easystreet.com/ovid/philosophy/indexdecency.htm Ovid http://www.perlmonks.org/index.pl?node_id=17000 Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/ __ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com
Re: [RfC] and [PATCH]: Libraries
On Oct 15, 2003, at 4:52 AM, Juergen Boemmels wrote: I spent the last day getting parrot running under Borland. The attached patch is whats need to get linking and running make test on both Windows/Borland and Linux/gcc. I'm not sure if its ready for inclusion in the tree, but I want some feedback on the approach. The main problem is that Borland can't build a single static library (at least I did not find out) with two files of the same name. But there are some name clashes: intlist.o and classes/intlist.o or stacks.o and languages/imcc/stacks.o. I solved this by seperating libparrot in three partial libs: classes/classes.a containing all object-files of classes/ ; languages/imcc/imcc.a containing all object-files of imcc and blib/lib/libparrot.a for all the rest. (This names need cleanup; shouldn't they all go to blib/lib?). classes/ is still build by its own Makefile, this should be integrated in the root-Makefile, but thats another story. Next problem is library interdependence. classes.a depends on libparrot.a and libparrot.a depends on classes.a. This complicates linking a bit. The gnu linker does not revisit previous files so the link line has to contain something like libparrot.a classes.a libparrot.a A new configure variable parrot_libs takes care of this Since no one else commented, I'll give you my two cents. I think there are 4 other options in addition to your proposal: 1) Find out for sure if Borland has a way to build this into a single library, if you're not 100% certain. Don't know if there are any Borland experts on the list. 2) Build separate libs, but only under Borland. 3) Rename the files with duplicate names so that they don't conflict--this might be worth doing anyway. 4) When building under Borland, make copies of the offending files under different names, and build those. (e.g., make a copy of languages/imcc/stacks.c called imcc-stacks.c, and build that instead) I'd say that (4) is the cleanest (it only affect the one problematic environment, and you get to keep a single library everywhere), but as I said I think (3) might be useful anyway (just to make it easier to refer to files in conversation). Option (3) is also dead-easy, but we'd lose cvs history for the renamed files. Having multiple libs isn't the end of the world, but it would be a shame to have to do it because of a particular compiler/linker quirk. JEff
Re: Object freezing
On Tue, Oct 21, 2003 at 09:12:27AM -0400, Dan Sugalski wrote: | We're talking about the first thing in a file (or stream, or whatever). I | was under the impression that XML files should be entirely composed of | valid XML, hence the need for the stream type marker being valid XML. YAML | doesn't care as much, so far as I understand, and for our own internal | binary format we cna do whatever we want. As for 'autodetecting' XML vs YAML, an earlier version of the YAML spec restricted plain-style mapping keys so that they could not start with the '<' character. In this way, a processor could auto-detect if the incoming stream was XML or YAML, and use the appropriate parser. With the restricted schema described below, a small-footprint XML parser could even be shipped with the core libyaml allowing us (the YAML team) to handle this XML compatibility requirement; especially with regard to SOAP, the defacto XML object 'serialization' schema. I wrote up a brief 'sketch' as to one option for interoperability between XML and YAML, although there are many such options, and soon the yaml-core list will be forced to discuss icky things like this. My thoughts (which are _not_ concensus in the YAML community) are found at http://www.yaml.org/xml.html ; this page gives the usually "invoice" example in XML, and an imperfect XSLT stylesheet for converting XML in this schema to YAML. Clearly more work is needed here, I would very much like to hear your requirements. Kind Regards, Clark P.S. I try to follow this list, but I often miss items, so if you put 'YAML' in the title and cc me on it, it will surely get my attention.
Re: Object freezing
On Tue, 21 Oct 2003, Clark C. Evans wrote: > On Tue, Oct 21, 2003 at 09:12:27AM -0400, Dan Sugalski wrote: > | We're talking about the first thing in a file (or stream, or whatever). I > | was under the impression that XML files should be entirely composed of > | valid XML, hence the need for the stream type marker being valid XML. YAML > | doesn't care as much, so far as I understand, and for our own internal > | binary format we cna do whatever we want. > > As for 'autodetecting' XML vs YAML We don't have to! Woohoo! :) This is one problem I didn't want to go into, so the encoding would be explicit in the header. Since we've now dodged even the pretense of guaranteed minimally valid anything in the stream header, the point's moot, which is nice. For YAML encoding, like for XML (and the default native encoding, which'll probably be parrot bytecode) everything after the end-of-header will be proper, well-formed whatever. Unless the encoder's messed up, of course. ;) Dan
Re: Object freezing
On Tue, 21 Oct 2003, Clark C. Evans wrote: > If you are going to go this far (including content-length) may I > just suggest using a MIME envelope? This has several advantages: This is a very good idea, but not this time, as it's too easy to get stuck in the endless churn of very good ideas and alternatives. The simple header format, without the null byte (taking it out is a good idea, since we may have the possibility of an all-text file in that case), is the way we're going to go. Maybe for version 2.0, but we've hit the Good Enough point here. Dan
Class creation in bytecode
Okay, since nobody took advantage of the, oh, at lesat 2 or 3 minutes since the metadata spec post, here's the equivalent for assembly. I'll stub in and commit the stubbed object.ops ops in a bit. We've already got ops to create a class standalone, and to subclass an existing class. We're also going to add the following ops: addparent Px, Py removeparent Px, Py To add and remove class Y as a parent of class X addattrib Ix, Py, Sz removeattrib Px, [IS]y To add attribute Z to class Y. X gets the attribute offset. removeattrib removes attribute #y or named Y (depending on whether it's a string or int) from the class To add or remove an implemented interface: adddoes Px, Sy removedoes Px, Sy Instantiate, as implemented, is dead. I'm going to nuke it, then use it for instantiating classes via metadata chunks. That's next message. Dan
Re: Object freezing
At 15:18 -0400 10/21/03, Dan Sugalski wrote: On Tue, 21 Oct 2003, Clark C. Evans wrote: > If you are going to go this far (including content-length) may I > just suggest using a MIME envelope? This has several advantages: This is a very good idea, but not this time, as it's too easy to get stuck in the endless churn of very good ideas and alternatives. I would think the MIME-envelope would have to be part of the data, rather than in the header. Or am I missing anything? It's encode/decoder determined, is it not? The simple header format, without the null byte (taking it out is a good idea, since we may have the possibility of an all-text file in that case), I have no particular feeling about the null byte. It would just be a convenience when debugging as it would allow you to just print the string, as it would be null-delimited. The CRLF (or just the CR or just the LF) could serve as an end of header marker just as well. Liz
Re: Object freezing
On Tue, Oct 21, 2003 at 07:41:08PM +0200, Elizabeth Mattijsen wrote: | If you ask me, you could do easy with a simple header line like: | | parrot xml 1.0 | \0 | | basically magic word ('parrot') | followed by a space | followed by the type | followed by a space | followed by version | followed by a CRLF (not sure about this one, but could be nice) | followed by a null byte | | I'm not clear if you would know beforehand how many bytes of data you | would receive. If that is possible to know at all time, then I would | suggest having the length as an extra part of the header. If you are going to go this far (including content-length) may I just suggest using a MIME envelope? This has several advantages: - there are already readers for the format - it allows you to specify the 'Content-Type' as, say binary/parrot or text/yaml or text/xml - it gives you a place to put 'Content-Length' - it is extensible, allowing for other headers - it allows you to include other 'binary' blobs in the same file Best, Clark
Re: Object freezing
On Tue, 21 Oct 2003, Clark C. Evans wrote: > Back to the YAML list... sorry for interloping! Ah, you weren't interloping--it is a good idea. You just managed to come in on the other side of Good Enough today. :) Dan
Re: Object freezing
Dan/Elizabeth, Thank you for considering my response, let me rephrase and then I'll go back to my own list (*grins*). On Tue, Oct 21, 2003 at 09:25:48PM +0200, Elizabeth Mattijsen wrote: | At 15:18 -0400 10/21/03, Dan Sugalski wrote: | >On Tue, 21 Oct 2003, Clark C. Evans wrote: | > > If you are going to go this far (including content-length) may I | > > just suggest using a MIME envelope? This has several advantages: | >This is a very good idea, but not this time, as it's too easy to get stuck | >in the endless churn of very good ideas and alternatives. I should have just suggested RFC822 like headers (used by E-Mail and HTTP) and not imply that MIME, with its multi-part and encoding glory need be supported. In other words, the header could simply be: Parrot-version: 0.3 Content-type: binary/parrot<- or text/yaml, text/xml Content-length: 49384 <- blank line (binary payload) This has the advantages of: (a) satisifies mentioned requirements: version, type, and size (a) easy to parse, well known syntax (b) fits in well with Intranet infrastructure (c) easy to extend down the road, ie, more headers can be added | I would think the MIME-envelope would have to be part of the data, | rather than in the header. Or am I missing anything? It's | encode/decoder determined, is it not? Well, it seemed you were making an 'envelope', and this is exactly what RFC822 is all about. In particular, all of the items you wanted to put in your header could be done easily with RFC822. | >The simple header format, without the null byte (taking it out is a good | >idea, since we may have the possibility of an all-text file in that case), | | I have no particular feeling about the null byte. It would just be a | convenience when debugging as it would allow you to just print the | string, as it would be null-delimited. The CRLF (or just the CR or | just the LF) could serve as an end of header marker just as well. RFC822 uses a "blank line", that is two adjacent "CRLF" items to mark the end of the header. Back to the YAML list... sorry for interloping! Clark
Re: Object freezing
At 12:56 -0700 10/21/03, Clark C. Evans wrote: On Tue, Oct 21, 2003 at 09:25:48PM +0200, Elizabeth Mattijsen wrote: | At 15:18 -0400 10/21/03, Dan Sugalski wrote: | >On Tue, 21 Oct 2003, Clark C. Evans wrote: | > > If you are going to go this far (including content-length) may I | > > just suggest using a MIME envelope? This has several advantages: | >This is a very good idea, but not this time, as it's too easy to get stuck | >in the endless churn of very good ideas and alternatives. I should have just suggested RFC822 like headers (used by E-Mail and HTTP) and not imply that MIME, with its multi-part and encoding glory need be supported. In other words, the header could simply be: Parrot-version: 0.3 Content-type: binary/parrot<- or text/yaml, text/xml Content-length: 49384 <- blank line (binary payload) But do we always need that? In my idea it would be something like: prrt 1.0 yaml# prrt = magic word, 1.0 = parrot header version, yaml = encode ID Parrot-version: 0.3 Content-type: binary/parrot Content-length: 49384 (binary payload) The Parrot header line would just be enough to get the right decoder, whatever the decoder does with the rest of the stream, is up to the decoder. Another example with XML: prrt 1.0 xml # assume XML xmlns:parrot="http://www.parrotcode.org/0.3"; # implies MIME-encode binary data (mime encoded binary data) And another one with an oldy but goody? prrt 1.0 storable # assume storable (whatever Storable.pm puts in its magic) Hope this made sense. Liz
Re: Class creation in bytecode
All~ Dan Sugalski wrote: To add or remove an implemented interface: adddoes Px, Sy removedoes Px, Sy Instantiate, as implemented, is dead. I'm going to nuke it, then use it for instantiating classes via metadata chunks. That's next message. Just a thought, but (add/remove)interface seems a little more undetstandable... Matt PS-Dan, what happened to you sig? I rather liked it.
Re: Taint mode testing and project Phalanx
On Tue, Oct 21, 2003 at 12:34:44PM -0500, Dave Rolsky wrote: > Anyway, my taint mode experience has been that random things break in very > weird ways when using it. All the more reason to test with it on. :) -- Michael G Schwern[EMAIL PROTECTED] http://www.pobox.com/~schwern/ Do not try comedy at home! Milk & Cheese are advanced experts! Attempts at comedy can be dangerously unfunny!
Re: Taint mode testing and project Phalanx
Michael G Schwern wrote: > On Tue, Oct 21, 2003 at 12:34:44PM -0500, Dave Rolsky wrote: >> Anyway, my taint mode experience has been that random things break in very >> weird ways when using it. > > All the more reason to test with it on. :) Given the differences in behaviour with taint mode, it seems to me that for a "taint mode test" (i.e. one with -wT in its first line) Test::Harness should run the test twice -- once with taint mode and once without. Though I suppose there might be a case where you want to run the test in taint mode only, so maybe Test::Harness needs some options to control this. /-\ http://personals.yahoo.com.au - Yahoo! Personals New people, new possibilities. FREE for a limited time.
Re: Class metadata for PIR/assembly files
Dan Sugalski wrote: Here's the scoop: Metadata for classes is simple. In PIR/assembly, they're noted with .things: .class Foo .is bar .is baz .does some_thing .member x .member y .member z .ssalc Unless someone tells me that ssalc is horribly obscene in some relatively common language, and we may still if the translation amuses me sufficiently. Keywords are simple for the metadata. .class starts the declaration, has a single parameter the name. Class declarations end with .ssalc. Each .is defines a parent class, each .does defines an interface the class supports, and each .member defines a PMC member slot that each object. If a class is defined in the bytecode, it gets instantiated when the bytecode is created. (It's a constant class, though like any other class is mutable at runtime so it's not that constant) There is no difference between a class created with metadata and one created by executable code piecemeal. Classes, when instantiated, have a backing namespace that's identical to the class name. We will be adding version metadata to the classes, but that's going to be deferred. It's OK for the code that handles PIR and assembly to ignore this for the moment, at least until the metadata segment is better defined. Which will be soon, though I'd rather someone else do the bytecode modification as it's been a long time since I've had my hand in there. This would be a good time to comment on the metadata, as I'm about to go finish defining the ops to create classes dynamically and actually finish the fscking object.c. code to do it. Will there be a way to specify which methods belong to the class in the metadata? Or will Method namespaces just have to match class names so that a lookup can be done? -Joe
Re: Class metadata for PIR/assembly files
At 07:44 PM 10/21/2003 -0400, Joseph Ryan wrote: Dan Sugalski wrote: Here's the scoop: Metadata for classes is simple. In PIR/assembly, they're noted with .things: .class Foo .is bar .is baz .does some_thing .member x .member y .member z .ssalc Will there be a way to specify which methods belong to the class in the metadata? Or will Method namespaces just have to match class names so that a lookup can be done? I was planning a .method directive. I like the feel of separate .field and .method directives. I like supporting 2 variations like C++, however this is only an intermediate language so it really doesn't matter. .class Foo .method InlineMeth (code) .endmeth .method NotInline ... .endclass .method Foo.NotInLine (code) .endmethod Using out of line definitions with inline declarations means the compiler can be single pass and simpler, however most decent compilers will do a separate semantic pass so forward declarations are easy. On the other hand, inline method definitions makes code emitting a little simpler. I see no real big technical problem with supporting both syntax, I think it is more proof-of-concept than anything since eventually we will pass an syntax tree form to the compiler instead. -Melvin
Re: Class metadata for PIR/assembly files
At 02:55 PM 10/21/2003 -0400, Dan Sugalski wrote: Here's the scoop: Metadata for classes is simple. In PIR/assembly, they're noted with .things: .class Foo .is bar .is baz .does some_thing .member x .member y .member z .ssalc Unless someone tells me that ssalc is horribly obscene in some relatively common language, and we may still if the translation amuses me sufficiently. I'm sure ssalc must mean something bad somewhere. Technically nothing is stopping us from using .end for everything since we are using a LALR parser and don't need fancy error reporting, Classes, when instantiated, have a backing namespace that's identical to the class name. Good. So do we support :: or . for scope resolution? Or both? It's OK for the code that handles PIR and assembly to ignore this for the moment, at least until the metadata segment is better defined. Which will be soon, though I'd rather someone else do the bytecode modification as it's been a long time since I've had my hand in there. Well we can hide this under PIR. Once PIR is set, we can start by implementing on the fly class creation, then change IMCC to emit metadata when the rest is in. That way HL languages don't have to change later. For now we just have IMCC emit newclass, etc. and manually construct the classes. -Melvin
Re: Taint mode testing and project Phalanx
On Tue, 21 Oct 2003, Michael G Schwern wrote: > On Tue, Oct 21, 2003 at 12:34:44PM -0500, Dave Rolsky wrote: > > Anyway, my taint mode experience has been that random things break in very > > weird ways when using it. > > All the more reason to test with it on. :) At this point I've become rather disgusted with it. When taint mode breaks pos(), and as a result your regex-based parser blows up in weird ways, and you spend many, many hours figuring out what exactly is happening, and then can't reduce it to a simple test case, you tend to get a little peeved. Tim's #ifdef idea for testing taint mode seems like a really good idea. Once I know it's well tested in the core, I'll be happy to test my own modules with it. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/