Re: Auto-install (was autoloaded...)

2001-02-09 Thread Bart Lateur

On Thu, 8 Feb 2001 17:39:01 +, Nicholas Clark wrote:

>On Thu, Feb 08, 2001 at 12:26:59PM -0500, Dan Sugalski wrote:
>> (Including Archive::Tar as part of the base perl distribution's not 
>> inappropriate, assuming we can get permission. )

Since it's already part of the "standard distribution" for Win32, both
ActiveState and IndigoStar, as both use it in order to install packages,
I guess that that won't be much of a problem.

>Do we really want to use tar format (over say cpio) as tar rounds files
>up to 512 block boundaries, and has some arbitrary restrictions on filename
>lengths in the headers?

You don't need to use raw .tar. Archive::Tar works perfectly fine with
.tar.gz files. Compression effectively takes care of the padding with
nulls, because they get "compressed away". The compression is handled by
the Compress::Zlib module. So if there is any redistribution licensee
problem, I expect it to be there.

-- 
Bart.



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Bart Lateur

On Thu, 8 Feb 2001 17:49:45 -0200, Branden wrote:

>I've never actually used PPM, only read about it in
>the web. I guess their file format is a disguised .tar.gz, right?

It's a combination of an XML file, file extension "PPD", which describes
the properties and dependencies, and platforms, and a normal .tar.gz
file. The latter can come straight from CPAN.

-- 
Bart.



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Michael G Schwern

On Thu, Feb 08, 2001 at 01:40:52PM -0500, Dan Sugalski wrote:
> >Seperated documentation is no documentation.
> 
> At some point things are going to get split out, unless you wedge the docs 
> into the actual program itself. (You were, after all, talking about config 
> files and XS modules, and those can't usefully stay inside the archive)

True, but those things are essential to making the program work, and
their loss will be dearly (and immediately) noted.  Documentation is
the first thing to get ditched in any install process.  I was mostly
thinking perldoc (yes, wedging the docs into the program) *plus* the
normal man page generation and install.  That's how I normally do
things.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Only mindless violence can raise my spirits now!



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Branden

Clayton Scott wrote:
>  PPM uses files containing a modified form of the Open Software
>  Distribution (OSD) specification for information about software
>  packages. These description files, which are written in Extensible
>  Markup Language (XML) code, are referred to as 'PPD' files.
>  Information about OSD can be found at the W3C web site (at the
>  time of this writing, http://www.w3.org/TR/NOTE-OSD.html). The
>  modifications to OSD used by PPM are documented in PPM::ppd.

Read it about OSD. I think it's good to use it for representing the
dependencies. It coulde be used by the `par' utility to build the archive.
It should probably be included in the archive as well, for checking the
dependencies and platform stuff. The OSD stuff is very `push' technology
oriented, which would match our definition of `automatic', what has been
described here as `evil' and `insecure', but using the DTD could provide
interoperability with other tools that support OSD. It's also
Java/Microsoft-oriented, so I'll see what PPM::ppd says about it to make it
usable by Perl.

The problem I see with PPM is that, AFAIK, it handles only singular modules,
it can't directly handle bundles of modules and also scripts. As it should
be possible, it probably wouldn't be easy and neither standardized. (Please
correct me if I'm wrong here.)

In http://www.w3.org/TR/NOTE-OSD.html#B they describe platform/cpu standard
names, and we'll definetly need those for checking target architecture. Can
we standardize upon those, or there's something missing? There's an issue
that I really don't know: in the same platform, different compilers generate
incompatible binaries? Because if this happens (and will still happen on
Perl 6) the platform identification should be os/cpu/compiler. Perhaps each
platform would have a list of `interchangeable' compilers, i.e. those that
produce compatible code and modules compiled with one can be used with a
Perl compiled with another.

BTW, I'm working on the PDD. Anyone has a suggestion on something to add?

Other important issue I don't know yet: Is there an Archive::Zip module for
Perl? How cross-platform is it? Can we bundle it with Perl (licensing
issues)? Is it stable? Will it give us the support we need (access to
individual files in the archive, and the ability to `fake' them as real
files through tied filehandles)?

- Branden




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Nicholas Clark

On Fri, Feb 09, 2001 at 10:28:49AM -0200, Branden wrote:
> In http://www.w3.org/TR/NOTE-OSD.html#B they describe platform/cpu standard
> names, and we'll definetly need those for checking target architecture. Can
> we standardize upon those, or there's something missing? There's an issue

I take it "Lunix" is Linux.
BSDi isn't FreeBSD, NetBSD or OpenBSD
Nothing they list seems to be VMS
Pace are still developing variants of Acorn's RISC OS to run set top boxes
As I understood it there were about 39 variants of Unix ever, and they seem
to have 12 listed.

Processors are missing
pa risc (nice one, as HP-UX *is* in the OS list)
arm sh ia64 s390

and probably some more
[If I can run 'ls /usr/src/linux/arch/' why can't they? Hmm. Document
dates from 1997. Acorn was trading then]

The ommission of pa risc makes me wonder about quality of the rest of this
document.

> that I really don't know: in the same platform, different compilers generate
> incompatible binaries? Because if this happens (and will still happen on
> Perl 6) the platform identification should be os/cpu/compiler. Perhaps each

on perl 5 different configure options generate different binaries.

> Other important issue I don't know yet: Is there an Archive::Zip module for
> Perl? How cross-platform is it? Can we bundle it with Perl (licensing

http://www.cpan.org/modules/by-module/Archive/Archive-Zip-0.11.readme

> issues)? Is it stable? Will it give us the support we need (access to
> individual files in the archive, and the ability to `fake' them as real
> files through tied filehandles)?

I don't know the answers to the other questions. However, I was intending
to write a layer for perl 5.7 that can access zip entries via regular
perl filehandles. (In theory this could let you run scripts direct out of a
zip without actually unpacking them anywhere)

Layers of some form are needed in perl6 to cope with utf8 and other
encodings, so I would expect the perl6 layer functionality can also be
extended to zip manipulations.

Nicholas Clark



Re: assign to magic name-of-function variable instead of "return"

2001-02-09 Thread Branden

I wrote:
> David L. Nicol wrote:
> > sub DirectBubbleSort() {
> >   my ($i,$t) = (-1,0);
> >   while (++$i <= $#$__) {
> > $$__[$i] > $$__[1+$i] and $t++ and @$__[$i,1+$i] = @$__[1+$i,$i];
> >   };
> >   $t and @$__ = DirectBubbleSort;
> > }
> >
> > @SomeList = DirectBubbleSort; # instead of DirectBubbleSort(\@SomeList)
> >
>
> I don't see what the problem with DirectBubbleSort(\@SomeList) is. It's
> rather explicit, and is very clear about what is done. Using a (\@)
> prototype on DirectBubbleSort would be ok too, since its name says right
> what it does and everybody likes some vanilla syntax. But making it
> @SomeList = DirectBubbleSort is the most error-prone thing I see! Nobody
> will get it right on first use...


Changed my mind...

Actually, I really reconsidered it. It really is useful (for another thing
other than bubblesorting its return value)! If we prohibit reading the
variable with $__ and allow setting it only with return, this would be
actually useful to get around a current prototype inconsistence.

First, let me say what I believe is a inconsistence in prototypes in Perl 5.

Before prototypes were added, if I wanted to pass an array as an array to a
sub, I had to pass a reference to it:

sub my_sub {
my $array = shift;
$array->[0] = 1;
};
my_sub \@my_array;

With prototypes, I could give my_sub a \@ prototype, and could call it
without the reference:

sub my_sub (\@) {
my $array = shift;
$array->[0] = 1;
};
my_sub @my_array;

Now suppose I want to return an array by reference. There are actually two
good reasons to do that: one is return a tied array/hash, and the other is
to avoid useless copy of data. Suppose I have:

sub my_sub {
my @array;
for (5..7) {
push @array, $_;
}
return @array;
}
@my_array = my_sub;

That would copy the values 5, 6, 7 to Perl stack and then push them into
@my_array. Now suppose I have

sub my_sub {
my @array;
for (5..5004) {
push @array, $_;
}
return @array;
}
@my_array = my_sub;
$my_first_element = $my_array[0];

That would copy 5000 values to Perl stack, only to copy them back to
@my_array later. This could be solved with references, just the way it was
used to pass an array as a parameter:

sub my_sub {
my @array;
for (5..5004) {
push @array, $_;
}
return \@array;
}

Now I have two ways of calling it:

$my_array_ref = my_sub;
$my_first_element = $my_array_ref->[0];

This is bad since it breaks all @my_array uses, you would have to replace
$my_array[$x] with $my_array_ref->[$x] in all subsequent cases. The other
possibility is:

*my_array = my_sub;
$my_first_element = $my_array[0];

This doesn't break @my_array uses, but it breaks all assignment to my_sub,
having to use a glob to do the assignment. This is also bad because the glob
doesn't make it clear what is the thing that's being assigned to: a scalar,
an array, or a hash.

Contrasting this example with C++, we can clearly see that it's a prototype
issue. In C++, the passing of the array by parameter to the sub by reference
without and with prototypes could be seen as

void myFunction1(Array *byPointer);
void myFunction2(Array &byReference);

Array a();

myFunction1(&a);
myFunction2(a);

As C++ explicitly returns its return type, you can do:

Array *myFunction1();
Array &myFunction2();

Array *a;
Array b(NULL);

a = myFunction1();
b = myFunction2();   // this actually calls the copy-constructor, but
 // in Perl it shouldn't be necessary.


What I propose to be able to not need to say ``*my_array = my_sub'' is to
flag my_sub with a special attribute that would work like a return
prototype. It could even be added to my_sub's prototype, say separe with a
colon between the input and return prototypes:

sub my_sub :would_alias { ...
sub my_sub (:\@){ ...

And then this:

@my_array = my_sub;

would behave (be compiled) like this:

*my_array = my_sub;

my_sub could then be changed to return the values more efficiently with
transparency for who calls it. And it would be possible to define functions
that return tied objects that get aliased to the return of the function.

Additionally, the \@ prototype could be used for return in the same method:

sub my_sub :alias {
my @array;
for (5..5004) {
push @array, $_;
}
return @array;
}

This way, the sub body wouldn't have to change, and it would work both with
and without `:alias', only one way more efficiently than the other.

Comments?





Problems:
Suppose @a and @b are aliases, so that @a = (1, 2, 3) would make @b ==
3.
What does @a = my_sub does? Both @a and @b are aliased to the result of
my_sub, or only @a?

A little experience shows that this is confusing in Perl 5 

Re: Auto-install (was autoloaded...)

2001-02-09 Thread Branden

Nicholas Clark wrote:
> I take it "Lunix" is Linux.
> BSDi isn't FreeBSD, NetBSD or OpenBSD
> Nothing they list seems to be VMS
> Pace are still developing variants of Acorn's RISC OS to run set top boxes
> As I understood it there were about 39 variants of Unix ever, and they
seem
> to have 12 listed.
> [snip]
> The ommission of pa risc makes me wonder about quality of the rest of this
> document.

Clearly an article of Microsoft...

> > that I really don't know: in the same platform, different compilers
generate
 > > incompatible binaries? Because if this happens (and will still happen
on
> > Perl 6) the platform identification should be os/cpu/compiler. Perhaps
each
>
> on perl 5 different configure options generate different binaries.

Can this be standardized somehow? Perl 6 binary level compatibility, for
extensions? In a way that distributing binary modules would be at least
possible? It's already a big trouble to consider the various different
os/cpu combinations, configure options is not feasible at all! Compilers I
can take, I don't actually expect gcc/MS VC++ generating compatible dlls,
althought I think it could be possible...

> I don't know the answers to the other questions. However, I was intending
> to write a layer for perl 5.7 that can access zip entries via regular
> perl filehandles. (In theory this could let you run scripts direct out of
a
> zip without actually unpacking them anywhere)

This seems like an issue that will not only be used by `par', as I see it
brought up many times now and then. Probably should be encapsulated in a
generic module and be bundled in Perl 6's standard distribution.

> Layers of some form are needed in perl6 to cope with utf8 and other
> encodings, so I would expect the perl6 layer functionality can also be
> extended to zip manipulations.
>
> Nicholas Clark

AFAIK, Perl 6 will have a translation layer from bytes to the encoding, and
one of the encodings would be `binary', that would just pass the bytes
through. This is what should probably be used for zip. If you mean the
uncompress-on-the-fly thing I think it would be handled like source filters
are today. In Perl 5.7 you can even have a source filter that decompresses a
.gzipped script and runs it, by using a source filter. I believe zip could
be handled by this also in 5.7. Perl 6 will certainly improve this area very
much.

- Branden




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Michael G Schwern

On Fri, Feb 09, 2001 at 10:28:49AM -0200, Branden wrote:
> Other important issue I don't know yet: Is there an Archive::Zip module for
> Perl? How cross-platform is it? Can we bundle it with Perl (licensing
> issues)? Is it stable? Will it give us the support we need (access to
> individual files in the archive, and the ability to `fake' them as real
> files through tied filehandles)?

I'm a bit more worried about Compress::Zlib, which Archive::Zip uses.
There are versions of that for pretty much every OS, including MacOS,
but its historically been a bit weird.  I remember there being a Very
Deep XS Bug that we couldn't quite dig out a while ago.  Or maybe it
was in Archive::Tar... one of the two.

Anyhow, I'm not worried about them much.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Any sufficiently encapsulated hack is no longer a hack.



Re: JWZ on s/Java/Perl/

2001-02-09 Thread Ken Fox

Branden wrote:
> I actually don't understand how traversing a graph can be faster than
> incrementing/decrementing/testing for zero on a refcount.

There are two main reasons advanced garbage collectors are fast:

 1. Cheap allocations. Most fast collectors have a one or two
instruction malloc. In C it looks like this:

  void *malloc(size) { void *obj = heap; heap += size; return obj; }

It's easier to do alignments in a macro layer above the allocator
so the allocator doesn't have to constantly re-align to address
boundaries. There is basically no difference between the performance
of heap and stack allocations with a good collector.

 2. Work proportional to live data, not total data. This is hard to
believe for a C programmer, but good garbage collectors don't have
to "free" every allocation -- they just have to preserve the live,
or reachable, data. Some researchers have estimated that 90% or
more of all allocated data dies (becomes unreachable) before the
next collection. A ref count system has to work on every object,
but smarter collectors only work on 10% of the objects.

- Ken



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Branden

This is the alpha version of the PDD about archives. I actually didn't have
the time to format it as a POD, and probably won't have the time to do it
until Monday, I don't even think I'll have time to check the lists on the
weekend. Nevertheless, I'm sending it on mail-message format for your
appreciation. I think it has most the information that was discussed here,
plus some thoughts I had.

I expect to collect some comments about it before writing the beta version,
this one in POD format. I'm starting to think this discussion should move to
another list, like -build, -stdlib. Maybe -source-control is related too. I
realise they're not active, but I actually think this is not a -language
issue. Anyway, I'm posting it here, if anyone thinks we should move it to
another list, just name the list and we move there.

Hope you like the text. Help to correct spell and grammar would be very
appreciated.

- Branden

 (cut here) ---



PDD: `par' -- The Perl Achive


1. Introduction

`par' stands by `Perl Archive'. It's a way to provide deploy and
installation of Perl programs/scripts/modules. It helps taking care of
module dependencies, which is mostly painful in Perl 5.



2. Motivation

When a programmer writes a Perl script (or module) and wants to share it
with other users, he probably uploads the code to CPAN, as a .tar.gz file or
something like that. Users are supposed to get the script via http or ftp,
untar it, and run it.

So far it's fine, but the problem begins when the script depends on modules
that aren't included in Perl's standard library, and are supposed to be
installed by users. The first headache starts when the user tries to run the
script and gets a message saying Perl can't find module X. He takes the time
to go to CPAN, download X, install it and try to run the script again, only
to find that the script also needed module Y or that X actually needed Z.
This process continues until there are no more needed modules or the user
gets tired of it and gives up running the script, whatever comes first.

This is a downside for developpers too, because they want to get around
this, and try to give the users installing instructions, telling them they
should install modules X, Y and Z to run the script. And since most layman
users don't care about installing modules, probably they'll also give up
running the script. The outcome is that only scripts that depend only on
standard modules achieve wide acceptance among layman users.




3. Goals


The proposal of this PDD is to define a standard way for developpers to
package their Perl code so that all needed dependencies are included in one
only file. And also define a standard way for users to install or run the
scripts distributed on this archive form.

The specification of `par' should satisfy the following properties:

a) It should allow distribution of Perl code in platform independent form.

b) It should allow distribution of Perl code and compiled C extensions in
platform dependent form.

c) It should allow distribution of Perl code in both bytecode and source
form, even both mixed in the same archive.

d) It should allow the user to install the contents of the archive. It
should also allow the user to uninstall an archive or upgrade to a new
version of it. It should allow the user to choose how the modules contained
on the archive will be installed in relation to the system modules. It
should allow the user to override the directories used for installation.

e) It should allow the user to run the scripts and use the modules contained
in the archive without requiring installation.

f) It should allow the inclusion of documentation and resource files
(images, sounds, dbms, text databases, ...) for all scripts and modules
included in the archive, and documentation for the archive itself.

g) The archive file should be stored in a format that can be easily created
and inspected with widely available external tools.

h) It should allow the developper to merge existing archives together with
files into new archives, so that the developper can package his scripts with
existing binary archives of depended modules.

i) There should be tools that help the developper to find dependencies of
his scripts, and to package his scripts along with modules it depends on an
archive.

j) The `par' mechanism should be bundled by default in Perl 6.0.0. Having it
added later would possibly cause problems with users of Perl 6.0.0 that
don't have it installed and would want to use archives. Not having anything
else on 6.0.0 isn't really that critical, because by using `par' it'll be
easy to install whatever missing module.





4. Proposed Implementation


The zip file format is proposed for packaging the files. It's independent of
platform and has no licensing constraints (must check that...). The reason
to choose zip instead of tar/gzip is that with zip one can access one file
without having to decompress the whole archive. Th

Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Dan Sugalski

At 12:06 PM 2/9/2001 -0500, Ken Fox wrote:
>Branden wrote:
> > I actually don't understand how traversing a graph can be faster than
> > incrementing/decrementing/testing for zero on a refcount.
>
>There are two main reasons advanced garbage collectors are fast:
>
>  1. Cheap allocations. Most fast collectors have a one or two
> instruction malloc. In C it looks like this:
>
>   void *malloc(size) { void *obj = heap; heap += size; return obj; }
>
> It's easier to do alignments in a macro layer above the allocator
> so the allocator doesn't have to constantly re-align to address
> boundaries. There is basically no difference between the performance
> of heap and stack allocations with a good collector.

This is definitely very true. It cuts out the overhead of free as well, 
since you don't have to free any data (perl pays this with realloc a lot, 
since realloc's a malloc, copy, and free). Plus there's no need to mess 
with any sort of 'allocated memory' list, which malloc and free currently 
need to keep so they don't leak memory.

>  2. Work proportional to live data, not total data. This is hard to
> believe for a C programmer, but good garbage collectors don't have
> to "free" every allocation -- they just have to preserve the live,
> or reachable, data. Some researchers have estimated that 90% or
> more of all allocated data dies (becomes unreachable) before the
> next collection. A ref count system has to work on every object,
> but smarter collectors only work on 10% of the objects.

As is this. (Perl can generate a lot of garbage if you're messing around 
with strings and arrays a lot)

Also, one thing people forget is that manipulating reference counts can get 
expensive. It doesn't seem like much--an integer increment or decrement 
here or there. No big deal, right? Well, that cost tends to add up after a 
while. Its paid in lots of tiny little pieces rather than in a few big 
chunks, but the total time taken by it is larger.

It's also possible that by tossing refcounts we can shrink down the size of 
a perl variable structure (though I know it's not that way now) or at least 
move the GC field to the end, where it's less likely to be loaded. Most 
fast processors these days fetch data into cache in 8 or 16 byte chunks, so 
moving the GC field outside of the active chunk area means we won't be 
loading in dead data (okay, it's only resting!) every time we access a 
variable. There's no point in doing this with perl 5, since it's not dead 
data, but with a non-refcount GC scheme it'll be accessed much less.

Finally, all you really need to do is read the last day or so of p5p where 
Alan's trying to plug a batch of perl memory leaks to see how well the 
refcount scheme seems to be working now...

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Branden

Dan Sugalski wrote:
> At 12:06 PM 2/9/2001 -0500, Ken Fox wrote:
> >  2. Work proportional to live data, not total data. This is hard to
> > believe for a C programmer, but good garbage collectors don't have
> > to "free" every allocation -- they just have to preserve the live,
> > or reachable, data. Some researchers have estimated that 90% or
> > more of all allocated data dies (becomes unreachable) before the
> > next collection. A ref count system has to work on every object,
> > but smarter collectors only work on 10% of the objects.
>
> As is this. (Perl can generate a lot of garbage if you're messing around
> with strings and arrays a lot)
>

Let me see if I got that right. If I change the way some objects are used so
that I tend to create other objects instead of reusing the old ones, I'm
actually not degrading GC performance, since its work is proportional to
live data. Right? This increases memory usage, though, right? Would this
cause some thrashing if the excessive memory usage causes degrading to
virtual memory? (I guess not, since live data would probably be accessed,
and dead data would probably be discarded somehow before going to virtual
memory, right?).

What are actually the consequences of generating more or less garbage by
reusing/not reusing structures, under this advanced GC model?

> Finally, all you really need to do is read the last day or so of p5p where
> Alan's trying to plug a batch of perl memory leaks to see how well the
> refcount scheme seems to be working now...

Yeah, I know that... But I actually think this is because Perl 5's
implementation of refcounting is quite messy, specially when weakrefs are in
the game.

- Branden




Bulletin: Nicol reads rfc 271; suggests restricted C

2001-02-09 Thread David L. Nicol


We're having a blizzard in Kansas City.  After breakfasting, changing
the litter box, and hiking to work through the snow, I am finally
looking at Conway's RFC 271.  Guess which one of the earlier
activites it reminds me of.

I agree with Branden that the inheritance semantics is screwy.
I think DesignByContract contract enforcement is best done with
a parameter typing system of some kind (as in C++).  Proper
objects don't get into invalid states (unless you smash them with
hammers, of course), so with polymorphic subs, where the
types of the arguments are part of the name of the function, and
some restrictions on C

(Did anyone RFC an extension to C that allows a type to
define a sanity checking function?)

you can't call a subroutine with an invalid argument.



The C method of defining a thing that is to be done on exiting
the current scope would make things easier rather than more complex
in my opinion, as discusses previously.


If object validity checking were incorporated int C the customary
return of an object


bless $retobj
}

would do an automatic sanity check against the requirements of the current
package.  How about calling the standard function that bless refers to,
if it exists, before conferring its blessing, package::TEST ?

package compliant;
sub TEST{
# will be called at every blessing.
# compliant objects must have true cheese.
$_[0]->cheese() or die "FALSE CHEESE"
}


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  "I don't care how they do it in New York"




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Dan Sugalski

At 04:09 PM 2/9/2001 -0200, Branden wrote:
>Dan Sugalski wrote:
> > At 12:06 PM 2/9/2001 -0500, Ken Fox wrote:
> > >  2. Work proportional to live data, not total data. This is hard to
> > > believe for a C programmer, but good garbage collectors don't have
> > > to "free" every allocation -- they just have to preserve the live,
> > > or reachable, data. Some researchers have estimated that 90% or
> > > more of all allocated data dies (becomes unreachable) before the
> > > next collection. A ref count system has to work on every object,
> > > but smarter collectors only work on 10% of the objects.
> >
> > As is this. (Perl can generate a lot of garbage if you're messing around
> > with strings and arrays a lot)
> >
>
>Let me see if I got that right. If I change the way some objects are used so
>that I tend to create other objects instead of reusing the old ones, I'm
>actually not degrading GC performance, since its work is proportional to
>live data. Right?

Correct. Whether reuse is a win overall is a separate question. Generally 
not allocating memory is faster than allocating memory, so reuse may mean 
you don't make a trip to the memory allocator at all. (Or fewer trips, at 
least, as perl's likely to do so for you a few times anyway)

>This increases memory usage, though, right? Would this
>cause some thrashing if the excessive memory usage causes degrading to
>virtual memory? (I guess not, since live data would probably be accessed,
>and dead data would probably be discarded somehow before going to virtual
>memory, right?).

It depends on whether the old structures are really unused. If they are, 
one of the GC passes will reclaim the space they're taking. Whether this 
happens before you start swapping's a good question, and that depends on 
your application. Most perl programs won't swap regardless, so it's not a 
big deal most of the time. If your program is large enough to swap, then 
it's time to take rather drastic measures, given how expensive swapping is. 
(It's safe to assume that if perl 6's garbage collector causes otherwise 
small programs to swap then it's busted and needs fixing)

>What are actually the consequences of generating more or less garbage by
>reusing/not reusing structures, under this advanced GC model?

The less memory you chew through the faster your code will probably be (or 
at least you'll have less overhead). Reuse is generally faster and less 
resource-intensive than recycling. What's true for tin cans is true for memory.

> > Finally, all you really need to do is read the last day or so of p5p where
> > Alan's trying to plug a batch of perl memory leaks to see how well the
> > refcount scheme seems to be working now...
>
>Yeah, I know that... But I actually think this is because Perl 5's
>implementation of refcounting is quite messy, specially when weakrefs are in
>the game.

Almost all refcounting schemes are messy. That's one of its problems. A 
mark and sweep GC system tends to be less prone to leaks because of program 
bugs, and when it *does* leak, the leaks tend to be large. Plus the code to 
do the GC work is very localized, which tends not to be the case in 
refcounting schemes.

Going to a more advanced garbage collection scheme certainly isn't a 
universal panacea--mark and sweep in perl 6 will *not* bring about world 
peace or anything. It will (hopefully) make our lives easier, though.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Nicholas Clark

On Fri, Feb 09, 2001 at 01:19:36PM -0500, Dan Sugalski wrote:
> The less memory you chew through the faster your code will probably be (or 
> at least you'll have less overhead). Reuse is generally faster and less 
> resource-intensive than recycling. What's true for tin cans is true for memory.

reduce, reuse, recycle.
The first R might also be important :-)

Nicholas Clark



Re: Bulletin: Nicol reads rfc 271; suggests restricted C

2001-02-09 Thread Branden


David L. Nicol wrote:
> We're having a blizzard in Kansas City.  After breakfasting, changing
> the litter box, and hiking to work through the snow, I am finally
> looking at Conway's RFC 271.  Guess which one of the earlier
> activites it reminds me of.
>

I'm in really doubt... Breakfasting?  ;-)



> I agree with Branden that the inheritance semantics is screwy.
> I think DesignByContract contract enforcement is best done with
> a parameter typing system of some kind (as in C++).  Proper
> objects don't get into invalid states (unless you smash them with
> hammers, of course), so with polymorphic subs, where the
> types of the arguments are part of the name of the function, and
> some restrictions on C
>
> (Did anyone RFC an extension to C that allows a type to
> define a sanity checking function?)
>
> you can't call a subroutine with an invalid argument.
>

I'm actually working on the underlying structures of Perl variables (part of
the big vtable discussion...), what is leading to underlying representation
of Perl objects. I think this could be used to allow implementation of
Design-By-Contract as an extension instead of built-in in the language, and
require the programer to explicitly specify which classes would use DBC.
It's actually a -internal thing, but I'll probably have to bring the
discussion to -language too since it's much affected by it.





>
> The C method of defining a thing that is to be done on exiting
> the current scope would make things easier rather than more complex
> in my opinion, as discusses previously.
>

In my point of view, the mechanism proposed on RFC 271 is only good for DBC,
mainly because of its inheritance semantics, so I agree with you that RFC
271 is not the best to define the thing to be done on exiting the scope.
(Please note I'm not supporting POST either, I think I'd probably go with
`continue', but I really don't want to enter that fight, I'm trying to be a
pacifist :-)



>
> If object validity checking were incorporated int C the customary
> return of an object
> [snip]
>

I don't think I quite understood you. Are you thinking `new' is the only one
that would have pre/post to validate arguments? Actually every method call
would potentially have pre/post blocks, by RFC 271. Did I get you wrong?

Hope I helped,

- Branden




Re: JWZ on s/Java/Perl/

2001-02-09 Thread Branden

Ken Fox wrote:
>  2. Work proportional to live data, not total data. This is hard to
> believe for a C programmer, but good garbage collectors don't have
> to "free" every allocation -- they just have to preserve the live,
> or reachable, data. Some researchers have estimated that 90% or
> more of all allocated data dies (becomes unreachable) before the
> next collection. A ref count system has to work on every object,
> but smarter collectors only work on 10% of the objects.

Does this 90/10 ratio mean that the memory usage is actually 10 times it
needs to be? (if it were even _possible_ to pack all the data without
fragmentation problems)

- Branden




Re: JWZ on s/Java/Perl/

2001-02-09 Thread abigail

On Fri, Feb 09, 2001 at 12:06:12PM -0500, Ken Fox wrote:
> 
>  2. Work proportional to live data, not total data. This is hard to
> believe for a C programmer, but good garbage collectors don't have
> to "free" every allocation -- they just have to preserve the live,
> or reachable, data. Some researchers have estimated that 90% or
> more of all allocated data dies (becomes unreachable) before the
> next collection. A ref count system has to work on every object,
> but smarter collectors only work on 10% of the objects.

So, it's more a data preserver than a garbage collector ;-)


Abigail



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Branden


I had the time to do a research in Internet about rpm/jar. The correct URLs
are:
* http://www.rpm.org
* http://java.sun.com/products/jdk/1.1/docs/guide/jar/

I found great utilitaries in http://www.rpm.org/software.html, we could
probably steal some of them for `par'. I found out that most of them are
written in Perl or otherwise Perl-related.
* rpm2html / rpmfind: The equivalent of CPAN.pm?
* rpm-cvs: Building a RPM from a CVS source tree.
* AutoRpm: Update a system with rpms from any source.
* rpmlint: A lint for rpms packages.
* Perl-RPM: Attempts Perl bindings for RPM.
* build-package: Build rpms from source packages.


I put together a comparison table between par and rpm/jar. I would include
tgz here, but I didn't want to break the 70 column barrier and I think it
would probably be No for most features anyway... I intend to include the
table with the beta PDD.



   +-+-+-+
  Comparison table:| par | rpm | jar |
+--+-+-+-+
| Platform independent | Yes | Yes | Yes |
|  | | | |
| Available in a wide  | Yes | No  | Yes |
| range of platforms   | | (Win32 +/-, | |
|  | | MacOS, VMS) | |
|  | | | |
| Allow platform   | Yes | Yes | No  |
| dependent deployment | | | |
|  | | | |
| Supports binary, | Yes | Yes | No  |
| source and bytecode  | | |  (source?)  |
|  | | | |
| Install archive  | Yes | Yes | No  |
| automatically| | |  (manually) |
|  | | | |
| Uninstall and| Yes | Yes | No  |
| upgrade archive  | | | |
|  | | | |
| Install, uninstall   | No  | Yes | No  |
| and upgrade scripts  | (possibly)  | | |
|  | | | |
| Run from archive | Yes | No  | Yes |
|  | | | |
| Resources| Yes | Yes | Yes |
|  | | | |
| Documentation| Yes | Yes | No  |
|  | | | |
| Supports various | Yes | No  | Yes |
| modules per archive  | |(yes)| (packages)  |
|  | | | |
| Merge many archives  | Yes | No  | Yes |
| in one   | | | |
|  | | | |
| Usable with external | Yes | No  | Yes |
| tools (e.g. WinZip)  | | | |
|  | | | |
| Dependencies of  | Yes | Yes | No  |
| the archive  |  (included) | | |
|  | | | |
| Build archive from   | Yes | Yes | No  |
| source tree  | | (external)  | |
|  | | | |
| Could be bundled | Yes |  Probably   |  Maybe (if  |
| with Perl 6? | | No  |  we bundle  |
|  | |  (too big)  |  a JVM too) |
|  | | | |
| Signed archives  | No  | No  | Yes |
|  | | | |
|  | | | |
|  | | | |
|  | | | |
|  | | | |
|  | | | |
+--+-+-+-+


- Branden





Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 06:17:34PM -0200, Branden wrote:
> 
> I had the time to do a research in Internet about rpm/jar. The correct URLs
> are:
> * http://www.rpm.org
> * http://java.sun.com/products/jdk/1.1/docs/guide/jar/
> 
> I found great utilitaries in http://www.rpm.org/software.html, we could
> probably steal some of them for `par'. I found out that most of them are
> written in Perl or otherwise Perl-related.
> * rpm2html / rpmfind: The equivalent of CPAN.pm?
> * rpm-cvs: Building a RPM from a CVS source tree.
> * AutoRpm: Update a system with rpms from any source.
> * rpmlint: A lint for rpms packages.
> * Perl-RPM: Attempts Perl bindings for RPM.
> * build-package: Build rpms from source packages.

There's also a tool called 'alien':

http://freshmeat.net/projects/alien/

> Alien converts between the rpm, deb, Stampede slp, and Slackware tgz
> file formats. If you want to use a package from another distribution
> than the one you have installed on your system, you can use alien to
> convert it to your preferred package format and install it.

Whatever we do I would much prefer being package format agnostic
instead of tying ourselves too tightly with some single format.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: restricted C

2001-02-09 Thread David L. Nicol

Branden wrote:

> >
> > If object validity checking were incorporated int C the customary
> > return of an object
> > [snip]
> >
> 
> I don't think I quite understood you. Are you thinking `new' is the only one
> that would have pre/post to validate arguments? Actually every method call
> would potentially have pre/post blocks, by RFC 271. Did I get you wrong?


bless is the (only) way to associate a reference with a package.

Currently, if I set up a package that expects all the objects defined by it
to have some representation or another (representation R) there is nothing
stopping me from accidentally (or someone else who is being rude) blessing
some noncompliant reference into the package, resulting in the necessity
of all the consistency checking that the pre-handlers would be for (or did
I get that wrong. ) 

Robust object methods have to do a lot of consistency checking because there
is no guarantee that the reference labeled "duck" isn't really a chicken with
softer feathers glued to it.

What I'm arguing is that a stronger typing, that enforced a correlation between
class-label and internal-data-consistent-with-expectations, would move the
pre and post handlers from a procedural specification into a more functional
one.

We would, like in C++ or Java, define a new kind of object that embodies the
whole of what the contract is between the two components exchanging the message.

If the receiving function cannot be invoked unless it is invoked with an
argument of type ExpectedType and the only way to get an argument into that
type is to have it pass the required test, whatever that is, as defined within
the ExpectedType specification, every contract gets its own object type and
any pre or post processing becomes implied by conversion wrappers.



Am I making sense?  I worry I am rushing.


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  "I don't care how they do it in New York"




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Branden

Jarkko Hietaniemi wrote:
> Whatever we do I would much prefer being package format agnostic
> instead of tying ourselves too tightly with some single format.
>

Any ideas on how to do that? Without breaking requirements?

I actually agree that being able to use tgz is good, and better compression
like bzip2 would also be desired, but I think we couldn't use it if it would
lack features we want. For example, with tgz it would be complex to deal
with running without extracting, and bzip2 isn't handled with common, widely
available tools.

I think we'd better have one working and stablished (and stable) approach to
do it than have too much flexibility and end up with some distribution
problems (like `oh! I don't have bzip2 and the developper only supplied a
bzip2 version of the archive', or `oh! I'll have to do zip, tgz, bzip2,
whatever3 versions of the same thing only to satisfy users that use
different formats').

I think we should go for `standard' rather than `flexible' on this one.

Of course we still can change the file format, if someone isn't happy about
zip. That's the advantage of discussing it before making it: it's not here
for we to regret about it...

- Branden




more POST recitation

2001-02-09 Thread David L. Nicol

Johan Vromans wrote:

> Would the POST be executed if the open fails? Why? Why not?
> 
>sub readit {
> POST {
> close F;
> }
> open F, "< $f" or die;
> scalar()
>}

Yes, because the flow of control passed it before the exception.
POST, as I see it, means, push a reference to the code in its block
onto a list of things to do when exiting the current scope, with no
regard to GC.


 
> But I think this is getting ridiculous. $slightly_joking++; I'd
> propose a much nicer and cleaner concept:
> 
> sub readit {
> open F ... ;
> prog1 {
> scalar();
> close F;
> }
> }
> 
> 'prog1' executes all expressions in the block, and returns the result
> of the first expression. Of course, we also need prog2, prog3, and so
> on ... $slightly_joking--;
> 
> All that POST and such do, is obfuscate the flow of control. I doubt
> that outweighs the small benefits.
> 
> -- Johan

do BEGIN and END obfuscate the flow of control?



   If constructors can have arbitrary names, then why not
   destructors?  Because while a constructor is explicitly
   called, a destructor is not.  Destruction happens
   automatically via Perl's garbage collection (GC) system,
   which is a quick but somewhat lazy reference-based GC
   system.  To know what to call, Perl insists that the
   destructor be named DESTROY.  Perl's notion of the right
   time to call a destructor is not well-defined currently,
   which is why your destructors should not rely on when they
   are called.

   Why is DESTROY in all caps?  Perl on occasion uses purely
   uppercase function names as a convention to indicate that
   the function will be automatically called by Perl in some
   way.  Others that are called implicitly include BEGIN,
   END, AUTOLOAD ...




The twisty-turny things you have to do in order to have cleanup
code right now prevent a subroutine that needs cleanup code from
having multiple exit points, unless the cleanup code is listed
at each one.


With POST, you explicitly declare cleanup code as soon as you see
you will need it, and you don't have to worry about your lock
getting forgotten about or your file remaining open.



# without POST
sub find_first_line_matching_array($\@){
open F, shift or die "could not open: $!";
while(){
foreach $w (@{$_[0]}){
if (/$w/){
close F;
return $_;
}   }   }   }

# with POST
sub find_first_line_matching_array($\@){
open F, shift or die "could not open: $!";
POST{close F};
while(){
foreach $w (@{$_[0]}){
return $_ if /$w/;
}   }   }   

The above is a very contrived way to have multiple exit points;
but tracking $retval so you can hit your cleanup code is a drag.

Many times I've chosen to use something like C
to hop down to the end of the sub, rather than repeat my
cleanup code before the return, several times, in inconvenient blocks.


POST would not be intended for flow-of-control; it is intended
to raise the code's s/n ration by abstracting the housekeeping.




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 06:46:26PM -0200, Branden wrote:
> Jarkko Hietaniemi wrote:
> > Whatever we do I would much prefer being package format agnostic
> > instead of tying ourselves too tightly with some single format.
> >
> 
> Any ideas on how to do that? Without breaking requirements?

There isn't a software problem another abstraction layer can't fix...
Design and implement a package format independent API that has calls
like %files = $package->getfiles(), $package->getfile($file),
$package->gettotalsize(), $package->dependencies(),
$package->install(), $package->uninstall(), and at lower levels
(non-casual-user) calls like guessformat($file).  Yes, this would be
harder than sticking with one format, but so much more flexible.

An API and its implementation would be very beneficial also in
installing other software than just Perl.

> problems (like `oh! I don't have bzip2 and the developper only supplied a
> bzip2 version of the archive', or `oh! I'll have to do zip, tgz, bzip2,
> whatever3 versions of the same thing only to satisfy users that use
> different formats').

And how is this different from 'oh! I don't have unzip'?

> I think we should go for `standard' rather than `flexible' on this one.

As I said on this I disagree.

I have no problem with in the first implementation round doing just zip:
what I abhor is the idea of hardwiring the zip assumption deeply into
our design.

> Of course we still can change the file format, if someone isn't happy about
> zip. That's the advantage of discussing it before making it: it's not here
> for we to regret about it...
> 
> - Branden

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: JWZ on s/Java/Perl/

2001-02-09 Thread David L. Nicol

[EMAIL PROTECTED] wrote:

> So, it's more a data preserver than a garbage collector ;-)
> 
> Abigail


I find it odd that perl mallocs each string individually, for instance;
I had thought that it would only malloc massive pieces and do its own
allocation and freeing of it.  Its laziness, of course.


-- 
  David Nicol 816.235.1187 [EMAIL PROTECTED]
  "I don't care how they do it in New York"




Re: Auto-install (was autoloaded...)

2001-02-09 Thread John Porter

Jarkko Hietaniemi wrote:
> 
> There isn't a software problem another abstraction layer can't fix...

"...except the problem of too many layers of abstraction". tchrist

(for those of you who didn't get the reference)


-- 
John Porter




Re: JWZ on s/Java/Perl/

2001-02-09 Thread Dan Sugalski

At 05:29 PM 2/9/2001 -0200, Branden wrote:
>Ken Fox wrote:
> >  2. Work proportional to live data, not total data. This is hard to
> > believe for a C programmer, but good garbage collectors don't have
> > to "free" every allocation -- they just have to preserve the live,
> > or reachable, data. Some researchers have estimated that 90% or
> > more of all allocated data dies (becomes unreachable) before the
> > next collection. A ref count system has to work on every object,
> > but smarter collectors only work on 10% of the objects.
>
>Does this 90/10 ratio mean that the memory usage is actually 10 times it
>needs to be? (if it were even _possible_ to pack all the data without
>fragmentation problems)

No. It means that 90% of the memory allocated between runs of the GC gets 
freed. The memory isn't wasted, by any means.

Most memory is allocated for ephemeral things--objects that come and go, 
temporary buffers, scratch space, and suchlike things. It's normal.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Auto-install (was autoloaded...)

2001-02-09 Thread John Porter

Branden wrote:
> 
> For example, with tgz it would be complex to deal
> with running without extracting, 

What?  tar -z not good enough for you?


> ... than have too much flexibility and end up with some distribution
> problems (like `oh! I don't have bzip2 and the developper only supplied a
> bzip2 version of the archive', or `oh! I'll have to do zip, tgz, bzip2,
> whatever3 versions of the same thing only to satisfy users that use
> different formats').

You way overstate the difficulty of this problem.
And you also way overstate the alleged downside of having a flexible
solution.


-- 
John Porter




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Dan Sugalski

At 06:30 PM 2/9/2001 +, Nicholas Clark wrote:
>On Fri, Feb 09, 2001 at 01:19:36PM -0500, Dan Sugalski wrote:
> > The less memory you chew through the faster your code will probably be (or
> > at least you'll have less overhead). Reuse is generally faster and less
> > resource-intensive than recycling. What's true for tin cans is true for 
> memory.
>
>reduce, reuse, recycle.
>The first R might also be important :-)

Oh, no doubt. Everything's got tradeoffs, the question is always "what's 
most important". In perl's case, it's speed, and memory usage is of 
secondary importance unless it impacts the speed of the program.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 04:05:54PM -0500, John Porter wrote:
> Jarkko Hietaniemi wrote:
> > 
> > There isn't a software problem another abstraction layer can't fix...
> 
> "...except the problem of too many layers of abstraction". tchrist
> 
> (for those of you who didn't get the reference)

Well, I certainly heard the reference before even hearing of Perl or Tom...

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: Auto-install (was autoloaded...)

2001-02-09 Thread John Porter

Jarkko Hietaniemi wrote:
> > 
> > (for those of you who didn't get the reference)
> 
> Well, I certainly heard the reference before even hearing of Perl or Tom...

I only ever saw it with his name on it.
So if he didn't coin it, then I think he "appropriated" it...

-- 
John Porter




Re: JWZ on s/Java/Perl/

2001-02-09 Thread Ken Fox

Branden wrote:
> Ken Fox wrote:
> > Some researchers have estimated that 90% or
> > more of all allocated data dies (becomes unreachable) before the
> > next collection. A ref count system has to work on every object,
> > but smarter collectors only work on 10% of the objects.
> 
> Does this 90/10 ratio mean that the memory usage is actually 10 times it
> needs to be? (if it were even _possible_ to pack all the data without
> fragmentation problems)

The general rule is the more space you "waste" the faster the collector
is. If you have memory to spare, then don't run the garbage collector as
often and your program will spend less total time garbage collecting.
In other words, the collection cost per object approaches zero.

If you "need" to go faster, then waste more memory.

If you "need" to use less memory, then go slower and collect more
frequently.

When comparing the memory management efficiency of different approaches,
it's very important to remember all the costs that the approaches have.
C-style malloc has quite a bit of overhead per object and tends to
fragment the heap. Many garbage collectors don't have either of these
problems.

Garbage collectors are very good from an efficiency perspective, but
tend to be unreliable in a mixed language environment and sometimes
impose really nasty usage requirements.

- Ken



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Nicholas Clark

On Fri, Feb 09, 2001 at 04:07:51PM -0500, John Porter wrote:
> Branden wrote:
> > 
> > For example, with tgz it would be complex to deal
> > with running without extracting, 
> 
> What?  tar -z not good enough for you?

I believe that that wognt work ugnless you have gnu tar.

"Get New Utilities" as I've heard it called by the advocates.
(yes, I get irritated when I forget that BSD cp doesn't have a -a flag)
However, in our portability quest "Get New Utilities" is not good
[unless we package TomC's Perl Power Tools and just use those]

Nicholas Clark



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 04:09:28PM -0500, John Porter wrote:
> Jarkko Hietaniemi wrote:
> > > 
> > > (for those of you who didn't get the reference)
> > 
> > Well, I certainly heard the reference before even hearing of Perl or Tom...
> 
> I only ever saw it with his name on it.

I believe the first part of it (adding layers) is an age-old
programming adage, the second part (not overdoing it) may well
be Tom's addition.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Dan Sugalski

At 11:32 AM 2/9/2001 -0200, Branden wrote:
>Nicholas Clark wrote:
> > > that I really don't know: in the same platform, different compilers
>generate
>  > > incompatible binaries? Because if this happens (and will still happen
>on
> > > Perl 6) the platform identification should be os/cpu/compiler. Perhaps
>each
> >
> > on perl 5 different configure options generate different binaries.
>
>Can this be standardized somehow? Perl 6 binary level compatibility, for
>extensions? In a way that distributing binary modules would be at least
>possible? It's already a big trouble to consider the various different
>os/cpu combinations, configure options is not feasible at all!

Perl, generally speaking, should have a mechanism to indicate compatibility 
based on various compilation issues. Given that this sort of thing impacts 
my platform of choice (VMS) more than others, it will.

>Compilers I
>can take, I don't actually expect gcc/MS VC++ generating compatible dlls,
>althought I think it could be possible...

I'm always amazed that people put up with this sort of nonsense. All 
compilers on a platform should generate compatible code. I don't really 
care whether it's COBOL, PL/I, ADA, C, Fortran, or BASIC.


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Dan Sugalski

At 09:42 AM 2/9/2001 +, Michael G Schwern wrote:
>On Thu, Feb 08, 2001 at 01:40:52PM -0500, Dan Sugalski wrote:
> > >Seperated documentation is no documentation.
> >
> > At some point things are going to get split out, unless you wedge the docs
> > into the actual program itself. (You were, after all, talking about config
> > files and XS modules, and those can't usefully stay inside the archive)
>
>True, but those things are essential to making the program work, and
>their loss will be dearly (and immediately) noted.  Documentation is
>the first thing to get ditched in any install process.  I was mostly
>thinking perldoc (yes, wedging the docs into the program) *plus* the
>normal man page generation and install.  That's how I normally do
>things.

So, then, do you want the bytecode format to have some sort of text segment 
in it so you can compile data files (and docs are data as much as anything 
else) into the output bytecode file? We can do that, I expect.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: JWZ on s/Java/Perl/

2001-02-09 Thread Robin Berjon

At 16:16 09/02/2001 -0500, Ken Fox wrote:
>The general rule is the more space you "waste" the faster the collector
>is. If you have memory to spare, then don't run the garbage collector as
>often and your program will spend less total time garbage collecting.
>In other words, the collection cost per object approaches zero.
>
>If you "need" to go faster, then waste more memory.
>
>If you "need" to use less memory, then go slower and collect more
>frequently.

Which (to me) seems to just beg for the question: Is this something that
the oft discussed use {less,more} {memory,speed} pragma could hook into ?

-- robin b.
Heisenberg might have been here.




Re: Auto-install (was autoloaded...)

2001-02-09 Thread Nicholas Clark

On Fri, Feb 09, 2001 at 02:53:43PM -0600, Jarkko Hietaniemi wrote:
> On Fri, Feb 09, 2001 at 06:46:26PM -0200, Branden wrote:
> > problems (like `oh! I don't have bzip2 and the developper only supplied a
> > bzip2 version of the archive', or `oh! I'll have to do zip, tgz, bzip2,
> > whatever3 versions of the same thing only to satisfy users that use
> > different formats').
> 
> And how is this different from 'oh! I don't have unzip'?

Code to do unzip (yes, even including the whole of zlib just like gcc,
xfree86 and several other things I can't remember offhand that irritate
me as I have libz.so) is small enough to go in the perl core if needed.
It's smaller than Encode by a long way :-)
[or is that :-(]


> > I think we should go for `standard' rather than `flexible' on this one.
> 
> As I said on this I disagree.
> 
> I have no problem with in the first implementation round doing just zip:
> what I abhor is the idea of hardwiring the zip assumption deeply into
> our design.

I would agree - hardwiring is to be avoided.

Nicholas Clark



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 09:18:55PM +, Nicholas Clark wrote:
> On Fri, Feb 09, 2001 at 04:07:51PM -0500, John Porter wrote:
> > Branden wrote:
> > > 
> > > For example, with tgz it would be complex to deal
> > > with running without extracting, 
> > 
> > What?  tar -z not good enough for you?
> 
> I believe that that wognt work ugnless you have gnu tar.

So very true.  Good people, the world is not Linux is it was not VAX.
(one may have "gzip" in which case one can "gzip -dc foo.tgz|tar ...")

We must be flexible/robust/friendly.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Jarkko Hietaniemi

On Fri, Feb 09, 2001 at 09:22:13PM +, Nicholas Clark wrote:
> On Fri, Feb 09, 2001 at 02:53:43PM -0600, Jarkko Hietaniemi wrote:
> > On Fri, Feb 09, 2001 at 06:46:26PM -0200, Branden wrote:
> > > problems (like `oh! I don't have bzip2 and the developper only supplied a
> > > bzip2 version of the archive', or `oh! I'll have to do zip, tgz, bzip2,
> > > whatever3 versions of the same thing only to satisfy users that use
> > > different formats').
> > 
> > And how is this different from 'oh! I don't have unzip'?
> 
> Code to do unzip (yes, even including the whole of zlib just like gcc,
> xfree86 and several other things I can't remember offhand that irritate
> me as I have libz.so) is small enough to go in the perl core if needed.

I assume the next logical thing to add would be MD5?

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Nicholas Clark

On Fri, Feb 09, 2001 at 03:25:42PM -0600, Jarkko Hietaniemi wrote:
> I assume the next logical thing to add would be MD5?

Yes, well before zip (IMHO). It would make CPAN.pm a happy bunny, and should
make the world (feel) more secure.

This is now really perl5 isn't it?

Nicholas Clark



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Ken Fox

Dan Sugalski wrote:
> At 04:09 PM 2/9/2001 -0200, Branden wrote:
> > If I change the way some objects are used so
> > that I tend to create other objects instead of reusing the old ones, I'm
> > actually not degrading GC performance, since its work is proportional to
> > live data. Right?
> 
> Correct. Whether reuse is a win overall is a separate question.

It's totally dependent upon hardware. From a software big-O type of
analysis, creating new objects is never slower than reusing objects.

The problems come about if (a) memory is low and the OS decides to
page without telling the application to prepare for paging or (b) if all
memory isn't the same speed, e.g. caches are faster than main memory.

> > This increases memory usage, though, right? Would this
> > cause some thrashing if the excessive memory usage causes degrading to
> > virtual memory? ...
> 
> It depends on whether the old structures are really unused. If they are,
> one of the GC passes will reclaim the space they're taking.

It also depends on locality of reference. Semi-space-based collectors
are not bad at preserving locality -- mark-sweep and malloc-like allocators
are terrible.

The weird thing is that a collector can actually *improve* locality by
moving objects "close" to the things they refer to. In perl's case, the
collector could move the underlying value representation close to the PMC
that refers to it. (But we may want to pin a PMC so that foreign code
can keep references to it. Argh.)

> (It's safe to assume that if perl 6's garbage collector causes otherwise
> small programs to swap then it's busted and needs fixing)

If you mean small as in "tight loop" then I agree. If you mean small as
in a "quick one liner" then I'm not sure. The quick one liners run quickly
and speeding memory management up/down by 100% might not even be noticeable.

> The less memory you chew through the faster your code will probably be (or
> at least you'll have less overhead). Reuse is generally faster and less
> resource-intensive than recycling. What's true for tin cans is true for memory.

The electrons are re-used whether you allocate a new object or not... ;)

> Going to a more advanced garbage collection scheme certainly isn't a
> universal panacea--mark and sweep in perl 6 will *not* bring about world
> peace or anything. It will (hopefully) make our lives easier, though.

Mark-sweep doesn't have a cheap allocator or good locality. At this point
in history, I think if we don't go with a more advanced system we're not
learning.

- Ken



Re: JWZ on s/Java/Perl/

2001-02-09 Thread Dan Sugalski

At 10:21 PM 2/9/2001 +0100, Robin Berjon wrote:
>At 16:16 09/02/2001 -0500, Ken Fox wrote:
> >The general rule is the more space you "waste" the faster the collector
> >is. If you have memory to spare, then don't run the garbage collector as
> >often and your program will spend less total time garbage collecting.
> >In other words, the collection cost per object approaches zero.
> >
> >If you "need" to go faster, then waste more memory.
> >
> >If you "need" to use less memory, then go slower and collect more
> >frequently.
>
>Which (to me) seems to just beg for the question: Is this something that
>the oft discussed use {less,more} {memory,speed} pragma could hook into ?

Sure. Using it to alter the frequency of garbage collection's not an 
inappropriate thing to do.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Auto-install (was autoloaded...)

2001-02-09 Thread abigail

On Fri, Feb 09, 2001 at 09:22:13PM +, Nicholas Clark wrote:
> 
> Code to do unzip (yes, even including the whole of zlib just like gcc,
> xfree86 and several other things I can't remember offhand that irritate
> me as I have libz.so) is small enough to go in the perl core if needed.


Even after reading this twice, I still get the impression you think
both gcc and xfree86 would fit in the perl core snuggly.

Oh well, as long as you don't fit emacs in it; there's got to be some
limit.


Abigail



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-09 Thread Dan Sugalski

At 04:53 PM 2/9/2001 -0500, Ken Fox wrote:
>Dan Sugalski wrote:
> > At 04:09 PM 2/9/2001 -0200, Branden wrote:
> > > If I change the way some objects are used so
> > > that I tend to create other objects instead of reusing the old ones, I'm
> > > actually not degrading GC performance, since its work is proportional to
> > > live data. Right?
> >
> > Correct. Whether reuse is a win overall is a separate question.
>
>It's totally dependent upon hardware. From a software big-O type of
>analysis, creating new objects is never slower than reusing objects.

Yeah, but while O(x*n) and O(y*n) are equivalent (assuming x an y are 
constants) theoretically, it can make a biggish difference in real life if 
x and y are, say, 1 and 4. Some objects do have non-trivial setup costs. 
(Such as those that open persistent, reusable connections to remote systems...)

Granted in this case we're talking about pure memory allocation issues, as 
that's all that GCs cover, so you probably won't see more than a few 
hundred cycles difference for that.

>The problems come about if (a) memory is low and the OS decides to
>page without telling the application to prepare for paging or (b) if all
>memory isn't the same speed, e.g. caches are faster than main memory.
>
> > > This increases memory usage, though, right? Would this
> > > cause some thrashing if the excessive memory usage causes degrading to
> > > virtual memory? ...
> >
> > It depends on whether the old structures are really unused. If they are,
> > one of the GC passes will reclaim the space they're taking.
>
>It also depends on locality of reference. Semi-space-based collectors
>are not bad at preserving locality -- mark-sweep and malloc-like allocators
>are terrible.

I'm not sure what scheme we'll be using, but some sort of generational 
semi-space scheme seems likely, if for no other reason than to reduce the 
pauses we'll get when GC fires off.

>The weird thing is that a collector can actually *improve* locality by
>moving objects "close" to the things they refer to. In perl's case, the
>collector could move the underlying value representation close to the PMC
>that refers to it. (But we may want to pin a PMC so that foreign code
>can keep references to it. Argh.)

PMC structures will be allocated in a different area than the data they 
contain, and they'll also be of fixed location. Not great, but short of 
imposing a double-deref penalty (with the corresponding chance of 
programmer error) there's nothing for it. Extension code definitely is a 
pain in this case.

We can still try and make things more local so that, for example, all the 
string data for the scalars in a string array get compacted together.

> > (It's safe to assume that if perl 6's garbage collector causes otherwise
> > small programs to swap then it's busted and needs fixing)
>
>If you mean small as in "tight loop" then I agree. If you mean small as
>in a "quick one liner" then I'm not sure. The quick one liners run quickly
>and speeding memory management up/down by 100% might not even be noticeable.

I was thinking of things that may process a lot of data but in small 
pieces, like the command-line greps and suchlike things. They can take a 
while on 100M files, but that shouldn't be because they've eaten 200M of 
RAM in the process...

> > Going to a more advanced garbage collection scheme certainly isn't a
> > universal panacea--mark and sweep in perl 6 will *not* bring about world
> > peace or anything. It will (hopefully) make our lives easier, though.
>
>Mark-sweep doesn't have a cheap allocator or good locality. At this point
>in history, I think if we don't go with a more advanced system we're not
>learning.

Yeah, I was sloppy in my terminology. Some of the generational garbage 
collectors look rather nice, and whatever we choose, the original mark and 
sweep won't be it, for pause times if nothing else. (If it weren't for the 
pauses, pretty much anything is better than refcounts...)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: more POST recitation

2001-02-09 Thread Ken Fox

"David L. Nicol" wrote:
> # with POST
> sub find_first_line_matching_array($\@){
> open F, shift or die "could not open: $!";
> POST{close F};
> while(){
> foreach $w (@{$_[0]}){
> return $_ if /$w/;
> }   }   }

I'd rather not use POST for resource cleanup at all. Why not
just:

sub find_first_line_matching_array($\@){
   my $f = open shift or die "could not open: $!";
   while(<$f>){
  foreach $w (@{$_[0]}){
 return $_ if /$w/;
   }
}

We already have object destructors invoked when they go out
of scope. Why not push that technique until we reach a situation
where it doesn't work? Do you have something in mind?

- Ken



Re: Auto-install (was autoloaded...)

2001-02-09 Thread Bart Lateur

On Fri, 9 Feb 2001 21:18:55 +, Nicholas Clark wrote:

>> What?  tar -z not good enough for you?
>
>I believe that that wognt work ugnless you have gnu tar.

As a Windows user, I should add:

 * Archive::Tar is part of the Perl 5.6.0 distributions for Win32
(Activestate and IndigoPerl)

 * On a currently normal Pentium of 500MHz, 64Mb, ungzipping and
untarring a .tgz archive of 250k (the ungzipped file itself is roughly
1.5Mb) takes roughly 1 second. (ONE second!) That is the 0.072 version
of Archive::Tar, which does in memory unarchiving only.

Needless to say, compressing is a lot slower.

 * And I'd like the normal module distributions on CPAN to still work.
Those are all .tar.gz.

-- 
Bart.



Re: JWZ on s/Java/Perl/

2001-02-09 Thread Bart Lateur

On Fri, 09 Feb 2001 12:06:12 -0500, Ken Fox wrote:

>There are two main reasons advanced garbage collectors are fast:
>
> 1. Cheap allocations. Most fast collectors have a one or two
>instruction malloc. In C it looks like this:
>
>  void *malloc(size) { void *obj = heap; heap += size; return obj; }
>
>It's easier to do alignments in a macro layer above the allocator
>so the allocator doesn't have to constantly re-align to address
>boundaries. There is basically no difference between the performance
>of heap and stack allocations with a good collector.

That is not a garbage collector. That is "drop everything you don't
need, and we'll never use it again." Oh, sure, not doing garbage
collection at all is faster then doing reference counting.

> 2. Work proportional to live data, not total data. This is hard to
>believe for a C programmer, but good garbage collectors don't have
>to "free" every allocation -- they just have to preserve the live,
>or reachable, data. Some researchers have estimated that 90% or
>more of all allocated data dies (becomes unreachable) before the
>next collection. A ref count system has to work on every object,
>but smarter collectors only work on 10% of the objects.

That may work for C, but not for Perl.

sub test {
my($foo, $bar, %baz);
...
return \%baz;
}

You may notice that only PART of the locally malloced memory, gets
freed. the memory of %baz may well be in the middle of that pool. You're
making a huge mistake if you simply declare the whole block dead weight.

-- 
Bart.



Re: JWZ on s/Java/Perl/

2001-02-09 Thread Mark Koopman

> On Fri, 09 Feb 2001 12:06:12 -0500, Ken Fox wrote:
> 
> 
> That may work for C, but not for Perl.
> 
>   sub test {
>   my($foo, $bar, %baz);
>   ...
>   return \%baz;
>   }
> 
> You may notice that only PART of the locally malloced memory, gets
> freed. the memory of %baz may well be in the middle of that pool. You're
> making a huge mistake if you simply declare the whole block dead weight.
> 
> -- 
>   Bart.

but is this an example of the way people SHOULD code, or simply are ABLE to 
code this.   are we considering to deprecate this type of bad style, and force
to a programmer to, in this case, supply a ref to %baz in the arguements to
this sub?

Mark Koopman
Software Engineer

WebSideStory, Inc

10182 Telesis Court
San Diego CA  92121
858.546.1182.##.318
858.546.0480.fax

perl -e '
eval(lc(join("",
map ({chr}(q(
49877273766940
80827378843973
32767986693280
69827639463932
39883673434341
))=~/../g;'



Re: JWZ on s/Java/Perl/

2001-02-09 Thread Russ Allbery

Mark Koopman <[EMAIL PROTECTED]> writes:
>> On Fri, 09 Feb 2001 12:06:12 -0500, Ken Fox wrote:

>> That may work for C, but not for Perl.
>> 
>>  sub test {
>>  my($foo, $bar, %baz);
>>  ...
>>  return \%baz;
>>  }

> but is this an example of the way people SHOULD code, or simply are ABLE
> to code this.  are we considering to deprecate this type of bad style,
> and force to a programmer to, in this case, supply a ref to %baz in the
> arguements to this sub?

That's a pretty fundamental aspect of the Perl language; I use that sort
of construct all over the place.  We don't want to turn Perl into C, where
if you want to return anything non-trivial without allocation you have to
pass in somewhere to put it.

-- 
Russ Allbery ([EMAIL PROTECTED])