Re: formencode as .egg in Debian ??

2005-11-22 Thread Sanghyeon Seo
Martin wrote:
> If there is no way to install the package directly into site-packages
> using the provided setup.py, I think setup.py should be
> modified/ignored.

Bob wrote:
> Won't this mean a total re-write of cdbs since it specifically looks for
> setup.py?

Matthias Klose wrote:
> yes, if cdbs doesn't allow that. you don't have to use cdbs. it's not
> a goal to adopt our packaging policies to the way cdbs _currently_
> works.

No, not at all.

DEB_PYTHON_SETUP_CMD := debian/setup.py
Write your own setup.py under debian/.
Done.

Seo Sanghyeon



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Ian Bicking

Phillip J. Eby wrote:


Note also that in many cases, the package will be a single .egg file,
(analagous to a Java .jar file) rather than a directory, and files are
preferable to directories in most cases as they make Python import
processing faster.




Yes, it's true, zipfile import processing is faster than normal import 
processing; it is in fact one of the reasons zipfile imports were added to 
Python, because the zip directories are cached.  A zipfile import lookup is 
a single dictionary lookup, whereas a directory import lookup requires 
multiple stat() calls.  For all practical purposes, zipfiles added to 
sys.path are free after the initial directory read operation.


Maybe an easier way to understand this (at least my impression) is that 
zip files are treated as read-only.  Any directory on sys.path gets 
scanned everytime a new module is imported.  And you never know if 
someone added something, so you do it all over again each time.  A zip 
file is scanned only once.



--
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread M.-A. Lemburg
Phillip J. Eby wrote:
> At 01:04 AM 11/22/2005 -0600, Bob Tanner wrote:
> 
>>On Tuesday 22 November 2005 12:15 am, Martin v. Löwis wrote:
>>
>>>I don't think Debian should use the egg structure. It apparently relies
>>>on building a long sys.path (even though through only a single .pth
>>>file);
>>
>>I'm not sure of how .eggs are implemented, but I'm going to cross-post this
>>info to the python-distutils mailing list. See below for additional comments.
>>
>>
>>>this adds additional costs to all import statements on startup.
>>>It gets worse if these are zipfiles, because then each import statement
>>>will have to look into each zipfile (until the import is resolved).
>>
>>The is the opposite of what I was told  by upstream development over on
>>distutils, snippet from Phillip J. Eby <[EMAIL PROTECTED]>:
>>
>>
>>Note also that in many cases, the package will be a single .egg file,
>>(analagous to a Java .jar file) rather than a directory, and files are
>>preferable to directories in most cases as they make Python import
>>processing faster.
>>
> 
> 
> Yes, it's true, zipfile import processing is faster than normal import 
> processing; 

Only after *all* ZIP files on sys.path have been scanned
for their contents. The more you add to sys.path, the longer
Python takes to startup.

What's worse is that the slow-down affects the whole Python
installation - each and every application using Python will
have to scan all these ZIP files in case it tries to import
a non-existing module or one which it finds late on sys.path.

> it is in fact one of the reasons zipfile imports were added to 
> Python, because the zip directories are cached.  A zipfile import lookup is 
> a single dictionary lookup, whereas a directory import lookup requires 
> multiple stat() calls.  For all practical purposes, zipfiles added to 
> sys.path are free after the initial directory read operation.

They are "free" for long running applications only where
this caching makes sense.

> Note that the need for a .pth is a limitation caused by the requirement to 
> have packages importable at startup.  Packages installed in "multi-version" 
> or "deactivated" mode are only added to sys.path upon request and have no 
> impact on startup time.  Relatively few eggs *need* to be installed with a 
> .pth file; we are simply in a transitional period where people still expect 
> "installed" packages to be importable without an additional require() 
> operation.
> 
> Finally, I think it's important to note that what Debian should or should 
> not use isn't really relevant to Debian's users, who will quite simply need 
> eggs for many packages.  If Debian doesn't provide them, the users will be 
> forced to obtain them elsewhere.  Over time, the number of packages that 
> users need in egg form will continue to increase, and there will be an 
> increasing number of users wanting to know why Debian can't provide 
> them.  It's perfectly reasonable not to redo existing Debian packages to 
> use eggs, but for some packages, *not* using eggs is simply not an option.

Why should "eggs" be the only way to install a package ?

Doesn't the standard "python setup.py install" work with
eggified packages anymore (meaning that the package is
installed as normal site-packages package) ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 22 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Ian Bicking

M.-A. Lemburg wrote:
Finally, I think it's important to note that what Debian should or should 
not use isn't really relevant to Debian's users, who will quite simply need 
eggs for many packages.  If Debian doesn't provide them, the users will be 
forced to obtain them elsewhere.  Over time, the number of packages that 
users need in egg form will continue to increase, and there will be an 
increasing number of users wanting to know why Debian can't provide 
them.  It's perfectly reasonable not to redo existing Debian packages to 
use eggs, but for some packages, *not* using eggs is simply not an option.



Why should "eggs" be the only way to install a package ?

Doesn't the standard "python setup.py install" work with
eggified packages anymore (meaning that the package is
installed as normal site-packages package) ?


Eggs give room for package metadata that doesn't exist otherwise. 
Putting dependencies aside, this is functionality that simply doesn't 
exist with the standard distutils installation.  In the case of 
FormEncode, it doesn't make use of any egg features (except that other 
packages may want to depend on it using setuptools).  In the case of 
other frameworks -- including TurboGears (which I think is the ultimate 
packaging goal here) -- the Egg metadata really is important, it's not 
just used for dependencies.



--
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:
Yes, it's true, zipfile import processing is faster than normal import 
processing; it is in fact one of the reasons zipfile imports were added 
to Python, because the zip directories are cached.  A zipfile import 
lookup is a single dictionary lookup, whereas a directory import lookup 
requires multiple stat() calls.  For all practical purposes, zipfiles 
added to sys.path are free after the initial directory read operation.


OTOH, it does add an overhead on startup, as it will have to read
the TOC of all zipfiles on sys.path, atleast if the module you are
looking for is in the last zipfile on the path. It then also adds
memory overhead, as the TOC of all files is cached in memory.

Note that the need for a .pth is a limitation caused by the requirement 
to have packages importable at startup.  Packages installed in 
"multi-version" or "deactivated" mode are only added to sys.path upon 
request and have no impact on startup time.  Relatively few eggs *need* 
to be installed with a .pth file; we are simply in a transitional period 
where people still expect "installed" packages to be importable without 
an additional require() operation.


People reasonable will have this expectation for a Debian package. If
you install a Debian package with some library, you expect the library
to be usable right away.

Finally, I think it's important to note that what Debian should or 
should not use isn't really relevant to Debian's users, who will quite 
simply need eggs for many packages.  If Debian doesn't provide them, the 
users will be forced to obtain them elsewhere.


Debian should provide the packages, but not as eggs. For a Debian user,
eggs do not add advantages, and for a Debian Developer, they only add
additional hassle.

Over time, the number of 
packages that users need in egg form will continue to increase, and 
there will be an increasing number of users wanting to know why Debian 
can't provide them.  It's perfectly reasonable not to redo existing 
Debian packages to use eggs, but for some packages, *not* using eggs is 
simply not an option.


Debian developers should work with upstream authors to keep a
distutils-based setup.py operational.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Ian Bicking wrote:
Maybe an easier way to understand this (at least my impression) is that 
zip files are treated as read-only.  Any directory on sys.path gets 
scanned everytime a new module is imported.  And you never know if 
someone added something, so you do it all over again each time.  A zip 
file is scanned only once.


The difference is that a directory inside site-python is not scanned
*at all* if the application looks for some other package. Only the
top-level site-python directory is read, and it is not scanned, but
instead a lookup operation directly asks for the subdirectory with
the package name.

If you have many zipfiles on sys.path, all applications will suffer
from having to read the TOC of all those zipfiles, even if they need
none of them. OTOH, if you had packages inside site-python, the
contents of the unused packages is simply ignored.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread M.-A. Lemburg
Ian Bicking wrote:
> M.-A. Lemburg wrote:
> 
>>> Finally, I think it's important to note that what Debian should or
>>> should not use isn't really relevant to Debian's users, who will
>>> quite simply need eggs for many packages.  If Debian doesn't provide
>>> them, the users will be forced to obtain them elsewhere.  Over time,
>>> the number of packages that users need in egg form will continue to
>>> increase, and there will be an increasing number of users wanting to
>>> know why Debian can't provide them.  It's perfectly reasonable not to
>>> redo existing Debian packages to use eggs, but for some packages,
>>> *not* using eggs is simply not an option.
>>
>>
>>
>> Why should "eggs" be the only way to install a package ?
>>
>> Doesn't the standard "python setup.py install" work with
>> eggified packages anymore (meaning that the package is
>> installed as normal site-packages package) ?
> 
> 
> Eggs give room for package metadata that doesn't exist otherwise.
> Putting dependencies aside, this is functionality that simply doesn't
> exist with the standard distutils installation.  In the case of
> FormEncode, it doesn't make use of any egg features (except that other
> packages may want to depend on it using setuptools).  In the case of
> other frameworks -- including TurboGears (which I think is the ultimate
> packaging goal here) -- the Egg metadata really is important, it's not
> just used for dependencies.

Understood, but wouldn't it be reasonably possible to
also install this meta-data into a standard site-packages
package directory ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 22 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: formencode as .egg in Debian ??

2005-11-22 Thread Bob Tanner
Sanghyeon Seo wrote:

>> yes, if cdbs doesn't allow that. you don't have to use cdbs. it's not
>> a goal to adopt our packaging policies to the way cdbs currently
>> works.
> 
> No, not at all.
> 
> DEB_PYTHON_SETUP_CMD := debian/setup.py
> Write your own setup.py under debian/.

Nice. I see that ability on the cdbs wiki as well.

https://wiki.duckcorp.org/DebianPackagingTutorial/CDBS


-- 
Bob Tanner <[EMAIL PROTECTED]>  | Phone : (952)943-8700
http://www.real-time.com, Minnesota, Linux | Fax   : (952)943-8500
Key fingerprint = AB15 0BDF BCDE 4369 5B42  1973 7CF1 A709 2CC1 B288


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Ian Bicking

M.-A. Lemburg wrote:

Eggs give room for package metadata that doesn't exist otherwise.
Putting dependencies aside, this is functionality that simply doesn't
exist with the standard distutils installation.  In the case of
FormEncode, it doesn't make use of any egg features (except that other
packages may want to depend on it using setuptools).  In the case of
other frameworks -- including TurboGears (which I think is the ultimate
packaging goal here) -- the Egg metadata really is important, it's not
just used for dependencies.



Understood, but wouldn't it be reasonably possible to
also install this meta-data into a standard site-packages
package directory ?


An egg and Python packages don't map 1-to-1.  An egg can contain 
multiple packages (which is fairly uncommon so far), but also a 
top-level package can exist in more than one egg (i.e., namespace 
packages, like zope.interfaces or paste.script).  The metadata belongs 
to the egg, not to the package inside the egg.


Also, some of the metadata is encoded in the directory name itself, like 
the version information.  I think this makes it easier to do some 
scanning operations, without a single database of installed packages 
(and also respecting sys.path manipulation).


That said, I think it would be nice if the transition was smoother. 
E.g., if a file "ElementTree-1.2.6.egg-provided" could point to an 
installed elementtree library (similar to the currently-supported 
.egg-link file, but also slightly different).  And, perhaps, 
elementtree/ElementTree.egg-info could exist (with the same data as the 
current ElementTree-1.2.6/EGG-INFO), though I think the simpler case 
where extra metadata is disallowed would be easier.  That would only 
work for situations when there's a 1-to-1 mapping from packages to 
eggs/projects, but that covers many situations, especially cases where 
we're currently seeing conflicts.  You lose the ability to easily 
support multiple versions of a package with this, though that could 
probably be handled too.



--
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: formencode as .egg in Debian ??

2005-11-22 Thread Bob Tanner
Ian Bicking wrote:

>>>Eggs give room for package metadata that doesn't exist otherwise.
>>>Putting dependencies aside, this is functionality that simply doesn't
>>>exist with the standard distutils installation.  


>> Understood, but wouldn't it be reasonably possible to
>> also install this meta-data into a standard site-packages
>> package directory ?

 
> An egg and Python packages don't map 1-to-1.  An egg can contain
> multiple packages (which is fairly uncommon so far), but also a
> top-level package can exist in more than one egg (i.e., namespace
> packages, like zope.interfaces or paste.script).  The metadata belongs
> to the egg, not to the package inside the egg.

I'd like to bring focus back to immediate problem at hand (and yes, I
understand there is something much bigger involved).

The ultimate goal is to debianize TurboGears, reading the above, and other
posts using the legacy site-packages (non-egg) installation will "break"
TurboGears?


-- 
Bob Tanner <[EMAIL PROTECTED]>  | Phone : (952)943-8700
http://www.real-time.com, Minnesota, Linux | Fax   : (952)943-8500
Key fingerprint = AB15 0BDF BCDE 4369 5B42  1973 7CF1 A709 2CC1 B288


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Ian Bicking

Bob Tanner wrote:

An egg and Python packages don't map 1-to-1.  An egg can contain
multiple packages (which is fairly uncommon so far), but also a
top-level package can exist in more than one egg (i.e., namespace
packages, like zope.interfaces or paste.script).  The metadata belongs
to the egg, not to the package inside the egg.



I'd like to bring focus back to immediate problem at hand (and yes, I
understand there is something much bigger involved).

The ultimate goal is to debianize TurboGears, reading the above, and other
posts using the legacy site-packages (non-egg) installation will "break"
TurboGears?


Well... not really, but TurboGears will think it is broken, because it 
will require packages that will be available (through some non-egg 
form), but it won't realize are available.  ElementTree in particular; I 
don't know if the other packages TG uses are available as Debian 
packages currently.  That's for 0.8.  In 0.9 and ahead it will be more 
broken, because TG will both provide and consume egg metadata 
(entry_points, in particular).  So while it would be reasonably easy to 
patch TurboGears now to be installed without eggs, that's only a 
short-term solution.


However, it is also possible that you could patch Kid (which TG 
requires) to remove the requirement on ElementTree (adding that 
requirement to the Debian package metadata) and then you'd be fine into 
the future.  In that model, all the new packages you do for TurboGears 
would be installed as eggs, but packages already available won't be 
installed as eggs.



--
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 02:32 PM 11/22/2005 -0600, Bob Tanner wrote:

The ultimate goal is to debianize TurboGears, reading the above, and other
posts using the legacy site-packages (non-egg) installation will "break"
TurboGears?


If by that you mean, will you be able to create a Debian-packaged 
TurboGears without extensive patching to work around the issue, the answer 
is no, you will not.


Right now, easy-deb looks like the best bet for people wanting a 
Debian-packaged TurboGears, in that my understanding from the author is 
that it does in fact Debianize eggs.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread M.-A. Lemburg
Ian Bicking wrote:
> M.-A. Lemburg wrote:
> 
>>> Eggs give room for package metadata that doesn't exist otherwise.
>>> Putting dependencies aside, this is functionality that simply doesn't
>>> exist with the standard distutils installation.  In the case of
>>> FormEncode, it doesn't make use of any egg features (except that other
>>> packages may want to depend on it using setuptools).  In the case of
>>> other frameworks -- including TurboGears (which I think is the ultimate
>>> packaging goal here) -- the Egg metadata really is important, it's not
>>> just used for dependencies.
>>
>>
>>
>> Understood, but wouldn't it be reasonably possible to
>> also install this meta-data into a standard site-packages
>> package directory ?
> 
> 
> An egg and Python packages don't map 1-to-1.  An egg can contain
> multiple packages (which is fairly uncommon so far), but also a
> top-level package can exist in more than one egg (i.e., namespace
> packages, like zope.interfaces or paste.script).  The metadata belongs
> to the egg, not to the package inside the egg.
> 
> Also, some of the metadata is encoded in the directory name itself, like
> the version information.  I think this makes it easier to do some
> scanning operations, without a single database of installed packages
> (and also respecting sys.path manipulation).

Well, yes, but all of this is only needed for the egg support.

In order to keep compatibilty with the existing wide-spread
approach to install packages in site-packages/ using "python setup.py
install", it should be possible (and I believe this should be the
default to not disrupt existing usage and documentation) to
run "python setup.py install" with an eggified source
distribution in addition to the command to install it as
regular egg.

Otherwise, we'll end up with completely confused users
and two disjoint and incompatible installations mechanisms.

> That said, I think it would be nice if the transition was smoother.
> E.g., if a file "ElementTree-1.2.6.egg-provided" could point to an
> installed elementtree library (similar to the currently-supported
> .egg-link file, but also slightly different).  And, perhaps,
> elementtree/ElementTree.egg-info could exist (with the same data as the
> current ElementTree-1.2.6/EGG-INFO), though I think the simpler case
> where extra metadata is disallowed would be easier.  That would only
> work for situations when there's a 1-to-1 mapping from packages to
> eggs/projects, but that covers many situations, especially cases where
> we're currently seeing conflicts.  You lose the ability to easily
> support multiple versions of a package with this, though that could
> probably be handled too.

I'm not suggesting to port over all the features you
get from using setuptools' eggs (even though I do believe
that you can go a long way using a special egg import
hook), but it should be possible to get a regular working
installation using "python setup.py install".

PS: I understand setuptools and eggs as feature set which
adds functionality to distutils, not as competitive
and disjoint all-in-one solution. The latter won't fly
well with installations that require native installers
to be used such as Debian's apt-get, rpm and all
the others.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 22 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Bob Tanner
On Tuesday 22 November 2005 03:27 pm, M.-A. Lemburg wrote:
> In summary, things get slower when importing from ZIP files;
> it really only makes sense for applications that have a long
> run time and where startup is not that important, e.g.
> Zope et al.

I was going to stay out of this discussion :-) 

When I read the above, my knee-jerk reaction is: Where is the data to backup 
this statement?

Follow up questions are:

How much slower? We talking milliseconds, seconds, minutes? Yes, there are 
variables, here, but narrow them to a set number and compare zip vs 
directory?

Previous post talked about memory consumption, again bytes, kilobytes, 
megabytes?


-- 
Bob Tanner <[EMAIL PROTECTED]>  | Phone : (952)943-8700
http://www.real-time.com, Minnesota, Linux | Fax   : (952)943-8500
Key fingerprint = AB15 0BDF BCDE 4369 5B42  1973 7CF1 A709 2CC1 B288


pgpWWwdKlnQ3w.pgp
Description: PGP signature


Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread M.-A. Lemburg
Phillip J. Eby wrote:
> At 06:33 PM 11/22/2005 +0100, M.-A. Lemburg wrote:
> 
>>Phillip J. Eby wrote:
>>
>>>Yes, it's true, zipfile import processing is faster than normal import
>>>processing;
>>
>>Only after *all* ZIP files on sys.path have been scanned
>>for their contents. The more you add to sys.path, the longer
>>Python takes to startup.
> 
> 
> This is simply not true.  If you don't believe PEP 302 and site.py, measure 
> it for yourself.  The *only* addition to startup is the time to actually 
> read the .pth file and append the entries to the list.
> 
> 
>>What's worse is that the slow-down affects the whole Python
>>installation - each and every application using Python will
>>have to scan all these ZIP files in case it tries to import
>>a non-existing module or one which it finds late on sys.path.
> 
> 
> And how often do programs attempt to import non-existing modules along 
> performance critical paths?

Every single time you fire up Python and the user has not
installed a module called "sitecustomize" (which is deliberatly
not shipped with Python), Python will scan the complete sys.path
for this module... and that's just one example.

It is rather common in Python code to test for the availability
of a faster variant by trying an import (e.g. for XML parsers)
and then falling back to some slower emulations.

> Note by the way that "scan all these ZIP files" is a misleading term in any 
> case - the files are not "scanned".  They are opened, and a small amount of 
> data is read from the end of the file.  Nothing that I would consider 
> "scanning" is involved.

The data read from the end of the file is the directory
which is decoded using marshal functions. You normally
call this scanning data.

Like Martin said: you always have to read the whole ZIP
directory - even if you're just interested in a single
module with the file.

Actually loading the module then requires decompressing
the code which takes a whole lot longer than just reading
a file from the file system.

In summary, things get slower when importing from ZIP files;
it really only makes sense for applications that have a long
run time and where startup is not that important, e.g.
Zope et al.

The main argument for using ZIP imports is to easy
distribution of complete pure-Python packages, not
a performance gain. You'd use one of the freeze tools
for that, e.g. mxCGIPython which creates a single
file Python interpreter which has a really good
startup time due to the fact that the Python lib
is embedded into the Python executable as static data
and then loaded on demand by the OS as needed.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 22 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 09:54 PM 11/22/2005 +0100, M.-A. Lemburg wrote:

In order to keep compatibilty with the existing wide-spread
approach to install packages in site-packages/ using "python setup.py
install", it should be possible (and I believe this should be the
default to not disrupt existing usage and documentation) to
run "python setup.py install" with an eggified source
distribution in addition to the command to install it as
regular egg.


"python setup.py install" *is* the command to install it as an egg - and 
the layout is no different than what occurs with an 'extra_path' setup() 
today.  The only difference of any consequence is that it maintains a 
*single* .pth file for all eggs, rather than one .pth file per egg, and 
that's an improvement by the standard you and Martin have put forth, where 
saving startup time is a good thing.




I'm not suggesting to port over all the features you
get from using setuptools' eggs (even though I do believe
that you can go a long way using a special egg import
hook),


Eggs don't need an import hook, which is one reason why they're so useful - 
they use the existing, well-established platform provided by Python 2.3 and 
above.




 but it should be possible to get a regular working
installation using "python setup.py install".


You already do!  It's just that *dependencies* also need to be installed as 
eggs, in order for the depending package to take advantage.  Without that, 
the depending package can't guarantee what version of the dependency is in 
use, thereby increasing the developer's support load.  (Kevin Dangoor, the 
TurboGears author, has previously mentioned that he couldn't imagine trying 
to support TurboGears without eggs, simply due to the overhead of debugging 
people's dependency problems on multiple platforms and packaging 
systems.  Even the support overhead caused by him making mistakes in using 
setuptools or by bugs in setuptools itself, was still dwarfed by the number 
of trouble-free installations on Windows and Mac OS.


And over the last few months, I believe we've also succeeded in stomping 
most of the issues that people had with getting solid non-root 
installations on their Linux distributions.  So the reasons for developers 
to prefer their dependencies to be managed as eggs will only improve over 
time, as the egg system allows Python developers to control and introspect 
their dependencies, rather than keeping that information hidden behind 
diverse platform-specific packaging tools.


So, the issue being surfaced here is that Python developers who want to use 
other people's code will want the other people's code packaged as eggs, 
even if their own package doesn't need to be an egg itself (other than to 
take advantage of the automatic dependency handling).  That's why 
easy_install exists: it can build eggs for most PyPI packages, even if the 
package author never heard of Python eggs.




PS: I understand setuptools and eggs as feature set which
adds functionality to distutils, not as competitive
and disjoint all-in-one solution.


It's not disjoint at all, since it's trivial to package the majority of 
plain ol' distutils packages as eggs.  It isn't even competitive, in the 
sense that you are not forced to choose between one or the other.  You can 
have a "legacy" version of a package installed in the traditional way, and 
an egg version too.  Code that isn't using the dependency management 
facility will import the legacy version, and code that does will see the 
egg version.


It's only "competitive" if you feel that there must be only one way to do 
it.  (And if you do feel that way, then it also should be obvious that eggs 
are the superior solution, since they don't take away any capabilities of 
the old, only provide new ones.)




The latter won't fly
well with installations that require native installers
to be used such as Debian's apt-get, rpm and all
the others.


Currently, there is an experimental Gentoo "ebuild", and there's 
easy_deb.  I'm talking with some people about putting together an 
"easy_rpm" equivalent.  I recently heard that the easy_deb project was 
backed by Ubuntu, so it sounds like there's a fair amount of interest in 
supporting it out there.  Mainly, I see this as an issue of the packagers 
having to adapt to meet their users' changing needs.  It's reasonable to 
expect the major vendors to lag behind in terms of egg support, so there 
will naturally be a transitional period where tools like easy_deb and the 
hypothetical easy_rpm will be necessary.  (Setuptools' "bdist_rpm" command 
builds egg-based RPMs already, but a separate easy_rpm that can also 
rpm/eggify legacy packages would be less invasive.)


At the same time, it's also not reasonable for vendors to expect that 
ignoring eggs will make them go away - the practical advantages are far too 
compelling.  A developer who targets the egg runtime gets cross-platform 
dependency management and has the option of doing distributed development 
track

Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Bob Ippolito

On Nov 22, 2005, at 1:27 PM, M.-A. Lemburg wrote:


Phillip J. Eby wrote:
Note by the way that "scan all these ZIP files" is a misleading  
term in any
case - the files are not "scanned".  They are opened, and a small  
amount of

data is read from the end of the file.  Nothing that I would consider
"scanning" is involved.


The data read from the end of the file is the directory
which is decoded using marshal functions. You normally
call this scanning data.

Like Martin said: you always have to read the whole ZIP
directory - even if you're just interested in a single
module with the file.

Actually loading the module then requires decompressing
the code which takes a whole lot longer than just reading
a file from the file system.


Last I checked, CPUs and RAM are a lot faster than disk.  Unless it's  
sitting in cache already, reading a zip should be way faster than  
reading an uncompressed file.  On top of that, I don't think egg zips  
are compressed by default...


-bob


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

M.-A. Lemburg wrote:

Doesn't the standard "python setup.py install" work with
eggified packages anymore (meaning that the package is
installed as normal site-packages package) ?


No. First, an eggified package tries to download ez_setup
first, i.e. it won't do the distutils setup(), but the easy_install
one. Then, "setup.py install" will not install into, say,
site-packages/mxTools, but into
site-packages/mxTools-2.0.3.egg/mxTools, and add, to easy_install.pth,
the line

/usr/lib/python2.3/site-packages/mxTools-2.0.3.egg

(which in turn adds this directory to sys.path)

If you edit setup.py to use distutils setup instead of
easy_install setup, it will install as before, but warn
that setup() is being passed unrecognized parameters.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 02:00 PM 11/22/2005 -0600, Ian Bicking wrote:
An egg and Python packages don't map 1-to-1.  An egg can contain multiple 
packages (which is fairly uncommon so far), but also a top-level package 
can exist in more than one egg (i.e., namespace packages, like 
zope.interfaces or paste.script).  The metadata belongs to the egg, not to 
the package inside the egg.


In addition, an egg can contain various Python modules, and still have 
metadata even if no packages are involved.



That said, I think it would be nice if the transition was smoother. E.g., 
if a file "ElementTree-1.2.6.egg-provided" could point to an installed 
elementtree library (similar to the currently-supported .egg-link file, 
but also slightly different).  And, perhaps, 
elementtree/ElementTree.egg-info could exist (with the same data as the 
current ElementTree-1.2.6/EGG-INFO), though I think the simpler case where 
extra metadata is disallowed would be easier.


It's not necessary to create a new way to do this; you can simply create 
'ElementTree.egg-info' in site-packages and put the PKG-INFO file in there 
(and any other egg-info files), and it's supported by the existing 
mechanisms just fine, as long as the project has no top-level resources 
(files, data directories, etc.), and does not participate in namespace 
packages.


So, for practical purposes, this would be more of a way to upgrade legacy 
packages to be detectable by egg-based packages, than a way to install egg 
packages as non-eggs.  However, it might be a workable compromise for 
getting many of the Debian-packaged TurboGears dependencies to be usable 
while still mostly conforming to the existing Debian policy.


A few months ago, this approach wouldn't have worked due to possible 
conflicts with locally-installed eggs, but setuptools now has runtime 
conflict management that can smooth it over as long as you haven't imported 
any of the conflicting packages yet.



  That would only work for situations when there's a 1-to-1 mapping from 
packages to eggs/projects, but that covers many situations, especially 
cases where we're currently seeing conflicts.  You lose the ability to 
easily support multiple versions of a package with this, though that 
could probably be handled too.


This approach won't support multiple simultaneous versions, but then 
neither do most system packaging tools, and if this is strictly a 
workaround for system packagers who don't want to move everything to eggs, 
then it works just fine for that.  They will, however, have to be careful 
about namespace packages in setuptools-based packages, since the package 
directories will be shared by two separately-installed projects, and each 
package will want to include its own __init__.py files.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:

If you have many zipfiles on sys.path, all applications will suffer
from having to read the TOC of all those zipfiles, even if they need
none of them. OTOH, if you had packages inside site-python, the
contents of the unused packages is simply ignored.



I'm sorry, but this is, shall we say, "fact challenged"?  .pth files' 
contents are added to the *end* of sys.path.  This means that stdlib 
imports and normal site-packages imports are satisfied *before* any 
hypothetical overhead from .pth entries, whether they're zipfiles or 
directories.


Correct. I was not talking about stdlib imports. I was talking about 
imports satisfied from the end of sys.path, or imports resulting in

ImportErrors.

If Python never reaches the .pth entries at runtime, it 
will not even read the zipfile TOCs, let alone attempting to stat() for 
contained packages.


Correct. However, a false preposition can imply anything: Python
*always* reaches the .pth entries atleast once, in a typical
installation, while looking for sitecustomize. This will cause
a load of all zipfiles on sys.path, before site.py is done.


Please check your facts before spreading untruths like this


I did check: I have a file a.pth in site-packages, which refers to
a.zip (in the same directory), and I have an empty Python file e.py.
Running

strace -o xxx python e.py

shows, among others

open("/usr/lib/python2.3/site-packages/a.zip", O_RDONLY|O_LARGEFILE) = 5
...
ead(5, "PK\3\4\n\0\0\0\0\0\202\274v3\265<\267\r\16\0\0\0\16\0\0"..., 
132) = 132


So a.zip is read even though the program does not contain
a single import statement.

What is the untruth I'm spreading?

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:
This is simply not true.  If you don't believe PEP 302 and site.py, 
measure it for yourself.  The *only* addition to startup is the time to 
actually read the .pth file and append the entries to the list.


I did. strace shows that all zip files are loaded.

And how often do programs attempt to import non-existing modules along 
performance critical paths?


Every time. Atleast sitecustomize is imported in most programs (except
those skipping site.py), and is not present in most installations.
The standard library catches ImportError about 250 times, although
fewer expect the failure in a typical installation.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:

Debian should provide the packages, but not as eggs.



For packages that only operate as eggs, and/or require their 
dependencies as eggs, you are stating a contradiction in terms.  Eggs 
are not merely a distribution format, any more than Java .jar files are.


So I should say

"Debian should not provide eggs, period", since what Debian provides
are packages, and eggs are not?


Debian developers should work with upstream authors to keep a
distutils-based setup.py operational.



It's perfectly operational; clearly the entire egg system is *well* 
within the Python runtime's intended operating parameters, as it uses 
only well-defined and published aspects of the Python language, API, 
stdlib, and build process.


I didn't say the egg system in inoperational. I said that distutils
setup is not operational for, for example, FormEncode: this uses
another packaging library in setup.py, not distutils setup.

Perhaps you have some other definition of "operational" in mind? 


I had "*distutils-based* setup.py" in mind.

As 
I've already stated, applying this same policy to Java libraries would 
be to demanding that all the .class files be extracted to the filesystem 
and any manifest files be deleted, before Debian would consent to 
package them.  In other words, it would be silly and pointless, because 
the users would then ignore the packages in favor of actual jars, 
because then their applications would actually work.


This is not the same. A java .jar file is deployed by putting it on 
disk. For an egg, an (apparently undocumented) number of additional

steps is necessary, such as editing easy-install.pth.

In Java, the drawback of course is that each user has to edit
CLASSPATH to include all the jar files desired. easy_setup
makes this unnecessary, but in a way unfriendly to dpkg (and
I assume other Linux package formats).

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

M.-A. Lemburg wrote:

The main argument for using ZIP imports is to easy
distribution of complete pure-Python packages, not
a performance gain.


Precisely. For this reason, python2x.zip is on sys.path,
allowing you to include the entire library (subset) in
a single file.

It may also be of convenience to users and local admins
which have to install the same package across different
machines, when the system vendor does not provide a package,
but it appears that eggs are currently unfriendly to
system vendors which want to include eggified packages
in their own packaging system.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Bob Tanner wrote:
When I read the above, my knee-jerk reaction is: Where is the data to backup 
this statement?


One could show strace outputs, and compare the number of system calls.
Compiling this into actual timing is difficult: you would have to trade
stat calls for read calls, and you would have to take the operating
system's disk and directory caching into account.


How much slower?


Not noticably, for a small number of zip files, and non-trivial
Python scripts. Likewise, the presumed speed advantage of zip files
(for zip file directory hits) is just as unnoticable, due to OS
caching (depending on the OS, of course, and sizes of directories).

We talking milliseconds, seconds, minutes? Yes, there are 
variables, here, but narrow them to a set number and compare zip vs 
directory?


Milliseconds, either way.

Previous post talked about memory consumption, again bytes, kilobytes, 
megabytes?


This is easier to say. For FormEncode-0.4, which contains 51 files,
the Zip directory is 3625 bytes. Now, this is stored in memory somehow:

Each file ends up as a directory entry, e.g.

'formencode/fields.pyc': 
('/tmp/FormEncode-0.4-py2.4.egg/formencode/fields.pyc', 8, 12603, 34753, 
69084, -28392, 13172, -784586915)


This makes 12 pointers (48 bytes) for pointers in containers per file
(3 in the dictionary, 9 in the tuple), plus 7*12 byte for the integers
(some shared, of course), plus 3*12 bytes for the string and tuple
headers, plus the characters for the strings (including null bytes,
4146 bytes for FormEncode). Counting this all together, the
in-memory usage is about 4300 bytes for FormEncode; if I had
this in /usr/lib/python2.3/site-packages instead of /tmp, it
would be larger.

This is roughly 2% of the egg file size. Larger eggs likely
have more files, so this might serve as an estimate. The memory
overhead would occur for all eggs installed, for all Python
scripts.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:
The only thing that occurs to me as even a possibility would be some 
kind of frequently-used system administration utility, like if you were 
going to rewrite all the bash builtin commands as Python scripts.


This whole discussion is not about whether the start time actually
matters - it is about whether it is a fact or not that eggs improve
the startup. Some people said it does, others said it doesn't, and this
is just the finding-of-facts phase.

Anyway,

> I'm terribly curious what Python applications exist for whom:
> 1. Startup time is a consideration, that
> 2. Haven't already been refactored to a long-running process.

For this, CGI scripts come to mind. Many people use them, and they
are often short-running, and they often get invoked frequently.

Then why was the python##.zip entry added to sys.path in Python 2.3?  My 
understanding was that it was added to allow Python to start faster by 
cutting down on extraneous stat() calls.


PEP 273 doesn't give much rationale:

"Booting"
...
"Just as there are default directories in sys.path, there must be
one or more default zip archives too."

IIRC, it was to simplify deployment, having the entire library in
a single file.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 11:56 PM 11/22/2005 +0100, Martin v. Löwis wrote:

Phillip J. Eby wrote:

Debian should provide the packages, but not as eggs.


For packages that only operate as eggs, and/or require their dependencies 
as eggs, you are stating a contradiction in terms.  Eggs are not merely a 
distribution format, any more than Java .jar files are.


So I should say

"Debian should not provide eggs, period", since what Debian provides
are packages, and eggs are not?


I don't understand you.



Debian developers should work with upstream authors to keep a
distutils-based setup.py operational.


It's perfectly operational; clearly the entire egg system is *well* 
within the Python runtime's intended operating parameters, as it uses 
only well-defined and published aspects of the Python language, API, 
stdlib, and build process.


I didn't say the egg system in inoperational. I said that distutils
setup is not operational for, for example, FormEncode: this uses
another packaging library in setup.py, not distutils setup.


I still don't understand you.  If a package subclasses a distutils command, 
is it no longer a distutils setup?  What if it bundles a library module 
that includes a subclass of a distutils command?  Where, precisely, do you 
draw the line between a "distutils setup" and something else?  *Many* 
packages subclass distutils commands or use unusual arguments to the 
distutils setup() that cause things to be installed in unusual ways.  I'm 
very familiar with this because easy_install tries to support as many of 
those quirky subclasses or arguments as is practical, so it seems to me 
that the definition of a "distutils setup" is nowhere near as clear cut as 
your statements here imply.



As I've already stated, applying this same policy to Java libraries would 
be to demanding that all the .class files be extracted to the filesystem 
and any manifest files be deleted, before Debian would consent to package 
them.  In other words, it would be silly and pointless, because the users 
would then ignore the packages in favor of actual jars, because then 
their applications would actually work.


This is not the same. A java .jar file is deployed by putting it on disk. 
For an egg, an (apparently undocumented)


An egg must be on sys.path, if you want to use it without explicitly using 
the egg runtime.  See "The Quick Guide To Python Eggs", in particular this 
passage from http://peak.telecommunity.com/DevCenter/PythonEggs#using-eggs :


   "If you have a pure-Python egg that doesn't use any in-package data 
files, and you don't mind manually placing it on sys.path or PYTHONPATH, 
you can use the egg without installing setuptools."




number of additional
steps is necessary, such as editing easy-install.pth.


Nothing except performance considerations prevents you having a separate 
.pth file for each and every egg, just as nothing prevents distutils 
packages from being installed as directory+.pth today.  Does Debian 
currently reject packages that use the extra_path argument to setup(), like 
Numeric?




In Java, the drawback of course is that each user has to edit
CLASSPATH to include all the jar files desired. easy_setup
makes this unnecessary, but in a way unfriendly to dpkg (and
I assume other Linux package formats).


I don't understand you here.  Are you saying that it's not possible for 
dpkg to do a post-install or uninstall operation like adding or removing a 
line from a file?


In any case, if you look at the approach Ian Bicking suggested and I 
commented further on, you'll see that you *can* in fact bypass this whole 
issue by packaging the egg metadata in another form, that gets rid of the 
need for .egg files or directories, as well as .pth manipulation.


That approach, however, is not significantly documented at this time (other 
than a post to the distutils-SIG earlier this year outlining the design), 
but I'd be more than happy to document it further, if it makes the need for 
the rest of this discussion go away.  :)


Here are the steps to create a "single-version" egg:

1. Build the egg

2. Unzip the egg directly into site-packages, but rename the EGG-INFO 
subdirectory in the process to ProjectName.egg-info, where ProjectName is 
the pkg_resources.safe_name() of the setup(name="...") 
argument.  (Alternately, you can take 
'filename.split("-")[0].replace("_","-")', where 'filename' is the 
os.path.basename of the egg.)


3. (optional) remove any .py/.pyc/.pyo files that have an adjacent C 
extension file of the same name, such as 'foo.py' and 'foo.pyd' or 
'foo.so'.  The .py/.pyc/.pyo are stubs created by setuptools to extract the 
C extension from a zipped egg at runtime, and are not needed by an 
extracted installation.  (This step is optional because Python's import 
gives precedence to the C extensions over the .py files, so nothing bad 
will happen if you don't delete the files.)


What this process will *not* do for you is address conflicts in top-level 
data files, nor will i

Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread John J Lee
On Tue, 22 Nov 2005, "Martin v. Löwis" wrote:
> > As 
> > I've already stated, applying this same policy to Java libraries would 
> > be to demanding that all the .class files be extracted to the filesystem 
> > and any manifest files be deleted, before Debian would consent to 
> > package them.  In other words, it would be silly and pointless, because 
> > the users would then ignore the packages in favor of actual jars, 
> > because then their applications would actually work.
> 
> This is not the same. A java .jar file is deployed by putting it on 
> disk. For an egg, an (apparently undocumented) number of additional
> steps is necessary, such as editing easy-install.pth.
> 
> In Java, the drawback of course is that each user has to edit
> CLASSPATH to include all the jar files desired. easy_setup
> makes this unnecessary, but in a way unfriendly to dpkg (and
> I assume other Linux package formats).

Actually, I believe this is the eventual primary intended mode of
operation of eggs.  In that case, the necessary sys.path manipulation is
handled either by setuptools' wrapper scripts / wrapper .exe files calling
require(version), or by calling that function elsewhere.  The difference
from .jar files is that you have the *option* of doing a global install
into site-packages, which adds an entry to the .pth file.

Phillip's primary use case for the whole egg / setuptools thing is, IIUC,
precisely the zero-install case, in particular to support installation of
plug-ins by dropping them into a directory, without even requiring any
"add it to a path" step.

Can I ask you (Martin) a couple of questions?

1. Does the above affect your concern about reading many zip files?

2. I understand your concern about memory usage (though the above seems to
make it a non-issue in practice, if used sensibly), but I must have missed
the argument you made for setuptools and/or Python Eggs being problematic
for distributors such as Debian and Gentoo.  What specifically is/are the
problem(s)?  It seems at least two distributions are already actively
moving towards use of Python Eggs, so it would be good to inform those
distributors of any problem you see before they get too far.


John



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread John J Lee
On Tue, 22 Nov 2005, Phillip J. Eby wrote:

> At 11:56 PM 11/22/2005 +0100, Martin v. Löwis wrote:
> >Phillip J. Eby wrote:
> >>>Debian should provide the packages, but not as eggs.
> >>
> >>For packages that only operate as eggs, and/or require their dependencies 
> >>as eggs, you are stating a contradiction in terms.  Eggs are not merely a 
> >>distribution format, any more than Java .jar files are.
> >
> >So I should say
> >
> >"Debian should not provide eggs, period", since what Debian provides
> >are packages, and eggs are not?
> 
> I don't understand you.
[...]

I can only assume Martin is reading 'package' to mean 'a unit of software 
distribution' rather than 'a Python package'.


John



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread John J Lee
On Tue, 22 Nov 2005, John J Lee wrote:
[...]
> Actually, I believe this is the eventual primary intended mode of
> operation of eggs.  In that case, the necessary sys.path manipulation is
> handled either by setuptools' wrapper scripts / wrapper .exe files calling
> require(version), or by calling that function elsewhere.  The difference
> from .jar files is that you have the *option* of doing a global install
> into site-packages, which adds an entry to the .pth file.
[...]

Forgot to add: eggs may also be installed without use of zip files.  
Given that, why are the properties of zip file imports necessarily
relevant to people using eggs (whether they be users of software
distribututions or people concerned about memory usage or startup time)?


John


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 11:51 PM 11/22/2005 +, John J Lee wrote:

1. Does the above affect your concern about reading many zip files?

2. I understand your concern about memory usage (though the above seems to
make it a non-issue in practice, if used sensibly), but I must have missed
the argument you made for setuptools and/or Python Eggs being problematic
for distributors such as Debian and Gentoo.  What specifically is/are the
problem(s)?  It seems at least two distributions are already actively
moving towards use of Python Eggs, so it would be good to inform those
distributors of any problem you see before they get too far.


Actually, the .egg-info approach that Ian reminded me of should alleviate 
*all* the concerns raised so far, although it requires more hands-on 
management of the package definitions.  However, if the Debian packagers 
don't care about this (it's likely no significant change for the better or 
worse compared to what they have to do already), then they can take a pass 
altogether on the file/directory issues, at the cost of slowing down tools 
that actually *do* use eggs.


The tradeoff is that .egg-info files have to have their PKG-INFO files 
opened and read in order to do dependency processing  but actually, if 
they created full ProjectName-Version.egg-info directories, this issue 
could actually be bypassed as well.  Hm.  Yes, that's even better.  I 
didn't think of this before because the normal use of .egg-info directories 
is for "packaging" a source tree as an egg, and you don't want to have to 
rename the directory every time the version changes.  But for a system 
packager, this isn't an issue.


So, I guess for packaging tools that can't support multiple versions being 
installed (or don't desire to), the .egg-info approach allows them to 
precisely preserve the status quo with respect to the implementation 
tradeoffs, without hurting anybody or breaking anything.


Now that it's clear to me that we can skirt the performance issue for 
.egg-info directories, it seems reasonable to recommend this approach as a 
low-risk way for system vendors to support Python eggs, and to provide a 
way for them to expose metadata for their existing Python packages in a way 
that setuptools-based packages can use - just make a 
Project-Version.egg-info directory with a PKG-INFO in it, added to 
site-packages along with the actual package installation.


Indeed, it seems like it would be reasonable to propose that perhaps the 
normal distutils install process could provide this additional metadata, 
which would then eliminate the need for any repackaging or upgrades to any 
tools that already use "setup.py install" to create their packages.  This 
would address things like ElementTree, which isn't inherently egg-based, 
but which people would like to depend on without having to rely on a 
platform-specific packaging tool to resolve the dependency "out of band".




--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:

Debian should provide the packages, but not as eggs.



For packages that only operate as eggs, and/or require their 
dependencies as eggs, you are stating a contradiction in terms.  Eggs 
are not merely a distribution format, any more than Java .jar files are.



So I should say

"Debian should not provide eggs, period", since what Debian provides
are packages, and eggs are not?



I don't understand you.


This is getting difficult: I don't actually know what "a contradiction
in terms" is. You seemed to be saying that eggs are not a distribution
format. If so, Debian should not distribute them. If eggs are,
in fact, a distribution format: what is the contradiction then?
I would still claim that Debian should not distribute them, but
instead distribute policy-conforming Debian packages instead.


I didn't say the egg system in inoperational. I said that distutils
setup is not operational for, for example, FormEncode: this uses
another packaging library in setup.py, not distutils setup.



I still don't understand you.  If a package subclasses a distutils 
command, is it no longer a distutils setup?


It is not a distutils setup because it does not invoke
distutils.core.setup.

Also, if it did so, and changed the install command to do something
inherently different (like installing into a different target directory
without the user asking for it), it is not a distutils setup anymore.

What if it bundles a 
library module that includes a subclass of a distutils command?  Where, 
precisely, do you draw the line between a "distutils setup" and 
something else?


Extending distutils is fine. An extension is a feature that, if not
invoked, has no effect. easy_setup changes install in a way that
has an effect.

*Many* packages subclass distutils commands or use 
unusual arguments to the distutils setup() that cause things to be 
installed in unusual ways.


Yes, but they preserve the default behaviour: packages get installed
into subdirectories of site-packages if you invoke the install
command.

This is not the same. A java .jar file is deployed by putting it on 
disk. For an egg, an (apparently undocumented)



An egg must be on sys.path, if you want to use it without explicitly 
using the egg runtime.  See "The Quick Guide To Python Eggs", in 
particular this passage from 
http://peak.telecommunity.com/DevCenter/PythonEggs#using-eggs :


   "If you have a pure-Python egg that doesn't use any in-package data 
files, and you don't mind manually placing it on sys.path or PYTHONPATH, 
you can use the egg without installing setuptools."


Right. However, easy_setup apparently edits a .pth file to achieve
this effect. This is different from jarfiles, where the end-user is
expected to put it on sys.path.

While this would be possible to do for egg files as well (ie
just drop it somewhere, and let the user edit PYTHONPATH), it violates
Debian policy to do so: the packages should be immediately usable
after the Debian package is installed.

Nothing except performance considerations prevents you having a separate 
.pth file for each and every egg


That is not true. Usability also suffers if sys.path becomes long.

just as nothing prevents distutils 
packages from being installed as directory+.pth today.  Does Debian 
currently reject packages that use the extra_path argument to setup(), 
like Numeric?


No. The policy states "Install your modules into 
/usr/lib/pythonX.Y/site-packages/". This is somewhat imprecise,

and it is not clear (to me) whether the python-numeric violates
that policy or not.

>> but in a way unfriendly to dpkg
I don't understand you here.  Are you saying that it's not possible for 
dpkg to do a post-install or uninstall operation like adding or removing 
a line from a file?


That is certainly possible - but currently, each maintainer would have
to come up with his own solution. This is more tedious to do than it
could be.


Here are the steps to create a "single-version" egg:

1. Build the egg
...


This is still tedious to do, but certainly fits with Debian conventions
(policy or not) better than any other approach.

Of course, this creates additional work for package maintainers that 
wouldn't be present with setuptools' normal .egg file/directory 
distributions, and my assumption was that the maintainers would prefer 
to be able to ignore such issues and get the benefit of dependencies 
defined by the upstream developers.  Eggs keep each project in its own 
little bubble, where it can't overwrite anything else and can be 
uninstalled without removing any overlapping parts.


I don't see how the maintainer could use the dependency information
in the egg files. Debian policy is that the .deb files need to
define proper dependencies, so the maintainer has to lookup
and edit the dependency information *anyway*. Using the egg
package name is of limited, help, either, because Debian policy
mandates a certain naming scheme for packages, giving the
FormEncode package a name 

Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Martin v. Löwis

Phillip J. Eby wrote:
I find that surprising, since I only use CGI if I'm not concerned about 
the start time.  It's not like there aren't dozens of "long-running 
process" solutions for Python web apps including mod_python, FastCGI, 
SCGI, Twisted, and even ReadyExec, to fit almost every conceivable 
need.  And since the advent of WSGI, more frameworks can be used with 
more of those deployment options than ever before.


As an example, both MoinMoin and pypi (cheeseshop) ran as CGI scripts
on python.org for quite some time. I'm not sure whether this is still
the case case, but there were certainly many accesses to both, and
they produced a significant load.

Currently, viewcvs runs as a CGI script on svn.python.org. This is
because I don't know what FastCGI, SCGI, Twisted, and ReadyExec are,
and I have only heard of mod_python, and I don't have any time to
learn any of these, and then find out how to adjust viewcvs to
use them. I guess once the search engines find them, they will
also contribute to the load.

Also, the sf redirector (python.org/sf) is a CGI script. It
runs fairly quickly these days, but it might not get invoked
so often.

Regards,
Martin


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread John J Lee
On Wed, 23 Nov 2005, "Martin v. Löwis" wrote:
[...]
> I don't see how the maintainer could use the dependency information
> in the egg files.

The maintainers could use the dependency information in the egg files by
writing a tool to automatically map from the egg dep info to whatever
format their packaging system uses -- for example, .deb files.


> Debian policy is that the .deb files need to
> define proper dependencies, so the maintainer has to lookup
> and edit the dependency information *anyway*. Using the egg
> package name is of limited, help, either, because Debian policy
> mandates a certain naming scheme for packages, giving the
> FormEncode package a name of python2.4-formencode.

Why does any of that block distributors from using egg dependency info?


John



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Andrew Bennetts
Martin v. Löwis wrote:
[...]
> 
> This whole discussion is not about whether the start time actually
> matters - it is about whether it is a fact or not that eggs improve
> the startup. Some people said it does, others said it doesn't, and this
> is just the finding-of-facts phase.
> 
> Anyway,
> 
> > I'm terribly curious what Python applications exist for whom:
> > 1. Startup time is a consideration, that
> > 2. Haven't already been refactored to a long-running process.
> 
> For this, CGI scripts come to mind. Many people use them, and they
> are often short-running, and they often get invoked frequently.

Another example would be bzr ; revision control
command line tools (or command line tools in general, I suppose) feel much nicer
to use when they respond immediately.  It doesn't take many hundreds of
milliseconds of startup delay for a tool to start feeling a little bit sluggish.

-Andrew.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread David Arnold
-->"Phillip" == Phillip J Eby <[EMAIL PROTECTED]> writes:

  Phillip> This is a major advantage over developers who do not do this,
  Phillip> not only in developer effectivness, but also because a
  Phillip> developer who depends exclusively on a specific packaging
  Phillip> system will not have the same effective reach for their
  Phillip> offering, or conversely will require a greater investment of
  Phillip> effort to support various packaging systems.



So, this would seem to imply that installation of eggs is similar to
using PEAR or CPAN?

If so, from the perspective of a Debian user, this is a major
disadvantage.  I'm used to having a single framework that manages my
packages and their dependency relationships, and it works well.

Adding a language-specific mechanism simply causes problems, with stray
files installed into directories "owned" by a .deb package, versions of
CPAN/PEAR-installed packages drifting out of date with the interpreter
and standard library, and just the cognitive load of needing to deal
with something other than apt-get.  My experiences with CPAN/PEAR
packages have been universally bad, and I now try very, very hard to use
nothing except apt/dpkg.

I understand that from a Python-only perspective eggs might have a bunch
of ease-of-use advantages, but from my point of view I'd suggest it's
better that the developer (or Debian packager) takes the trouble to make
it work with dpkg so all Debian users get to maintain the consistency
they're used to.

My 2c, etc,




d





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 01:21 AM 11/23/2005 +0100, Martin v. Löwis wrote:

Phillip J. Eby wrote:

Debian should provide the packages, but not as eggs.



For packages that only operate as eggs, and/or require their 
dependencies as eggs, you are stating a contradiction in terms.  Eggs 
are not merely a distribution format, any more than Java .jar files are.


So I should say

"Debian should not provide eggs, period", since what Debian provides
are packages, and eggs are not?


I don't understand you.


This is getting difficult: I don't actually know what "a contradiction
in terms" is. You seemed to be saying that eggs are not a distribution
format.


They are not a distribution format.  There are in fact three physical 
formats that an  egg can take (if we ignore .egg-link files, which are 
really needed only to work around the absence of symlinks on Windows).  In 
principle, there could be many others.


I suspect that part of the confusion stems that I prefer to use "package" 
to refer only to a Python package (thing you import), and not to refer to a 
distribution as a "package".  However, Debian calls distributions 
"packages", so some confusion is perhaps inevitable.  What's more, it 
appears that the Debian policy calls for the Debian package to be named for 
the contained Python package, regardless of whether that's the name of the 
distribution.


An "egg" is a "distribution" of a "project" that is importable and can 
carry both standardized and individualized metadata that can be read by the 
pkg_resources module.  There are various distribution *formats* in which an 
"egg" may be physically manifested, but the "egg" itself is a logical 
concept, not a physical one.  It is therefore, as I said, "not merely a 
distribution format".  Is that any clearer?


The "contradiction in terms" was that I took your meaning of "package" to 
be the same as my term "project" - i.e., a functional collection of Python 
resources.  Projects that *are* eggs, can't be provided "but not as 
eggs".  They *are* eggs, so not providing them as eggs means not providing 
them at all.


In contrast, projects that are not built with setuptools aren't inherently 
eggs, but you can certainly make eggs out of them.  For these projects, you 
*do* have the choice to provide them "not as eggs", but then they are also 
of no use to the projects that need eggs.


As we've already briefly discussed, in the simplest form a project can be 
made eggs just by adding an appropriately-named .egg-info/PKG-INFO file.




 If so, Debian should not distribute them.


This is what I don't understand, as it has nothing to do whether or not is 
a distribution format, at least not that I can see.  My statement was that 
eggs are not merely a distribution format; they are a logical concept that 
can be physically packaged in various ways, and if it's necessary to invent 
yet another physical layout, well, we can do that too.




If eggs are,
in fact, a distribution format: what is the contradiction then?
I would still claim that Debian should not distribute them, but
instead distribute policy-conforming Debian packages instead.


Which would be the same as saying you wouldn't distribute, say, setuptools 
itself.  Setuptools is an egg, and can't function except as an egg, because 
it is more than a Python package.  Again, an "egg" is some specific release 
of a project and its introspectable metadata.



I still don't understand you.  If a package subclasses a distutils 
command, is it no longer a distutils setup?


It is not a distutils setup because it does not invoke
distutils.core.setup.


Now I really don't understand you.  Line 43 of setuptools/__init__.py reads:

setup = distutils.core.setup

So, how is it not invoking distutils.core.setup?


What if it bundles a library module that includes a subclass of a 
distutils command?  Where, precisely, do you draw the line between a 
"distutils setup" and something else?


Extending distutils is fine. An extension is a feature that, if not
invoked, has no effect. easy_setup changes install in a way that
has an effect.


So do all the packages that rework install_data to be more to their liking 
- and there are quite a lot of them, as I discovered when I began testing 
easy_install.



Nothing except performance considerations prevents you having a separate 
.pth file for each and every egg


That is not true. Usability also suffers if sys.path becomes long.


How?  I don't understand this.  Someone using eggs rarely has reason to 
manually manipulate sys.path unless they are adding some kind of plugin 
directory to it.  If they want to know what package version they are using, 
pkg_resources provides a superior API for querying it; I can say e.g. 
'require("TurboGears")' and receive back a list of all the eggs that 
compose or are required by TurboGears, along with their locations.  (Or 
conversely, receive a DistributionNotFound or VersionConflict error 
explaining what's missing or what was already imported that'

Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 01:40 AM 11/23/2005 +0100, Martin v. Löwis wrote:

Phillip J. Eby wrote:
I find that surprising, since I only use CGI if I'm not concerned about 
the start time.  It's not like there aren't dozens of "long-running 
process" solutions for Python web apps including mod_python, FastCGI, 
SCGI, Twisted, and even ReadyExec, to fit almost every conceivable 
need.  And since the advent of WSGI, more frameworks can be used with 
more of those deployment options than ever before.


As an example, both MoinMoin and pypi (cheeseshop) ran as CGI scripts
on python.org for quite some time. I'm not sure whether this is still
the case case, but there were certainly many accesses to both, and
they produced a significant load.

Currently, viewcvs runs as a CGI script on svn.python.org. This is
because I don't know what FastCGI, SCGI, Twisted, and ReadyExec are,
and I have only heard of mod_python, and I don't have any time to
learn any of these, and then find out how to adjust viewcvs to
use them. I guess once the search engines find them, they will
also contribute to the load.

Also, the sf redirector (python.org/sf) is a CGI script. It
runs fairly quickly these days, but it might not get invoked
so often.


Sure, I run viewcvs and MoinMoin as CGI scripts as well, because they're 
low-volume use.  My point was more that if a few milliseconds per request 
would make any meaningful difference to those applications' performance, I 
would have already migrated them to use one of the many long-running 
process options for Python web applications.  That's why it didn't occur to 
me to consider that worth caring about.


Again, I'm not saying it might not be an issue, just that I never even 
considered it because I've been doing long-running Python web apps with 
FastCGI for about 8-9 years now, so the idea of running something 
millisecond-critical as a CGI just isn't something that would occur to me.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: [Distutils] formencode as .egg in Debian ??

2005-11-22 Thread Phillip J. Eby

At 11:53 AM 11/23/2005 +1100, David Arnold wrote:

-->"Phillip" == Phillip J Eby <[EMAIL PROTECTED]> writes:

  Phillip> This is a major advantage over developers who do not do this,
  Phillip> not only in developer effectivness, but also because a
  Phillip> developer who depends exclusively on a specific packaging
  Phillip> system will not have the same effective reach for their
  Phillip> offering, or conversely will require a greater investment of
  Phillip> effort to support various packaging systems.



So, this would seem to imply that installation of eggs is similar to
using PEAR or CPAN?


Not at the level I think you mean.  Apart from the .pth file, and any 
scripts, each egg has its *own*, 100% encapsulated file or directory, for 
example, which is quite different from at least CPAN.  (I don't know 
anything about PEAR.)


One thing in particular is significantly different: eggs have runtime 
discovery and introspection of metadata and dependencies.  It would be more 
appropriate to compare them with Eclipse plugins or "OSGi bundles" for 
Java.  (These are an enhanced jar format with dependency information, 
version info, platform metadata, etc.)




Adding a language-specific mechanism simply causes problems, with stray
files installed into directories "owned" by a .deb package, versions of
CPAN/PEAR-installed packages drifting out of date with the interpreter
and standard library, and just the cognitive load of needing to deal
with something other than apt-get.


Eggs cannot overwrite each other's contents, or indeed anything else other 
than scripts (which can be directed to individual package directories ala 
Stow if you prefer).  Eggs carry the Python version number they are built for.


In addition, eggs do not have to be installed in system library 
directories; an application can simply dump all its eggs and its main 
script in a single directory, and then run from there without relying on 
system packages at all.  The egg runtime identifies which versions of which 
eggs are needed to satisfy the dependencies when the script runs.




  My experiences with CPAN/PEAR
packages have been universally bad, and I now try very, very hard to use
nothing except apt/dpkg.


That's certainly understandable, but comparison with CPAN is definitely 
inappropriate, since many of the issues that can exist with CPAN (or the 
bare distutils in Python's case) simply cannot exist with .egg files and 
directories.  For example, even if a user ran easy_install as root and 
installed a new version of a package, the older .egg would still remain on 
the system, and none of the files in the new egg would overwrite the older 
egg's files, since eggs are installed as either a zipfile or directory, 
named for the package/version/python-version/platform.  (Note, by the way, 
that this means you can actualy install .eggs for multiple architectures in 
the same directory and get away with it.)


Let's say the user installed SomeEgg-1.2, replacing the system-installed 
SomeEgg-1.1. If they want to put things back the way they were, they need 
only run "easy_install SomeEgg==1.1", which will find the still-untouched 
SomeEgg-1.1 sitting where it always was, and this will rewrite the .pth 
file to make SomeEgg-1.1 the active version again.  Meanwhile, whatever 
program the user installed that needed SomeEgg-1.2 will most likely 
continue to work, as long as it's using the egg dependency machinery to get 
at it.


Of course, the user who's installing some program that needs newer packages 
than are offered by the packaging vendor can simply designate another 
installation directory, and tell EasyInstall to put any new eggs and 
scripts there, instead of adding them to the main system.  They are then 
nicely isolated from any system-level changes.




I understand that from a Python-only perspective eggs might have a bunch
of ease-of-use advantages, but from my point of view I'd suggest it's
better that the developer (or Debian packager) takes the trouble to make
it work with dpkg so all Debian users get to maintain the consistency
they're used to.


Which is all very well and good, except there are plenty of packaging 
systems besides Debian, and platforms that don't even have anything 
resembling a packaging system.  (And likely plenty of Python developers 
who've never heard of Debian, and a larger number who've never used it.)


In any case, the current discussion is more about the issue of providing 
metadata so that Python developers can *tell* when the packaging system 
provides a usable version of something, without having to write tools for 
every packaging system in existence.  Providing the .egg-info directory 
with Debian-installed packages is a reasonable solution for offering 
system-provided packages, but without using the .egg file/directory 
formats.  (The irony here is that the solution perceived as more desirable 
here, is the one that *requires* the package maintainers to avoid 
inter-project conflict

.egg in Debian summary?

2005-11-22 Thread Bob Tanner
Bob Tanner wrote:

>> I don't think Debian should use the egg structure. It apparently relies
>> on building a long sys.path (even though through only a single .pth
>> file);
> 
> I'm not sure of how .eggs are implemented, but I'm going to cross-post
> this info to the python-distutils mailing list.

Read and re-read the complete thread regarding .eggs in Debian and I cannot
tell if any progress has been made.

Still in the discussion/fact-finding stage?

As "just a package maintainer" I was looking for the "options" to move
forward. Looking at the thread, I think these are the options (skipping the
pro's and con's for now):

1. Do nothing, go with the status quo as documented in the Debian python
policy, which is no .egg's and unpackage everything into a sub-directory of
site-packages.

2. Investigate easydeb 

3. Using Phillip's .egg-info solution


Any others?
-- 
Bob Tanner <[EMAIL PROTECTED]>  | Phone : (952)943-8700
http://www.real-time.com, Minnesota, Linux | Fax   : (952)943-8500
Key fingerprint = AB15 0BDF BCDE 4369 5B42  1973 7CF1 A709 2CC1 B288


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]