[Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg

  BRIEF INTRODUCTION:  I've been using python since the early 1.X
releases.  Mostly for application development.  On occasion I've
contributed bits to the core:

> grep Romberg Misc/ACKS 
Mike Romberg

  I've recently ported a large application to python3 (it started life
as using 1.1 so it has been a long road for this codebase).  The one
big killer feature of python3 I'm attempting to use is implicit
namespace packages.  But they are broken with the zipimport.c module.

  
  It seems that zipimport.c never worked with these as it is
comparing paths in the form 'a.b.c' to paths in the form 'a/b/c'.  I
created a patch that fixes this and makes zipimport work exactly the
same as a standard filesystem import.  I was getting my patch ready to
submit when I found that this problem has already been reported:

https://bugs.python.org/issue17633

  Is there anything I can do to help fix this issue?  I could polish
up my patch create test cases and submit them.  But it looks like the
above patch does the same thing and is in "the process".  But it has
been "in the process" for three years.  What else needs to be done?
I'll help if I can.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Brett Cannon
FIrst off, Mike, sorry about the bug. We have unfortunately let zipimport
get into a sorry state that has made no one want to work on the code
anymore. That being said, I opened https://bugs.python.org/issue25711 to
specifically try to fix this issue once and for all and along the way
modernize zipimport by rewriting it from scratch to be more maintainable
(or whatever the module is named in case we break backwards-compatibility).

At this point the best option might be, Mike, if you do a code review for
https://bugs.python.org/issue17633, even if it is simply a LGTM. I will
then personally make sure the approved patch gets checked in for Python 3.6
in case the rewrite of zipimport misses the release.

On Sat, 2 Jan 2016 at 11:35  wrote:

>
>   BRIEF INTRODUCTION:  I've been using python since the early 1.X
> releases.  Mostly for application development.  On occasion I've
> contributed bits to the core:
>
> > grep Romberg Misc/ACKS
> Mike Romberg
>
>   I've recently ported a large application to python3 (it started life
> as using 1.1 so it has been a long road for this codebase).  The one
> big killer feature of python3 I'm attempting to use is implicit
> namespace packages.  But they are broken with the zipimport.c module.
>
>
>   It seems that zipimport.c never worked with these as it is
> comparing paths in the form 'a.b.c' to paths in the form 'a/b/c'.  I
> created a patch that fixes this and makes zipimport work exactly the
> same as a standard filesystem import.  I was getting my patch ready to
> submit when I found that this problem has already been reported:
>
> https://bugs.python.org/issue17633
>
>   Is there anything I can do to help fix this issue?  I could polish
> up my patch create test cases and submit them.  But it looks like the
> above patch does the same thing and is in "the process".  But it has
> been "in the process" for three years.  What else needs to be done?
> I'll help if I can.
>
> Mike
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg

-- 
> "Brett" == Brett Cannon  writes:

> I opened
> https://bugs.python.org/issue25711 to specifically try to
> fix this issue once and for all and along the way modernize
> zipimport by rewriting it from scratch to be more
> maintainable

  Every time I read about impementing a custom loader:

https://docs.python.org/3/library/importlib.html

  I've wondered why python does not have some sort of virtual
filesystem layer to deal with locating modules/packages/support
files.   Virtual file systems seem like a good way to store data on a
wide range of storage devices.

  A VFSLoader object would interface with importlib and deal with:

  - implementing a finder and loader

  - Determine the actual type of file to load (.py, .pyc, .pyo,
__pycache__, etc).

  - Do all of it's work by calling virtual functions such as:
 * listdir(path)
 * read(path)
 * stat(path)  # for things like mtime, size, etc
 * write(path, data)  # not all VFS implement this

  Then things like a ziploader would just inherit from VFSLoader
implement the straight forward methods and everything should "just
work" :).  I see no reason why every type of loader (real filesystem,
http, ssh, sql database, etc) would not do this as well.  Leave all
the details such as implicit namespace packages, presence of
__init__.py[oc] files, .pth files, etc in one single
location and put the details on how to interact with the actual
storage device in leaf classes which don't know or care about the high
level details.  They would not even know they are loading python
modules.  It is just blobs of data to them.

  I may try my hand at creating a prototype of this for just the
zipimporter and see how it goes.

  
> At this point the best option might be, Mike, if you do a
> code review for https://bugs.python.org/issue17633, even if
> it is simply a LGTM. I will then personally make sure the
> approved patch gets checked in for Python 3.6 in case the
> rewrite of zipimport misses the release.

  Cool.  I'll see what I can do.  I was having a bit of trouble with
the register/login part of the bug tracker.  Which is why I came
here.  I'll battle with it one more time and see if I can get it to
log me in.

  The patch should be fairly simple.  In a nutshell it just does a:

  path.replace('.', '/') + '/' in two locations.  One where it checks
for the path being a directory entry in the zip file and the second to
return an implicit namespace path (instead of not found) if it is a
match.   I'll check the patch on the tracker and see if it still works
with 3.5.1.  If not I'll attach mine.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Guido van Rossum
On Sat, Jan 2, 2016 at 3:26 PM,  wrote:

>
> --
> > "Brett" == Brett Cannon  writes:
>
> > I opened
> > https://bugs.python.org/issue25711 to specifically try to
> > fix this issue once and for all and along the way modernize
> > zipimport by rewriting it from scratch to be more
> > maintainable
>
>   Every time I read about impementing a custom loader:
>
> https://docs.python.org/3/library/importlib.html
>
>   I've wondered why python does not have some sort of virtual
> filesystem layer to deal with locating modules/packages/support
> files.   Virtual file systems seem like a good way to store data on a
> wide range of storage devices.
>

Yeah, but most devices already implement a *real* filesystem, so the only
time the VFS would come in handy would be for zipfiles, where we already
have a solution.


>   A VFSLoader object would interface with importlib and deal with:
>
>   - implementing a finder and loader
>
>   - Determine the actual type of file to load (.py, .pyc, .pyo,
> __pycache__, etc).
>
>   - Do all of it's work by calling virtual functions such as:
>  * listdir(path)
>  * read(path)
>  * stat(path)  # for things like mtime, size, etc
>  * write(path, data)  # not all VFS implement this
>

Emulating a decent filesystem API requires you to implement functionality
that would never be used by an import loader (write() is an example -- many
of the stat() fields are another example). So it would just be overkill.


>   Then things like a ziploader would just inherit from VFSLoader
> implement the straight forward methods and everything should "just
> work" :).  I see no reason why every type of loader (real filesystem,
> http, ssh, sql database, etc) would not do this as well.


All those examples except "real filesystem" are of very limited practical
value.


> Leave all
> the details such as implicit namespace packages, presence of
> __init__.py[oc] files, .pth files, etc in one single
> location and put the details on how to interact with the actual
> storage device in leaf classes which don't know or care about the high
> level details.  They would not even know they are loading python
> modules.  It is just blobs of data to them.
>

Actually the import loader API is much more suitable and less work to
implement than a VFS.


>   I may try my hand at creating a prototype of this for just the
> zipimporter and see how it goes.
>

That would nevertheless be an interesting exercise -- I hope you do it and
report back.


> > At this point the best option might be, Mike, if you do a
> > code review for https://bugs.python.org/issue17633, even if
> > it is simply a LGTM. I will then personally make sure the
> > approved patch gets checked in for Python 3.6 in case the
> > rewrite of zipimport misses the release.
>
>   Cool.  I'll see what I can do.  I was having a bit of trouble with
> the register/login part of the bug tracker.  Which is why I came
> here.  I'll battle with it one more time and see if I can get it to
> log me in.
>

If you really can't manage to comment in the tracker (which sounds unlikely
-- many people have succeeded :-) you can post your LGTM on the specific
patch here.


>   The patch should be fairly simple.  In a nutshell it just does a:
>
>   path.replace('.', '/') + '/' in two locations.  One where it checks
> for the path being a directory entry in the zip file and the second to
> return an implicit namespace path (instead of not found) if it is a
> match.   I'll check the patch on the tracker and see if it still works
> with 3.5.1.  If not I'll attach mine.


Well, Brett would like to see your feedback on the specific patch. Does it
work for you?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Brett Cannon
I just wanted to quickly say that Guido's observation as to how a VFS is
overkill is right. Imagine implementing a loader using sqlite and you
quickly realize that doing a dull VFS is more than necessary to implement
what import needs to function.

On Sat, 2 Jan 2016, 19:42 Guido van Rossum  wrote:

> On Sat, Jan 2, 2016 at 3:26 PM,  wrote:
>
>>
>> --
>> > "Brett" == Brett Cannon  writes:
>>
>> > I opened
>> > https://bugs.python.org/issue25711 to specifically try to
>> > fix this issue once and for all and along the way modernize
>> > zipimport by rewriting it from scratch to be more
>> > maintainable
>>
>>   Every time I read about impementing a custom loader:
>>
>> https://docs.python.org/3/library/importlib.html
>>
>>   I've wondered why python does not have some sort of virtual
>> filesystem layer to deal with locating modules/packages/support
>> files.   Virtual file systems seem like a good way to store data on a
>> wide range of storage devices.
>>
>
> Yeah, but most devices already implement a *real* filesystem, so the only
> time the VFS would come in handy would be for zipfiles, where we already
> have a solution.
>
>
>>   A VFSLoader object would interface with importlib and deal with:
>>
>>   - implementing a finder and loader
>>
>>   - Determine the actual type of file to load (.py, .pyc, .pyo,
>> __pycache__, etc).
>>
>>   - Do all of it's work by calling virtual functions such as:
>>  * listdir(path)
>>  * read(path)
>>  * stat(path)  # for things like mtime, size, etc
>>  * write(path, data)  # not all VFS implement this
>>
>
> Emulating a decent filesystem API requires you to implement functionality
> that would never be used by an import loader (write() is an example -- many
> of the stat() fields are another example). So it would just be overkill.
>
>
>>   Then things like a ziploader would just inherit from VFSLoader
>> implement the straight forward methods and everything should "just
>> work" :).  I see no reason why every type of loader (real filesystem,
>> http, ssh, sql database, etc) would not do this as well.
>
>
> All those examples except "real filesystem" are of very limited practical
> value.
>
>
>> Leave all
>> the details such as implicit namespace packages, presence of
>> __init__.py[oc] files, .pth files, etc in one single
>> location and put the details on how to interact with the actual
>> storage device in leaf classes which don't know or care about the high
>> level details.  They would not even know they are loading python
>> modules.  It is just blobs of data to them.
>>
>
> Actually the import loader API is much more suitable and less work to
> implement than a VFS.
>
>
>>   I may try my hand at creating a prototype of this for just the
>> zipimporter and see how it goes.
>>
>
> That would nevertheless be an interesting exercise -- I hope you do it and
> report back.
>
>
>> > At this point the best option might be, Mike, if you do a
>> > code review for https://bugs.python.org/issue17633, even if
>> > it is simply a LGTM. I will then personally make sure the
>> > approved patch gets checked in for Python 3.6 in case the
>> > rewrite of zipimport misses the release.
>>
>>   Cool.  I'll see what I can do.  I was having a bit of trouble with
>> the register/login part of the bug tracker.  Which is why I came
>> here.  I'll battle with it one more time and see if I can get it to
>> log me in.
>>
>
> If you really can't manage to comment in the tracker (which sounds
> unlikely -- many people have succeeded :-) you can post your LGTM on the
> specific patch here.
>
>
>>   The patch should be fairly simple.  In a nutshell it just does a:
>>
>>   path.replace('.', '/') + '/' in two locations.  One where it checks
>> for the path being a directory entry in the zip file and the second to
>> return an implicit namespace path (instead of not found) if it is a
>> match.   I'll check the patch on the tracker and see if it still works
>> with 3.5.1.  If not I'll attach mine.
>
>
> Well, Brett would like to see your feedback on the specific patch. Does it
> work for you?
>
> --
> --Guido van Rossum (python.org/~guido)
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Brett Cannon
On Sat, 2 Jan 2016, 20:42 Brett Cannon  wrote:

> I just wanted to quickly say that Guido's observation as to how a VFS is
> overkill is right. Imagine implementing a loader using sqlite and you
> quickly realize that doing a dull VFS is more
>
"dull" -> "full"

than necessary to implement what import needs to function.
>
> On Sat, 2 Jan 2016, 19:42 Guido van Rossum  wrote:
>
>> On Sat, Jan 2, 2016 at 3:26 PM,  wrote:
>>
>>>
>>> --
>>> > "Brett" == Brett Cannon  writes:
>>>
>>> > I opened
>>> > https://bugs.python.org/issue25711 to specifically try to
>>> > fix this issue once and for all and along the way modernize
>>> > zipimport by rewriting it from scratch to be more
>>> > maintainable
>>>
>>>   Every time I read about impementing a custom loader:
>>>
>>> https://docs.python.org/3/library/importlib.html
>>>
>>>   I've wondered why python does not have some sort of virtual
>>> filesystem layer to deal with locating modules/packages/support
>>> files.   Virtual file systems seem like a good way to store data on a
>>> wide range of storage devices.
>>>
>>
>> Yeah, but most devices already implement a *real* filesystem, so the only
>> time the VFS would come in handy would be for zipfiles, where we already
>> have a solution.
>>
>>
>>>   A VFSLoader object would interface with importlib and deal with:
>>>
>>>   - implementing a finder and loader
>>>
>>>   - Determine the actual type of file to load (.py, .pyc, .pyo,
>>> __pycache__, etc).
>>>
>>>   - Do all of it's work by calling virtual functions such as:
>>>  * listdir(path)
>>>  * read(path)
>>>  * stat(path)  # for things like mtime, size, etc
>>>  * write(path, data)  # not all VFS implement this
>>>
>>
>> Emulating a decent filesystem API requires you to implement functionality
>> that would never be used by an import loader (write() is an example -- many
>> of the stat() fields are another example). So it would just be overkill.
>>
>>
>>>   Then things like a ziploader would just inherit from VFSLoader
>>> implement the straight forward methods and everything should "just
>>> work" :).  I see no reason why every type of loader (real filesystem,
>>> http, ssh, sql database, etc) would not do this as well.
>>
>>
>> All those examples except "real filesystem" are of very limited practical
>> value.
>>
>>
>>> Leave all
>>> the details such as implicit namespace packages, presence of
>>> __init__.py[oc] files, .pth files, etc in one single
>>> location and put the details on how to interact with the actual
>>> storage device in leaf classes which don't know or care about the high
>>> level details.  They would not even know they are loading python
>>> modules.  It is just blobs of data to them.
>>>
>>
>> Actually the import loader API is much more suitable and less work to
>> implement than a VFS.
>>
>>
>>>   I may try my hand at creating a prototype of this for just the
>>> zipimporter and see how it goes.
>>>
>>
>> That would nevertheless be an interesting exercise -- I hope you do it
>> and report back.
>>
>>
>>> > At this point the best option might be, Mike, if you do a
>>> > code review for https://bugs.python.org/issue17633, even if
>>> > it is simply a LGTM. I will then personally make sure the
>>> > approved patch gets checked in for Python 3.6 in case the
>>> > rewrite of zipimport misses the release.
>>>
>>>   Cool.  I'll see what I can do.  I was having a bit of trouble with
>>> the register/login part of the bug tracker.  Which is why I came
>>> here.  I'll battle with it one more time and see if I can get it to
>>> log me in.
>>>
>>
>> If you really can't manage to comment in the tracker (which sounds
>> unlikely -- many people have succeeded :-) you can post your LGTM on the
>> specific patch here.
>>
>>
>>>   The patch should be fairly simple.  In a nutshell it just does a:
>>>
>>>   path.replace('.', '/') + '/' in two locations.  One where it checks
>>> for the path being a directory entry in the zip file and the second to
>>> return an implicit namespace path (instead of not found) if it is a
>>> match.   I'll check the patch on the tracker and see if it still works
>>> with 3.5.1.  If not I'll attach mine.
>>
>>
>> Well, Brett would like to see your feedback on the specific patch. Does
>> it work for you?
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> ___
>> Python-Dev mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg
> " " == Brett Cannon  writes:

 > I just wanted to quickly say that Guido's observation as to how
 > a VFS is overkill is right. Imagine implementing a loader using
 > sqlite and you quickly realize that doing a dull VFS is more
 > than necessary to implement what import needs to function.

  I fear I've made a poor choice in calling this abstract class a VFS
(I'm terrible with names).  I'm not thinking of anything along the
lines of a full file system that supports open(), seek(), read() and
everything else.   That for sure would be overkill and way more
complicated than it needs to be.

  All I'm really thinking about is a simple abstract interface that is
used by an importer to actually locate and retrieve the binary objects
that will be loaded.  For the simple case I think just two methods
would/could server a read only "blob/byte database":

  listdir(path)  # returns an iterable container of "files"/"dirs" found
 # at path
 
  get(path)  # returns a bytes object for the given path

  I think with those two virtual calls a more high level import layer
can locate and retrieve modules to be loaded without even knowing
where they came from.

  The higher level would know about things such as the difference
between .py and .pyc "files" or the possible existance of __pycache__
directories and what may be found in them.  Right now the zipimporter
contains a list of file extensions to try and load (and in what
order).  It also lacks any knowledge of __pycache__ directories (which
is one of the outstanding bugs with it).   It just seems to me that
this sorta logic would be better moved to a higher layer and the zip
layer just translates paths into reads of byte blobs.

  I mentioned write()/put() for two reasons:

  1)  When I import a .py file then a .pyc file is created on my
  filesystem.  I don't really know what piece of code created it.
  But a write to the filesystem (assuming it is writeable and
  permissions set etc) occurs.   It might be nice for other
  storage systems (zip, sql, etc) could optionally support this.
  They could if the code that crated the .pyc simply did a put()
  to the object that pulled in the .py file.  The interface is
  expanded by two calls (put() and delete()).

  2)  Integration with package data.  I know there are
  modules/packages out there that help a module try and locate
  data files that may be associated with a package.  I think it
  would be kinda cool for a module to instead be able to get a
  handle to the abstract class that loaded it.  it could then use
  the same listdir() get() and possibly write methods the importer
  did.  The writing bit of this may or may not be a good idea :)


  Anyway, hope I did not muddy the waters.  I was just thinking a bit
out loud and none of this may live past my own experiments.   I was/am
just trying to think of why the importers like the zipimporter don't
work like a filesystem importer and how they would be cleaner if they
just dealt with paths and byte blobs to store/get based on those paths.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Nick Coghlan
On 3 January 2016 at 16:32, Nick Coghlan  wrote:
> For folks that are interested in that, folks that aren't following
> import-sig in addition to python-dev may want to take a look at
> Brett's design for the importlib.resources API:
> http://nbviewer.jupyter.org/gist/brettcannon/9c4681a77a7fa09c5347

Sorry, I meant to include a link to the import-sig thread as well:
https://mail.python.org/pipermail/import-sig/2015-November/001041.html

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread Nick Coghlan
On 3 January 2016 at 15:29,   wrote:
>> " " == Brett Cannon  writes:
>
>  > I just wanted to quickly say that Guido's observation as to how
>  > a VFS is overkill is right. Imagine implementing a loader using
>  > sqlite and you quickly realize that doing a dull VFS is more
>  > than necessary to implement what import needs to function.
>
>   I fear I've made a poor choice in calling this abstract class a VFS
> (I'm terrible with names).  I'm not thinking of anything along the
> lines of a full file system that supports open(), seek(), read() and
> everything else.   That for sure would be overkill and way more
> complicated than it needs to be.
>
>   All I'm really thinking about is a simple abstract interface that is
> used by an importer to actually locate and retrieve the binary objects
> that will be loaded.  For the simple case I think just two methods
> would/could server a read only "blob/byte database":
>
>   listdir(path)  # returns an iterable container of "files"/"dirs" found
>  # at path
>
>   get(path)  # returns a bytes object for the given path

We already have the latter:
https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceLoader.get_data

It's the former that has taken a while to get to, as the 3rd party
pkg_resources module (part of setuptools) already provides a pragmatic
API that also has the virtue of being compatible with both Python 2 &
3, and there a few subtleties related to the possible use of temporary
files that make a robust API design trickier than it first appears to
be.

For folks that are interested in that, folks that aren't following
import-sig in addition to python-dev may want to take a look at
Brett's design for the importlib.resources API:
http://nbviewer.jupyter.org/gist/brettcannon/9c4681a77a7fa09c5347

Cheers,
Nick.

P.S. If anyone actually *does* want a full "virtual file system layer"
API for non-filesystem storage locations:
http://docs.pyfilesystem.org/en/latest/filesystems.html

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com