[Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg

  BRIEF INTRODUCTION:  I've been using python since the early 1.X
releases.  Mostly for application development.  On occasion I've
contributed bits to the core:

> grep Romberg Misc/ACKS 
Mike Romberg

  I've recently ported a large application to python3 (it started life
as using 1.1 so it has been a long road for this codebase).  The one
big killer feature of python3 I'm attempting to use is implicit
namespace packages.  But they are broken with the zipimport.c module.

  
  It seems that zipimport.c never worked with these as it is
comparing paths in the form 'a.b.c' to paths in the form 'a/b/c'.  I
created a patch that fixes this and makes zipimport work exactly the
same as a standard filesystem import.  I was getting my patch ready to
submit when I found that this problem has already been reported:

https://bugs.python.org/issue17633

  Is there anything I can do to help fix this issue?  I could polish
up my patch create test cases and submit them.  But it looks like the
above patch does the same thing and is in "the process".  But it has
been "in the process" for three years.  What else needs to be done?
I'll help if I can.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg

-- 
> "Brett" == Brett Cannon  writes:

> I opened
> https://bugs.python.org/issue25711 to specifically try to
> fix this issue once and for all and along the way modernize
> zipimport by rewriting it from scratch to be more
> maintainable

  Every time I read about impementing a custom loader:

https://docs.python.org/3/library/importlib.html

  I've wondered why python does not have some sort of virtual
filesystem layer to deal with locating modules/packages/support
files.   Virtual file systems seem like a good way to store data on a
wide range of storage devices.

  A VFSLoader object would interface with importlib and deal with:

  - implementing a finder and loader

  - Determine the actual type of file to load (.py, .pyc, .pyo,
__pycache__, etc).

  - Do all of it's work by calling virtual functions such as:
 * listdir(path)
 * read(path)
 * stat(path)  # for things like mtime, size, etc
 * write(path, data)  # not all VFS implement this

  Then things like a ziploader would just inherit from VFSLoader
implement the straight forward methods and everything should "just
work" :).  I see no reason why every type of loader (real filesystem,
http, ssh, sql database, etc) would not do this as well.  Leave all
the details such as implicit namespace packages, presence of
__init__.py[oc] files, .pth files, etc in one single
location and put the details on how to interact with the actual
storage device in leaf classes which don't know or care about the high
level details.  They would not even know they are loading python
modules.  It is just blobs of data to them.

  I may try my hand at creating a prototype of this for just the
zipimporter and see how it goes.

  
> At this point the best option might be, Mike, if you do a
> code review for https://bugs.python.org/issue17633, even if
> it is simply a LGTM. I will then personally make sure the
> approved patch gets checked in for Python 3.6 in case the
> rewrite of zipimport misses the release.

  Cool.  I'll see what I can do.  I was having a bit of trouble with
the register/login part of the bug tracker.  Which is why I came
here.  I'll battle with it one more time and see if I can get it to
log me in.

  The patch should be fairly simple.  In a nutshell it just does a:

  path.replace('.', '/') + '/' in two locations.  One where it checks
for the path being a directory entry in the zip file and the second to
return an implicit namespace path (instead of not found) if it is a
match.   I'll check the patch on the tracker and see if it still works
with 3.5.1.  If not I'll attach mine.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-02 Thread mike . romberg
> " " == Brett Cannon  writes:

 > I just wanted to quickly say that Guido's observation as to how
 > a VFS is overkill is right. Imagine implementing a loader using
 > sqlite and you quickly realize that doing a dull VFS is more
 > than necessary to implement what import needs to function.

  I fear I've made a poor choice in calling this abstract class a VFS
(I'm terrible with names).  I'm not thinking of anything along the
lines of a full file system that supports open(), seek(), read() and
everything else.   That for sure would be overkill and way more
complicated than it needs to be.

  All I'm really thinking about is a simple abstract interface that is
used by an importer to actually locate and retrieve the binary objects
that will be loaded.  For the simple case I think just two methods
would/could server a read only "blob/byte database":

  listdir(path)  # returns an iterable container of "files"/"dirs" found
 # at path
 
  get(path)  # returns a bytes object for the given path

  I think with those two virtual calls a more high level import layer
can locate and retrieve modules to be loaded without even knowing
where they came from.

  The higher level would know about things such as the difference
between .py and .pyc "files" or the possible existance of __pycache__
directories and what may be found in them.  Right now the zipimporter
contains a list of file extensions to try and load (and in what
order).  It also lacks any knowledge of __pycache__ directories (which
is one of the outstanding bugs with it).   It just seems to me that
this sorta logic would be better moved to a higher layer and the zip
layer just translates paths into reads of byte blobs.

  I mentioned write()/put() for two reasons:

  1)  When I import a .py file then a .pyc file is created on my
  filesystem.  I don't really know what piece of code created it.
  But a write to the filesystem (assuming it is writeable and
  permissions set etc) occurs.   It might be nice for other
  storage systems (zip, sql, etc) could optionally support this.
  They could if the code that crated the .pyc simply did a put()
  to the object that pulled in the .py file.  The interface is
  expanded by two calls (put() and delete()).

  2)  Integration with package data.  I know there are
  modules/packages out there that help a module try and locate
  data files that may be associated with a package.  I think it
  would be kinda cool for a module to instead be able to get a
  handle to the abstract class that loaded it.  it could then use
  the same listdir() get() and possibly write methods the importer
  did.  The writing bit of this may or may not be a good idea :)


  Anyway, hope I did not muddy the waters.  I was just thinking a bit
out loud and none of this may live past my own experiments.   I was/am
just trying to think of why the importers like the zipimporter don't
work like a filesystem importer and how they would be cleaner if they
just dealt with paths and byte blobs to store/get based on those paths.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-03 Thread mike . romberg
> " " == Brett Cannon  writes:

...

 > So it's possible that some abstraction might be possible, but
 > up to this point there hasn't been a motivating factor for
 > importlib to do this as zipimport is written in C and thus
 > won't benefit from any abstraction that importlib uses in its
 > Python code (hence why zipimport needs to be rewritten). Maybe
 > after we have a zip-backed importer written in Python a common
 > API will fall through and we can abstract that out, but I
 > wouldn't want to guess until the code is written and someone
 > can stare at the two implementations.

  Fair enough.  I too think it is a good idea to make base classes
after their is a need for them and not before.   Some argument could
be made that there is a need now as zipimported modules/packages don't
work exactly the same way as "normal" ones.  But since you plan a
rewrite of zipimport collecting the common stuff after that makes sense.

 > It should also be said that there is nothing requiring that an
 > importer support any concept of a file. Going back to the
 > sqlite database example, you could make it nothing more than a
 > table that maps module name to source code and bytecode:

  Yep.  I was not thinking file either.  I may have said file (again
I'm terrible with names) but I'm thinking an array of bytes with a
string id (which I call a path).  The actual storage could simply be
storing the byte array using the stringID/path.

...

 > That's enough to implement an importer and there is no mention
 > of file paths anywhere. If you really wanted to could do a
 > second table that acted as a database-backed file system for
 > reading package data files, but that is not required to meet
 > the minimum functionality -- and then some thanks to bytecode
 > support -- for an importer.

  Yea the python module name could be viewed as a path
(package.subpackage.module) and stored in a hierarchical way.  Or it
could simply be viewed as a string and stored in some flat database.


  Anyway, back at the ranch...  I've got an extended version of the
patch for issue17633 working on my system.  I've added to the test
case to check for the proper functioning of implicit namespace
packages spread between two zip files.  I still need to add tests for
a namespace package spread between a normal filesystem and a zip
file.  I expect I'll have that done in a day or two.  I'll post it to
the bug tracker.

Mike


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] zipimport.c broken with implicit namespace packages

2016-01-04 Thread mike . romberg
> " " == Brett Cannon  writes:

...

 > It's a reasonable thing to consider, but it would be better to
 > get zipimport fixed for you, then rewritten

  To that end, I've added a patch to the issue tracker:

https://bugs.python.org/issue17633

  My patch is issue17633-3.diff which builds upon issue17633-2.diff in
that it fixes an issue with the enumerated value used by find_loader
(find_loader was returning -1 which was not even a valid enumerated
value).

  I also expanded the test cases for zipimport to cover namespace
packages spread between multiple zip archives and zip archives and
real filesystems.

  Please let me know if there is anything else I can do.

Thanks,

Mike

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread mike . romberg
> " " == Barry Warsaw  writes:

>> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote:

>> I don't know about anyone else, but on my own development
>> systems it is not that unusual for me to *edit* the stdlib
>> files (to add debug prints) while debugging my own programs.
>> Freeze would definitely interfere with that.  I could, of
>> course, install a separate source build on my dev system, but I
>> thought it worth mentioning as a factor.

   [snip]

 > But even with system scripts, I do need to step through them
 > occasionally.  If it were a matter of changing a shebang or
 > invoking the script with a different Python
 > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full
 > unpacked source, that would be fine.

  If the stdlib were to use implicit namespace packages
( https://www.python.org/dev/peps/pep-0420/ ) and the various
loaders/importers as well, then python could do what I've done with an
embedded python application for years.  Freeze the stdlib (or put it
in a zipfile or whatever is fast).  Then arrange PYTHONPATH to first
look on the filesystem and then look in the frozen/ziped storage.

  Normally the filesystem part is empty.   So, modules are loaded from
the frozen/zip area.  But if you wanna override one of the frozen
modules simply copy one or more .py files onto the file system.  I've
been doing this only with modules in the global scope.  But implicit
namespace packages seem to open the door for this with packages.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com