Module/package hierarchy and its separation from file structure

2008-01-23 Thread Peter Schuller
e in common. But feel free to substitude for something else (a Zoo
say).

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-23 Thread Peter Schuller
>> I do *not* want to simply break out X into org.lib.animal.x, and
>> have org.lib.animal import org.lib.animal.x.X as X.
>
> Nevertheless, that seems the best (indeed, the Pythonic) solution to
> your problem as stated. Rather than just shooting it down, we'll have
> to know more about ehat actual problem you're trying to solve to
> understand why this solution doesn't fit.

That is exactly what my original post was trying very hard to
explain. The problem is the discrepancy that I described between the
organization desired in terms of file system structure, and the
organization required in terms of module hierarchy. The reason it is a
problem is that, by default, there is an (in my opinion) too strong
connection between file system structure and module hierarchy in
Python.

>> While this naively solves the problem of being able to refer to X as
>> org.lib.animal.X, the solution is anything but consistent because
>> the *identity* of X is still org.lib.animal.x.X.
>
> The term "identity" in Python means something separate from this
> concept; you seem to mean "the name of X".

Not necessarily. In part it is the name, in that __name__ will be
different. But to the extent that calling code can potentially import
them under differents names, it's identity. Because importing the same
module under two names results in two distinct modules (two distinct
module objects) that have no realation with each other. So for
example, if a module has a single global protected by a mutex, there
are suddenly two copies of that. In short: identity matters.

>> Examples of way this breaks things:
>> 
>>   * X().__class__.__name__ gives unexpected results.
>
> Who is expecting them otherwise, and why is that a problem?

Depends on situation. One example is that if your policy is that
instances log using a logger named by the fully qualified name of the
class, than someone importing and using x.y.z.Class will expect to be
able to grep for x.y.z.Class in the output of the log file.

>>   * Automatically generated documentation will document using the
>>   "real" package name.
>
> Here I lose all track of what problem you're trying to solve. You want
> the documentation to say exactly where the class "is" (by name), but
> you don't want the class to actually be defined at that location? I
> can't make sense of that, so probably I don't understand the
> requirement.

You are baffled that what I seem to want is that the definition of the
class (file on disk) be different from the location inferred by the
module name. Well, this is *exactly* what I want because, like I said,
I do not want the strong connection beteween file system structure and
module hierarchy. The fact that this connection exists, is what is
causing my problems.

Please note that this is not any kind of crazy-brained idea; lots of
languages have absolutely zero relationship between file location and
modules/namespaces.

I realize that technically Python does not have this either. Like I
said in the original post, I do realize that I can override __import__
with any arbitrary function, and/or do magic in __init__. But I also
did not want to resort to hacks, and would prefer that there be some
kind of well-established solution to the problem.

Although I was originally hesitant to use an actual example for fear
of giving the sense that I was trying to start a language war, your
answer above prompts me to do so anyway, to show in concrete terms
what I mean, for those that wonder why/how it would work.

So for example, in Ruby, there is no problem having:

File monkey.rb:

module Org
  module Lib
module Animal
  class Monkey ...
..
  end
end
  end
end

File tiger.rb:

module Org
  module Lib
module Animal
  class Tiger ...
..
  end
end
  end
end

This is possible because the act of addressing code to be loaded into
the interpreter is not connected to the namespace/module system, but
rather to the file system.

Some languages avoid (but does not eliminate) the problem I am having
without having this disconnect. For example, Java does have a strong
connection between file system structure and class names. However the
critical difference is that in Java, everything is modeled around
classes, and class names map directly to the file system structure. So
in Java, you would have the class

   org.lib.animal.Monkey

in

   /org/lib/animal/Monkey.java

and

   org.lib.animal.Tiger

in

   /org/lib/animal/Tiger.java

In other words, introducing a separate file does not introduce a new
package. This works well as long as you are fine with having
everything related to a class in the same file.

The problem is that with Python, everything is not a classes, and a
file translates to a module, not a class. So you cannot have your
source in differe

Re: Module/package hierarchy and its separation from file structure

2008-01-24 Thread Peter Schuller
>> Not necessarily. In part it is the name, in that __name__ will be
>> different. But to the extent that calling code can potentially import
>> them under differents names, it's identity. Because importing the same
>> module under two names results in two distinct modules (two distinct
>> module objects) that have no realation with each other. So for
>> example, if a module has a single global protected by a mutex, there
>> are suddenly two copies of that. In short: identity matters.
>
> That's not true. It doesn't matter if you Import  a module several times  
> at different places and with different names, it's always the same module  
> object.

Sorry, this is all my stupidity. I was being daft. When I said
importing under different names, I meant exactly that. As in, applying
hacks to import a module under a different name by doing it relative
to a different root directory. This is however not what anyone is
suggesting in this discussion. I got my wires crossed. I fully
understand that "import x.y.z" or "import x.y.z as B", and so one do
not affect the identity of the module.

> Ok, there is one exception: the main script is loaded as __main__, but if  
> you import it using its own file name, you get a duplicate module.
> You could confuse Python adding a package root to sys.path and doing  
> imports from inside that package and from the outside with different  
> names, but... just don't do that!

Right :)

> I don't really understand what your problem is exactly, but I think you  
> don't require any __import__ magic or arcane hacks. Perhaps the __path__  
> package attribute may be useful to you. You can add arbitrary directories  
> to this list, which are searched for submodules of the package. This way  
> you can (partially) decouple the file structure from the logical package  
> structure. But I don't think it's a good thing...

That sounds useful if I want to essentially put the contents of a
directory somewhere else, without using a symlink. In this case my
problem is more related to the "file == module" and "directory ==
module" semantics, since I want to break contents in a single module
out into several files.

> Isn't org.lib.animal a package, reflected as a directory on disk? That's  
> the same both for Java and Python. Monkey.py and Tiger.py would be modules  
> inside that directory, just like Monkey.java and Tiger.java. Aren't the  
> same thing?

No, because in Java Monkey.java is a class. So we have class Monkey in
package org.lib.animal. In Python we would have class Monkey in module
org.lib.animal.monkey, which is redundant and does not reflect the
intended hierarchy. I have to either live with this, or put Monkey in
.../animal/__init__.py. Neither option is what I would want, ideally.

Java does still suffer from the same problem since it forces "class ==
file" (well, "public class == file"). However it is less of a problem
since you tend to want to keep a single class in a single file, while
I have a lot more incentive to split up a module into different files
(because you may have a lot of code hiding behind the public interface
of a module).

So essentially, Java and Python have the same problem, but certain
aspects of Java happens to mitigate the effects of it. Languages like
Ruby do not have the problem at all, because the relationship between
files and modules is non-existent.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-29 Thread Peter Schuller
> You can reassign the class's module:
>
> from org.lib.animal.monkey import Monkey
> Monkey.__module__ = 'org.lib.animal'
>
>
> (Which, I must admit, is not a bad idea in some cases.)

Is there a sense whether this is truly a supported way of doing this,
in terms of not running into various unintended side-effects? One
example would be sys.modules that I mentioned in the previous
post. Another, possibly related, might be interaction with the import
keyword and its implementation.

I will probably have to read up more on the semantics of __import__
and related machinery.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-29 Thread Peter Schuller
> You can also put, in animal/__init__.py:
>  from monkey import Monkey
> and now you can refer to it as org.lib.animal.Monkey, but keep the  
> implementation of Monkey class and all related stuff into  
> .../animal/monkey.py

The problem is that we are now back to the identity problem. The class
won't actually *BE* org.lib.animal.Monkey. Perhaps manipulating
__module__ is enough; perhaps not (for example, what about
sys.modules?). Looks like I'll just live with putting more than I
would like in the same file.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-30 Thread Peter Schuller
>> The problem is that we are now back to the identity problem. The class
>> won't actually *BE* org.lib.animal.Monkey.
>
> The usage is the same; it works in all cases once you redefine
> __module__.  Who cares what it really is?

The cases I listed were just examples. My point was that I wanted it
to *be* the right class, to avoid unintended consequences. If I knew
what all those possible consequences were, there would not be a
problem to begin with.

The other follow-up to your E-Mail points out a possible problem for
example. I would not have come up with that, but that does not mean
the effect does not exist. And committing to a solution that "seems to
work", only to break massively for some particular use case in the
future, is exactly why I don't want a "hack" for a solution.

I don't know Python internals enough to state of believe with any
authority wither, let's say, stomping __module__ and hacking
sys.modules would be enough to *truly* do it correctly in a proper way
such that it is entirely transparent. This is why I care about whether
it truly changes the real identity of the class; it's not about
satisfying my particular list of examples (because they *were* just
examples).

> Whatever.  ISTM you came here looking for a particular means and not a
> particular end.

My particular preferred end is to be able to separate file hierarchy
from module hierarchy without causing unforseen consequences. This was
the stated goal all along.

> Python already has the power to meet your stated
> needs, but you won't use that solution because it's "hacky".
> Apparently all you really wanted was the loosened file structure in
> the first place.

Yes, or failing that an alternative that mitigates the problem. And it
*is* hacky, in my opinion, if things break as a result of it (such as
the other poster's inspect example).

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-30 Thread Peter Schuller
> Well, all I will say is that many people on this list, myself
> included, do know Python internals, and we use the method we've been
> suggesting here, without problems.

Ok. That is useful to know (that it is being done in practice without
problems).

Thanks!

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Module/package hierarchy and its separation from file structure

2008-01-30 Thread Peter Schuller
> It what sense will it not be? Why do you care so much about where the 
> source code for Monkey is defined? If you actually want to read the 
> source, you might need to follow the chain from "animal", see that Monkey 
> is imported from "monkey", and go look at that. But the rest of the time, 
> why would you care?
>
> There is a very good reason to care *in practice*: if there is code out 
> there that assumes that the source code from Monkey is in the file it was 
> found in. In practice, you might be stuck needing to work around that. 
> But that's not a good reason to care *in principle*. In principle, the 
> actual location of the source code should be an implementation detail of 
> which we care nothing. It's possible that the source for Monkey doesn't 

Exactly. I *DON'T* want anything to depend on the physical location on disk.
That was exactly what I was after from the beginning; a total separation of
location on disk from the location in the module hiearachy. As you say, the
location of the source should be an implementation detail. That is exactly
what I am after.

I'll have a closer look at the suggested practice of modifying __module__.

For this particular use case we probably won't end up doing that, but it
may come to be useful in the future.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

-- 
http://mail.python.org/mailman/listinfo/python-list