[Frans Englich] ... > My problem, AFAICT, with using os.walk() the usual way, is that in order to > construct the /hierarchial/ XML document, I need to be aware of the directory > depth, and a recursive function handles that nicely; os.walk() simply > concentrates on figuring out paths to all files in a directory, AFAICT.
walk() does a pre- (default) or post-order traversal of a directory tree; that's it. > I guess I could solve it with using os.walk() in a traditional way, by somehow > pushing libxml2 nodes on a stack, after keeping track of the directory levels > etc(string parsing..). Or, one could write ones own recursive directory > parser.. > > My question is: what is the least ugly? What is the /proper/ solution for my > problem? How would you write it in the cleanest way? Well, it's enough just to keep a dict mapping directory path to directory object. That doesn't require parsing anything. For example, here are some dumb base classes (not XML -- the XML part distracts from the essence for me): class HasPath: def __init__(self, path): self.path = path def __lt__(self, other): return self.path < other.path class Directory(HasPath): def __init__(self, path): HasPath.__init__(self, path) self.files = [] # list of File objects self.subdirs = [] # list of sub-Directory objects class File(HasPath): pass Then a generic build_tree function. It accepts optional arguments to specify factory functions for creating directory and file objects. The guts require that a directory object have .path, .files and .subdirs attributes; you probably want to season to taste, to require XML-ish spellings. It returns a directory-ish object, representing the path passed to it: def build_tree(path, Directory=Directory, File=File): top = Directory(path) path2dir = {path: top} for root, dirs, files in os.walk(path): dirobj = path2dir[root] for name in dirs: subdirobj = Directory(os.path.join(root, name)) path2dir[subdirobj.path] = subdirobj dirobj.subdirs.append(subdirobj) for name in files: dirobj.files.append(File(os.path.join(root, name))) return top That looks short and sweet to me. It could be made shorter, but not without losing clarity to my eyes. As a concrete example, here's a subclass of Directory (to be passed as the Directory factory to build_tree) with a display() method that dumps a sensibly indented listing of the tree; note that sort() gets the intended order via the inherited HasPath.__lt__() implementation: class ListingDirectory(Directory): # Display directory tree as a tree, with 4-space indents. # Files listed before subdirectories, both in alphabetical order. # Full path displayed for topmost directory, base names for all # other entries. Directories listed with trailing os.sep. def display(self, level=0): self.subdirs.sort() self.files.sort() name = self.path if level: name = os.path.basename(name) print "%s%s%s" % (' ' * level, name, os.sep) for f in self.files: print "%s%s" % (' ' * (level + 4), os.path.basename(f.path)) for d in self.subdirs: d.display(level + 4) -- http://mail.python.org/mailman/listinfo/python-list