PEP 372 -- Adding an ordered directory to collections

2008-06-16 Thread Armin Ronacher
Abstract


This PEP proposes an ordered dictionary as a new data structure for
the ``collections`` module, called "odict" in this PEP for short.  The
proposed API incorporates the experiences gained from working with
similar implementations that exist in various real-world applications
and other programming languages.


Rationale
=

In current Python versions, the widely used built-in dict type does
not specify an order for the key/value pairs stored.  This makes it
hard to use dictionaries as data storage for some specific use cases.

Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
certain order on iteration.  In those languages, and existing Python
ordered-dict implementations, the ordering of items is defined by the
time of insertion of the key.  New keys are appended at the end, keys
that are overwritten and not moved.

The following example shows the behavior for simple assignments:

>>> d = odict()
>>> d['parrot'] = 'dead'
>>> d['penguin'] = 'exploded'
>>> d.items()
[('parrot', 'dead'), ('penguin', 'exploded')]

That the ordering is preserved makes an odict useful for a couple of
situations:

- XML/HTML processing libraries currently drop the ordering of
  attributes, use a list instead of a dict which makes filtering
  cumbersome, or implement their own ordered dictionary.  This affects
  ElementTree, html5lib, Genshi and many more libraries.

- There are many ordererd dict implementations in various libraries
  and applications, most of them subtly incompatible with each other.
  Furthermore, subclassing dict is a non-trivial task and many
  implementations don't override all the methods properly which can
  lead to unexpected results.

  Additionally, many ordered dicts are implemented in an inefficient
  way, making many operations more complex then they have to be.

- PEP 3115 allows metaclasses to change the mapping object used for
  the class body.  An ordered dict could be used to create ordered
  member declarations similar to C structs.  This could be useful, for
  example, for future ``ctypes`` releases as well as ORMs that define
  database tables as classes, like the one the Django framework ships.
  Django currently uses an ugly hack to restore the ordering of
  members in database models.

- Code ported from other programming languages such as PHP often
  depends on a ordered dict.  Having an implementation of an
  ordering-preserving dictionary in the standard library could ease
  the transition and improve the compatibility of different libraries.


Ordered Dict API


The ordered dict API would be mostly compatible with dict and existing
ordered dicts.  (Note: this PEP refers to the Python 2.x dictionary
API; the transfer to the 3.x API is trivial.)

The constructor and ``update()`` both accept iterables of tuples as
well as mappings like a dict does.  The ordering however is preserved
for the first case:

>>> d = odict([('a', 'b'), ('c', 'd')])
>>> d.update({'foo': 'bar'})
>>> d
collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])

If ordered dicts are updated from regular dicts, the ordering of new
keys is of course undefined again unless ``sort()`` is called.

All iteration methods as well as ``keys()``, ``values()`` and
``items()`` return the values ordered by the the time the key-value
pair was inserted:

>>> d['spam'] = 'eggs'
>>> d.keys()
['a', 'c', 'foo', 'spam']
>>> d.values()
['b', 'd', 'bar', 'eggs']
>>> d.items()
[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]

New methods not available on dict:

``odict.byindex(index)``

Index-based lookup is supported by ``byindex()`` which returns
the key/value pair for an index, that is, the "position" of a
key in the ordered dict.  0 is the first key/value pair, -1
the last.

>>> d.byindex(2)
('foo', 'bar')

``odict.sort(cmp=None, key=None, reverse=False)``

Sorts the odict in place by cmp or key.  This works exactly
like ``list.sort()``, but the comparison functions are passed
a key/value tuple, not only the value.

>>> d = odict([(42, 1), (1, 4), (23, 7)])
>>> d.sort()
>>> d
collections.odict([(1, 4), (23, 7), (42, 1)])

``odict.reverse()``

Reverses the odict in place.

``odict.__reverse__()``

Supports reverse iteration by key.


Questions and Answers
=

What happens if an existing key is reassigned?

The key is not moved but assigned a new value in place.  This is
consistent with existing implementations and allows subclasses to
change the behavior easily::

class movingcollections.odict):
def __setitem__(self, key, value):
self.pop(key, None)
odict.__setitem__(self, key, value)

What happens if keys appear multiple times in the list passed to the
constructor?

The same as for regular dicts: The latter item overrides the
former.  This 

Re: PEP 372 -- Adding an ordered directory to collections

2008-06-18 Thread Armin Ronacher
Martin v. Löwis  v.loewis.de> writes:

> 
> > I think I have lost the thread here, sorry. So I explain again what I
> > mean. I think for this data structure it's important to keep all the
> > normal dict operations at the same speed. If you use a C
> > implementation vaguely similar to my pure python recipe you can
> > perform the del in O(1) too, because pairs are joined in (double)
> > linked list. But such data structure is O(n) to find the n-th item
> > inserted into the sequence.
> 
> Right. So byindex(n) would be O(n) then, right? If so, what's the
> purpose of having that method in the first place?
What's the purpose of having list.insert?

> The PEP doesn't give a rationale, but just proposes that the method
> be there. My guess is that it includes it for performance reasons.
> However, I think the PEP (author) is misguided in assuming that
> making byindex() a method of odict, you get better performance than
> directly doing .items()[n] - which, as you say, you won't.
Without byindex the only way to cherry pick an item is either doing
something like

i = od.iteritems()
for idx in xrange(offset):
value = idx.next()
return value

Or

return od.items()[offset]

One creates tons of unnecessary method calls, the other creates a full
blown list object just to throw it away later.  Both less than optimal
solutions that can be implemented in a more efficient way on the C
layer where one only has to iterate over the linked list offset times
and return the item.  And iteration for that linked list is most likely
something like "for (n = 0; n != offset; ++n) iter = iter->next".

Regards,
Armin

--
http://mail.python.org/mailman/listinfo/python-list

Re: PEP 372 -- Adding an ordered directory to collections

2008-06-19 Thread Armin Ronacher
Small Status update of the changes incorporated in the PEP:

  - The PEP Title was fixed.  Of course it's a dictionary not a
directory :-)

  - A questions and answers section was added that covers some
of the questions raised here and on the original thread on
python-devel.

  - Comparison behavor was documented

  - a note on Python 3 support was added (about keys-views)


Regards,
Armin

--
http://mail.python.org/mailman/listinfo/python-list


Re: New python.org site

2006-03-12 Thread Armin Ronacher
I don't like the new python.org

But i have I (in my mind) nice idea.

Dedicate python.org to the language developers and python interna. And
create a nice small page on go-python.org dedicated to the users. It
should *only* feature a documentation with a comment box on the bottom
of each side, a download section and links to important python pages as
well as a nice designed moinmoin wiki.

Regards,
Armin

-- 
http://mail.python.org/mailman/listinfo/python-list


Try Python!

2006-03-29 Thread Armin Ronacher
Hiho,

One week ago I came across the nice `Try Ruby!`_ demonstration which
features an ajax based ruby console and a 20 minutes ruby tutorial.
I really liked that application and so I started to port that to
python.
Since I got a bit confused by the very complex javascript code I wrote
a
webconsole from scratch.

The result is a very basic python console which behaves like the CLI
one, except that it can't handle `raw_input` or any other method call
trying to access `sys.stdin`.

At the moment the application is multithreaded and evaluated
expressions
in a dict holding the sessions variables of the client connections.

Because of the behaviour the application breaks down easily and isn't
secure. This happens because I haven't finished it yet. Additionally
sessions don't have a timeout so you have to restart the server if it's
eating to much RAM.

If someone is interested in putting up that application on a public
server I can tell the application to spawn from inside XEN hosts and to
use forking instead of the multithreaded approach currently used.

The application is licensed under the GNU GPL, the sourcecode is
available via svn from::

http://trac.pocoo.org/repos/trypy

Since it requires Paste, PasteDeploy and the current colubrid checkout,
here the installation for copy/pasteing:

- easy_install Paste
- easy_install PasteDeploy
- svn co http://trac.pocoo.org/repos/trypy
- cd trypy
- svn co http://trac.pocoo.org/repos/colubrid/trunk/colubrid
- python trypy.py

The last command starts the application.

And here a screenshot of a running session:
http://trac.pocoo.org/wiki/TryPy

Regards,
Armin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Try Python!

2006-03-29 Thread Armin Ronacher
BartlebyScrivener wrote:
> Armin,
>
> Mike Meyer already took a crack at this, and his starts right up just
> by clicking on the link.
>
> http://www.mired.org/home/mwm/try_python/
Hm. Looks not that useful since you can't create any functions and you
can remove the prompt :-)

> Yours looks prettier, but I don't think novices are going to be able to
> figure out how to start it.
They don't have to figure out if someone would install that on a public
host. But therefore the application has to run inside of a jail or a
XEN since python doesn't have a secure sandbox.

Regards,
Armin

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: New Python website, new documentation ?

2006-03-30 Thread Armin Ronacher
I paste my idea (original posted in #pythonpaste a few days ago):

 Chairos: what I really miss is a "go-python.org" page :)
 which would be what?
 showing up python gui application, python wsgi
applications, a python documentation with a comment section, tutorials
and a wiki
 and everything compact
 and python.org for the companies and core developers

Something really basic. Only small tutorials, a documentation with
comments and a list of python applications. With a clean design and
only few links per page. Something like the rubyonrails page for
example.

Regards,
Armin

-- 
http://mail.python.org/mailman/listinfo/python-list