* Steve Holden:

While I am fully aware that premature optimization, etc., but I cannot
resist an appeal to efficiency if it finally kills off this idea that
"they took 'cmp()' away" is a bad thing.

Passing a cmp= argument to sort provides the interpreter with a function
that will be called each time any pair of items have to be compared. The
key= argument, however, specifies a transformation from [x0, x1, ...,
xN] to [(key(x0), x0), (key(x1), x1), ..., (key(xN), xN)] (which calls
the key function precisely once per sortable item).

From a C routine like sort() [in CPython, anyway] calling out from C to
a Python function to make a low-level decision like "is A less than B?"
turns out to be disastrous for execution efficiency (unlike the built-in
default comparison, which can be called directly from C in CPython).

If your data structures have a few hundred items in them it isn't going
to make a huge difference. If they have a few million thenit is already
starting to affect performance ;-)

It's not either/or, it's do programmers still need the cmp functionality?

Consider that *correctness* is a bit more important than efficiency, and that sorting strings is quite common...

Possibly you can show me the way forward towards sorting these strings (shown below) correctly for a Norwegian locale. Possibly you can't. But one thing is for sure, if there was a cmp functionality it would not be a problem.


<example>
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> L = ["æ", "ø", "å"]   # This is in SORTED ORDER in Norwegian
  >>> L
  ['æ', 'ø', 'å']
  >>> L.sort()
  >>> L
  ['å', 'æ', 'ø']
  >>>
  >>> import locale
  >>> locale.getdefaultlocale()
  ('nb_NO', 'cp1252')
  >>> locale.setlocale( locale.LC_ALL )  # Just checking...
  'C'
  >>> locale.setlocale( locale.LC_ALL, "" )  # Setting default locale, Norwgian.
  'Norwegian (Bokmål)_Norway.1252'
  >>> locale.strxfrm( "æøå" )
  'æøå'
  >>> L.sort( key = locale.strxfrm )
  >>> L
  ['å', 'æ', 'ø']
  >>> locale.strcoll( "å", "æ" )
  1
  >>> locale.strcoll( "æ", "ø" )
  -1
  >>>
</example>


Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that is, it would have if it could have been used as cmp function to sort (or better, to a separate routine named e.g. custom_sort).

And strcoll can be so used in 2.x:


<example>
C:\Documents and Settings\Alf\test> py2
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> def show( list ):
...     print "[" + ", ".join( list ) + "]"
...
>>> L = [u"æ", u"ø", u"å"]
>>> show( L )
[æ, ø, å]
>>> L.sort()
>>> show( L )
[å, æ, ø]
>>> import locale
>>> locale.setlocale( locale.LC_ALL, "" )
'Norwegian (Bokm\xe5l)_Norway.1252'
>>> L.sort( cmp = locale.strcoll )
>>> show( L )
[æ, ø, å]
>>> L
[u'\xe6', u'\xf8', u'\xe5']
>>> _
</example>


The above may just be a bug in the 3.x stxfrm. But it illustrates that sometimes you have your sort order defined by a comparision function. Transforming that into a key can be practically impossible (it can also be quite inefficient).


Cheers & hth.,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to