Re: Performance: sets vs dicts.

Terry Reedy Wed, 01 Sep 2010 16:27:56 -0700

On 9/1/2010 5:40 PM, John Bokma wrote:

Arnaud Delobelle<arno...@googlemail.com>  writes:

Terry Reedy<tjre...@udel.edu>  writes:

On 9/1/2010 11:40 AM, Aahz wrote:

I think that any implementation
that doesn't have O(1) for list element access is fundamentally broken,


Whereas I think that that claim is fundamentally broken in multiple ways.

and we should probably document that somewhere.


I agree that *current* algorithmic behavior of parts of CPython on
typical *current* hardware should be documented not just 'somewhere'
(which I understand it is, in the Wiki) but in a CPython doc included
in the doc set distributed with each release.

Perhaps someone or some group could write a HowTo on Programming with
CPython's Builtin Classes that would describe both the implementation
and performance and also the implications for coding style. In
particular, it could compare CPython's array lists and tuples to
singly linked lists (which are easily created in Python also).

But such a document, after stating that array access may be thought of
as constant time on current hardware to a useful first approximation,
should also state that repeated seqeuntial accessess may be *much*
faster than repeated random accessess. People in the high-performance
computing community are quite aware of this difference between
simplified lies and messy truth. Because of this, array algorithms are
(should be) written differently in Fortran and C because Fortran
stores arrays by columns and C by rows and because it is usually much
faster to access the next item than one far away.


I don't understand what you're trying to say.

Most generally, that I view Python as an general algorithm language andnot just as a VonNeuman machine programming language.

More specifically, that O() claims can be inapplicable, confusing,misleading, incomplete, or false, especially when applied to real timeand to real systems with finite limits.

Aahz didn't claim that random list element access was constant time,

>> he said it was O(1) (and
>> that it should be part of the Python spec that it is).

Yes, I switched, because 'constant time' is a comprehensible claim thatcan be refuted and because that is how some will interpret O(1) (seebelow for proof;-).

If one takes O(1) to mean bounded, which I believe is the usualtechnical meaning, then all Python built-in sequence operations takebounded time because of the hard size limit. If sequences were notbounded in length, then access time would not be bounded either.

My most specific point is that O(1), interpreted as more-or-lessconstant time across a range of problem sizes, can be either a virute orvice depending on whether the constancy is a result of speeding up largeproblems or slowing down small problems. I furthermore contend thatPython sequences on current hardware exhibit both virtue and vice andthat is would be absurd to reject a system that kept the virtue withoutthe vice and that such absurdity should not be built into the languagedefinition.

My fourth point is that we can meet the reasonable goal of helping somepeople make better use of current Python/CPython on current hardwarewithout big-O controversy and without screwing around with the languagedefinition and locking out the future.

Uhm, O(1) /is/ constant time, see page 45 of Introduction to Algorithms,
2nd edition.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Performance: sets vs dicts.

Reply via email to