from:"Harald Luessen"

Internals and complexity of types, containers and algorithms

2007-06-25 Thread Harald Luessen

Hi, I am new to python and I miss some understanding of the internals 
of some types and containers. With my C/C++ background I hope to get 
some hints to chose the best data structure for my programs. Here are 
some questions:

- Is there a brief description (not source) how the types tuple, 
string, list and dict are represented internally. Is a list behind 
the scenes just a double linked list or an array or a mixture of 
these things? Is a dict a tree or a hash array and what is the 
collision mechanism? Is the string an array with some header info?

- What is the big-O complexity of the most common algorithms for 
these types and containers? Is it O(n), O(n*log(n)) or O(n**2)? 
I mean inserting, appending (front or back), finding something 
and so on.

- The same questions for important and common library containers 
if you can name some.

- Is this information somewhere in the web? Why is it not written 
in the documentation?

- When I want to use a representation of a game board like chess 
in C/C++ I use an array[64] or bigger of int or char for the pieces. 
What data structure or type would be useful in Python when the 
performance ist most important? Is it list or string or an array 
from a library or what else?

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Internals and complexity of types, containers and algorithms

2007-06-26 Thread Harald Luessen

On Mon, 25 Jun 2007 James Stroud wrote:

>Harald Luessen wrote:
>> Hi, I am new to python and I miss some understanding of the internals 
>> of some types and containers. With my C/C++ background I hope to get 
>> some hints to chose the best data structure for my programs. Here are 
>> some questions:
>
>This depends on how you define "best". If you want speed and 
>optimization, you can use the numpy package built with ATLAS tuned for a 
>specific machine.

Are there arrays in numpy that can efficiently be used for 
other things than matrix arithmetic? Are they faster than lists 
but still easy to use?

no_piece = 0
wpawn = 1
...
board[square] = no_piece
board[square+8] = pawn
...

could be a typical pawn move.

>Beyond speed, "best" in the python community usually means "most suited" 
>from an idiomatic perspective and from the perspective of structure that 
>lends itself to long term maintainability because [C]python data 
>structures seem to undergo optimizations in their implementation at each 
>revision.

I like the python way of "best" code. But in this particular 
question the emphasis was on performance and speed.

>Your best bet is probably to forget about implementation and write code 
>that makes sense. For example, some have suggested a tuple-keyed 
>dictionary to represent a chess board:
>
>board = ((c,r) for r in xrange(1, 9) for c in 'abcdefgh')
>starting = 'RNBQKBNR' + 'P' * 8 + ' ' * 32 + 'p' * 8 + 'rnbqkbnr'
>position = dict(zip(board, starting))

I am not new to board game programming and I am used to 
an array approach or even bitboards. Therefore the dictionary 
looks a little bit strange to me.

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Internals and complexity of types, containers and algorithms

2007-06-26 Thread Harald Luessen

On Mon, 25 Jun 2007 Martin v. Löwis wrote:

>Sure, see below:
>
>- tuples are represented as arrays, with a single block for the
>  entire objects (object header, tuple size, and data)
>- list are represented as arrays, with two memory blocks:
>  one for object header and sizes, and the other one for the
>  "guts", i.e. the actual data. The list uses over-allocation,
>  to avoid resizing on each addition.
>- strings are implemented as arrays, with a single block for
>  the entire string. In addition to header, size, and data,
>  it also contains a cached hash and a pointer to the interned
>  version of the string (if any).
>- dicts are implemented as hash tables, with open addressing.
... and more interesting things ...

Thank you. That was the information I was looking for. 
I just forgot to ask for sets.

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 2to3 refactoring [was Re: Tuple parameter unpacking in 3.x]

2008-10-06 Thread Harald Luessen

On Sun, 05 Oct 2008 "Aaron \"Castironpi\" Brady" wrote:

>There's the possibility that the most important words should go first in 
>this case:
>
>result_flag__t
>
>But, I'll admit that other people could have learned different orders of 
>scanning words than I, especially depending on their spoken language 
>backgrounds.  A poll of the newsgroup isn't exactly academically 
>impartial sampling, but there aren't any better ways to make decisions, 
>are there?  (I think it would be easy to make either one a habit.)
>
>Here's the other option in the same context:
>
>def function(vocab_list, result_flag__t, max_value):
> result, flag = result_flag__t
> pass
>
>To be thorough, there's also a trailing double underscore option.
>
>def function(vocab_list, result_flag__, max_value):
> result, flag = result_flag__
> pass
>
>Which I don't recognize from any other usages, but I defer.  If there 
>aren't any, conditionally, I think this is my favorite.

t__result_flag and result_flag__t have the advantage that you can 
search for t__ or __t as start or end of a name if you want to 
find and change all these places in the source. You can compare 
it with the decision to use reinterpret_cast(...) as a cast 
operator in C++. It is ugly but much easier to find than (long)...
A search for __ alone would have too many hits in Python.

Harald

--
http://mail.python.org/mailman/listinfo/python-list

Re: error when porting C code to Python (bitwise manipulation)

2008-07-10 Thread Harald Luessen

On Thu, 10 Jul 2008 Jordan wrote:

>On Jul 10, 1:35 pm, MRAB <[EMAIL PROTECTED]> wrote:
>> On Jul 10, 4:56 am, Jordan <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> > I am trying to rewrite some C source code for a poker hand evaluator
>> > in Python.  Putting aside all of the comments such as just using the C
>> > code, or using SWIG, etc.  I have been having problems with my Python
>> > code not responding the same way as the C version.
>>
>> > C verison:
>>
>> > unsigned find_fast(unsigned u)
>> > {
>> >     unsigned a, b, r;
>> >     u += 0xe91aaa35;
>> >     u ^= u >> 16;
>> >     u += u << 8;
>> >     u ^= u >> 4;
>> >     b  = (u >> 8) & 0x1ff;
>> >     a  = (u + (u << 2)) >> 19;
>> >     r  = a ^ hash_adjust[b];
>> >     return r;
>>
>> > }
>>
>> > my version (Python, hopefully ;)):
>>
>> > def find_fast(u):
>> >     u += 0xe91aaa35
>> >     u ^= u >> 16
>> >     u += u << 8
>> >     u ^= u >> 4
>> >     b  = (u >> 8) & 0x1ff
>> >     a  = (u + (u << 2)) >> 19
>> >     r  = a ^ hash_adjust[b]
>> >     return r
>>
>> > As far as I understand the unsigned instructions in C just increase
>> > amount of bytes the int can hold, and Python automatically converts to
>> > longs which have infinite size when necessary, so I am not sure why I
>> > am getting different results.
>>
>> > I assume that I am missing something fairly simple here, so help a
>> > n00b out if you can :)
>>
>> > Thanks in advance,
>>
>> > jnb
>>
>> You want to restrict the values to 32 bits. The result of + or << may
>> exceed 32 bits, so you need to mask off the excess bits afterwards.
>>
>> def find_fast(u):
>>     mask = 0x
>>     u  = (u + 0xe91aaa35) & mask
>>     u ^= u >> 16
>>     u  = (u + (u << 8)) & mask # can get away with only 1 mask here
>>     u ^= u >> 4
>>     b  = (u >> 8) & 0x1ff
>>     a  = ((u + (u << 2)) & mask) >> 19 # can get away with only 1 mask
>> here
>>     r  = a ^ hash_adjust[b]
>>     return r
>>
>> HTH
>
>Well, I guess there are two problemsthe masking and the fact the
>in C it seems to for some reason overflow and become a negative
>valuestill not sure why it does itSo the code with just
>masking doesn't work, you still need some sort of weird inversion like
>the ~(0x - u).weird
>
>anyone?

In C unsigned can not be negative. Why do you believe 
the numbers are negative? If your debugger is telling 
you this thow away the debugger and use printf. 
If printf is telling you this then use the right format.
printf("%u", u); // for unsigned int u
printf("%lu", u); // for unsigned long u
printf("%x", u); 
or
printf("0x%08x", u); // to see u in hex

Harald

--
http://mail.python.org/mailman/listinfo/python-list

Re: improving a huge double-for cycle

2008-09-18 Thread Harald Luessen

On Thu, 18 Sep 2008 Alexzive wrote:

>I am a python newbie and need to run following code for a task in an
>external simulation programm called "Abaqus" which makes use of python
>to access the mesh (ensamble of nodes with xy coordinates) of a
>certain geometrical model.
>
>[IN is the starting input containing the nodes to be check, there are
>some double nodes with the same x and y coordinates which need to be
>removed. SN is the output containing such double nodes]
>
>Code: Select all
>for i in range(len(IN)): #scan all elements of the list IN
>  for j in range(len(IN)):
>if i <> j:
> if IN[i].coordinates[0] == IN[j].coordinates[0]:
>   if IN[i].coordinates[1] == IN[j].coordinates[1]:
>  SN.append(IN[i].label)

I did not test the syntax, but here is an idea with sorted lists.
It should be O(NlogN).

def sk(x):
return x.coordinates[0]

IN.sort(key=sk)
for i in xrange(len(IN)):
for j in xrange(i+1, len(IN)):
if IN[i].coordinates[0] == IN[j].coordinates[0]:
if IN[i].coordinates[1] == IN[j].coordinates[1]:
SN.append(IN[i].label)
else:
break

Harald

--
http://mail.python.org/mailman/listinfo/python-list

Re: improving a huge double-for cycle

2008-09-19 Thread Harald Luessen

On Thu, 18 Sep 2008 Bruno Desthuilliers wrote:

># Harald : uncomment this and run test_results. As far as I can tell, it
># doesn't yields the expected results
>
>## IN7 = IN[:]
>## def sortk7(n):
>## return n.coordinates[0]
>
>## def doubles7():
>## "is ordering better ? - Nope Sir, it's broken..."
>## IN7.sort(key=sortk)
>## SN = []
>## sn_append = SN.append
>## in_len = len(IN)
>## for i in xrange(in_len):
>## node_i = IN[i]
>## coords_i = node_i.coordinates
>## for j in xrange(i+1, in_len):
>## if coords_i[0] == IN[j].coordinates[0]:
>## if coords_i[1] == IN[j].coordinates[1]:
>## sn_append(node_i)
>## else:
>## break
>## return SN
...
>Results here (py2.5, gentoo linux, athlonxp1800, 512 ram):
>
> >>> test_results()
>True
> >>> test_times()
>doubles0 : 1.55667901039
>doubles1 : 0.719144105911
>doubles2 : 0.703393936157
>doubles3 : 0.700654983521
>doubles4 : 0.706257104874
>doubles5 : 0.528184890747
>doubles6 : 0.461633205414
>doubles8 : 0.0134379863739
>doubles9 : 0.0108540058136

When you change my code then do it right. :-)
You forgot to change the IN to IN7 at _every_ place.
And the sortk should be sortk7 in _both_ places.

I never let the code run before myself. I just wrote it 
in the newsreader. But now i did and I have a second 
version as bonus.


IN7 = IN[:]

def sortk7(n):
return n.coordinates[0], n.coordinates[1]

def doubles7():
IN7.sort(key=sortk7)
SN = []
sn_append = SN.append
in_len = len(IN7)
for i in xrange(in_len):
node_i = IN7[i]
coords_i = node_i.coordinates
for j in xrange(i+1, in_len):
if coords_i[0] == IN7[j].coordinates[0]:
if coords_i[1] == IN7[j].coordinates[1]:
sn_append(node_i)
else:
break
return SN


def comp7( x, y ):
 return cmp( x.coordinates, y.coordinates )

def doubles7a():
IN7.sort( comp7 )
SN = []
sn_append = SN.append
in_len = len(IN7)
for i in xrange(in_len):
node_i = IN7[i]
for j in xrange(i+1, in_len):
node_j = IN7[j]
if comp7( node_i, node_j ) == 0:
sn_append(node_i)
else:
break
return SN


Here are the results. (py2.5, WindowsXP, Pentium4, 2.6GHz, 1.5GB):
My version is not so bad.

doubles0 : 1.03830598582
doubles1 : 0.47943719104
doubles2 : 0.487412506338
doubles3 : 0.475924733451
doubles4 : 0.466548681466
doubles5 : 0.340487967046
doubles6 : 0.278480365521
doubles7 : 0.0953190978183
doubles7a : 0.0784233750379
doubles8 : 0.010236496538
doubles9 : 0.00742803903848


Harald

--
http://mail.python.org/mailman/listinfo/python-list

Re: if, continuation and indentation

2010-05-27 Thread Harald Luessen

On Thu, 27 May 2010 HH wrote:

>I have a question about best practices when it comes to line wrapping/
>continuation and indentation, specifically in the case of an if
>statement.
>
>if (width == 0 and
>height == 0 and
>color == 'red' and
>emphasis == 'strong' or
>highlight > 100):
>raise ValueError("sorry, you lose")

My solution would probably look like this:

if (   width == 0
   and height == 0
   and color == 'red'
   and emphasis == 'strong'
or highlight > 100
   ):
raise ValueError("sorry, you lose")

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: IDLE file saving problem

2009-08-23 Thread Harald Luessen

On Fri, 21 Aug 2009 14:48:56 -0700 (PDT), r  wrote:

>On Aug 21, 4:10 pm, MRAB  wrote:
>
>Yes, and much more needs improvement! I have made many changes already
>and i am polishing an "IDLE 3000" for the good people of Pythonia,
>stay tuned more to come

There is a small little detail that nags me when I use IDLE:

When the cursor is somewhere in the white space at the
beginning of the lines and I use Ctrl-rightArrow  [ctrl]-[->]
to go to the first word in the line then IDLE skips
the position with the first non-whitespace letter and places
the cursor after the first word in the line.
This is not consistent because moving backwards works. [ctrl]-[<-]
Similar things happen at the end of a line.

I suggest that these movements should consider 'before the first' 
and 'after the last' text in the line as stop positions.

I think other editors do that, at least MS Visual Studio.

(The [pos1] alternating between first column and first text in line
does not help much.)

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Annoying octal notation

2009-08-24 Thread Harald Luessen

On Mon, 24 Aug 2009 Derek Martin wrote:
>Those participating in this thread have pretty much all seem to agree
>that the only places where decimal numbers with leading zeros really
>are common are either in rather specialized applications, such as
>computer-oriented data or serial numbers (which typically behave more
>like strings, from a computer science perspective), or the rather
>common one of dates.  The latter case is perhaps what's significant,
>if any of those cases are.

I don't like the 'leading 0 is octal'-syntax. I typically think of
numbers as decimal and bytes as hexadecimal.
I would even write something like this:

# iterate over bits 3 to 5
for i in range(0x00, 0x40, 0x08):
...
print "0x%02x\n" % i

0x00
0x08
0x10
0x18
0x20
0x28
0x30
0x38

For me it is easier to see where the bits are in the hex notation.

And it is very common to use numbers with leading zeroes
that are hexadecimal. Like this:

# print address and data
for i in range(0x1):
print "%04x: %d\n" % i, data[i]

: ...
0001: ...
...
000f: ...
0010: ...
...

When you are looking for examples of numbers where leading zeroes
do not mean octal then consider decimal AND hexadecimal.

>I tend to think that within the computer
>science arena, the history and prevalence of the leading 0 indicating
>octal far outweighs all of those cases combined.

I disagree.

Harald

-- 
http://mail.python.org/mailman/listinfo/python-list

Internals and complexity of types, containers and algorithms

Re: Internals and complexity of types, containers and algorithms

Re: Internals and complexity of types, containers and algorithms

Re: 2to3 refactoring [was Re: Tuple parameter unpacking in 3.x]

Re: error when porting C code to Python (bitwise manipulation)

Re: improving a huge double-for cycle

Re: improving a huge double-for cycle

Re: if, continuation and indentation

Re: IDLE file saving problem

Re: Annoying octal notation

10 matches

Site Navigation

Mail list logo

Footer information