Internals and complexity of types, containers and algorithms
Hi, I am new to python and I miss some understanding of the internals of some types and containers. With my C/C++ background I hope to get some hints to chose the best data structure for my programs. Here are some questions: - Is there a brief description (not source) how the types tuple, string, list and dict are represented internally. Is a list behind the scenes just a double linked list or an array or a mixture of these things? Is a dict a tree or a hash array and what is the collision mechanism? Is the string an array with some header info? - What is the big-O complexity of the most common algorithms for these types and containers? Is it O(n), O(n*log(n)) or O(n**2)? I mean inserting, appending (front or back), finding something and so on. - The same questions for important and common library containers if you can name some. - Is this information somewhere in the web? Why is it not written in the documentation? - When I want to use a representation of a game board like chess in C/C++ I use an array[64] or bigger of int or char for the pieces. What data structure or type would be useful in Python when the performance ist most important? Is it list or string or an array from a library or what else? Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: Internals and complexity of types, containers and algorithms
On Mon, 25 Jun 2007 James Stroud wrote: >Harald Luessen wrote: >> Hi, I am new to python and I miss some understanding of the internals >> of some types and containers. With my C/C++ background I hope to get >> some hints to chose the best data structure for my programs. Here are >> some questions: > >This depends on how you define "best". If you want speed and >optimization, you can use the numpy package built with ATLAS tuned for a >specific machine. Are there arrays in numpy that can efficiently be used for other things than matrix arithmetic? Are they faster than lists but still easy to use? no_piece = 0 wpawn = 1 ... board[square] = no_piece board[square+8] = pawn ... could be a typical pawn move. >Beyond speed, "best" in the python community usually means "most suited" >from an idiomatic perspective and from the perspective of structure that >lends itself to long term maintainability because [C]python data >structures seem to undergo optimizations in their implementation at each >revision. I like the python way of "best" code. But in this particular question the emphasis was on performance and speed. >Your best bet is probably to forget about implementation and write code >that makes sense. For example, some have suggested a tuple-keyed >dictionary to represent a chess board: > >board = ((c,r) for r in xrange(1, 9) for c in 'abcdefgh') >starting = 'RNBQKBNR' + 'P' * 8 + ' ' * 32 + 'p' * 8 + 'rnbqkbnr' >position = dict(zip(board, starting)) I am not new to board game programming and I am used to an array approach or even bitboards. Therefore the dictionary looks a little bit strange to me. Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: Internals and complexity of types, containers and algorithms
On Mon, 25 Jun 2007 Martin v. Löwis wrote: >Sure, see below: > >- tuples are represented as arrays, with a single block for the > entire objects (object header, tuple size, and data) >- list are represented as arrays, with two memory blocks: > one for object header and sizes, and the other one for the > "guts", i.e. the actual data. The list uses over-allocation, > to avoid resizing on each addition. >- strings are implemented as arrays, with a single block for > the entire string. In addition to header, size, and data, > it also contains a cached hash and a pointer to the interned > version of the string (if any). >- dicts are implemented as hash tables, with open addressing. ... and more interesting things ... Thank you. That was the information I was looking for. I just forgot to ask for sets. Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: 2to3 refactoring [was Re: Tuple parameter unpacking in 3.x]
On Sun, 05 Oct 2008 "Aaron \"Castironpi\" Brady" wrote: >There's the possibility that the most important words should go first in >this case: > >result_flag__t > >But, I'll admit that other people could have learned different orders of >scanning words than I, especially depending on their spoken language >backgrounds. A poll of the newsgroup isn't exactly academically >impartial sampling, but there aren't any better ways to make decisions, >are there? (I think it would be easy to make either one a habit.) > >Here's the other option in the same context: > >def function(vocab_list, result_flag__t, max_value): > result, flag = result_flag__t > pass > >To be thorough, there's also a trailing double underscore option. > >def function(vocab_list, result_flag__, max_value): > result, flag = result_flag__ > pass > >Which I don't recognize from any other usages, but I defer. If there >aren't any, conditionally, I think this is my favorite. t__result_flag and result_flag__t have the advantage that you can search for t__ or __t as start or end of a name if you want to find and change all these places in the source. You can compare it with the decision to use reinterpret_cast(...) as a cast operator in C++. It is ugly but much easier to find than (long)... A search for __ alone would have too many hits in Python. Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: error when porting C code to Python (bitwise manipulation)
On Thu, 10 Jul 2008 Jordan wrote: >On Jul 10, 1:35 pm, MRAB <[EMAIL PROTECTED]> wrote: >> On Jul 10, 4:56 am, Jordan <[EMAIL PROTECTED]> wrote: >> >> >> >> > I am trying to rewrite some C source code for a poker hand evaluator >> > in Python. Putting aside all of the comments such as just using the C >> > code, or using SWIG, etc. I have been having problems with my Python >> > code not responding the same way as the C version. >> >> > C verison: >> >> > unsigned find_fast(unsigned u) >> > { >> > unsigned a, b, r; >> > u += 0xe91aaa35; >> > u ^= u >> 16; >> > u += u << 8; >> > u ^= u >> 4; >> > b = (u >> 8) & 0x1ff; >> > a = (u + (u << 2)) >> 19; >> > r = a ^ hash_adjust[b]; >> > return r; >> >> > } >> >> > my version (Python, hopefully ;)): >> >> > def find_fast(u): >> > u += 0xe91aaa35 >> > u ^= u >> 16 >> > u += u << 8 >> > u ^= u >> 4 >> > b = (u >> 8) & 0x1ff >> > a = (u + (u << 2)) >> 19 >> > r = a ^ hash_adjust[b] >> > return r >> >> > As far as I understand the unsigned instructions in C just increase >> > amount of bytes the int can hold, and Python automatically converts to >> > longs which have infinite size when necessary, so I am not sure why I >> > am getting different results. >> >> > I assume that I am missing something fairly simple here, so help a >> > n00b out if you can :) >> >> > Thanks in advance, >> >> > jnb >> >> You want to restrict the values to 32 bits. The result of + or << may >> exceed 32 bits, so you need to mask off the excess bits afterwards. >> >> def find_fast(u): >> mask = 0x >> u = (u + 0xe91aaa35) & mask >> u ^= u >> 16 >> u = (u + (u << 8)) & mask # can get away with only 1 mask here >> u ^= u >> 4 >> b = (u >> 8) & 0x1ff >> a = ((u + (u << 2)) & mask) >> 19 # can get away with only 1 mask >> here >> r = a ^ hash_adjust[b] >> return r >> >> HTH > >Well, I guess there are two problemsthe masking and the fact the >in C it seems to for some reason overflow and become a negative >valuestill not sure why it does itSo the code with just >masking doesn't work, you still need some sort of weird inversion like >the ~(0x - u).weird > >anyone? In C unsigned can not be negative. Why do you believe the numbers are negative? If your debugger is telling you this thow away the debugger and use printf. If printf is telling you this then use the right format. printf("%u", u); // for unsigned int u printf("%lu", u); // for unsigned long u printf("%x", u); or printf("0x%08x", u); // to see u in hex Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: improving a huge double-for cycle
On Thu, 18 Sep 2008 Alexzive wrote: >I am a python newbie and need to run following code for a task in an >external simulation programm called "Abaqus" which makes use of python >to access the mesh (ensamble of nodes with xy coordinates) of a >certain geometrical model. > >[IN is the starting input containing the nodes to be check, there are >some double nodes with the same x and y coordinates which need to be >removed. SN is the output containing such double nodes] > >Code: Select all >for i in range(len(IN)): #scan all elements of the list IN > for j in range(len(IN)): >if i <> j: > if IN[i].coordinates[0] == IN[j].coordinates[0]: > if IN[i].coordinates[1] == IN[j].coordinates[1]: > SN.append(IN[i].label) I did not test the syntax, but here is an idea with sorted lists. It should be O(NlogN). def sk(x): return x.coordinates[0] IN.sort(key=sk) for i in xrange(len(IN)): for j in xrange(i+1, len(IN)): if IN[i].coordinates[0] == IN[j].coordinates[0]: if IN[i].coordinates[1] == IN[j].coordinates[1]: SN.append(IN[i].label) else: break Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: improving a huge double-for cycle
On Thu, 18 Sep 2008 Bruno Desthuilliers wrote: ># Harald : uncomment this and run test_results. As far as I can tell, it ># doesn't yields the expected results > >## IN7 = IN[:] >## def sortk7(n): >## return n.coordinates[0] > >## def doubles7(): >## "is ordering better ? - Nope Sir, it's broken..." >## IN7.sort(key=sortk) >## SN = [] >## sn_append = SN.append >## in_len = len(IN) >## for i in xrange(in_len): >## node_i = IN[i] >## coords_i = node_i.coordinates >## for j in xrange(i+1, in_len): >## if coords_i[0] == IN[j].coordinates[0]: >## if coords_i[1] == IN[j].coordinates[1]: >## sn_append(node_i) >## else: >## break >## return SN ... >Results here (py2.5, gentoo linux, athlonxp1800, 512 ram): > > >>> test_results() >True > >>> test_times() >doubles0 : 1.55667901039 >doubles1 : 0.719144105911 >doubles2 : 0.703393936157 >doubles3 : 0.700654983521 >doubles4 : 0.706257104874 >doubles5 : 0.528184890747 >doubles6 : 0.461633205414 >doubles8 : 0.0134379863739 >doubles9 : 0.0108540058136 When you change my code then do it right. :-) You forgot to change the IN to IN7 at _every_ place. And the sortk should be sortk7 in _both_ places. I never let the code run before myself. I just wrote it in the newsreader. But now i did and I have a second version as bonus. IN7 = IN[:] def sortk7(n): return n.coordinates[0], n.coordinates[1] def doubles7(): IN7.sort(key=sortk7) SN = [] sn_append = SN.append in_len = len(IN7) for i in xrange(in_len): node_i = IN7[i] coords_i = node_i.coordinates for j in xrange(i+1, in_len): if coords_i[0] == IN7[j].coordinates[0]: if coords_i[1] == IN7[j].coordinates[1]: sn_append(node_i) else: break return SN def comp7( x, y ): return cmp( x.coordinates, y.coordinates ) def doubles7a(): IN7.sort( comp7 ) SN = [] sn_append = SN.append in_len = len(IN7) for i in xrange(in_len): node_i = IN7[i] for j in xrange(i+1, in_len): node_j = IN7[j] if comp7( node_i, node_j ) == 0: sn_append(node_i) else: break return SN Here are the results. (py2.5, WindowsXP, Pentium4, 2.6GHz, 1.5GB): My version is not so bad. doubles0 : 1.03830598582 doubles1 : 0.47943719104 doubles2 : 0.487412506338 doubles3 : 0.475924733451 doubles4 : 0.466548681466 doubles5 : 0.340487967046 doubles6 : 0.278480365521 doubles7 : 0.0953190978183 doubles7a : 0.0784233750379 doubles8 : 0.010236496538 doubles9 : 0.00742803903848 Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: if, continuation and indentation
On Thu, 27 May 2010 HH wrote: >I have a question about best practices when it comes to line wrapping/ >continuation and indentation, specifically in the case of an if >statement. > >if (width == 0 and >height == 0 and >color == 'red' and >emphasis == 'strong' or >highlight > 100): >raise ValueError("sorry, you lose") My solution would probably look like this: if ( width == 0 and height == 0 and color == 'red' and emphasis == 'strong' or highlight > 100 ): raise ValueError("sorry, you lose") Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: IDLE file saving problem
On Fri, 21 Aug 2009 14:48:56 -0700 (PDT), r wrote: >On Aug 21, 4:10 pm, MRAB wrote: > >Yes, and much more needs improvement! I have made many changes already >and i am polishing an "IDLE 3000" for the good people of Pythonia, >stay tuned more to come There is a small little detail that nags me when I use IDLE: When the cursor is somewhere in the white space at the beginning of the lines and I use Ctrl-rightArrow [ctrl]-[->] to go to the first word in the line then IDLE skips the position with the first non-whitespace letter and places the cursor after the first word in the line. This is not consistent because moving backwards works. [ctrl]-[<-] Similar things happen at the end of a line. I suggest that these movements should consider 'before the first' and 'after the last' text in the line as stop positions. I think other editors do that, at least MS Visual Studio. (The [pos1] alternating between first column and first text in line does not help much.) Harald -- http://mail.python.org/mailman/listinfo/python-list
Re: Annoying octal notation
On Mon, 24 Aug 2009 Derek Martin wrote: >Those participating in this thread have pretty much all seem to agree >that the only places where decimal numbers with leading zeros really >are common are either in rather specialized applications, such as >computer-oriented data or serial numbers (which typically behave more >like strings, from a computer science perspective), or the rather >common one of dates. The latter case is perhaps what's significant, >if any of those cases are. I don't like the 'leading 0 is octal'-syntax. I typically think of numbers as decimal and bytes as hexadecimal. I would even write something like this: # iterate over bits 3 to 5 for i in range(0x00, 0x40, 0x08): ... print "0x%02x\n" % i 0x00 0x08 0x10 0x18 0x20 0x28 0x30 0x38 For me it is easier to see where the bits are in the hex notation. And it is very common to use numbers with leading zeroes that are hexadecimal. Like this: # print address and data for i in range(0x1): print "%04x: %d\n" % i, data[i] : ... 0001: ... ... 000f: ... 0010: ... ... When you are looking for examples of numbers where leading zeroes do not mean octal then consider decimal AND hexadecimal. >I tend to think that within the computer >science arena, the history and prevalence of the leading 0 indicating >octal far outweighs all of those cases combined. I disagree. Harald -- http://mail.python.org/mailman/listinfo/python-list