Re: Few things
Thank you for the comments and answers, and sorry for my answering delay... Josiah Carlson: >Decorators can do this without additional syntax. Think @accepts and @returns.< The purpose of those pre-post is to write something simile and very *clean* that states what inputs and outputs must be. This is an example of a pre-post conditional for a sorting function taken from that site (all this is inside the docstring of the function): pre: # must be a list isinstance(a, list) # all elements must be comparable with all other items forall(range(len(a)), lambda i: forall(range(len(a)), lambda j: (a[i] < a[j]) ^ (a[i] >= a[j]))) post[a]: # length of array is unchanged len(a) == len(__old__.a) # all elements given are still in the array forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e)) # the array is sorted forall([a[i] >= a[i-1] for i in range(1, len(a))]) Surely such things can be passed (at least as strings) to the @accepts and @returns decorators (using a "decorate" identifier instead of @ is probably nicer, because the @ makes Python look more like Perl, but I've seen that lots of people have already discussed such topic). Such testing performed by such decorators can be "switched off" with a global boolean flag when the program is debugged and tested. So now someone can write and let standardise a couple of good @accepts and @returns decorators/functors :-] >Having a 'faq' for permutation and combination generation would be 99% of the way there.< Uh, I'm sorry, but I don't understand :-] Aren't such functions quite well defined? >[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's a log factor faster to select a median, but many algorithms involving selecting medians (at least the ones that I run into in CS theory) end up repeatedly (logn times) selecting the 'kth' smallest element (varying k's), where sorting would actually run slightly faster.< I've done some tests with a Quickselect that I have essentially translated and adapted to pure Python from "Numerical Recipes" (it seems a bit faster than the Quickselect coded by Raymond Hettinger that can be seen in the cookbook). I have seen that on my PC, on random sequence of FP numbers, a *single* Quickselect (to find just the median) is faster than the standard sort for lists longer than about 3 million elements. So it's often useless. But using Psyco, that Quickselect becomes 5-6 times faster (for long lists), so it beats the (good) standard Sort for lists longer than 600-3000 elements. If the Quickselect works in place (as the sort) then it returns a partially ordered list, and you can use it to quickly select other positions (so for close positions, like the computing of the two central values for the median, the complexity of the second select is nearly a constant time). So coding the Quickselect in C/Pyrex can probably make it useful. If you are interested I can give the Python Quickselect code, etc. >Raymond Hettinger< I have already seen that this person is working a lot on Python, often in the algorithmic parts. Nick Coghlan>I believe the OP was objecting to the spelling of "this integer literal is hex" and "this integer literal is octal".< Right. Josiah Carlson>Regardless, I also don't believe the "I don't like this" without "this is the way it should be" will result in anything.< You are right, I was mostly afraid of saying silly things... Here is: Such syntax can be like: number (Putting at the beginning of the number is probably worse and it goes against normal base representation in mathematics, where you often subscript the base number). cannot be "B" or "b" (that stands for "base") because number can be a Hex containing B too... So can be "_" (this is the Subscript in TeX markup, so this agrees with normal representation of the base) can be: 1)just an integer number representing the base (similar to the second parameter of "int", this also allows to specify any base). 2) a symbol to represent a smaller class of possibilities, like 0=2, 1=8, 2=10, 3=16, 4=64. Instead of such digits a letter can be used: a=2, b=8, c=10, etc. I think the first option is better. So integer numbers can be written like: 1010100111011_2 154545_10 777_8 afa35a_16 Fi3pK_64 Thank you to Carlos Ribeiro for your development of such doc string ideas, I appreciate them :-] Bear hugs, Bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Few things
Thank you Josiah Carlson for your answers. If certain parts of my messages upset you, then you can surely skip those parts. (Now I am reading a big good manual: "Learning Python, II ed.", so in the future I hope to write better things about this language.) >That is simple and clean?< Well, it's a clean way to express such complexity. And now I think decorators aren't so fit for such purpose (they produce less clear pre-post). The pre-posts in the docs seem better to me. >Then again, I document and test, and haven't used pre/post conditions in 5+ years.< Yep, documentation and doctests (etc) are useful. >>but I've seen that lots of people have already discussed such topic).<< >Discussion of the @ decorator syntax is a moot point.< I know, I noticed that... Still, only few things are fixed in stone :-] >Think of it like an 'example' in the documentation, where the code is provided for doing both permutations and combinations.< Ah, now I see. In my original post of this thread I have given an URL of some routines written in C because they are about 10 times faster than the routines that I can write in Python. >>If you are interested I can give the Python Quickselect code, etc.<< >No thank you, I have my own.< I see. I haven't tried to force you to take my routines, but to offer any interested person a way to cheek my words. >Ick. In Python, the language is generally read left to right, in a similar fashion to english. The prefix notation of 0 and 0x, in my opinion, reads better than your postfix-with-punctuation notation.< As you say, it's your opinion, and I respect it, but in mathematics I think you usually put the base at the end of the number, as a small subscripted number (but I understand your following explanations). >I'll also mention that two of your examples; afa35a_16 and Fi3pK_64, are valid python variable names< Uh, I am sorry... As I feared, I have written a silly thing :-) >I much prefer just using decimal and offering the proper base notation afterwards in a comment...< I agree. I'll avoid to give hugs, Bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: [Python-Dev] PEP: __source__ proposal
(I answer here because I'm still not at easy on python-dev) It looks like an interesting feature. I think most LOGOs allows something similar (or better, with a cleaner syntax). The shells of many versions of this language keep a "pool" of functions defined with "to" into a kind of database, so you can call them and edit them one at a time (this is a small block from Wikipedia; note that LOGO allows to do more than just turtle graphic): to spiral :size if :size > 30 [stop] ; a condition stop fd :size rt 15; many lines of action spiral :size *1.02; the tailend recursive call end In Mathematica you can edit "cells" of the interactive document, you can press enter to go to the next line inside the same cell, and then you press SHIFT+enter to "compile" and execute the text in the cell (usually going to the cell that follows the output one). Some similar "improvements" of the CPython shell can be useful... but I think you first need to look in many places (Smalltalk, LOGO, Mathematica, Mathcad, etc.) to find the best ideas to copy from. Bear hugs, Bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: > This worked out in 5.28s > Imo it's not that *much* slower > (of course, Psyco can't help here) There's no need of a chain here, so you can rewrite this: import itertools def subs(s): return len(set(itertools.chain( s[i:j] for i in xrange(len(s)) for j in xrange(i, len(s)+1 - 1 as: def subs(s): return len(set(s[i:j] for i in xrange(len(s)) for j in xrange(i, len(s)+1))) - 1 Psyco helps a little (about 10% on my PC) if you use it like this: from time import time import psyco; psyco.full() def main(): t = time() fin = open("input.txt") fout = open("output.txt", "w") fin.next() for line in fin: r = set() s = line.rstrip() len_s = len(s) for i in xrange(len_s): for j in xrange(i, len_s + 1): r.add(s[i : j]) print >> fout, len(r) - 1 fout.close() print time() - t main() Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: > My Py solution: > ... Cute. You can replace: a[ord(s[i])] += [i] With: a[ord(s[i])].append(i) If you want to micro-optimize this with Psyco may be a little faster: (lev >> 1) Than: lev // 2 But to increase speed it's usually better to reduce memory allocations. So you can try to find a way to pull this out of the function: a = [[] for i in xrange(128)] And to avoid a true append: a[ord(s[i])] += [i] (There are many ways to avoid an append. One of them is to have an already allocated list/array and just bump forward an index variable that starts from 0. This is worth doing especially when you use Psyco). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: > my home tests proved Python is a fellow fast beast of C++. > Quite unexpected (I expected Py would be by ~10 times slower). > PS > Both my codes are identical in their algorithms. > = > 0.016 0.0150001049042 <--- exec times Maybe in your C++ code there's something that can be improved, this is a 1:1 translation to D (V.1) language (using dlibs) and it's about 2.2 times faster than the Psyco version: http://codepad.org/MQLj0ydB Using a smarter usage of memory (that is avoiding all or most memory allocations inside all loops), the performance difference will surely grow. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Number of distinct substrings of a string [continuation]
n00m: > I suspect that building of Suffix Tree would > be a big exec.time-consuming overhead In C/D/C++ there are ways to allocate memory in smarter ways, using pools, arenas, stacks, freelists, etc. With that, using a compact encoding of tree nodes (there are many different tricks to reduce space used by a suffix tree), you can go fast enough. Probably a basic implementation too will suffice in many cases :-) Anyway, you may try a pure-Python2.x implementation: http://suffixtree.googlecode.com/files/suffixtree-0.1.py If you find time & will to try to use that (or similar code) to improve your Python solution to the problem 99, you can show us the code here... Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph library for Python
Robin Becker: > There are already very many implementations eg > > http://code.google.com/p/igraphhttp://www.boost.org/doc/libs/release/libs/graphhttp://ernst-schroeder.uni.lu/Digraph/doc/http://code.google.com/p/python-graphhttp://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph... > > and many others..some of the above already seem to be very useful. There's mine too: http://sourceforge.net/projects/pynetwork/ API: http://code.activestate.com/recipes/466329/ Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph library for Python
geremy condra: > Since that's released under the python license, I'm going to > go ahead and commit the version that includes the topo > traversal, but if you have any objections you only need to > say the word and I'll take it down. No objections :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph library for Python
Geremy Condra: > is there a particular way you want your attribution line to read? You can just use my nickname (in all lowercase), with the list of parts you have used. Don't worry. > Well, we all seem to have reinvented the wheel differently ;) Maybe also because they are designed for different purposes. > Bearophile, Tiago- any interest in trying to combine the > best parts of our libraries, with an eye towards eventual > integration into the standard library? The first thing to do is to ask Guido and Hettinger if they are willing to put a "good" graph module into the std lib. If their answer is positive for some definition of "good", then we can think about doing something. Several years ago I have suggested to put a graph module in the std lib, and the answer was something like: "Put the lib online, and if people use it a lot, we'll see to put it into the std lib." In the meantime my lib was used by no one and ten other graph libs are used (networkx seems among the most used), but I think no one of them has shown a strong usage. (In the meantime Hettinger has written and added two or three or four GOOD data structures to the std lib using a "fast lane", avoiding the step of popular usage test). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Which graph library is best suited for large graphs?
Wolodja Wentland: > Which library would you choose? This one probably uses low memory, but I don't know if it works still: http://osl.iu.edu/~dgregor/bgl-python/ Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: switch
Bruno Desthuilliers: > Well, obviously such business rules must by no mean be hardcoded. You > really need a "rule engine", configurable by your domain experts thru a > DSL that we'll design specially for you. The rule engine will generate > an AbstractScoreFactory that will instanciate appropriate IScore > implementation objects that knows what to do. [snip] Thank you very much for bringing back some sanity in this newsgroup, sometimes a good antiexample like yours is better than many explanations. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Seek support for new slice syntax PEP.
Steven D'Aprano: > I've lost all enthusiasm for discussing language enhancements That's probably the main downside of the moratorium. Humans need to play some to keep their will to work and improve things. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: iterators and views of lists
Brendan Miller: > Currently people slice and dice with well... slices, but those are > copying, so if you want to operate over part of a range you make a > copy, perform the operation, then copy the results back in. > > I was thinking you'd want something like random access iterators in > c++, or pointers in c, to write typical in place algorithmic code. To > me, something like non-copying slices (maybe you'd call it a list > view?) would seem functionally similar and maybe more pythonic. There are surely ways to modify the current situation, introducing views, and other kind of iterators (C++ design in this regard can be improved, see the last works of Andrei Alexandrescu about D). This can lead to a certain increase of the performance of Python programs, but it also surely makes writing programs harder and introduces new bug sources. Modern languages are now doing something kinda different from what you ask for, introducing more immutable data (that in theory lead to more copying, but in practice allow for more data sharing too, because there's less need to actually copy data when the data is immutable), see Clojure. An hypothetical Python language designed today probably copies more stuff from Haskell and Clojure. Python is kept intentionally simple, because it's designed to be usable by people that are not just professional programmers, but for example a biologist that needs to perform some computations and doesn't want to use all the time learning how to use the language itself (think about C++ again). So Python programmers live with a less performing language. When performance of the code is not enough (even with the future Unladen Swallow), they use another language (often to write extensions used from Python code). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: iterators and views of lists
Brendan Miller: > I agree though, it doesn't matter to everyone and anyone. The reason I > was interested was because i was trying to solve some specific > problems in an elegant way. I was thinking it would be cool to make > python more usable in programming competitions by giving it its own > port of the STL's algorithm library, which needs something along the > lines of C++'s more powerful iterators. It seems you have missed my post, so here it is, more explicitly: http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf Bye, bearophie -- http://mail.python.org/mailman/listinfo/python-list
Vectorized laziness 2
Do you remember my post about Vectorized laziness that was fully ignored by everyone here? http://groups.google.com/group/comp.lang.python/browse_thread/thread/2637aafa1274629d/ The latest Clojure v.1.1 has implemented the same idea, they are named "Chunked Sequences": http://www.infoq.com/news/2009/12/clojure-11-rc1-transients See: http://clojure.googlegroups.com/web/chunks.pdf (I know they can have some problematic corner cases.) Bye and be well, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: syntax error : first python program
Ben Finney: > > bash# /usr/bin/python sample.py > > File "sample.py", line 2 > > name = "blah" > > ^ > > SyntaxError: invalid syntax > > Indentation is syntax in Python. But I agree, a little better error message may help there, something that contains "indentation" in it :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Missing collections
What are the useful collections that are missing in the collections module? Let's see: - sortedDict, sortedSet (based on a search tree) - frozenDict - bitArray (or bitset) - graph - linkedList (*) (*) linkedList is like deque, it's a linked list of short arrays, but they aren't 100% full. Few more that are less important: - biDict (bidirectional dict, a bijection) - persistentList, persistentDict - intervalDict - Trie - Dawg - Rope - Fibonacci heap - Bloom filter - Union-Find With those I think many things are covered :-) Regarding the standard library of Python 3, it's easy enough to create for mistake a module and have it collide with an equally named module of the std lib. To avoid this I think (after seeing the std lib of the D language) it can be useful to put all modules of the standard library into a namespace, like "py" or "std" or something like that. So you write: import py.math from py.math import sin x = py.math.cos(1.2) y = sin(1.2) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Symbols as parameters?
Martin Drautzburg, having symbols spread in the namespace is bad. And too much magic is even worse. You seem to need something like an enum that must be used with its qualified name, as: move(direction.up) That works well unless the symbols name are keywords. Creating a small class like this (that warns if you give it a string that contains a keyword) looks not too much hard: direction = Enum("up", "down", "left", "right") Or: direction = Enum("up, down, left, right") Or: direction = Enum("up down left right") Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Sorted dictionary
Raymond Hettinger: > Using an underlying list to track sorted items > means that insertion and deletion take O(n) time. > That could be reduced to O(log n) time by using > a blist or skiplist as the underlying structure > for maintaining a sort. In the collections module it can be useful to have ordered dicts and sets based on search trees. I'm thinking about B+ trees to use CPU cache locality better. It can be fun :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient Running Median
Very nice. I have added a comment at the bottom, using a circular queue instead of a deque may be faster. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: flow control and nested loops
Raymond Hettinger: > Another approach for exiting multiple levels of loops is wrap the > inner calls in a function and return from them when needed: > > def f(x): > for y in y: > for z in Z: > if test1(x,y,z): > return > frobnicate(x,y,z) > > for x in X: > f(x) That's usual the solution I use for this problem, it's a little rough and it has some performance cost, but it's readable and simple. In PHP break and continue accept an optional numeric value (default is 1, 0 is of course ignored) "which tells it how many nested enclosing structures are to be broken out of.": http://us2.php.net/manual/en/control-structures.break.php http://us2.php.net/manual/en/control-structures.continue.php That PHP design gives me shivers, it's horrid, bug-prone and fragile: for x in X: for y in Y: for z in Z: if test1(x, y, z): continue 3 if test2(x, y, z): continue 2 frobnicate(x, y, z) glortz(x, y) splat(x) A better solution is to add labels to Python (I hope this code is equivalent to the original Perl one), that can optionally be used by "continue" and "break". This solution design is also used by D: label OUTER: for x in X: label INNER: for y in Y: for z in Z: if test1(x, y, z): continue OUTER if test2(x, y, z): continue INNER frobnicate(x, y, z) glortz(x, y) splat(x) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: python memory use
Gary Robinson: >(I could test the particular case I mention, but I'm wondering if someone has >some fundamental knowledge that would lead to a basic understanding.)< Java is one of the languages most avid of memory, often even more than Python 2.x. Some bad people have said that Java developers were not that interested in saving RAM because Sun sells hardware, and the more RAM it uses the more they can sell ;-) More seriously, Java uses a complex hybrid generational garbage collectors, while CPython uses a much simpler reference count GC + cycle detector. A reference counter usually has a lower performance compared to good generational garbage collectors, especially if they are hybridized with several other algorithms, but it's simpler (and most things in CPython design are designed for simplicity even when they are a little less efficient, and among other things this simplicity helps this OpenSource project recruit and keep developers), it's almost deterministic (so for example in some situations you can forget to close a file) so it often uses less memory because in any moment you have only the objects you are using (reference cycles add a little extra complexity in this). While a generational GC keeps a lot of extra memory unused, free, etc. There are studies that show that if you use such kind of good generational GCs and you pay about a 2-5X memory penalty you can have a language about as fast as ones where you manually manage memory. Indeed today good Java programs are often no more than 2X slower than C++ and sometimes are about as fast or even faster (thanks to other optimizations, like a strong inlining of virtual methods done by HotSpot). If you want a language that uses less RAM you can try FreePascal :-) I think that among the languages designed to work with a GC, the D language is among the ones that uses less memory (because so far its GC is not that good, so it saves memory while being slower than the advanced GC used by Sun Java). On 64 bit systems Java Sun has added an optimization, certain pointers are compressed in 32 bits, reducing memory usage. Similar things may be done by the LLVM too in future: http://llvm.org/pubs/2005-06-12-MSP-PointerCompSlides.pdf Maybe someday 64-bit CPython will do the same, or maybe UnlandenSwallow or PyPy (Jthon in a sense may do it already if you use it on a 64 bit Java. I don't know). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Quick compare string to list
Scooter: > I'm reading in a text file, and for each line in the file, I'm looking > for the existence of phrases from a list. The list contains approx. > 120 items currently but will most likely grow. This procedure itself > is not the main function of my program and only grew out of the need > to reformat certain phrases I'm finding in a file before re-outputting > it. But as I suspected, this searching of the lists slows the whole > process way way down. Was looking for ideas of a better way to do > this. Know your basic computer science :-) http://en.wikipedia.org/wiki/Aho-Corasick_algorithm There are probably C implementations that can be used from Python, like: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: > Yes, think of sorting tree structures where you have to recursively > compare them til you find an unequal pair of nodes. That's cute. In what situations do you have to perform such kind of sort? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Enormous Input and Output Test
Terry Reedy: > Don't waste your time with problem sites that judge raw-clock time over > (and before) accuracy, thereby greatly favoring low-level languages and > hack tricks over clear high-level code. I usually don't like to solve the kind of problems shown by those sites because those problems are too much artificial (and often too much difficult). But sometimes I have written some solutions. But those sites never judge "raw" running time over accuracy: in most or all such sites the programs are tested with tons of possible inputs, and if even one output is a bit wrong, the program is totally refused. This is a hard rule that encourages programmers to take a very good care of program correctness. Some sites add a little more "noise" in the inputs, simulating a bit more real-world inputs, while most of those online contests give clean inputs (the input bounds are well specified in the problem statement). >From what I've seen from some of the best solutions submitted to those sites (some sites allow people to see the source of the contest entries), the programs usually don't (need to) use "hack tricks" as you say (even if probably some program uses them). Using hacks is often unsafe so people usually prefer safer ways to code, because just a little bug may fully compromise the score of the program. I agree that the timing scores in such sites often encourage low level languages, like C (and sometimes C++, that's a multilevel language), but on the other hand such languages exist, C is used in many real- world places, so designing sites where people compete with such languages is legit. C allows people to understand better what's going on inside the computer, this is valuable and positive. Bashing low- level languages is silly. Even CPython is written in C. A good programmer has to know both higher and lower level languages. And in real-life sometimes you need performance. This thread shows that a normal Python program is not up to those timings for the enormous input problem (even if there are ways to write a Python program to solve this problem). People at Google are trying to create a 5 times faster Python (Unladen Swallow project) because they use lot of real-world Python code and they think Python is slow. I've found plenty of situations where CPython code is not fast enough for my purposes. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Raymond Hettinger: > Psychologically, the thing that I find to be interesting is > that beginners and intermediate users seem to take to key > functions more readily than old timers. The key function seems > to be an easy thing to teach (perhaps because that's the > way Excel sorts and the way SQL sorts, or perhaps it is because > problem statements seem to arise in the form of "sort by this, > then by that" instead of "here's how two objects should be > compared"). > > In contrast, some people who have have had deep experience with > cmp functions may tend to first think of cmp solutions and then > have a harder time seeing solutions with key functions. If you > grew-up on C's qsort() like I did, then a key function may not > be the first thing that pops into your head. I love the 'key', it makes my code simpler and it's simpler to understand. I am not opposed to the idea of keeping cmp, that in some rare cases may be useful or more natural. The problem is that if you allow to use the cmp, lot of programmers will use it straight away, not even bothering to know what that strange 'key' argument may be useful for. And they miss the how much handy 'key' is. I am having a very hard type (despite my implementation, explanations, etc) explaining to D programmers why for a flexible general-purpose function a key function argument is often better. They in the end have added something like that in the std lib, but with a weird name like SchwartzSort that's surely not the first standard sort they look for... this is bad. So a funny solution to this situation may be the following: to keep only 'key' in the sort/sorted for few years, so the community of Python programmers gets used to 'key' only, and then put 'cmp' back in, for the special cases ;-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: > bearophile: >> I am having a very hard type (despite my implementation, explanations, >> etc) explaining to D programmers why for a flexible general-purpose >> function a key function argument is often better. They in the end have >> added something like that in the std lib, but with a weird name like >> SchwartzSort that's surely not the first standard sort they look >> for... this is bad. > "key" and the DSU (Schwartz) sorting pattern makes lots of sense in > scripting languages like Python that are primarily used on relatively > small data sets. The best reason for writing something in a lower > level language like D is because you want to push the limits of your > availiable machine resources. Therefore, the DSU strategy's extra > memory consumption will be what you want less of the time than it is > in Python. I think you are quite wrong. In D dynamic arrays are built-in, they are used almost as Python lists (but they are single-typed, unless you create an array of variants), among other things they have a built-in sort. Such sort is used for small situations too. Even in a language like D *very* often what you need is to sort a small amount of data in a very flexible way. In such situations what you need isn't to squeeze the last byte or CPU cycle out of the PC, but to have a handy and short syntax, a very flexible sorting, and something that's surely not bug-prone. In such situation having a 'key' argument is *better*. Such sort can be stable. That's why the main sort/sorted routines in my dlibs accept an optional key callable, and they are very handy. I can show you examples. On the other hand once in a while in a D program you may need to sort a very large amount of data, or you may need something faster (and unstable, like a refined introsort based on a 2-pivot QS), or even more special than the things allowed by the built in sort. In such situation you can import a special sorting function from the std lib and use it, such sort can have something equivalent to a cmp (but lower level, it can accept a function template alias, to inline the comparison operation). So the idea is: in a multi-level language you very often have to sort a limited amount of data in complex ways. Because in most programs a quite large percentage of the code isn't performance-critical. In such large parts of the code what you want is something very handy, safe and flexible. So having a key function is positive. In the few spots where you need high performance, you can import (or even implement with your hands) and use a specialized sorting routine. D is a general purpose language, it's not used like C in Python projects to write performance-critical spots only. This is also visible in other parts of the language, for example there are built-in associative arrays. Their syntax is handy, and even if they aren't high performance (they are sometimes slower than Python dicts despite being O(1), but they never have a O(n^2) performance profile, as may happen to Python dicts, so they are safer) you can use them in a very quick way, even if you have to create a 10-items associative array. The end result is that in D programs you can find more hashes than in C++ code, where I think programmers use them only where simpler solutions (like iterating on an array) aren't good enough. (So D AAs have to be fast even in situations where you put 8 items inside them, like in Python dicts). This free usage of AAs makes the code stronger too (because once in a while you may put 1000 items in such AA, and it will keeps working efficiently, while a linear scan in a list of 1000 items starts to become not efficient). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin: > Bearophile: > > sorting, and something that's surely not bug-prone. In such situation > > having a 'key' argument is *better*. Such sort can be stable. > > Nothing stops comparison sorting from being stable. Since the rest of > your post seems premised on the opposite, I hope that clears things up > for you. When I have written that post I was partially unfocused, I am sorry. What I meant is that a general sorting routine, even in D, is better to be first of all flexible. So I think it's better for the D built-in sort to be stable, because such extra invariant allows you to use the sort in more situations. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Hrvoje Niksic: > Note that stable sort has additional memory requirements. In situations > where you don't need stability, but do need memory-efficient in-place > sorting, an unstable sort might well be preferred. This is why > libraries such as C++'s STL offer both. There are stable sorts that need no additional memory. In a largish program written in a general-purpose multi-level language, like D (this is probably true for C++ too), often 80% of the code takes a very little time to run. Such code may need to process few small arrays, or small data sets, or to manage the GUI, etc. So optimizing those parts of the code for performance (both in memory used and CPU cycles used) is stupid. What you want in such large parts of the code is: - to write code quickly - to have code that's short, readable, easy to fix, and most important of all that is the less bug-prone as possible. So what you need in such large parts of the code is something that's very flexible and safe, even if it's not top performance (both in memory and CPU). This is why for example in D there are built-in associative arrays, that have a handy syntax, built-in methods, and they are never O(n^2), even if they are not the most efficient ones where you need max performance, or where you need minimal memory used, or where you need to perform unusual operations. Such built-ins are useful to reduce both code length and bug count, because they are quite safe and easy to use. Stable sorts are a little safer, because they don't scramble your data in certain ways, so they can be used in more situations. This is why having the Timsort in Python is good. When you run profile the D code and you see your program uses too much RAM or wastes too much time in the built-in sort, then you can switch to using special sorts from the std lib. You can even write your own hash or sort for special situations (and you don't have to drop down to use another language for that, you keep using the same, even if you may want to change your programming stile, and use a more C-like style, with no dynamic allocations, some bit twidding, pointers, more structs, 16 bit integers, unsigned integers, unions, compiler intrinsics, even inlined assembly that uses SSE3, etc). In normal code you may want to adopt a more Java-like programming style, or templates, or even using a large library I have written, you may program it almost as a curious Python-like, with lazyness too. This is why I think the built-in sort has to be flexible, because you are supposed to use it most of the times, and most of the times you don't need top performance. Different parts of a program have to be optimized for different things, computer performance, or programmer performance. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: The rap against "while True:" loops
Peter Billam: > I remember in the structured-programming revolution the > loop { ... if whatever {break;} ... } > idiom was The Recommended looping structure, because the code is > more maintainable. I think "break" was almost the antithesis of structured programming, it was seen as the little (and a bit more well behaved) brother of "goto". Too many "breaks" turn code almost into Spaghetti, that is the opposite of structured programming. > With a while () {...} idiom, you commit > yourself to a bug-prone rewrite if you ever need to insert a > statement before the first break-test, and likewise with a > repeat { ... } until () you commit yourself to a rewrite if > you ever need to insert a statement after the last break-test. I think while True:... is used often in Python because Python lacks still a do-while (or repeat-until, but this is less nice, because the condition is reversed compared to the one you use in a while-do loop) construct. Give me a do-while and a good amount of breaks&while True in my Python code will be removed. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: How about adding slice notation to iterators/generators?
Terry Reedy: > 1. islice works with any iterator; generator method would only work with > generators A slice syntax that's syntactic sugar for islice(some_iter,1,None) may be added to all iterators. >2. iterator protocol is intentionally simple.< Slice syntax is already available for lists, tuples, strings, arrays, numpy, etc, so adding it to iterators too doesn't look like adding that large amount of information to the mind of the programmer. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: The rap against "while True:" loops
Paul Rubin: > http://scholar.google.com/scholar?cluster=17368311454828547380 > > Keep in mind that the article is 35 years old though, and is purely > imperative. Lots of stuff done with cockamamie looping constructs is > more cleanly done with Python generators, itertools, higher-order > functions, etc. That's a famous and very good paper, a good read for people that like to program. The Example1 and Example2 can be rewritten in several different ways, but several compilers today are not able to perform such optimizations yet, so what Knuth has written there are still among the faster ways to implement that algorithm. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Reverse Iteration Through Integers
Paul Rubin: > If you want to peel off digits from an int one by one without string > conversions, it's easiest to do that in reverse order: > > n = 961 > digits = [] > while n > 0: > n,d = divmod(n, 10) > digits.append(d) > > Look up the docs for "divmod" for an explanation of that handy function. > Now the above gives you a reversed list of digits--what to do with it > is an exercise for you ;-). Note that if n=0 then you get the empty list. I think that with Psyco it's better to avoid divmod(). It's very positive to teach novices to always use tests every time they write a function, because it makes their programs less buggy and often allows them to solve the whole programming problem sooner: def reverse(n): """ Reverse the digits of integer n, ignoring trailing zeros. >>> reverse(12590) 9521 >>> reverse("abc") Traceback (most recent call last): ... TypeError: unsupported operand type(s) for divmod(): 'str' and 'int' >>> [reverse(x) for x in (0, 1, -1, 2, -2L, 100L, -100)] [0, 1, -1, 2, -2L, 1L, -1] >>> [reverse(x) for x in (125, 1250, 123456789)] [521, 521, 987654321] >>> [reverse(x) for x in (0.0, 1.0, -5.3, 125.0, 1.23456e20)] [0, 1.0, -5.2998, 521.0, 654321.0] >>> str(reverse(169883903200298309284038223098439430943092816286 ** 123))[:35] '65852401624276201339740994895755844' """ # code removed ... if __name__ == "__main__": import doctest doctest.testmod() print "Doctests done" Using tests (and a bit later to use a versioning system) is among the things that have to be taught as soon as possible :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: "Python for Bioinformatics" available and in stock
Sebastian Bassi, this is an piece from the #5: ProtSeq = raw_input("Protein sequence: ").upper() ProtDeg = {"A":4,"C":2,"D":2,"E":2,"F":2,"G":4,"H":2, "I":3,"K":2,"L":6,"M":1,"N":2,"P":4,"Q":2, "R":6,"S":6,"T":4,"V":4,"W":1,"Y":2} SegsValues = [] for aa in range(len(ProtSeq)): A more pythonic code is: prot_seq = raw_input("Protein sequence: ").upper() prot_deg = {... segs_values = [] for aa in xrange(len(prot_seq)): Note the use of xrange and names_with_underscores. In Python names are usually lower case and their parts are separated by underscores. >From #6: segsvalues=[]; segsseqs=[]; segment=protseq[:15]; a=0 ==> segs_values = [] segs_seqs = [] segment = prot_seq[:15] a = 0 If you want to limit the space in the book the you can pack those lines in a single line, but it's better to keep the underscores. >From #18: prop = 100.*cp/len(AAseq) return (charge,prop) ==> prop = 100.0 * cp / len(aa_seq) return (charge, prop) Adding spaces between operators and after a comma, and a zero after the point improves readability. >From #35: import re pattern = "[LIVM]{2}.RL[DE].{4}RLE" ... rgx = re.compile(pattern) When the pattern gets more complex it's better to show readers to use a re.VERBOSE pattern, to split it on more lines, indent those lines as a program, and add #comments to those lines. The #51 is missing. I like Python and I think Python is fit for bioinformatics purposes, but 3/4 of the purposes of a book like this are to teach bioinformatics first and computer science and Python second. And sometimes a dynamic language isn't fast enough for bioinformatics purposes, so a book about this topic probably has to contain some pieces of C/D/Java code too, to show and discuss implementations of algorithms that require more heavy computations (that are often already implemented inside biopython, etc, but someone has to write those libs too). The purpose here is not to teach how to write industrial-strength C libraries to perform those heavier computations, but to give the reader an idea (in a real lower-level language) how those libraries are actually implemented. Because science abhors black boxes, a scientist must have an idea of how all subsystems she/he/hir is using are working inside (that's why using Mathematica can be bad for a scientist, because such person has to write "and here magic happens" in the produced paper). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: New syntax for blocks
r: > i think the following syntax would be quite beneficial > to replace some redundant "if's" in python code. http://python.org/dev/peps/pep-3003/ bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: #define (from C) in Python
Santiago Romero: >Obviously, I prefer to write well structured code but I had to sacrifize SIZE >by SPEED< In C99 you have "inline" (and gcc/gcc-llvm usually inline small functions anyway) that helps avoid many macros. > Now I'm porting the emulator to a scripted language, so I need > even more previous design ideas before starting to code, so that > I can achieve (I hope I'll be able to do it with this group's help) > 100% cpu speed in an standard desktop PC. The tecniques needed to speed up Python code are very different from the ones you use in C. You may need a lot of time to learn Python performance tricks. Psyco helps a lot. > Well, I don't agree with that, "constants" and "macros" wouldn't > hurt python, when using them in the right situations. I miss true constants in Python, but it may be impossible to add them to this language, because of how it uses references. In Python you rely on convention, writing their names ALL_UPPERCASE. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing an emulator in python - implementation questions (for performance)
Try creation an extension module with ShedSkin. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Common area of circles
Shashwat Anand: > > Given 'n' circles and the co-ordinates of their center, and the radius of > > all being equal i.e. 'one', How can I take out the intersection of their > > area. I can see two possible solutions, both approximate. In both solutions you first look if there are a pair of circles that don't intersect, in this case the intersect area is zero. You also remove all circles fully contained in another circle. The first strategy is easy to do, but it probably leads to a lower precision. Then you can sample randomly many times the rectangular area that surely contains all circles. You count how many of those random points are inside all circles. This gives you an approximate area of their intersection. Increasing the points numbers slowly increases the result precision. The second strategy is more complex. You convert your circles into polygons with a fixed number of vertices (like 50 or 100 or 1000 or more. This number is constant even if the circles don't have all the same radius). So you can turn this circle into a simple mesh of triangles (all vertices are on the circumference). Then you "subtract" the second polygonalized circle, this can create a hole and split triangles in pieces, and so on with successive circles. At the end you can compute the total area of the triangles left. This is doable, but you need time to do implement this. The advantage is that the numerical precision of the result is probably higher. If you implement this second solution you can implement the first one too, and use it as a test to avoid bugs. Visualizing the triangles with Pygame or MatPlotLib can be useful to look for bugs. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting a string
I don't know how fast this is (Python 2.x): >>> from itertools import groupby >>> t = 'si_pos_99_rep_1_0.ita' >>> tuple(int("".join(g)) if h else "".join(g) for h,g in groupby(t, >>> str.isdigit)) ('si_pos_', 99, '_rep_', 1, '_', 0, '.ita') It doesn't work with unicode strings. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Voronoi diagram algorithm (Fortune’s sweepline )
Captain___nemo: > Isn't it possible to implement Voronoi diagram using Fortune's > algorithm without multithreading or hash map? Multi-threading isn't part of Fortune's algorithm. Multi-threading can be confusing, but it's better for you to learn to feel at home using hash maps (named dicts in Python), because they are both essential in computer science and in Python. Ask if you need some help regarding dicts and sets. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Voronoi diagram algorithm (Fortune’s sweepline )
Robert Kern: > You can see a mild modification of Fortune's original C code here: > > http://svn.scipy.org/svn/scikits/trunk/delaunay/scikits/delaunay/Voro... That's C++; C++ makes simple things hard and hard things possible :-) In Python that code may become much shorter (and slower). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Voronoi diagram algorithm (Fortune’s sweepline )
dorzey: > Found this python implementation: > http://www.oxfish.com/python/voronoi.py Looks very nice, and it may be ShedSkin-compilable too. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: TypeError: int argument required
Lawrence D'Oliveiro: >So no, using alternative quotes does not make things more readable.< You think that this: ' ' Isn't a bit more readable and simpler to write than: " " I think lot of doesn't agree with you. In such situation it can also be positive to split such string in two or more parts, for example (untested): style = ("fill:blue; stroke:pink; stroke-width:5; " "fill-opacity:0.1; stroke-opacity:0.9") print >> fo, ' ' % (abs_x, abs_y, w, h, style) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Input problem
Prasoon: > I am new to pythonand using python 2.6 > I want to know when to use raw_input( ) and when to use input( )??? I think it's better to always use raw_input(), and then convert the string to the data to work on. Once in a while you may want to use input() to input expressions (like formulas) in your program in a quick, simple and unsafe way... In Python3+ they have removed input() and they have renamed raw_input () as input(). You can have the functionality of 2.x input() with eval (input()). (I think Python3 developers have taken the right choices here). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Good books in computer science?
Nathan Stoddard: > The best way to become a good programmer is to program. Write a lot of > code; work on some large projects. This will improve your skill more than > anything else. It's also important to learn new languages regularly. I > recommend to learn C, Python, and Lisp first. To become very good in a practical activity (like programming or writing stories or playing piano) you have to do many things for a lot of time. You have to practice it a lot, but that's not enough. You also must keep pushing forward the limit of your skills, doing things hard for you. Reading smart books and learning from the experts in the field is usually necessary. Quite often it's useful to read good books not much related with the activity you are doing too, because the human mind works better this way. Another thing you have to do is to keep your eyes open, for example to be able to understand when your learning is struck in some slow corner: now and then you will have to meta-learn, that is to change the way you learn and train yourself. This is a hard and often painful thing to do, but it's probably necessary if you want to become very good, because very often you learn in the wrong way, or in a not much efficient way. Howard Gardner too has written about such topic. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: What is the best method to match a pattern in set of lines
This is a small OT post, sorry. Dennis Lee Bieber, may I ask why most or all your posts are set to "No- Archive"? > HTTP://www.bestiaria.com/ I think there are other furries beside you around here. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Good books in computer science?
Albert van der Horst: > For programming practice I do the problems of > http://projecteuler.net/ Time ago I have solved some of them with Python, D and C (some of them are quite hard for me), I have tried to produce very fast code (like a D generator for prime numbers that's like 100 times faster of some 'fast' C# prime generators I've seen in that forum). But then I have stopped because to me they seem a waste of time, they look too much academic, they don't exercise the right muscles of the mind. They may be good if you want to become good in performing numerical number theory, and bad for everyone else. Seeing how many people like to do those Project Euler puzzles, I presume my ideas aren't shared by most people. I am now using some solutions of mine of those problems to spot "performance bugs" in a new D compiler (and even in ShedSkin), so they are somewhat useful again :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Specific iterator in one line
Filip Gruszczyński: > [1, 0, 0, 1] -> ['b', 'b', 'a', 'a', 'b', 'b'] I like this version (43 golf holes), it's readable enough: [c for d in[1,0,0,1]for c in("a","bb")[d]] Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Searching equivalent to C++ RAII or deterministic destructors
Ulrich Eckhardt: > a way to automatically release the resource, something > which I would do in the destructor in C++. Is this helpful? http://effbot.org/pyref/with.htm Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Psyco good for Python 2.6.1?
Russ P.: Python works well to me on Python 2.6+, on Windows. You can find a compiled version too, around. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Is code duplication allowed in this instance?
Francesco Bochicchio: > Possibly to prepare the test data set you might need a > different - and already proven - implementation of > the algorithm. Usually a brute force or slow but short algorithm is OK (beside some hard-coded input-output pairs). Sometimes you may use the first implementation of your code that's usually simpler, before becoming complex because of successive optimizations. But you have to be careful, because the first implementation too may be buggy. Some other times there are simpler algorithms to test if the output of another algorithm is correct (think of exponential algorithms with a polinomial test of the correct result), for example to test a fancy Introsort implementation you can use a very small and simple O(n^2) sorter, or better a simple linear loop that tests the result to be sorted. Here, beside unit tests, are also useful languages (or tools) that allow to add pre- and post-conditions, class invariants, etc. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Clarity vs. code reuse/generality
On 3 Lug, 16:05, kj wrote: > I'm will be teaching a programming class to novices, and I've run > into a clear conflict between two of the principles I'd like to > teach: code clarity vs. code reuse. They are both important principles, but clarity is usually more important because short code that can't be read can't be fixed and modified, while long code that can be read is alive still. > So I thought that I'd code a helper function, called _binary_search, > that took five parameters: a lower limit, an upper limit, a > one-parameter function, a target value, and a tolerance (epsilon). Five arguments are a bit too much, even for non-novices. 2-3 arguments are more than enough for novices. Where are doctests'? Novices have to learn to think of tests as part of the normal code :-) I think the main problem in all this discussion is that generally to learn to program you have to write code yourself, so your students have to invent and write their own binary search. Reading your code they are going to learn much less. Creating a binary search from almost scratch can turn out to be too much hard for them, it means you have to ask them to invent something simpler :-) Programming is (among other things) problem solving, so they have to learn to solve not easy problems from the beginning. Showing them famous algorithms (and very good code to copy from) can be useful, but it's less important than learning basic problem solving skills, and much less important than learning why solving problems has to became their purpose and pleasure :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: tough-to-explain Python
kj, as Piet van Oostrum as said, that's the difference between mutable an immutable. It comes from the procedural nature of Python, and probably an explanation of such topic can't be avoided if you want to learn/teach Python. Lot of people think that a language where everything is immutable is simpler for newbies to understand (even if they have to learn what higher-order functions are). That's why some people think Scheme or other languages (even Clojure, I guess) are better for novices (Scheme has mutability, but you can avoid showing it to newbies). Some people say that languages that mostly encourage the use of immutable data structures (like F#, Clojure, Scala and many other less modern ones) help avoid bugs, maybe for novices too. On the other hand, it's hard or impossible to actually remove complexity from a system, and you usually just move it elsewhere. So other things will become harder to do for those novices. I have no idea if for the average novice it's simpler to learn to use a immutables-based language instead of a mutables-based one (it can also be possible that some novices prefer the first group, and other novices prefer the second group). >From what I have seen lot of students seem able to learn Python, so it's not a bad choice. Python isn't perfect, and in *many* situations it is pragmatic, for example to increase its performance. Generally for a novice programmer running speed is not important, but it's important to have a really coherent and clean language. I've personally seen that for such people even Python looks very "dirty" (even if it's one of the less dirty ones). For example a novice wants to see 124 / 38 to return the 62/19 fraction and not 3 or 3.263157894736842 :-) People may be able to invent a clean and orthogonal language that's easy to use and has very few compromises, fit for newbies. But this language may be very slow, not much useful in practice (who knows? Maybe there's a practical niche even for such very high level language), and it doesn't teach how to use lower level languages like C :-) Today I think there are no languages really fit for teaching. Python is one of the few fit ones, but it's getting more and more complex as time passes because it's getting used in more and more complex real world situations (a language fit for newbies must not have abstract base classes, decorators, etc). D language is too much complex for a newbie. Java is not too much bad, but it's requires to write too much code, it's too much fussy (semicolons at the end of lines? Newbies say that the computer is an idiot when it refuses code just because there's a missing useless semicolon!), and it misses some necessary things (first class functions! Damn). A nice language like Boo running on the Mono VM seems another option :-) In the past Pascal was good enough, but I think it's not good enough anymore. The problem is that teaching is a niche activity (even if a very important one). PLT Scheme is one of the few environments (beside Squeak, Python itself, and little more) that look refined and implemented well enough for such purpose. See you later, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: count
Vilya Harvey: > from itertools import groupby > for x, g in groupby([fields[1] for fields in data]): > print x, len(tuple(g)) Avoid that len(tuple(g)), use something like the following, it's lazy and saves some memory. def leniter(iterator): """leniter(iterator): return the length of a given iterator, consuming it, without creating a list. Never use it with infinite iterators. >>> leniter() Traceback (most recent call last): ... TypeError: leniter() takes exactly 1 argument (0 given) >>> leniter([]) 0 >>> leniter([1]) 1 >>> leniter(iter([1])) 1 >>> leniter(x for x in xrange(100) if x%2) 50 >>> from itertools import groupby >>> [(leniter(g), h) for h,g in groupby("bccaad")] [(4, 'a'), (1, 'b'), (2, 'c'), (2, 'a'), (1, 'd'), (4, 'e')] >>> def foo0(): ... if False: yield 1 >>> leniter(foo0()) 0 >>> def foo1(): yield 1 >>> leniter(foo1()) 1 """ # This code is faster than: sum(1 for _ in iterator) if hasattr(iterator, "__len__"): return len(iterator) nelements = 0 for _ in iterator: nelements += 1 return nelements Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: count
Paul Rubin: > print x, sum(1 for _ in g) Don't use that, use my function :-) If g has a __len__ you are wasting time. And sum(1 ...) is (on my PC) slower. J. Clifford Dyer: > if __name__ == '__main__': > test_func(summer, 1000) > test_func(tupler, 1000) > test_func(summer, 1) > test_func(tupler, 1) Have you forgotten my function? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Psyco 64 bits
Luis P. Mendes: > It seems that Psyco cannot be used in such platforms. > Or is there another version of Psyco for 64 bits platform? > I googled and arrived to PyPy. But I don't know how to use it. Psyco will be updated, but probably not for 64 bit CPUs. At the moment PyPy doesn't give you much speed up. So you can try IronPython, but I don't think it will help. There's also the Boo language, that's often fast enough. If your code has few hot spots, then there are many solutions that can be used. For example Cython, Weave, ShedSkin. Or writing external code in C/C++/D/Fortran and interfacing with it with a plethora of means, like ctypes, swig, PIL, Pyd, Boost.Python, Inline, and many other. Running speed is a big problem in many Python programs, so they have invented an army of possible solutions. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: can i write a assemly language programs in python
Simon Forman: > Examine CorePyhttp://www.corepy.org > > "CorePy is a complete system for developing machine-level programs in > Python. CorePy lets developers build and execute assembly-level > programs interactively from the Python command prompt, embed them > directly in Python applications, or export them to standard assembly > languages." I have never used CorePy yet, but it's an awesome project ^_^ With some care and work even Python programs can get fast enough :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: psyco V2 beta2 benchmark
larudwer, is that time_subdist subdist(i) a bad worsening? Something to be fixed in Psyco2? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: list of all possible values
David Gibb: > For example: if my values are ['a', 'b', 'c'], then all possible lists > of length 2 would be: aa, ab, ac, ba, bb, bc, ca, cb, cc. >>> from itertools import product >>> list(product("abc", repeat=2)) [('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')] Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: A question of style .....
tuxagb: > If I have to write an extension module with many objects, how can I > organizing the code? > I think: every object in a separate file, and a last file with the > PyInit_. function. But is unmenageable . > > Solutions? What do you think about using Cython? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: turtle dump
Superchicken: > is there a way to dump the content of a turtle window to a file or a file > object? A possible low-tech solution is to append to a list the sequence of your plotting commands (using a decorator too, even, something like the logging decorator), and then save all of them at the end into a text file :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: psyco V2
Very good, thank you. I'll try it when I can. Is Psyco3 going to borrow/steal some ideas/code from Unladen Swallow? The problem I have with Psyco1.6 is that you can't use the normal profilers to know how much seconds of running time is taken by each function/method of your code. Psyco1.6 has a profile() function, but I am not much able to use it yet. Can you tell me how to find a sorted list of the running time of all functions/methods of a Psyco-digested program? Bye and thank you, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
William Dode': > I just tested it with a litle game, to find the places of horse on > a board 5x5. The result is : > > c 5s > gcj 7s > java 7s > shedskin 8s > python + psyco 18s > cython avec malloc *int 18s > cython 55s avec [] python > python 303s (5m3s) Nice timings, can you please show me the Python, Java and C code versions? I may do more tests. The purpose of all those "example programs" in ShedSkin is to find bugs and to find details to speedup. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
Skip Montanaro: > I read just enough French to know that "avec" means "with", but I don't > understand the difference between "avec malloc *int" and "avec []". Can you > explain please? Maybe it's the time difference between using a Python list from Cython and using a C "array" allocated with a malloc from Cython. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
Nick Craig-Wood: > Can you give a hint as to how I debug this? I presume my program has > some instances of non static types which is causing the problem, but > it is going to be a very long debugging session if it takes me an hour > each cycle ;-) > > The program is about 700 lines of python (excluding comments). You can show us the Python (SSPython) code, and we can try to find the problem. Sometimes there's no simple ways to solve such problems. Generally for not very large progrograms if SS doesn't compile in about a minute or so then it's gone in infinite loop (there's a compilation flag that avoids some infinite loops, try it). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
William Dode': > http://hg.flibuste.net/libre/games/cheval > Like you'll see, i tried to use exactly the same code for each langage. It's a cute solver. Few more versions of mine: #1, a Psyco version of mine: http://codepad.org/9m5rf7kX #2, unrolled Psyco version: http://codepad.org/gKFLu34M #3, a quick D (D1) version: http://codepad.org/Tk9FL7Xk Timings (no printing), seconds, best of 3: #1: 4.79 #2: 3.67 #3: 1.10 Your Psyco version: 13.37 Your C version, compiled with GCC 4.3.2, -s -O3 -fomit-frame-pointer: 3.79 I have timed the whole running time of the programs, with an external timer, so my timings include the start of the PythonVM and the final cleanup of the GC. Please, feel free to time my code again, so we can compare with your other timings (Java, etc) better. I have used Psyco 1.6 final and Python 2.6.2 on a Core2 CPU at 2 GHz. To compile the D1 code you can use the free DMD compiler for D1, I have used version 1.042, www.digitalmars.com/d/download.html ). In such benchmarks it's often better to start from the fastest low level code (written in an imperative language as C/D/C++, or a version in a functional language like Clean or OCaML) and then use it to create higher level versions (in Python, etc). This offers you a baseline timing, and usually the small differences of the higher level code compared to the original C/Clean code don't invalidate the test. I have changed only small things in the program, as you can see the Psyco program #2 is faster than your C code :-) If you learn how to use it well, Psyco1.6 can often lead to very good performance. But lot of people don't know how to use Psyco well (I too am ignorant: I don't know how to profile Python programs yet). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
greg: > Posting benchmark times for Pyrex or Cython is pretty > meaningless without showing the exact code that was > used, since times can vary enormously depending on > how much you C-ify things. Was this link, shown by William, not enough? http://hg.flibuste.net/libre/games/cheval/file/46797c3a5136/chevalx.pyx#l1 Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: How to comment constant values?
Chris Rebert: > Only modules, classes, and functions/methods can have docstrings > associated with them. > For anything else, you have to use comments; or you can mention them > in the docstrings of related things. What about adding docstrings to other Python things, especially to variables? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler
William Dode': > I updated the script (python, c and java) with your unrolled version > + somes litle thinks. [...] > c 1.85s > gcj 2.15s > java 2.8s > python2.5 + psyco 3.1s > unladen-2009Q2 145s (2m45) > python2.5 254s (4m14s) > python3.1 300s (5m) > ironpython1.1.1 680s (11m20) Sorry for being late, I was away. In your last C version this code is useless because the C compiler is able to perform such simple optimization by itself (but probably Python isn't able, so if you want the code to be the similar in all versions it may be better to keep it): shift_0=shift[0]; shift_1=shift[1]; shift_2=shift[2]; shift_3=shift[3]; shift_4=shift[4]; shift_5=shift[5]; shift_6=shift[6]; shift_7=shift[7]; This part in the Python code is useless: shift_0 = shift[0] shift_1 = shift[1] shift_2 = shift[2] shift_3 = shift[3] shift_4 = shift[4] shift_5 = shift[5] shift_6 = shift[6] shift_7 = shift[7] Because later you copy values locally anyway: def solve(nb, x, y, SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit, shift_0=shift_0, shift_1=shift_1, shift_2=shift_2, shift_3=shift_3, shift_4=shift_4, shift_5=shift_5, shift_6=shift_6, shift_7=shift_7, ): So doing something like this is probably enough: def solve(nb, x, y, SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit, shift_0=shift[0], shift_1=shift[1], shift_2=shift[2], shift_3=shift[3], shift_4=shift[4], shift_5=shift[5], shift_6=shift[6], shift_7=shift[7], ): In low-level languages like C unrolling has to be done with care, to avoid slowing down the code. I have tried your latest C version using your compiler options, my MinGW based on GCC 4.3.2 produces a crash at runtime. Using LLVM-GCC it runs in 1.31 seconds. The D version is a bit less optimized than your last C versions, yet using DMD it runs in 1.08-1.10 seconds. Let's see if someone is able to write a C version faster than that D code :-) Have you have compiled/read my D version? In the D version you may have missed that I did use an extra trick: unsigned integers, so it needs just two tests to see if a number is in the 0-5, 0-5 square :-) Note that Pyd, the Python-D bridge, may work with the latest DMD version still (and it works if you use a bit older DMD compiler): http://pyd.dsource.org/ Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: The longest word
On Jul 28, 10:26 am, NiklasRTZ wrote: > Newbie hello onwards to the .py manual asks meantime how the longest > word gets determined? > First word, ok > 'a aa aaa aa'[:'a aa aaa aa'.find(' ',1,10)] Your language is not easy to understand, but I think you want to find the longest word in a string. If this is the input string: txt = "a aa aaa aa" There are several ways to do it, I'll show a simple one. You can split it into its parts (not having Python a built-in lazy split yet, you can split it all at once). You can do it with the string split method. It produces a list of the words, more or less (but you may have words like "gone,", you may have to take care of them too, this requires some code. Once you have a list of words, you have to take the longest. A simple way is to replace each word with a tuple that contains the length of the word and the word itself, then pick the tuple with the highest length value. But Python allows you to do such operation in a faster way: you can use the max() function, it has a "key" function that allows you to remap on the fly what you mean by "max". So you just have to give it a function (reference) that given the word spits its length, such function is "len" itself. If you instead want to find the position of the longest word the program gets a little longer. Ask if you need something different. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: set variable to looping index?
Martin: > for f in var1_fn, var2_fn, var3_fn: > if f.split('.')[0] == 'var1': > var1 = call_some_function(f) > . > . > . > etc As usual I have big problems in understanding what you want to do. Please, show an example of the contents of var1_fn, var2_fn, etc. Generally you can create a dict: results = {} And then fill it with name-result pairs: name = value.split('_')[0] results[name] = some_function(value) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird: > I've been trying to figure out a simple algorithm on how to combine a > list of parts that have 1D locations that overlap into a non- > overlapping list. For example, here would be my input: > > part name location > a 5-9 > b 7-10 > c 3-6 > d 15-20 > e 18-23 > > And here is what I need for an output: > part name location > c.a.b3-10 > d.e 15-23 That's sometimes known as "The Set Union Problem on Intervals". Knowing the name of a problem is sometimes half the work :-) People have written papers on this. This is a O(N^2) force-brute algorithm, you can try it on your data: data = """part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23""" pairs = map(str.split, data.splitlines()[1:]) triples = [map(int, i.split("-")) + [name] for (name, i) in pairs] overlap = lambda x1,y1, x2,y2: y1 >= x2 and x1 <= y2 results = [] for (x1,y1,name) in triples: for i, res in enumerate(results): if overlap(x1,y1, res[0],res[1]): results[i].append(name) results[i][0] = min(x1, res[0]) results[i][1] = max(y1, res[1]) break else: results.append([x1, y1, name]) print results # Output: [[3, 10, 'a', 'b', 'c'], [15, 23, 'd', 'e']] If you have have a lot of data then it will be too much slow, and you have to use a more complex algorithm. If your intervals are many, they are small, and they are spread in a not too much large range of possible values, you can create an integer array of indexes, and you can "paint" intervals on it. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Mark Dickinson: > # create sorted list of points where an interval is entered or left > transitions = [] > for name, (start, stop) in input: > transitions.extend([(start, 'Start', name), (stop, 'Stop', name)]) > transitions.sort() > > # scan list, accumulating blocks along the way. Oh, right, that's the usual simple solution. Silly me for showing the dumb quadratic solution :-) The "pixelmap" approach may be faster if you have tons of small intervals in a not too much large range. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
kj: > # connect the nodes > for i in range(len(parts) - 1): > part_i = parts[i] > for j in range(i + 1, len(parts)): Note that's O(N^2), see the O(sort) standard solution by Mark Dickinson. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Albert van der Horst: >That is an algorithmic question and has little to do with Python.< Yes, but comp.lang.python isn't comp.lang.c, that kind of questions are perfectly fine here. They help keep this place from becoming boring. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: intricated functions: how to share a variable
TP: > def tutu(): > > def toto(): > > print a > a = 4 > print a > > a=2 > toto() > > tutu() > ## > > I obtain the following error: > "UnboundLocalError: local variable 'a' referenced before assignment" > > This is because Python looks in the local context before looking in the > global context. > > The use of "global a" in toto() does not help because global allows to force > Python to look for the variable at the module level. > > So, how to share a variable between intricated functions? Generally your purpose as a programmer is to write the least intricate code. Because the less tricky it is, the more readable it becomes and also the bug count decreases. So probably a better solution is to just use the normal function semantics: you pass them an argument and you take an argument as return value. Such return value will be the new version of the value you talk about. Python3+ has the "nonlocal" statement that may do what you ask for. But as many other things in Python it must be used with judgment, to avoid writing intricate code. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing Binary Structures; Is there a better way / What is your way?
Martin P. Hellwig: > On several occasions I have needed (and build) a parser that reads a > binary piece of data with custom structure. For example (bogus one): > > BE > +-+-+-+-+--++ > | Version | Command | Instruction | Data Length | Data | Filler | > +-+-+-+-+--++ > Version: 6 bits > Command: 4 bits > Instruction: 5 bits > Data Length: 5 bits > Data: 0-31 bits > Filler: filling 0 bits to make the packet dividable by 8 Have you tried Hachoir? (I think its name may be changed to Fusil, I don't know). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: Python for Bioinformatics book
Sebastian Bassi: > All code is also available at thehttp://py3.us/##where ## is the code number, > for example:http://py3.us/57< The book looks interesting, but that doesn't look like a good way to show/offer the code. I suggest to also put it into a zip that can be downloaded. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Bug or feature: double strings as one
durumdara: > I wanna ask that is a bug or is it a feature? For me it's a bug-prone antifeature. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Need cleanup advice for multiline string
Robert Dailey: > This breaks the flow of scope. Would you guys solve this > problem by moving failMsg into global scope? > Perhaps through some other type of syntax? There are gals too here. This may help: http://docs.python.org/library/textwrap.html#textwrap.dedent Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Plotting Quadratic Functions, pygame
Senad Ibraimoski Of Belgrade: > Hello, I'm a new guy to this group, my professor recommend this group > to me, because I was boring him with questions.I'm new to python, and > I have problem plotting Quadratic Functions. Using module pygame. Python is a tool you use to learn something else, but tools are important, they also help shape the way you think. So tell your teacher that questions about tools are important enough. If your purpose is to learn something then using Pygame to plot functions is OK. If your purpose is just to plot them, and you don't have strange needs, then MatPlotLib can be better. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: flatten a list of list
Chris Rebert: > The OP asked for "simple", not "best", "most proper", or "fastest". My > comment was intended to mean that the code was marginally *simpler*, > not faster. Yep, the OP has asked for simple code. But often this is not the right way to solve this situation. A better way is to create (or copy) a flatten that's efficient and well tested & debugged, put it into some library of utilities, and then use import&use it when it's needed. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Annoying octal notation
MRAB: >'_': what if in the future we want to allow them in numbers for clarity? Hettinger says it's hard (= requires too many changes) to do that and Python programs don't have big integer constants often enough, so probably that improvement will not see the light. In the meantime in a Python program of mine I have put a small bug, writing 100 instead of 1000. Now in Python I write 10*1000*1000, because I try to learn from my bugs. In D I enjoy writing 10_000_000. --- Steven D'Aprano: > In D, at least some people want to follow Python's lead and either drop > support for oct literals completely, or require a 0o > prefix:http://d.puremagic.com/issues/show_bug.cgi?id=2656 Yes, people in the D community are trying to improve things, but it's a slow and painful process, and often it goes nowhere. There's lot of politics. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Graph library recommendations for large graphs
You may try the Python bindings for the Boost Graph Library, the graph you talk about may fit in 2GB of a 32 bit OS too (this is the first link I have found, it's a lot of time I don't use those graph bindings): http://banyan.usc.edu/log/c_cpp/boost-graph-library-python-bindings Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Algorithms as objects?
Chris Rebert: > It sounds like you're describing the Strategy Pattern > (http://en.wikipedia.org/wiki/Strategy_pattern). > > To have objects callable like functions, just implement the __call__ > special method in your class(es): Please, can't you just use a bit of functional-style programming? Like creating nested functions on the fly, etc. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Can't set attribute?!
dolgion ch: >good to know about this __slots__ thing, i'll keep it in mind< What you have to keep in mind is that it's better to not use __slots__ unless you really need to. Generally you need it not for the purposes a person coming from a static language is tempted to use it for. It's mostly useful for small classes that don't need subclassing that have to be instantiated a large number of times. As far as I know, it's meant as a memory optimization, not a way to define things in a more explicit way. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: The future of Python immutability
John Nagle: > The concept here is that objects have an "owner", which is either > a thread or some synchronized object. Locking is at the "owner" > level. This is simple until "ownership" needs to be transferred. > Can this be made to work in a Pythonic way, without explicit > syntax? > > What we want to happen, somehow, is to > transfer the ownership of "words" from the calling thread to the object > in "putitem", and transfer it to the calling thread in "getitem". > How can this be done? There are several people that have found ways to implement such owning of variables, for example Bartosz Milewski: http://bartoszmilewski.wordpress.com/2009/06/02/race-free-multithreading-ownership/ It requires some extra complexities, so statically typed languages usually don't implement this idea, even if it avoids bugs in user code. Implementing it in Python in a simple enough way looks like a good topic for advanced research :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Vectorized laziness inside
I've just seen this good Google Talk video from Jan 2008, "MonetDB/ X100: a (very) fast column-store", about a more efficient column- oriented DBMS: http://www.youtube.com/watch?v=yrLd-3lnZ58 The efficiency of this DBMS (something like 50 times faster) is produced by few things: - It's column-wise, this helps in several other successive optimizations too (row-oriented DBMS have their purpose still, the column-oriented are good if you want to perform statistics on most of your data, certain kinds of data mining, etc). - At 13.57 it shows that instead of yielding single tuples (or single items, it's column-oriented), it yields arrays of about 100 tuples/ fields. This allows to create primitives that are much more efficient. And the CPU can process them better, using SSE instruction too. Such small arrays are designed to fit in the CPU cache (probably L2). Such vectorized operations are also pipelined in some way. - The filtering operations often don't produce new vectors, they just mark the tuples as not present any more inside an array. This helps avoid many copies of such arrays. - Data is kept compressed on disk, the compression is column-wise, and decompression is done only just-in-time to save data transfers. The number compression takes in account that often data is sorted, so it's delta-compressed, and then such delta is compressed only in a range of the Gaussian-like residual distribution (outliers are encoded directly). This compression also allows to keep large indexes in RAM, that speeds up things more. - They even shown vectorized hashing, but I have not understood how they work. - The reading from the disks is done in a merged way, to avoid reading the same things many times for similar queries. (The good thing is that it's not hard to understand most things shown in this video. But I'd like to see the C code they use as "reference", that's just 3 times faster than their DBMS). DBMS inside work as the lazy operations that are getting more common in Python code (see itertools), and common in Haskell. So to reduce the Python interpreter overhead of lazy iterations it may be used a vectorized strategy (that's meant to be almost transparent for the Python programmer, so this is an implementation thing), so items can be produced and moved in groups, inside arrays, like in that video. (The DBMS too has an interpreter overhead, it's made of fast primitives used by lazy interpreted code, it looks a lot like a Python/ Ruby :-) ). Beside CPython This idea can be useful also for Psyco/ Unladen Swallow. Lazy filtering is a bit less common in Python code, so I don't know if it can also be useful the idea of not copying new arrays but just marking items as absent (for example with a bit array attached to the items array). As you may know I have introduced laziness in the D language, D2 now has a standard library that contains several of the things present in the itertools module. I'll do some experiments to see if the ideas of such vectorized laziness can improve lazy generators in D. I may even test the idea of keeping arrays with holes. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a local variable scope.
Steven D'Aprano: > (3) Create an inner function, then call that. Several people after someone gives this anwer. > My personal opinion is that if you really need a local scope inside a > function, the function is doing too much and should be split up.< I agree. And a way to split a function is to define an inner function, that's one of their main purposes. No need to add other things to the language as the OP suggests. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Google Code Jam language usage
len_b): d[i][j] += d[i - 1][j] if b[j] == a[i]: d[i][j] += d[i - 1][j - 1] d[i][j] %= 1 print "Case #%d: %04d" % (t+1, d[len(a) - 1][len_b - 1]) import psyco; psyco.full() main() (There can be ways to speed up this Python code, I have not tried to use a 1D matrix with shifts to find the right starting of the rows as in C, and often in such dynamic programming algorithms you can just keep 2 rows to avoid storing the whole dynamic matrix, this saves memory and speed up code because there are less cache misses. Using Psyco with array.array CPU cache misses can be felt, in normal Python code they are almost invisible, lost in the noise). In Python I'd like to have a built-in list&array method to assign all items to a fixed value: a1 = [1, 2, 3] a2 = [array.array('l', [0]) * 5 a1.set(0) a2.set(0) That equals to the following, but it's faster: for i in xrange(len(a1)): a1[i] = 0 for i in xrange(len(a2)): a2[i] = 0 I'd also like to have 64 bit signed/unsigned integers in array.array. Such online contests usually ignore compilation times, but in languages that have ways to perform computations at compile time (C++ with its templates, and even more in D that has better templates and CTFE http://en.wikipedia.org/wiki/Compile_time_function_execution ) such times can become important, because some precomputations can be moved to compile time. If compilation time counts too, then in such contests Python will improve its stats (a slow compilation time can be bad for the testing of the code, so it's a disadvantage for the programmer too). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Google Code Jam language usage
> (There can be ways to speed up this Python code, I have not tried to > use a 1D matrix with shifts to find the right starting of the rows as > in C, and often in such dynamic programming algorithms you can just > keep 2 rows to avoid storing the whole dynamic matrix, this saves > memory and speed up code because there are less cache misses. It was easy to do, running time about 0.13 seconds: from array import array def main(): b = "welcome to code jam" len_b = len(b) di = array('l', [0]) * len_b dim1 = array('l', [0]) * len_b for t in xrange(int(raw_input())): a = raw_input() for j in xrange(len_b): # set to 0 di[j] = 0 dim1[j] = 0 if a[0] == b[0]: dim1[0] = 1 for i in xrange(1, len(a)): for j in xrange(len_b): # set to 0 di[j] = 0 di[0] += dim1[0] if b[0] == a[i]: di[0] += 1 for j in xrange(1, len_b): di[j] += dim1[j] if b[j] == a[i]: di[j] += dim1[j - 1] di[j] %= 1 di, dim1 = dim1, di print "Case #%d: %04d" % (t+1, dim1[len_b - 1]) import psyco; psyco.full() main() Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Static typing, Python, D, DbC
I write some Python code almost every day, but lately I am using a lot the D language too. After using D for about three years I now know some of it, but when I need to write short (< about 1000 lines) *correct* programs, Python is still the more productive for me. Static typing, and the usage of transitive const of D are a strange thing. It seems obvious that they lead to safer code, but on the other hand experimentally I have seen that my short Python programs are less buggy than equivalent D ones. Static typing looks safer, but D offers many low level features, and generally it contains many more traps or ways to shoot your own foot, that they more than compensate for the "lack of safety" coming from Python dynamic typing, even when you don't use those low level features. Maybe for large (> about 100_000 lines) programs D may come out to be less bug-prone than Python (thanks to the better data hiding and static typing), but I am not sure of this at all... Static typing also has a significant costs. When you write D2 code often something doesn't work because of some transitive const or immutable (or just because of the large number of bugs that need to be fixed still in the D compiler). So here you pay some cost as debugging time (or time to avoid those problems). And there is a mental cost too, because you need to keep part of your attention on those const- related things instead of your algorithms, etc. Lately while I program with Python one of the D features that I most miss is a built-in Design By Contract (see PEP 316), because it avoids (or helps me to quickly find and fix) many bugs. In my opinion DbC is also very good used with doctests. You may implement a poor's man DbC in Python like this: class Foo: def _invariant(self): assert ... assert ... return True def bar(self, ...): assert self._invariant() ... res = ... assert self._invariant() return res But this missed several useful features of DbC. >From the D standard library, I have also appreciated a lazy string split (something like a str.xplit()). In some situations it reduces memory waste and increases code performance. I have installed this, on a Windows Vista OS: http://www.python.org/ftp/python/2.7/python-2.7.msi But I have had two problems, the 'random' module was empty, and it didn't import the new division from the future, so I've had to remove it and reinstall 2.6.6. Is this just a problem of mine? Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Static typing, Python, D, DbC
John Nagle: > Design by contract really isn't a good fit to Python. I have used some times the class invariant done manually, as I have shown, and it has avoided me some bugs, so I have experimentally seen you are wrong. > I've done proof of correctness work, and there are suitable languages > for it. It needs a language where global static analysis is > possible, so you can reliably tell what can changes what. I see DbC for Python as a way to avoid or fix some of the bugs of the program, and not to perform proof of correctness of the code. Even if you can't be certain, you are able reduce the probabilities of some bugs to happen. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Static typing, Python, D, DbC
Paul Rubin: > I think DbC as envisioned by the Eiffel guy who coined (and trademarked) > the term is that it's a static verification technique, marketing-speak > annotating subroutines with pre- and post- conditions that can be > checked with Hoare logic. Runtime checks wouldn't qualify as that. The implementations of DbC in D and C# run their tests at run-time (but in theory an external tool may find a way to perform part of those tests at compile-time). A full implementation of DbC contains several other things beside preconditions and postconditions, see http://www.python.org/dev/peps/pep-0316/ (it misses few things like loop invariants and loop variants). For me DbC is useful almost as unit-testing (I use them at the same time), this is why in the original post I have said it's one of the things I miss most in Python. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Static typing, Python, D, DbC
John Nagle: > All signed Windows 7 drivers must pass Microsoft's static checker, > which checks that they don't have bad pointers and call all the > driver APIs correctly. I didn't know this, thank you. I'll read more about this topic. Eric S. Johansson: > If you are careless, assertions can build up to a > significant percentage of the code base and Slow execution. the argument for > dealing with this, last time I looked, was that you turn off assertions when > your code is "running". This logic is flawed because bugs exist and will > assert > themselves at the worst possible time which usually means after you turned off > the assertions. These are real problems. In some situations the code is "fast enough" even with those runtime tests, so in those situations it's better to keep them active even in release builds or when you use the program (this is often true for small script-like programs). But in some situations you want a faster final program. Running the run-time contracts only when you test the code and removing them when you run the program for real sounds silly or dangerous. But life is made of trade-offs, and running those contracts during testing is better than _never_ running them. In practice if your tests are done well enough (they are similar to the real usage of the program), the contracts are able to catch most of the bugs anyway before the release build. The run-time contracts may slow down the code, but there are few ways to reduce this problem. You have to write your contracts taking care of not changing the computational complexity of the routine/method they guard (so you usually don't want to perform a O(n) test for a routine with O(n ln n) worst-case complexity). You run your tests often, so if some tests are too much slow you see it soon and you may fix the problem (the same is true for slow unit tests). In D there are several ways to "layer" the work you perform at runtime, there is not just the release build and test build. You have the version and debug statements, you may use them inside DbC contracts too. So in normal test builds I use faster contracts, but if I spot a bug I lower the debug level and I recompile the D code, activating heavier tests, to better spot where the bug is. One good thing of DbC is that when the "density" of presence of contracts in your programs gets higher of some threshold, most of the bugs present in the code show themselves up as failed assertions/contracts. To me it seems related to percolation theory :-) (http://en.wikipedia.org/ wiki/Percolation_threshold ). In D you may also use enforce(), that's essentially a camouflaged throw/raise statement, if you use it outside DbC contracts, they are tests that run even in release builds when your contracts are disabled. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: palindrome iteration
e("aibohphobia" * 1000) True >>> is_palindrome(list("aibohphobia")) True >>> is_palindrome([1, 2, 3]) Traceback (most recent call last): ... AttributeError: 'int' object has no attribute 'lower' """ n = len(text) i = 0 j = n - 1 while i < j: assert _is_palindrome_loop_invariant(i, j, n) if text[i].lower() != text[j].lower(): return False else: i += 1 j -= 1 return True if __name__ == "__main__": import doctest doctest.testmod() print "Doctests done." is_palindrome([1, 2, 3]) fails not because the input is a list, but because integers lack the lower() method. It's useful to add few failing doctests too, to show what are the limits of the duck typing of the function. As you see I have tried some corner cases, empty string, single char, two chars, three chars, and more. I have also added a 'stress test' with a larger input. In Python this code is not efficient because of the interpreter overhead and because I call a lower() method on each char (string of length 1), so this is not good Python code. But it's easy to translate to a lower level language, and I think Psyco does a good enough work on it. In C language (plus stdbool) the tolower() is more efficient, but strlen() wastes some time. Often it's better to keep around the length too of your strings, for example using a "fat pointer", as in D. // C code //#define NDEBUG #include "assert.h" #include "stdio.h" #include "string.h" #include "stdbool.h" #include "ctype.h" #include "stdlib.h" bool is_palindrome_loop_invariant(int i, int j, int length) { assert(i >= 0); assert(i < length); assert(j >= 0); assert(j < length); assert(i <= j); assert(i == length - 1 - j); return true; } bool is_palindrome(const char *text) { const int n = strlen(text); int i = 0; int j = n - 1; while (i < j) { assert(is_palindrome_loop_invariant(i, j, n)); if (tolower(text[i]) != tolower(text[j])) { return false; } else { i++; j--; } } return true; } void is_palindrome_test() { assert(is_palindrome("")); assert(is_palindrome("1")); assert(is_palindrome("aa")); assert(!is_palindrome("ab")); assert(!is_palindrome("abc")); assert(is_palindrome("aba")); assert(is_palindrome("abA")); assert(is_palindrome("AbA")); assert(is_palindrome("ABA")); assert(is_palindrome("aibohphobia")); assert(!is_palindrome("ab")); const int n = 1000; const char *s = "aibohphobia"; const int len_s = strlen(s); char *text = malloc(n * len_s + 1); if (text == NULL) exit(EXIT_FAILURE); int i; char *p = text; for (i = 0; i < n; i++) { memcpy(p, s, len_s); p += len_s; } text[n * len_s] = '\0'; // puts(text); assert(is_palindrome(text)); } int main() { is_palindrome_test(); return EXIT_SUCCESS; } C experts (or good C lints) are probably able to find some loss of precision or loss of sign errors in the following lines: const int n = strlen(text); const int len_s = strlen(s); char *text = malloc(n * len_s + 1); memcpy(p, s, len_s); Writing perfect C code isn't an easy art. (I have used signed values because writing correct C code with unsigned values is a pain, despite avoids loss of precision or sign). Compiling the C code with GCC 4.5.1, and disabled assertions (NDEBUG defined): -O3 -Wall -fomit-frame-pointer -S The loop of is_palindrome() gets compiled to (x86, 32 bit): L10: addl$1, %esi subl$1, %ebx cmpl%ebx, %esi jge L9 L4: movsbl (%edi,%esi), %eax movl%eax, (%esp) call_tolower movl%eax, %ebp movsbl (%edi,%ebx), %eax movl%eax, (%esp) call_tolower cmpl%eax, %ebp je L10 Those two calls to _tolower inside the loop don't help performance, but if you know your strings contain only upper case and lower case chars, and you are able to assume few other things on your machine, it's easy to replace them with non-portable inlined code. The addl and subl are the i++ and j--, GCC has moved them at top as commonly done optimization. The first cmpl is the (i < j) test, and the jge is a jump that ends the loop when it's the right time to do so. The movsbl go read one char, and put them in eax. The movl are necessary to set arguments for the call to the tolower function. The last cmpl compares the results of the tolower, and je (jump equal) jumps to the start of the loop if they are equal. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list