from:"bearophile"

Re: Few things

2004-11-30 Thread bearophile

Thank you for the comments and answers, and sorry for my answering
delay...

Josiah Carlson:

>Decorators can do this without additional syntax. Think @accepts and
@returns.<

The purpose of those pre-post is to write something simile and very
*clean* that states what inputs and outputs must be. This is an
example of a pre-post conditional for a sorting function taken from
that site (all this is inside the docstring of the function):

pre:
# must be a list
isinstance(a, list)

# all elements must be comparable with all other items
forall(range(len(a)),
   lambda i: forall(range(len(a)),
lambda j: (a[i] < a[j]) ^ (a[i] >= a[j])))

post[a]:
# length of array is unchanged
len(a) == len(__old__.a)

# all elements given are still in the array
forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))

# the array is sorted
forall([a[i] >= a[i-1] for i in range(1, len(a))])


Surely such things can be passed (at least as strings) to the @accepts
and @returns decorators (using a "decorate" identifier instead of @ is
probably nicer, because the @ makes Python look more like Perl, but
I've seen that lots of people have already discussed such topic). Such
testing performed by such decorators can be "switched off" with a
global boolean flag when the program is debugged and tested.
So now someone can write and let standardise a couple of good @accepts
and @returns decorators/functors :-]


>Having a 'faq' for permutation and combination generation would be
99% of the way there.<

Uh, I'm sorry, but I don't understand :-]
Aren't such functions quite well defined?


>[Fixed] Quickselect, really, doesn't gain you a whole lot. Sure, it's
a log factor faster to select a median, but many algorithms involving
selecting medians (at least the ones that I run into in CS theory) end
up repeatedly (logn times) selecting the 'kth' smallest element
(varying k's), where sorting would actually run slightly faster.<

I've done some tests with a Quickselect that I have essentially
translated and adapted to pure Python from "Numerical Recipes" (it
seems a bit faster than the Quickselect coded by Raymond Hettinger
that can be seen in the cookbook). I have seen that on my PC, on
random sequence of FP numbers, a *single* Quickselect (to find just
the median) is faster than the standard sort for lists longer than
about 3 million elements. So it's often useless.
But using Psyco, that Quickselect becomes 5-6 times faster (for long
lists), so it beats the (good) standard Sort for lists longer than
600-3000 elements. If the Quickselect works in place (as the sort)
then it returns a partially ordered list, and you can use it to
quickly select other positions (so for close positions, like the
computing of the two central values for the median, the complexity of
the second select is nearly a constant time).
So coding the Quickselect in C/Pyrex can probably make it useful.
If you are interested I can give the Python Quickselect code, etc.


>Raymond Hettinger<

I have already seen that this person is working a lot on Python, often
in the algorithmic parts.


Nick Coghlan>I believe the OP was objecting to the spelling of "this
integer literal is hex" and "this integer literal is octal".<

Right.


Josiah Carlson>Regardless, I also don't believe the "I don't like
this" without "this is the way it should be" will result in anything.<

You are right, I was mostly afraid of saying silly things... Here is:
Such syntax can be like:
number

(Putting  at the beginning of the number is probably
worse and it goes against normal base representation in mathematics,
where you often subscript the base number).

 cannot be "B" or "b" (that stands for "base") because
number can be a Hex containing B too... So  can be "_"
(this is the Subscript in TeX markup, so this agrees with normal
representation of the base)

 can be:
1)just an integer number representing the base (similar to the second
parameter of "int", this also allows to specify any base).
2) a symbol to represent a smaller class of possibilities, like 0=2,
1=8, 2=10, 3=16, 4=64. Instead
of such digits a letter can be used: a=2, b=8, c=10, etc.
I think the first option is better.

So integer numbers can be written like:
1010100111011_2
154545_10
777_8
afa35a_16
Fi3pK_64


Thank you to Carlos Ribeiro for your development of such doc string
ideas, I appreciate them :-]

Bear hugs,
Bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Few things

2004-11-30 Thread bearophile

Thank you Josiah Carlson for your answers. If certain parts of my
messages upset you, then you can surely skip those parts.
(Now I am reading a big good manual: "Learning Python, II ed.", so in
the future I hope to write better things about this language.)


>That is simple and clean?<

Well, it's a clean way to express such complexity.
And now I think decorators aren't so fit for such purpose (they
produce less clear pre-post). The pre-posts in the docs seem better to
me.


>Then again, I document and test, and haven't used pre/post conditions
in 5+ years.<

Yep, documentation and doctests (etc) are useful.


>>but I've seen that lots of people have already discussed such
topic).<<

>Discussion of the @ decorator syntax is a moot point.<

I know, I noticed that... Still, only few things are fixed in stone
:-]


>Think of it like an 'example' in the documentation, where the code is
provided for doing both permutations and combinations.<

Ah, now I see.
In my original post of this thread I have given an URL of some
routines written in C because they are about 10 times faster than the
routines that I can write in Python.


>>If you are interested I can give the Python Quickselect code, etc.<<

>No thank you, I have my own.<

I see. I haven't tried to force you to take my routines, but to offer
any interested person a way to cheek my words.


>Ick.  In Python, the language is generally read left to right, in a
similar fashion to english.  The prefix notation of 0 and
0x, in my opinion, reads better than your
postfix-with-punctuation notation.<

As you say, it's your opinion, and I respect it, but in mathematics I
think you usually put the base at the end of the number, as a small
subscripted number (but I understand your following explanations).


>I'll also mention that two of your examples; afa35a_16 and Fi3pK_64,
are valid python variable names<

Uh, I am sorry... As I feared, I have written a silly thing :-)


>I much prefer just using decimal and offering the proper base
notation afterwards in a comment...<

I agree.

I'll avoid to give hugs,
Bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-Dev] PEP: source proposal

2004-12-05 Thread bearophile

(I answer here because I'm still not at easy on python-dev)
It looks like an interesting feature. I think most LOGOs allows
something similar (or better, with a cleaner syntax). The shells of
many versions of this language keep a "pool" of functions defined with
"to" into a kind of database, so you can call them and edit them one
at a time (this is a small block from Wikipedia; note that LOGO allows
to do more than just turtle graphic):

to spiral :size
   if  :size > 30 [stop] ; a condition stop
   fd :size rt 15; many lines of action
   spiral :size *1.02; the tailend recursive call
end

In Mathematica you can edit "cells" of the interactive document, you
can press enter to go to the next line inside the same cell, and then
you press SHIFT+enter to "compile" and execute the text in the cell
(usually going to the cell that follows the output one).

Some similar "improvements" of the CPython shell can be useful... but
I think you first need to look in many places (Smalltalk, LOGO,
Mathematica, Mathcad, etc.) to find the best ideas to copy from.

Bear hugs,
Bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Number of distinct substrings of a string [continuation]

2009-11-29 Thread Bearophile

n00m:
> This worked out in 5.28s
> Imo it's not that *much* slower
> (of course, Psyco can't help here)

There's no need of a chain here, so you can rewrite this:

import itertools
def subs(s):
return len(set(itertools.chain(
s[i:j]
for i in xrange(len(s))
for j in xrange(i, len(s)+1 - 1

as:

def subs(s):
return len(set(s[i:j] for i in xrange(len(s))
  for j in xrange(i, len(s)+1))) - 1


Psyco helps a little (about 10% on my PC) if you use it like this:

from time import time
import psyco; psyco.full()

def main():
t = time()
fin = open("input.txt")
fout = open("output.txt", "w")
fin.next()
for line in fin:
r = set()
s = line.rstrip()
len_s = len(s)
for i in xrange(len_s):
for j in xrange(i, len_s + 1):
r.add(s[i : j])
print >> fout, len(r) - 1
    fout.close()
print time() - t

main()

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Number of distinct substrings of a string [continuation]

2009-11-29 Thread Bearophile

n00m:
> My Py solution:
> ...

Cute. You can replace:
a[ord(s[i])] += [i]
With:
a[ord(s[i])].append(i)

If you want to micro-optimize this with Psyco may be a little faster:
(lev >> 1)
Than:
lev // 2

But to increase speed it's usually better to reduce memory
allocations. So you can try to find a way to pull this out of the
function:
a = [[] for i in xrange(128)]
And to avoid a true append:
a[ord(s[i])] += [i]

(There are many ways to avoid an append. One of them is to have an
already allocated list/array and just bump forward an index variable
that starts from 0. This is worth doing especially when you use
Psyco).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Number of distinct substrings of a string [continuation]

2009-11-29 Thread Bearophile

n00m:
> my home tests proved Python is a fellow fast beast of C++.
> Quite unexpected (I expected Py would be by ~10 times slower).
> PS
> Both my codes are identical in their algorithms.

> =
> 0.016         0.0150001049042   <--- exec times

Maybe in your C++ code there's something that can be improved, this is
a 1:1 translation to D (V.1) language (using dlibs) and it's about 2.2
times faster than the Psyco version:
http://codepad.org/MQLj0ydB
Using a smarter usage of memory (that is avoiding all or most memory
allocations inside all loops), the performance difference will surely
grow.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Number of distinct substrings of a string [continuation]

2009-11-29 Thread Bearophile

n00m:

> I suspect that building of Suffix Tree would
> be a big exec.time-consuming overhead

In C/D/C++ there are ways to allocate memory in smarter ways, using
pools, arenas, stacks, freelists, etc. With that, using a compact
encoding of tree nodes (there are many different tricks to reduce
space used by a suffix tree), you can go fast enough. Probably a basic
implementation too will suffice in many cases :-)

Anyway, you may try a pure-Python2.x implementation:
http://suffixtree.googlecode.com/files/suffixtree-0.1.py
If you find time & will to try to use that (or similar code) to
improve your Python solution to the problem 99, you can show us the
code here...

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library for Python

2009-12-09 Thread Bearophile

Robin Becker:

> There are already very many implementations eg
>
> http://code.google.com/p/igraphhttp://www.boost.org/doc/libs/release/libs/graphhttp://ernst-schroeder.uni.lu/Digraph/doc/http://code.google.com/p/python-graphhttp://compbio.washington.edu/~zach/py_graph/doc/html/public/py_graph...
>
> and many others..some of the above already seem to be very useful.

There's mine too:
http://sourceforge.net/projects/pynetwork/
API:
http://code.activestate.com/recipes/466329/

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library for Python

2009-12-10 Thread Bearophile

geremy condra:

> Since that's released under the python license, I'm going to
> go ahead and commit the version that includes the topo
> traversal, but if you have any objections you only need to
> say the word and I'll take it down.

No objections :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library for Python

2009-12-10 Thread Bearophile

Geremy Condra:

> is there a particular way you want your attribution line to read?

You can just use my nickname (in all lowercase), with the list of
parts you have used. Don't worry.


> Well, we all seem to have reinvented the wheel differently ;)

Maybe also because they are designed for different purposes.


> Bearophile, Tiago- any interest in trying to combine the
> best parts of our libraries, with an eye towards eventual
> integration into the standard library?

The first thing to do is to ask Guido and Hettinger if they are
willing to put a "good" graph module into the std lib. If their answer
is positive for some definition of "good", then we can think about
doing something.

Several years ago I have suggested to put a graph module in the std
lib, and the answer was something like: "Put the lib online, and if
people use it a lot, we'll see to put it into the std lib." In the
meantime my lib was used by no one and ten other graph libs are used
(networkx seems among the most used), but I think no one of them has
shown a strong usage. (In the meantime Hettinger has written and added
two or three or four GOOD data structures to the std lib using a "fast
lane", avoiding the step of popular usage test).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Which graph library is best suited for large graphs?

2009-12-11 Thread Bearophile

Wolodja Wentland:
> Which library would you choose?

This one probably uses low memory, but I don't know if it works still:
http://osl.iu.edu/~dgregor/bgl-python/

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: switch

2009-12-11 Thread Bearophile

Bruno Desthuilliers:

> Well, obviously such business rules must by no mean be hardcoded. You
> really need a "rule engine", configurable by your domain experts thru a
> DSL that we'll design specially for you. The rule engine will generate
> an AbstractScoreFactory that will instanciate appropriate IScore
> implementation objects that knows what to do.

[snip]

Thank you very much for bringing back some sanity in this newsgroup,
sometimes a good antiexample like yours is better than many
explanations.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Seek support for new slice syntax PEP.

2009-12-15 Thread Bearophile

Steven D'Aprano:

> I've lost all enthusiasm for discussing language enhancements

That's probably the main downside of the moratorium. Humans need to
play some to keep their will to work and improve things.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: iterators and views of lists

2009-12-16 Thread Bearophile

Brendan Miller:
> Currently people slice and dice with well... slices, but those are
> copying, so if you want to operate over part of a range you make a
> copy, perform the operation, then copy the results back in.
>
> I was thinking you'd want something like random access iterators in
> c++, or pointers in c, to write typical in place algorithmic code. To
> me, something like non-copying slices (maybe you'd call it a list
> view?) would seem functionally similar and maybe more pythonic.

There are surely ways to modify the current situation, introducing
views, and other kind of iterators (C++ design in this regard can be
improved, see the last works of Andrei Alexandrescu about D).

This can lead to a certain increase of the performance of Python
programs, but it also surely makes writing programs harder and
introduces new bug sources.

Modern languages are now doing something kinda different from what you
ask for, introducing more immutable data (that in theory lead to more
copying, but in practice allow for more data sharing too, because
there's less need to actually copy data when the data is immutable),
see Clojure. An hypothetical Python language designed today probably
copies more stuff from Haskell and Clojure.

Python is kept intentionally simple, because it's designed to be
usable by people that are not just professional programmers, but for
example a biologist that needs to perform some computations and
doesn't want to use all the time learning how to use the language
itself (think about C++ again). So Python programmers live with a less
performing language. When performance of the code is not enough (even
with the future Unladen Swallow), they use another language (often to
write extensions used from Python code).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: iterators and views of lists

2009-12-18 Thread Bearophile

Brendan Miller:
> I agree though, it doesn't matter to everyone and anyone. The reason I
> was interested was because i was trying to solve some specific
> problems in an elegant way. I was thinking it would be cool to make
> python more usable in programming competitions by giving it its own
> port of the STL's algorithm library, which needs something along the
> lines of C++'s more powerful iterators.

It seems you have missed my post, so here it is, more explicitly:

http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf

Bye,
bearophie
-- 
http://mail.python.org/mailman/listinfo/python-list

Vectorized laziness 2

2009-12-21 Thread Bearophile

Do you remember my post about Vectorized laziness that was fully
ignored by everyone here?
http://groups.google.com/group/comp.lang.python/browse_thread/thread/2637aafa1274629d/

The latest Clojure v.1.1 has implemented the same idea, they are named
"Chunked Sequences":
http://www.infoq.com/news/2009/12/clojure-11-rc1-transients
See:
http://clojure.googlegroups.com/web/chunks.pdf

(I know they can have some problematic corner cases.)

Bye and be well,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: syntax error : first python program

2009-12-23 Thread Bearophile

Ben Finney:

> > bash# /usr/bin/python sample.py
> >   File "sample.py", line 2
> >     name = "blah"
> >     ^
> > SyntaxError: invalid syntax
>
> Indentation is syntax in Python.

But I agree, a little better error message may help there, something
that contains "indentation" in it :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Missing collections

2009-12-26 Thread Bearophile

What are the useful collections that are missing in the collections
module?

Let's see:
- sortedDict, sortedSet (based on a search tree)
- frozenDict
- bitArray (or bitset)
- graph
- linkedList (*)

(*) linkedList is like deque, it's a linked list of short arrays, but
they aren't 100% full.

Few more that are less important:
- biDict (bidirectional dict, a bijection)
- persistentList, persistentDict
- intervalDict
- Trie
- Dawg
- Rope
- Fibonacci heap
- Bloom filter
- Union-Find

With those I think many things are covered :-)

Regarding the standard library of Python 3, it's easy enough to create
for mistake a module and have it collide with an equally named module
of the std lib. To avoid this I think (after seeing the std lib of the
D language) it can be useful to put all modules of the standard
library into a namespace, like "py" or "std" or something like that.
So you write:

import py.math
from py.math import sin

x = py.math.cos(1.2)
y = sin(1.2)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Symbols as parameters?

2010-01-21 Thread Bearophile

Martin Drautzburg, having symbols spread in the namespace is bad. And
too much magic is even worse. You seem to need something like an enum
that must be used with its qualified name, as:
move(direction.up)
That works well unless the symbols name are keywords.

Creating a small class like this (that warns if you give it a string
that contains a keyword) looks not too much hard:
direction = Enum("up", "down", "left", "right")
Or:
direction = Enum("up, down, left, right")
Or:
direction = Enum("up down left right")

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sorted dictionary

2010-01-21 Thread Bearophile

Raymond Hettinger:
> Using an underlying list to track sorted items
> means that insertion and deletion take O(n) time.
> That could be reduced to O(log n) time by using
> a blist or skiplist as the underlying structure
> for maintaining a sort.

In the collections module it can be useful to have ordered dicts and
sets based on search trees.
I'm thinking about B+ trees to use CPU cache locality better. It can
be fun :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient Running Median

2010-01-23 Thread Bearophile

Very nice. I have added a comment at the bottom, using a circular
queue instead of a deque may be faster.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: flow control and nested loops

2009-09-26 Thread Bearophile

Raymond Hettinger:

> Another approach for exiting multiple levels of loops is wrap the
> inner calls in a function and return from them when needed:
>
>    def f(x):
>        for y in y:
>            for z in Z:
>                if test1(x,y,z):
>                    return
>                frobnicate(x,y,z)
>
>    for x in X:
>       f(x)

That's usual the solution I use for this problem, it's a little rough
and it has some performance cost, but it's readable and simple.

In PHP break and continue accept an optional numeric value (default is
1, 0 is of course ignored) "which tells it how many nested enclosing
structures are to be broken out of.":
http://us2.php.net/manual/en/control-structures.break.php
http://us2.php.net/manual/en/control-structures.continue.php
That PHP design gives me shivers, it's horrid, bug-prone and fragile:

for x in X:
for y in Y:
for z in Z:
if test1(x, y, z):
continue 3
if test2(x, y, z):
continue 2
frobnicate(x, y, z)
glortz(x, y)
splat(x)


A better solution is to add labels to Python (I hope this code is
equivalent to the original Perl one), that can optionally be used by
"continue" and "break". This solution design is also used by D:

label OUTER:
for x in X:
label INNER:
for y in Y:
for z in Z:
if test1(x, y, z):
continue OUTER
if test2(x, y, z):
continue INNER
frobnicate(x, y, z)
glortz(x, y)
splat(x)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python memory use

2009-09-30 Thread Bearophile

Gary Robinson:

>(I could test the particular case I mention, but I'm wondering if someone has 
>some fundamental knowledge that would lead to a basic understanding.)<

Java is one of the languages most avid of memory, often even more than
Python 2.x. Some bad people have said that Java developers were not
that interested in saving RAM because Sun sells hardware, and the more
RAM it uses the more they can sell ;-)

More seriously, Java uses a complex hybrid generational garbage
collectors, while CPython uses a much simpler reference count GC +
cycle detector.

A reference counter usually has a lower performance compared to good
generational garbage collectors, especially if they are hybridized
with several other algorithms, but it's simpler (and most things in
CPython design are designed for simplicity even when they are a little
less efficient, and among other things this simplicity helps this
OpenSource project recruit and keep developers), it's almost
deterministic (so for example in some situations you can forget to
close a file) so it often uses less memory because in any moment you
have only the objects you are using (reference cycles add a little
extra complexity in this). While a generational GC keeps a lot of
extra memory unused, free, etc. There are studies that show that if
you use such kind of good generational GCs and you pay about a 2-5X
memory penalty you can have a language about as fast as ones where you
manually manage memory. Indeed today good Java programs are often no
more than 2X slower than C++ and sometimes are about as fast or even
faster (thanks to other optimizations, like a strong inlining of
virtual methods done by HotSpot).

If you want a language that uses less RAM you can try FreePascal :-)

I think that among the languages designed to work with a GC, the D
language is among the ones that uses less memory (because so far its
GC is not that good, so it saves memory while being slower than the
advanced GC used by Sun Java).

On 64 bit systems Java Sun has added an optimization, certain pointers
are compressed in 32 bits, reducing memory usage. Similar things may
be done by the LLVM too in future:
http://llvm.org/pubs/2005-06-12-MSP-PointerCompSlides.pdf
Maybe someday 64-bit CPython will do the same, or maybe
UnlandenSwallow or PyPy (Jthon in a sense may do it already if you use
it on a 64 bit Java. I don't know).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Quick compare string to list

2009-09-30 Thread Bearophile

Scooter:
> I'm reading in a text file, and for each line in the file, I'm looking
> for the existence of phrases from a list. The list contains approx.
> 120 items currently but will most likely grow. This procedure itself
> is not the main function of my program and only grew out of the need
> to reformat certain phrases I'm finding in a file before re-outputting
> it. But as I suspected, this searching of the lists slows the whole
> process way way down. Was looking for ideas of a better way to do
> this.

Know your basic computer science :-)
http://en.wikipedia.org/wiki/Aho-Corasick_algorithm

There are probably C implementations that can be used from Python,
like:
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Q: sort's key and cmp parameters

2009-10-01 Thread Bearophile

Paul Rubin:

> Yes, think of sorting tree structures where you have to recursively
> compare them til you find an unequal pair of nodes.

That's cute. In what situations do you have to perform such kind of
sort?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Enormous Input and Output Test

2009-10-04 Thread Bearophile

Terry Reedy:

> Don't waste your time with problem sites that judge raw-clock time over
> (and before) accuracy, thereby greatly favoring low-level languages and
> hack tricks over clear high-level code.

I usually don't like to solve the kind of problems shown by those
sites because those problems are too much artificial (and often too
much difficult). But sometimes I have written some solutions.

But those sites never judge "raw" running time over accuracy: in most
or all such sites the programs are tested with tons of possible
inputs, and if even one output is a bit wrong, the program is totally
refused. This is a hard rule that encourages programmers to take a
very good care of program correctness.

Some sites add a little more "noise" in the inputs, simulating a bit
more real-world inputs, while most of those online contests give clean
inputs (the input bounds are well specified in the problem statement).

>From what I've seen from some of the best solutions submitted to those
sites (some sites allow people to see the source of the contest
entries), the programs usually don't (need to) use "hack tricks" as
you say (even if probably some program uses them). Using hacks is
often unsafe so people usually prefer safer ways to code, because just
a little bug may fully compromise the score of the program.

I agree that the timing scores in such sites often encourage low level
languages, like C (and sometimes C++, that's a multilevel language),
but on the other hand such languages exist, C is used in many real-
world places, so designing sites where people compete with such
languages is legit. C allows people to understand better what's going
on inside the computer, this is valuable and positive. Bashing low-
level languages is silly. Even CPython is written in C. A good
programmer has to know both higher and lower level languages.

And in real-life sometimes you need performance. This thread shows
that a normal Python program is not up to those timings for the
enormous input problem (even if there are ways to write a Python
program to solve this problem). People at Google are trying to create
a 5 times faster Python (Unladen Swallow project) because they use lot
of real-world Python code and they think Python is slow. I've found
plenty of situations where CPython code is not fast enough for my
purposes.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Q: sort's key and cmp parameters

2009-10-06 Thread Bearophile

Raymond Hettinger:

> Psychologically, the thing that I find to be interesting is
> that beginners and intermediate users seem to take to key
> functions more readily than old timers.  The key function seems
> to be an easy thing to teach (perhaps because that's the
> way Excel sorts and the way SQL sorts, or perhaps it is because
> problem statements seem to arise in the form of "sort by this,
> then by that" instead of "here's how two objects should be
> compared").
>
> In contrast, some people who have have had deep experience with
> cmp functions may tend to first think of cmp solutions and then
> have a harder time seeing solutions with key functions.  If you
> grew-up on C's qsort() like I did, then a key function may not
> be the first thing that pops into your head.

I love the 'key', it makes my code simpler and it's simpler to
understand. I am not opposed to the idea of keeping cmp, that in some
rare cases may be useful or more natural.

The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that
strange 'key' argument may be useful for. And they miss the how much
handy 'key' is.

I am having a very hard type (despite my implementation, explanations,
etc) explaining to D programmers why for a flexible general-purpose
function a key function argument is often better. They in the end have
added something like that in the std lib, but with a weird name like
SchwartzSort that's surely not the first standard sort they look
for... this is bad.

So a funny solution to this situation may be the following: to keep
only 'key' in the sort/sorted for few years, so the community of
Python programmers gets used to 'key' only, and then put 'cmp' back
in, for the special cases ;-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Q: sort's key and cmp parameters

2009-10-06 Thread Bearophile

Paul Rubin:

> bearophile:
>> I am having a very hard type (despite my implementation, explanations,
>> etc) explaining to D programmers why for a flexible general-purpose
>> function a key function argument is often better. They in the end have
>> added something like that in the std lib, but with a weird name like
>> SchwartzSort that's surely not the first standard sort they look
>> for... this is bad.

> "key" and the DSU (Schwartz) sorting pattern makes lots of sense in
> scripting languages like Python that are primarily used on relatively
> small data sets.  The best reason for writing something in a lower
> level language like D is because you want to push the limits of your
> availiable machine resources.  Therefore, the DSU strategy's extra
> memory consumption will be what you want less of the time than it is
> in Python.

I think you are quite wrong.

In D dynamic arrays are built-in, they are used almost as Python lists
(but they are single-typed, unless you create an array of variants),
among other things they have a built-in sort. Such sort is used for
small situations too. Even in a language like D *very* often what you
need is to sort a small amount of data in a very flexible way. In such
situations what you need isn't to squeeze the last byte or CPU cycle
out of the PC, but to have a handy and short syntax, a very flexible
sorting, and something that's surely not bug-prone. In such situation
having a 'key' argument is *better*. Such sort can be stable. That's
why the main sort/sorted routines in my dlibs accept an optional key
callable, and they are very handy. I can show you examples.

On the other hand once in a while in a D program you may need to sort
a very large amount of data, or you may need something faster (and
unstable, like a refined introsort based on a 2-pivot QS), or even
more special than the things allowed by the built in sort. In such
situation you can import a special sorting function from the std lib
and use it, such sort can have something equivalent to a cmp (but
lower level, it can accept a function template alias, to inline the
comparison operation).

So the idea is: in a multi-level language you very often have to sort
a limited amount of data in complex ways. Because in most programs a
quite large percentage of the code isn't performance-critical. In such
large parts of the code what you want is something very handy, safe
and flexible. So having a key function is positive. In the few spots
where you need high performance, you can import (or even implement
with your hands) and use a specialized sorting routine. D is a general
purpose language, it's not used like C in Python projects to write
performance-critical spots only.

This is also visible in other parts of the language, for example there
are built-in associative arrays. Their syntax is handy, and even if
they aren't high performance (they are sometimes slower than Python
dicts despite being O(1), but they never have a O(n^2) performance
profile, as may happen to Python dicts, so they are safer) you can use
them in a very quick way, even if you have to create a 10-items
associative array. The end result is that in D programs you can find
more hashes than in C++ code, where I think programmers use them only
where simpler solutions (like iterating on an array) aren't good
enough. (So D AAs have to be fast even in situations where you put 8
items inside them, like in Python dicts). This free usage of AAs makes
the code stronger too (because once in a while you may put 1000 items
in such AA, and it will keeps working efficiently, while a linear scan
in a list of 1000 items starts to become not efficient).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Q: sort's key and cmp parameters

2009-10-07 Thread Bearophile

Paul Rubin:
> Bearophile:
> > sorting, and something that's surely not bug-prone. In such situation
> > having a 'key' argument is *better*. Such sort can be stable.
>
> Nothing stops comparison sorting from being stable.  Since the rest of
> your post seems premised on the opposite, I hope that clears things up
> for you.

When I have written that post I was partially unfocused, I am sorry.

What I meant is that a general sorting routine, even in D, is better
to be first of all flexible. So I think it's better for the D built-in
sort to be stable, because such extra invariant allows you to use the
sort in more situations.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Q: sort's key and cmp parameters

2009-10-07 Thread Bearophile

Hrvoje Niksic:

> Note that stable sort has additional memory requirements.  In situations
> where you don't need stability, but do need memory-efficient in-place
> sorting, an unstable sort might well be preferred.  This is why
> libraries such as C++'s STL offer both.

There are stable sorts that need no additional memory.

In a largish program written in a general-purpose multi-level
language, like D (this is probably true for C++ too), often 80% of the
code takes a very little time to run. Such code may need to process
few small arrays, or small data sets, or to manage the GUI, etc.

So optimizing those parts of the code for performance (both in memory
used and CPU cycles used) is stupid. What you want in such large parts
of the code is:
- to write code quickly
- to have code that's short, readable, easy to fix, and most important
of all that is the less bug-prone as possible.

So what you need in such large parts of the code is something that's
very flexible and safe, even if it's not top performance (both in
memory and CPU).

This is why for example in D there are built-in associative arrays,
that have a handy syntax, built-in methods, and they are never O(n^2),
even if they are not the most efficient ones where you need max
performance, or where you need minimal memory used, or where you need
to perform unusual operations. Such built-ins are useful to reduce
both code length and bug count, because they are quite safe and easy
to use.

Stable sorts are a little safer, because they don't scramble your data
in certain ways, so they can be used in more situations. This is why
having the Timsort in Python is good.

When you run profile the D code and you see your program uses too much
RAM or wastes too much time in the built-in sort, then you can switch
to using special sorts from the std lib. You can even write your own
hash or sort for special situations (and you don't have to drop down
to use another language for that, you keep using the same, even if you
may want to change your programming stile, and use a more C-like
style, with no dynamic allocations, some bit twidding, pointers, more
structs, 16 bit integers, unsigned integers, unions, compiler
intrinsics, even inlined assembly that uses SSE3, etc). In normal code
you may want to adopt a more Java-like programming style, or
templates, or even using a large library I have written, you may
program it almost as a curious Python-like, with lazyness too.

This is why I think the built-in sort has to be flexible, because you
are supposed to use it most of the times, and most of the times you
don't need top performance. Different parts of a program have to be
optimized for different things, computer performance, or programmer
performance.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The rap against "while True:" loops

2009-10-12 Thread Bearophile

Peter Billam:

> I remember in the structured-programming revolution the
>    loop { ... if whatever {break;} ... }
> idiom was The Recommended looping structure, because the code is
> more maintainable.

I think "break" was almost the antithesis of structured programming,
it was seen as the little (and a bit more well behaved) brother of
"goto".
Too many "breaks" turn code almost into Spaghetti, that is the
opposite of structured programming.


> With a   while () {...}   idiom, you commit
> yourself to a bug-prone rewrite if you ever need to insert a
> statement before the first break-test,  and likewise with a
> repeat { ... } until ()   you commit yourself to a rewrite if
> you ever need to insert a statement after the last break-test.

I think while True:... is used often in Python because Python lacks
still a do-while (or repeat-until, but this is less nice, because the
condition is reversed compared to the one you use in a while-do loop)
construct.
Give me a do-while and a good amount of breaks&while True in my Python
code will be removed.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How about adding slice notation to iterators/generators?

2009-10-16 Thread Bearophile

Terry Reedy:

> 1. islice works with any iterator; generator method would only work with
> generators

A slice syntax that's syntactic sugar for islice(some_iter,1,None) may
be added to all iterators.


>2. iterator protocol is intentionally simple.<

Slice syntax is already available for lists, tuples, strings, arrays,
numpy, etc, so adding it to iterators too doesn't look like adding
that large amount of information to the mind of the programmer.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The rap against "while True:" loops

2009-10-16 Thread Bearophile

Paul Rubin:

>  http://scholar.google.com/scholar?cluster=17368311454828547380
>
> Keep in mind that the article is 35 years old though, and is purely
> imperative.  Lots of stuff done with cockamamie looping constructs is
> more cleanly done with Python generators, itertools, higher-order
> functions, etc.

That's a famous and very good paper, a good read for people that like
to program.
The Example1 and Example2 can be rewritten in several different ways,
but several compilers today are not able to perform such optimizations
yet, so what Knuth has written there are still among the faster ways
to implement that algorithm.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Reverse Iteration Through Integers

2009-10-19 Thread Bearophile

Paul Rubin:

> If you want to peel off digits from an int one by one without string
> conversions, it's easiest to do that in reverse order:
>
>   n = 961
>   digits = []
>   while n > 0:
>     n,d = divmod(n, 10)
>     digits.append(d)
>
> Look up the docs for "divmod" for an explanation of that handy function.
> Now the above gives you a reversed list of digits--what to do with it
> is an exercise for you ;-).  Note that if n=0 then you get the empty list.

I think that with Psyco it's better to avoid divmod().

It's very positive to teach novices to always use tests every time
they write a function, because it makes their programs less buggy and
often allows them to solve the whole programming problem sooner:


def reverse(n):
"""
Reverse the digits of integer n, ignoring trailing zeros.

>>> reverse(12590)
9521
>>> reverse("abc")
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for divmod(): 'str' and
'int'
>>> [reverse(x) for x in (0, 1, -1, 2, -2L, 100L, -100)]
[0, 1, -1, 2, -2L, 1L, -1]
>>> [reverse(x) for x in (125, 1250, 123456789)]
[521, 521, 987654321]
>>> [reverse(x) for x in (0.0, 1.0, -5.3, 125.0, 1.23456e20)]
[0, 1.0, -5.2998, 521.0, 654321.0]
>>> str(reverse(169883903200298309284038223098439430943092816286
** 123))[:35]
'65852401624276201339740994895755844'
"""
# code removed ...

if __name__ == "__main__":
    import doctest
doctest.testmod()
print "Doctests done"


Using tests (and a bit later to use a versioning system) is among the
things that have to be taught as soon as possible :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: "Python for Bioinformatics" available and in stock

2009-10-19 Thread Bearophile

Sebastian Bassi, this is an piece from the #5:

ProtSeq = raw_input("Protein sequence: ").upper()
ProtDeg = {"A":4,"C":2,"D":2,"E":2,"F":2,"G":4,"H":2,
   "I":3,"K":2,"L":6,"M":1,"N":2,"P":4,"Q":2,
   "R":6,"S":6,"T":4,"V":4,"W":1,"Y":2}
SegsValues = []
for aa in range(len(ProtSeq)):

A more pythonic code is:

prot_seq = raw_input("Protein sequence: ").upper()
prot_deg = {...
segs_values = []
for aa in xrange(len(prot_seq)):

Note the use of xrange and names_with_underscores. In Python names are
usually lower case and their parts are separated by underscores.

>From #6:

segsvalues=[]; segsseqs=[]; segment=protseq[:15]; a=0
==>
segs_values = []
segs_seqs = []
segment = prot_seq[:15]
a = 0

If you want to limit the space in the book the you can pack those
lines in a single line, but it's better to keep the underscores.

>From #18:
prop = 100.*cp/len(AAseq)
return (charge,prop)
==>
prop = 100.0 * cp / len(aa_seq)
return (charge, prop)

Adding spaces between operators and after a comma, and a zero after
the point improves readability.

>From #35:
import re
pattern = "[LIVM]{2}.RL[DE].{4}RLE"
...
rgx = re.compile(pattern)
When the pattern gets more complex it's better to show readers to use
a re.VERBOSE pattern, to split it on more lines, indent those lines as
a program, and add #comments to those lines.

The #51 is missing.

I like Python and I think Python is fit for bioinformatics purposes,
but 3/4 of the purposes of a book like this are to teach
bioinformatics first and computer science and Python second. And
sometimes a dynamic language isn't fast enough for bioinformatics
purposes, so a book about this topic probably has to contain some
pieces of C/D/Java code too, to show and discuss implementations of
algorithms that require more heavy computations (that are often
already implemented inside biopython, etc, but someone has to write
those libs too).
The purpose here is not to teach how to write industrial-strength C
libraries to perform those heavier computations, but to give the
reader an idea (in a real lower-level language) how those libraries
are actually implemented. Because science abhors black boxes, a
scientist must have an idea of how all subsystems she/he/hir is using
are working inside (that's why using Mathematica can be bad for a
scientist, because such person has to write "and here magic happens"
in the produced paper).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: New syntax for blocks

2009-11-10 Thread Bearophile

r:

> i think the following syntax would be quite beneficial
> to replace some redundant "if's" in python code.

http://python.org/dev/peps/pep-3003/

bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: #define (from C) in Python

2009-11-13 Thread Bearophile

Santiago Romero:

>Obviously, I prefer to write well structured code but I had to sacrifize SIZE 
>by SPEED<

In C99 you have "inline" (and gcc/gcc-llvm usually inline small
functions anyway) that helps avoid many macros.


>  Now I'm porting the emulator to a scripted language, so I need
> even more previous design ideas before starting to code, so that
> I can achieve (I hope I'll be able to do it with this group's help)
> 100% cpu speed in an standard desktop PC.

The tecniques needed to speed up Python code are very different from
the ones you use in C. You may need a lot of time to learn Python
performance tricks. Psyco helps a lot.


>  Well, I don't agree with that, "constants" and "macros" wouldn't
> hurt python, when using them in the right situations.

I miss true constants in Python, but it may be impossible to add them
to this language, because of how it uses references. In Python you
rely on convention, writing their names ALL_UPPERCASE.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Writing an emulator in python - implementation questions (for performance)

2009-11-13 Thread Bearophile

Try creation an extension module with ShedSkin.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Common area of circles

2010-02-04 Thread Bearophile

Shashwat Anand:
> > Given 'n' circles and the co-ordinates of their center, and the radius of
> > all being equal i.e. 'one', How can I take out the intersection of their
> > area.

I can see two possible solutions, both approximate. In both solutions
you first look if there are a pair of circles that don't intersect, in
this case the intersect area is zero. You also remove all circles
fully contained in another circle.

The first strategy is easy to do, but it probably leads to a lower
precision. Then you can sample randomly many times the rectangular
area that surely contains all circles. You count how many of those
random points are inside all circles. This gives you an approximate
area of their intersection. Increasing the points numbers slowly
increases the result precision.

The second strategy is more complex. You convert your circles into
polygons with a fixed number of vertices (like 50 or 100 or 1000 or
more. This number is constant even if the circles don't have all the
same radius). So you can turn this circle into a simple mesh of
triangles (all vertices are on the circumference). Then you "subtract"
the second polygonalized circle, this can create a hole and split
triangles in pieces, and so on with successive circles. At the end you
can compute the total area of the triangles left. This is doable, but
you need time to do implement this. The advantage is that the
numerical precision of the result is probably higher. If you implement
this second solution you can implement the first one too, and use it
as a test to avoid bugs. Visualizing the triangles with Pygame or
MatPlotLib can be useful to look for bugs.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Splitting a string

2010-04-02 Thread Bearophile

I don't know how fast this is (Python 2.x):

>>> from itertools import groupby
>>> t = 'si_pos_99_rep_1_0.ita'
>>> tuple(int("".join(g)) if h else "".join(g) for h,g in groupby(t, 
>>> str.isdigit))
('si_pos_', 99, '_rep_', 1, '_', 0, '.ita')

It doesn't work with unicode strings.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Voronoi diagram algorithm (Fortune’s sweepline )

2009-06-11 Thread Bearophile

Captain___nemo:
> Isn't it possible to implement Voronoi diagram using Fortune's
> algorithm without multithreading or hash map?

Multi-threading isn't part of Fortune's algorithm.
Multi-threading can be confusing, but it's better for you to learn to
feel at home using hash maps (named dicts in Python), because they are
both essential in computer science and in Python.
Ask if you need some help regarding dicts and sets.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Voronoi diagram algorithm (Fortune’s sweepline )

2009-06-11 Thread Bearophile

Robert Kern:
> You can see a mild modification of Fortune's original C code here:
>
> http://svn.scipy.org/svn/scikits/trunk/delaunay/scikits/delaunay/Voro...

That's C++; C++ makes simple things hard and hard things possible :-)
In Python that code may become much shorter (and slower).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Voronoi diagram algorithm (Fortune’s sweepline )

2009-06-12 Thread Bearophile

dorzey:
> Found this python implementation:
> http://www.oxfish.com/python/voronoi.py

Looks very nice, and it may be ShedSkin-compilable too.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError: int argument required

2009-06-15 Thread Bearophile

Lawrence D'Oliveiro:
>So no, using alternative quotes does not make things more readable.<

You think that this:

' '

Isn't a bit more readable and simpler to write than:

" "

I think lot of doesn't agree with you.


In such situation it can also be positive to split such string in two
or more parts, for example (untested):

style = ("fill:blue; stroke:pink; stroke-width:5; "
 "fill-opacity:0.1; stroke-opacity:0.9")

print >> fo, '
' % (abs_x, abs_y, w, h, style)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Input problem

2009-06-16 Thread Bearophile

Prasoon:
> I am new to pythonand using python 2.6
> I want to know when to use raw_input( ) and when to use input( )???

I think it's better to always use raw_input(), and then convert the
string to the data to work on.

Once in a while you may want to use input() to input expressions (like
formulas) in your program in a quick, simple and unsafe way...

In Python3+ they have removed input() and they have renamed raw_input
() as input(). You can have the functionality of 2.x input() with eval
(input()). (I think Python3 developers have taken the right choices
here).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Good books in computer science?

2009-06-19 Thread Bearophile

Nathan Stoddard:
> The best way to become a good programmer is to program. Write a lot of
> code; work on some large projects. This will improve your skill more than
> anything else. It's also important to learn new languages regularly. I
> recommend to learn C, Python, and Lisp first.

To become very good in a practical activity (like programming or
writing stories or playing piano) you have to do many things for a lot
of time.

You have to practice it a lot, but that's not enough. You also must
keep pushing forward the limit of your skills, doing things hard for
you.

Reading smart books and learning from the experts in the field is
usually necessary. Quite often it's useful to read good books not much
related with the activity you are doing too, because the human mind
works better this way.

Another thing you have to do is to keep your eyes open, for example to
be able to understand when your learning is struck in some slow
corner: now and then you will have to meta-learn, that is to change
the way you learn and train yourself. This is a hard and often painful
thing to do, but it's probably necessary if you want to become very
good, because very often you learn in the wrong way, or in a not much
efficient way.

Howard Gardner too has written about such topic.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What is the best method to match a pattern in set of lines

2009-06-20 Thread Bearophile

This is a small OT post, sorry.

Dennis Lee Bieber, may I ask why most or all your posts are set to "No-
Archive"?


> HTTP://www.bestiaria.com/

I think there are other furries beside you around here.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Good books in computer science?

2009-06-27 Thread Bearophile

Albert van der Horst:
> For programming practice I do the problems of
> http://projecteuler.net/

Time ago I have solved some of them with Python, D and C (some of them
are quite hard for me), I have tried to produce very fast code (like a
D generator for prime numbers that's like 100 times faster of some
'fast' C# prime generators I've seen in that forum).

But then I have stopped because to me they seem a waste of time, they
look too much academic, they don't exercise the right muscles of the
mind. They may be good if you want to become good in performing
numerical number theory, and bad for everyone else.

Seeing how many people like to do those Project Euler puzzles, I
presume my ideas aren't shared by most people.

I am now using some solutions of mine of those problems to spot
"performance bugs" in a new D compiler (and even in ShedSkin), so they
are somewhat useful again :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Specific iterator in one line

2009-06-30 Thread Bearophile

Filip Gruszczyński:
> [1, 0, 0, 1] -> ['b', 'b', 'a', 'a', 'b', 'b']

I like this version (43 golf holes), it's readable enough:

[c for d in[1,0,0,1]for c in("a","bb")[d]]

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Searching equivalent to C++ RAII or deterministic destructors

2009-07-02 Thread Bearophile

Ulrich Eckhardt:
> a way to automatically release the resource, something
> which I would do in the destructor in C++.

Is this helpful?
http://effbot.org/pyref/with.htm

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Psyco good for Python 2.6.1?

2009-07-02 Thread Bearophile

Russ P.:

Python works well to me on Python 2.6+, on Windows. You can find a
compiled version too, around.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is code duplication allowed in this instance?

2009-07-03 Thread Bearophile

Francesco Bochicchio:
> Possibly to prepare the test data set you might need a
> different - and already proven - implementation of
> the algorithm.

Usually a brute force or slow but short algorithm is OK (beside some
hard-coded input-output pairs).

Sometimes you may use the first implementation of your code that's
usually simpler, before becoming complex because of successive
optimizations. But you have to be careful, because the first
implementation too may be buggy.

Some other times there are simpler algorithms to test if the output of
another algorithm is correct (think of exponential algorithms with a
polinomial test of the correct result), for example to test a fancy
Introsort implementation you can use a very small and simple O(n^2)
sorter, or better a simple linear loop that tests the result to be
sorted.

Here, beside unit tests, are also useful languages (or tools) that
allow to add pre- and post-conditions, class invariants, etc.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Clarity vs. code reuse/generality

2009-07-03 Thread Bearophile

On 3 Lug, 16:05, kj  wrote:
> I'm will be teaching a programming class to novices, and I've run
> into a clear conflict between two of the principles I'd like to
> teach: code clarity vs. code reuse.

They are both important principles, but clarity is usually more
important because short code that can't be read can't be fixed and
modified, while long code that can be read is alive still.

> So I thought that I'd code a helper function, called _binary_search,
> that took five parameters: a lower limit, an upper limit, a
> one-parameter function, a target value, and a tolerance (epsilon).

Five arguments are a bit too much, even for non-novices. 2-3 arguments
are more than enough for novices.

Where are doctests'? Novices have to learn to think of tests as part
of the normal code :-)

I think the main problem in all this discussion is that generally to
learn to program you have to write code yourself, so your students
have to invent and write their own binary search. Reading your code
they are going to learn much less. Creating a binary search from
almost scratch can turn out to be too much hard for them, it means you
have to ask them to invent something simpler :-)

Programming is (among other things) problem solving, so they have to
learn to solve not easy problems from the beginning. Showing them
famous algorithms (and very good code to copy from) can be useful, but
it's less important than learning basic problem solving skills, and
much less important than learning why solving problems has to became
their purpose and pleasure :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: tough-to-explain Python

2009-07-07 Thread Bearophile

kj, as Piet van Oostrum as said, that's the difference between mutable
an immutable. It comes from the procedural nature of Python, and
probably an explanation of such topic can't be avoided if you want to
learn/teach Python.

Lot of people think that a language where everything is immutable is
simpler for newbies to understand (even if they have to learn what
higher-order functions are). That's why some people think Scheme or
other languages (even Clojure, I guess) are better for novices (Scheme
has mutability, but you can avoid showing it to newbies).

Some people say that languages that mostly encourage the use of
immutable data structures (like F#, Clojure, Scala and many other less
modern ones) help avoid bugs, maybe for novices too.

On the other hand, it's hard or impossible to actually remove
complexity from a system, and you usually just move it elsewhere. So
other things will become harder to do for those novices. I have no
idea if for the average novice it's simpler to learn to use a
immutables-based language instead of a mutables-based one (it can also
be possible that some novices prefer the first group, and other
novices prefer the second group).

>From what I have seen lot of students seem able to learn Python, so
it's not a bad choice.

Python isn't perfect, and in *many* situations it is pragmatic, for
example to increase its performance. Generally for a novice programmer
running speed is not important, but it's important to have a really
coherent and clean language. I've personally seen that for such people
even Python looks very "dirty" (even if it's one of the less dirty
ones).

For example a novice wants to see 124 / 38 to return the 62/19
fraction and not 3 or 3.263157894736842 :-)

People may be able to invent a clean and orthogonal language that's
easy to use and has very few compromises, fit for newbies. But this
language may be very slow, not much useful in practice (who knows?
Maybe there's a practical niche even for such very high level
language), and it doesn't teach how to use lower level languages like
C :-)

Today I think there are no languages really fit for teaching. Python
is one of the few fit ones, but it's getting more and more complex as
time passes because it's getting used in more and more complex real
world situations (a language fit for newbies must not have abstract
base classes, decorators, etc). D language is too much complex for a
newbie. Java is not too much bad, but it's requires to write too much
code, it's too much fussy (semicolons at the end of lines? Newbies say
that the computer is an idiot when it refuses code just because
there's a missing useless semicolon!), and it misses some necessary
things (first class functions! Damn). A nice language like Boo running
on the Mono VM seems another option :-)

In the past Pascal was good enough, but I think it's not good enough
anymore. The problem is that teaching is a niche activity (even if a
very important one). PLT Scheme is one of the few environments (beside
Squeak, Python itself, and little more) that look refined and
implemented well enough for such purpose.

See you later,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: count

2009-07-08 Thread Bearophile

Vilya Harvey:
> from itertools import groupby
> for x, g in groupby([fields[1] for fields in data]):
>     print x, len(tuple(g))

Avoid that len(tuple(g)), use something like the following, it's lazy
and saves some memory.


def leniter(iterator):
"""leniter(iterator): return the length of a given
iterator, consuming it, without creating a list.
Never use it with infinite iterators.

>>> leniter()
Traceback (most recent call last):
  ...
TypeError: leniter() takes exactly 1 argument (0 given)
>>> leniter([])
0
>>> leniter([1])
1
>>> leniter(iter([1]))
1
>>> leniter(x for x in xrange(100) if x%2)
50
>>> from itertools import groupby
>>> [(leniter(g), h) for h,g in groupby("bccaad")]
[(4, 'a'), (1, 'b'), (2, 'c'), (2, 'a'), (1, 'd'), (4, 'e')]

>>> def foo0():
... if False: yield 1
>>> leniter(foo0())
0

>>> def foo1(): yield 1
>>> leniter(foo1())
1
"""
    # This code is faster than:  sum(1 for _ in iterator)
if hasattr(iterator, "__len__"):
return len(iterator)
nelements = 0
for _ in iterator:
nelements += 1
return nelements

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: count

2009-07-09 Thread Bearophile

Paul Rubin:
> print x, sum(1 for _ in g)

Don't use that, use my function :-) If g has a __len__ you are wasting
time. And sum(1 ...) is (on my PC) slower.


J. Clifford Dyer:
> if __name__ == '__main__':
>     test_func(summer, 1000)
>     test_func(tupler, 1000)
>     test_func(summer, 1)
>     test_func(tupler, 1)

Have you forgotten my function?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Psyco 64 bits

2009-07-09 Thread Bearophile

Luis P. Mendes:
> It seems that Psyco cannot be used in such platforms.
> Or is there another version of Psyco for 64 bits platform?
> I googled and arrived to PyPy.  But I don't know how to use it.

Psyco will be updated, but probably not for 64 bit CPUs. At the moment
PyPy doesn't give you much speed up.
So you can try IronPython, but I don't think it will help. There's
also the Boo language, that's often fast enough.

If your code has few hot spots, then there are many solutions that can
be used. For example Cython, Weave, ShedSkin. Or writing external code
in C/C++/D/Fortran and interfacing with it with a plethora of means,
like ctypes, swig, PIL, Pyd, Boost.Python, Inline, and many other.
Running speed is a big problem in many Python programs, so they have
invented an army of possible solutions.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: can i write a assemly language programs in python

2009-07-09 Thread Bearophile

Simon Forman:
> Examine CorePyhttp://www.corepy.org
>
> "CorePy is a complete system for developing machine-level programs in
> Python. CorePy lets developers build and execute assembly-level
> programs interactively from the Python command prompt, embed them
> directly in Python applications, or export them to standard assembly
> languages."

I have never used CorePy yet, but it's an awesome project ^_^
With some care and work even Python programs can get fast enough :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: psyco V2 beta2 benchmark

2009-07-11 Thread Bearophile

larudwer, is that time_subdist subdist(i) a bad worsening? Something
to be fixed in Psyco2?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: list of all possible values

2009-07-13 Thread Bearophile

David Gibb:
> For example: if my values are ['a', 'b', 'c'], then all possible lists
> of length 2 would be: aa, ab, ac, ba, bb, bc, ca, cb, cc.

>>> from itertools import product
>>> list(product("abc", repeat=2))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'b'), ('b',
'c'), ('c', 'a'), ('c', 'b'), ('c', 'c')]

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A question of style .....

2009-07-15 Thread Bearophile

tuxagb:
> If I have to write an extension module with many objects, how can I
> organizing the code?
> I think: every object in a separate file, and a last file with the
> PyInit_. function. But is unmenageable .
>
> Solutions?

What do you think about using Cython?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: turtle dump

2009-07-16 Thread Bearophile

Superchicken:
> is there a way to dump the content of a turtle window to a file or a file 
> object?

A possible low-tech solution is to append to a list the sequence of
your plotting commands (using a decorator too, even, something like
the logging decorator), and then save all of them at the end into a
text file :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: psyco V2

2009-07-17 Thread Bearophile

Very good, thank you. I'll try it when I can.

Is Psyco3 going to borrow/steal some ideas/code from Unladen Swallow?

The problem I have with Psyco1.6 is that you can't use the normal
profilers to know how much seconds of running time is taken by each
function/method of your code.
Psyco1.6 has a profile() function, but I am not much able to use it
yet.
Can you tell me how to find a sorted list of the running time of all
functions/methods of a Psyco-digested program?

Bye and thank you,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-20 Thread Bearophile

William Dode':
> I just tested it with a litle game, to find the places of horse on
> a board 5x5. The result is :
>
> c 5s
> gcj 7s
> java 7s
> shedskin 8s
> python + psyco 18s
> cython avec malloc *int 18s
> cython 55s avec [] python
> python 303s (5m3s)

Nice timings, can you please show me the Python, Java and C code
versions? I may do more tests.
The purpose of all those "example programs" in ShedSkin is to find
bugs and to find details to speedup.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-20 Thread Bearophile

Skip Montanaro:
> I read just enough French to know that "avec" means "with", but I don't
> understand the difference between "avec malloc *int" and "avec []".  Can you
> explain please?

Maybe it's the time difference between using a Python list from Cython
and using a C "array" allocated with a malloc from Cython.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-21 Thread Bearophile

Nick Craig-Wood:
> Can you give a hint as to how I debug this?  I presume my program has
> some instances of non static types which is causing the problem, but
> it is going to be a very long debugging session if it takes me an hour
> each cycle ;-)
>
> The program is about 700 lines of python (excluding comments).

You can show us the Python (SSPython) code, and we can try to find the
problem. Sometimes there's no simple ways to solve such problems.

Generally for not very large progrograms if SS doesn't compile in
about a minute or so then it's gone in infinite loop (there's a
compilation flag that avoids some infinite loops, try it).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-21 Thread Bearophile

William Dode':
> http://hg.flibuste.net/libre/games/cheval
> Like you'll see, i tried to use exactly the same code for each langage.

It's a cute solver.

Few more versions of mine:

#1, a Psyco version of mine:
http://codepad.org/9m5rf7kX

#2, unrolled Psyco version:
http://codepad.org/gKFLu34M

#3, a quick D (D1) version:
http://codepad.org/Tk9FL7Xk

Timings (no printing), seconds, best of 3:
#1: 4.79
#2: 3.67
#3: 1.10
Your Psyco version: 13.37
Your C version, compiled with GCC 4.3.2, -s -O3 -fomit-frame-pointer:
3.79

I have timed the whole running time of the programs, with an external
timer, so my timings include the start of the PythonVM and the final
cleanup of the GC.

Please, feel free to time my code again, so we can compare with your
other timings (Java, etc) better.

I have used Psyco 1.6 final and Python 2.6.2 on a Core2 CPU at 2 GHz.

To compile the D1 code you can use the free DMD compiler for D1, I
have used version 1.042, www.digitalmars.com/d/download.html ).

In such benchmarks it's often better to start from the fastest low
level code (written in an imperative language as C/D/C++, or a version
in a functional language like Clean or OCaML) and then use it to
create higher level versions (in Python, etc). This offers you a
baseline timing, and usually the small differences of the higher level
code compared to the original C/Clean code don't invalidate the test.

I have changed only small things in the program, as you can see the
Psyco program #2 is faster than your C code :-) If you learn how to
use it well, Psyco1.6 can often lead to very good performance. But lot
of people don't know how to use Psyco well (I too am ignorant: I don't
know how to profile Python programs yet).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-21 Thread Bearophile

greg:
> Posting benchmark times for Pyrex or Cython is pretty
> meaningless without showing the exact code that was
> used, since times can vary enormously depending on
> how much you C-ify things.

Was this link, shown by William, not enough?
http://hg.flibuste.net/libre/games/cheval/file/46797c3a5136/chevalx.pyx#l1

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to comment constant values?

2009-07-26 Thread Bearophile

Chris Rebert:
> Only modules, classes, and functions/methods can have docstrings
> associated with them.
> For anything else, you have to use comments; or you can mention them
> in the docstrings of related things.

What about adding docstrings to other Python things, especially to
variables?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Shed Skin 0.2, an experimental (restricted) Python-to-C++ compiler

2009-07-26 Thread Bearophile

William Dode':
> I updated the script (python, c and java) with your unrolled version
> + somes litle thinks.
[...]
> c 1.85s
> gcj 2.15s
> java 2.8s
> python2.5 + psyco 3.1s
> unladen-2009Q2 145s (2m45)
> python2.5 254s (4m14s)
> python3.1 300s (5m)
> ironpython1.1.1 680s (11m20)

Sorry for being late, I was away.

In your last C version this code is useless because the C compiler is
able to perform such simple optimization by itself (but probably
Python isn't able, so if you want the code to be the similar in all
versions it may be better to keep it):

shift_0=shift[0];
shift_1=shift[1];
shift_2=shift[2];
shift_3=shift[3];
shift_4=shift[4];
shift_5=shift[5];
shift_6=shift[6];
shift_7=shift[7];


This part in the Python code is useless:

shift_0 = shift[0]
shift_1 = shift[1]
shift_2 = shift[2]
shift_3 = shift[3]
shift_4 = shift[4]
shift_5 = shift[5]
shift_6 = shift[6]
shift_7 = shift[7]

Because later you copy values locally anyway:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift_0,
shift_1=shift_1,
shift_2=shift_2,
shift_3=shift_3,
shift_4=shift_4,
shift_5=shift_5,
shift_6=shift_6,
shift_7=shift_7,
):

So doing something like this is probably enough:

def solve(nb, x, y,
SIDE=SIDE, SQR_SIDE=SQR_SIDE, circuit=circuit,
shift_0=shift[0],
shift_1=shift[1],
shift_2=shift[2],
shift_3=shift[3],
shift_4=shift[4],
shift_5=shift[5],
shift_6=shift[6],
shift_7=shift[7],
):

In low-level languages like C unrolling has to be done with care, to
avoid slowing down the code.

I have tried your latest C version using your compiler options, my
MinGW based on GCC 4.3.2 produces a crash at runtime. Using LLVM-GCC
it runs in 1.31 seconds. The D version is a bit less optimized than
your last C versions, yet using DMD it runs in 1.08-1.10 seconds.
Let's see if someone is able to write a C version faster than that D
code :-)

Have you have compiled/read my D version? In the D version you may
have missed that I did use an extra trick: unsigned integers, so it
needs just two tests to see if a number is in the 0-5, 0-5 square :-)
Note that Pyd, the Python-D bridge, may work with the latest DMD
version still (and it works if you use a bit older DMD compiler):
http://pyd.dsource.org/

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The longest word

2009-07-28 Thread Bearophile

On Jul 28, 10:26 am, NiklasRTZ  wrote:
> Newbie hello onwards to the .py manual asks meantime how the longest
> word gets determined?
> First word, ok
>  'a aa aaa aa'[:'a aa aaa aa'.find(' ',1,10)]

Your language is not easy to understand, but I think you want to find
the longest word in a string.
If this is the input string:
txt = "a aa aaa aa"

There are several ways to do it, I'll show a simple one.

You can split it into its parts (not having Python a built-in lazy
split yet, you can split it all at once). You can do it with the
string split method. It produces a list of the words, more or less
(but you may have words like "gone,", you may have to take care of
them too, this requires some code.

Once you have a list of words, you have to take the longest. A simple
way is to replace each word with a tuple that contains the length of
the word and the word itself, then pick the tuple with the highest
length value. But Python allows you to do such operation in a faster
way: you can use the max() function, it has a "key" function that
allows you to remap on the fly what you mean by "max". So you just
have to give it a function (reference) that given the word spits its
length, such function is "len" itself.

If you instead want to find the position of the longest word the
program gets a little longer. Ask if you need something different.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: set variable to looping index?

2009-07-30 Thread Bearophile

Martin:
> for f in var1_fn, var2_fn, var3_fn:
>     if f.split('.')[0] == 'var1':
>         var1 = call_some_function(f)
>         .
>         .
>         .
>       etc

As usual I have big problems in understanding what you want to do.

Please, show an example of the contents of var1_fn, var2_fn, etc.

Generally you can create a dict:
results = {}

And then fill it with name-result pairs:
name = value.split('_')[0]
results[name] = some_function(value)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Overlap in python

2009-08-04 Thread Bearophile

Jay Bird:
> I've been trying to figure out a simple algorithm on how to combine a
> list of parts that have 1D locations that overlap into a non-
> overlapping list.  For example, here would be my input:
>
> part name   location
> a  5-9
> b  7-10
> c  3-6
> d  15-20
> e  18-23
>
> And here is what I need for an output:
> part name   location
> c.a.b3-10
> d.e   15-23

That's sometimes known as "The Set Union Problem on Intervals".
Knowing the name of a problem is sometimes half the work :-) People
have written papers on this.

This is a O(N^2) force-brute algorithm, you can try it on your data:

data = """part name   location
a  5-9
b  7-10
c  3-6
d  15-20
e  18-23"""

pairs = map(str.split, data.splitlines()[1:])
triples = [map(int, i.split("-")) + [name] for (name, i) in pairs]

overlap = lambda x1,y1, x2,y2: y1 >= x2 and x1 <= y2

results = []
for (x1,y1,name) in triples:
for i, res in enumerate(results):
if overlap(x1,y1, res[0],res[1]):
results[i].append(name)
results[i][0] = min(x1, res[0])
results[i][1] = max(y1, res[1])
break
else:
results.append([x1, y1, name])

print results
# Output: [[3, 10, 'a', 'b', 'c'], [15, 23, 'd', 'e']]

If you have have a lot of data then it will be too much slow, and you
have to use a more complex algorithm.

If your intervals are many, they are small, and they are spread in a
not too much large range of possible values, you can create an integer
array of indexes, and you can "paint" intervals on it.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Overlap in python

2009-08-04 Thread Bearophile

Mark Dickinson:
> # create sorted list of points where an interval is entered or left
> transitions = []
> for name, (start, stop) in input:
>     transitions.extend([(start, 'Start', name), (stop, 'Stop', name)])
> transitions.sort()
>
> # scan list, accumulating blocks along the way.

Oh, right, that's the usual simple solution. Silly me for showing the
dumb quadratic solution :-)

The "pixelmap" approach may be faster if you have tons of small
intervals in a not too much large range.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Overlap in python

2009-08-04 Thread Bearophile

kj:
>     # connect the nodes
>     for i in range(len(parts) - 1):
>         part_i = parts[i]
>         for j in range(i + 1, len(parts)):

Note that's O(N^2), see the O(sort) standard solution by Mark
Dickinson.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Overlap in python

2009-08-05 Thread Bearophile

Albert van der Horst:
>That is an algorithmic question and has little to do with Python.<

Yes, but comp.lang.python isn't comp.lang.c, that kind of questions
are perfectly fine here. They help keep this place from becoming
boring.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: intricated functions: how to share a variable

2009-08-05 Thread Bearophile

TP:
> def tutu():
>
>     def toto():
>
>         print a
>         a = 4
>         print a
>
>     a=2
>     toto()
>
> tutu()
> ##
>
> I obtain the following error:
> "UnboundLocalError: local variable 'a' referenced before assignment"
>
> This is because Python looks in the local context before looking in the
> global context.
>
> The use of "global a" in toto() does not help because global allows to force
> Python to look for the variable at the module level.
>
> So, how to share a variable between intricated functions?

Generally your purpose as a programmer is to write the least intricate
code. Because the less tricky it is, the more readable it becomes and
also the bug count decreases.

So probably a better solution is to just use the normal function
semantics: you pass them an argument and you take an argument as
return value. Such return value will be the new version of the value
you talk about.

Python3+ has the "nonlocal" statement that may do what you ask for.
But as many other things in Python it must be used with judgment, to
avoid writing intricate code.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Parsing Binary Structures; Is there a better way / What is your way?

2009-08-05 Thread Bearophile

Martin P. Hellwig:
> On several occasions I have needed (and build) a parser that reads a
> binary piece of data with custom structure. For example (bogus one):
>
> BE
> +-+-+-+-+--++
> | Version | Command | Instruction | Data Length | Data | Filler |
> +-+-+-+-+--++
> Version: 6 bits
> Command: 4 bits
> Instruction: 5 bits
> Data Length: 5 bits
> Data: 0-31 bits
> Filler: filling 0 bits to make the packet dividable by 8

Have you tried Hachoir? (I think its name may be changed to Fusil, I
don't know).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Python for Bioinformatics book

2009-08-06 Thread Bearophile

Sebastian Bassi:

> All code is also available at thehttp://py3.us/##where ## is the code number, 
> for example:http://py3.us/57<

The book looks interesting, but that doesn't look like a good way to
show/offer the code. I suggest to also put it into a zip that can be
downloaded.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Bug or feature: double strings as one

2009-08-07 Thread Bearophile

durumdara:
> I wanna ask that is a bug or is it a feature?

For me it's a bug-prone antifeature.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need cleanup advice for multiline string

2009-08-11 Thread Bearophile

Robert Dailey:
> This breaks the flow of scope. Would you guys solve this
> problem by moving failMsg into global scope?
> Perhaps through some other type of syntax?

There are gals too here.
This may help:
http://docs.python.org/library/textwrap.html#textwrap.dedent

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Plotting Quadratic Functions, pygame

2009-08-14 Thread Bearophile

Senad Ibraimoski Of Belgrade:
> Hello, I'm a new guy to this group, my professor recommend this group
> to me, because I was boring him with questions.I'm new to python, and
> I have problem plotting Quadratic Functions. Using module pygame.

Python is a tool you use to learn something else, but tools are
important, they also help shape the way you think. So tell your
teacher that questions about tools are important enough.

If your purpose is to learn something then using Pygame to plot
functions is OK. If your purpose is just to plot them, and you don't
have strange needs, then MatPlotLib can be better.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: flatten a list of list

2009-08-16 Thread Bearophile

Chris Rebert:
> The OP asked for "simple", not "best", "most proper", or "fastest". My
> comment was intended to mean that the code was marginally *simpler*,
> not faster.

Yep, the OP has asked for simple code. But often this is not the right
way to solve this situation. A better way is to create (or copy) a
flatten that's efficient and well tested & debugged, put it into some
library of utilities, and then use import&use it when it's needed.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Annoying octal notation

2009-08-23 Thread Bearophile

MRAB:

>'_': what if in the future we want to allow them in numbers for clarity?

Hettinger says it's hard (= requires too many changes) to do that and
Python programs don't have big integer constants often enough, so
probably that improvement will not see the light.

In the meantime in a Python program of mine I have put a small bug,
writing 100 instead of 1000. Now in Python I write
10*1000*1000, because I try to learn from my bugs. In D I enjoy
writing 10_000_000.

---

Steven D'Aprano:

> In D, at least some people want to follow Python's lead and either drop
> support for oct literals completely, or require a 0o 
> prefix:http://d.puremagic.com/issues/show_bug.cgi?id=2656

Yes, people in the D community are trying to improve things, but it's
a slow and painful process, and often it goes nowhere. There's lot of
politics.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Graph library recommendations for large graphs

2009-08-24 Thread Bearophile

You may try the Python bindings for the Boost Graph Library, the graph
you talk about may fit in 2GB of a 32 bit OS too (this is the first
link I have found, it's a lot of time I don't use those graph
bindings):
http://banyan.usc.edu/log/c_cpp/boost-graph-library-python-bindings

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Algorithms as objects?

2009-08-27 Thread Bearophile

Chris Rebert:

> It sounds like you're describing the Strategy Pattern
> (http://en.wikipedia.org/wiki/Strategy_pattern).
>
> To have objects callable like functions, just implement the  __call__
> special method in your class(es):

Please, can't you just use a bit of functional-style programming? Like
creating nested functions on the fly, etc.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Can't set attribute?!

2009-09-02 Thread Bearophile

dolgion ch:
>good to know about this __slots__ thing, i'll keep it in mind<

What you have to keep in mind is that it's better to not use __slots__
unless you really need to.
Generally you need it not for the purposes a person coming from a
static language is tempted to use it for. It's mostly useful for small
classes that don't need subclassing that have to be instantiated a
large number of times.

As far as I know, it's meant as a memory optimization, not a way to
define things in a more explicit way.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The future of Python immutability

2009-09-06 Thread Bearophile

John Nagle:
> The concept here is that objects have an "owner", which is either
> a thread or some synchronized object.   Locking is at the "owner"
> level.  This is simple until "ownership" needs to be transferred.
> Can this be made to work in a Pythonic way, without explicit
> syntax?
>
> What we want to happen, somehow, is to
> transfer the ownership of "words" from the calling thread to the object
> in "putitem", and transfer it to the calling thread in "getitem".
> How can this be done?

There are several people that have found ways to implement such owning
of variables, for example Bartosz Milewski:

http://bartoszmilewski.wordpress.com/2009/06/02/race-free-multithreading-ownership/

It requires some extra complexities, so statically typed languages
usually don't implement this idea, even if it avoids bugs in user
code.
Implementing it in Python in a simple enough way looks like a good
topic for advanced research :-)

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Vectorized laziness inside

2009-09-10 Thread Bearophile

I've just seen this good Google Talk video from Jan 2008, "MonetDB/
X100: a (very) fast column-store", about a more efficient column-
oriented DBMS:
http://www.youtube.com/watch?v=yrLd-3lnZ58

The efficiency of this DBMS (something like 50 times faster) is
produced by few things:
- It's column-wise, this helps in several other successive
optimizations too (row-oriented DBMS have their purpose still, the
column-oriented are good if you want to perform statistics on most of
your data, certain kinds of data mining, etc).
- At 13.57 it shows that instead of yielding single tuples (or single
items, it's column-oriented), it yields arrays of about 100 tuples/
fields. This allows to create primitives that are much more efficient.
And the CPU can process them better, using SSE instruction too. Such
small arrays are designed to fit in the CPU cache (probably L2). Such
vectorized operations are also pipelined in some way.
- The filtering operations often don't produce new vectors, they just
mark the tuples as not present any more inside an array. This helps
avoid many copies of such arrays.
- Data is kept compressed on disk, the compression is column-wise, and
decompression is done only just-in-time to save data transfers. The
number compression takes in account that often data is sorted, so it's
delta-compressed, and then such delta is compressed only in a range of
the Gaussian-like residual distribution (outliers are encoded
directly). This compression also allows to keep large indexes in RAM,
that speeds up things more.
- They even shown vectorized hashing, but I have not understood how
they work.
- The reading from the disks is done in a merged way, to avoid reading
the same things many times for similar queries.

(The good thing is that it's not hard to understand most things shown
in this video. But I'd like to see the C code they use as "reference",
that's just 3 times faster than their DBMS).

DBMS inside work as the lazy operations that are getting more common
in Python code (see itertools), and common in Haskell.

So to reduce the Python interpreter overhead of lazy iterations it may
be used a vectorized strategy (that's meant to be almost transparent
for the Python programmer, so this is an implementation thing), so
items can be produced and moved in groups, inside arrays, like in that
video. (The DBMS too has an interpreter overhead, it's made of fast
primitives used by lazy interpreted code, it looks a lot like a Python/
Ruby :-) ). Beside CPython This idea can be useful also for Psyco/
Unladen Swallow.

Lazy filtering is a bit less common in Python code, so I don't know if
it can also be useful the idea of not copying new arrays but just
marking items as absent (for example with a bit array attached to the
items array).

As you may know I have introduced laziness in the D language, D2 now
has a standard library that contains several of the things present in
the itertools module. I'll do some experiments to see if the ideas of
such vectorized laziness can improve lazy generators in D. I may even
test the idea of keeping arrays with holes.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Creating a local variable scope.

2009-09-11 Thread Bearophile

Steven D'Aprano:

> (3) Create an inner function, then call that.

Several people after someone gives this anwer.


> My personal opinion is that if you really need a local scope inside a 
> function, the function is doing too much and should be split up.<

I agree. And a way to split a function is to define an inner function,
that's one of their main purposes. No need to add other things to the
language as the OP suggests.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Google Code Jam language usage

2009-09-15 Thread Bearophile

len_b):
d[i][j] += d[i - 1][j]
if b[j] == a[i]:
d[i][j] += d[i - 1][j - 1]
d[i][j] %= 1
print "Case #%d: %04d" % (t+1, d[len(a) - 1][len_b - 1])
import psyco; psyco.full()
main()

(There can be ways to speed up this Python code, I have not tried to
use a 1D matrix with shifts to find the right starting of the rows as
in C, and often in such dynamic programming algorithms you can just
keep 2 rows to avoid storing the whole dynamic matrix, this saves
memory and speed up code because there are less cache misses. Using
Psyco with array.array CPU cache misses can be felt, in normal Python
code they are almost invisible, lost in the noise).

In Python I'd like to have a built-in list&array method to assign all
items to a fixed value:

a1 = [1, 2, 3]
a2 = [array.array('l', [0]) * 5

a1.set(0)
a2.set(0)

That equals to the following, but it's faster:
for i in xrange(len(a1)):
a1[i] = 0
for i in xrange(len(a2)):
a2[i] = 0


I'd also like to have 64 bit signed/unsigned integers in array.array.


Such online contests usually ignore compilation times, but in
languages that have ways to perform computations at compile time (C++
with its templates, and even more in D that has better templates and
CTFE http://en.wikipedia.org/wiki/Compile_time_function_execution )
such times can become important, because some precomputations can be
moved to compile time. If compilation time counts too, then in such
contests Python will improve its stats (a slow compilation time can be
bad for the testing of the code, so it's a disadvantage for the
programmer too).

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Google Code Jam language usage

2009-09-15 Thread Bearophile

> (There can be ways to speed up this Python code, I have not tried to
> use a 1D matrix with shifts to find the right starting of the rows as
> in C, and often in such dynamic programming algorithms you can just
> keep 2 rows to avoid storing the whole dynamic matrix, this saves
> memory and speed up code because there are less cache misses.


It was easy to do, running time about 0.13 seconds:


from array import array

def main():
b = "welcome to code jam"
len_b = len(b)
di = array('l', [0]) * len_b
dim1 = array('l', [0]) * len_b
for t in xrange(int(raw_input())):
a = raw_input()
for j in xrange(len_b): # set to 0
di[j] = 0
dim1[j] = 0
if a[0] == b[0]:
   dim1[0] = 1
for i in xrange(1, len(a)):
for j in xrange(len_b): # set to 0
di[j] = 0
di[0] += dim1[0]
if b[0] == a[i]:
di[0] += 1
for j in xrange(1, len_b):
di[j] += dim1[j]
if b[j] == a[i]:
di[j] += dim1[j - 1]
di[j] %= 1
di, dim1 = dim1, di
print "Case #%d: %04d" % (t+1, dim1[len_b - 1])

import psyco; psyco.full()
main()

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Static typing, Python, D, DbC

2010-09-11 Thread Bearophile

I write some Python code almost every day, but lately I am using a lot
the D language too. After using D for about three years I now know
some of it, but when I need to write short (< about 1000 lines)
*correct* programs, Python is still the more productive for me.

Static typing, and the usage of transitive const of D are a strange
thing. It seems obvious that they lead to safer code, but on the other
hand experimentally I have seen that my short Python programs are less
buggy than equivalent D ones.

Static typing looks safer, but D offers many low level features, and
generally it contains many more traps or ways to shoot your own foot,
that they more than compensate for the "lack of safety" coming from
Python dynamic typing, even when you don't use those low level
features.

Maybe for large (> about 100_000 lines) programs D may come out to be
less bug-prone than Python (thanks to the better data hiding and
static typing), but I am not sure of this at all...

Static typing also has a significant costs. When you write D2 code
often something doesn't work because of some transitive const or
immutable (or just because of the large number of bugs that need to be
fixed still in the D compiler). So here you pay some cost as debugging
time (or time to avoid those problems). And there is a mental cost
too, because you need to keep part of your attention on those const-
related things instead of your algorithms, etc.



Lately while I program with Python one of the D features that I most
miss is a built-in Design By Contract (see PEP 316), because it avoids
(or helps me to quickly find and fix) many bugs. In my opinion DbC is
also very good used with doctests.

You may implement a poor's man DbC in Python like this:

class Foo:
def _invariant(self):
assert ...
assert ...
return True

def bar(self, ...):
assert self._invariant()
...
res = ...
assert self._invariant()
return res

But this missed several useful features of DbC.


>From the D standard library, I have also appreciated a lazy string
split (something like a str.xplit()). In some situations it reduces
memory waste and increases code performance.



I have installed this, on a Windows Vista OS:
http://www.python.org/ftp/python/2.7/python-2.7.msi

But I have had two problems, the 'random' module was empty, and it
didn't import the new division from the future, so I've had to remove
it and reinstall 2.6.6. Is this just a problem of mine?

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Static typing, Python, D, DbC

2010-09-12 Thread Bearophile

John Nagle:

> Design by contract really isn't a good fit to Python.

I have used some times the class invariant done manually, as I have
shown, and it has avoided me some bugs, so I have experimentally seen
you are wrong.


> I've done proof of correctness work, and there are suitable languages
> for it.  It needs a language where global static analysis is
> possible, so you can reliably tell what can changes what.

I see DbC for Python as a way to avoid or fix some of the bugs of the
program, and not to perform proof of correctness of the code. Even if
you can't be certain, you are able reduce the probabilities of some
bugs to happen.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Static typing, Python, D, DbC

2010-09-12 Thread Bearophile

Paul Rubin:
> I think DbC as envisioned by the Eiffel guy who coined (and trademarked)
> the term is that it's a static verification technique, marketing-speak
> annotating subroutines with pre- and post- conditions that can be
> checked with Hoare logic.  Runtime checks wouldn't qualify as that.

The implementations of DbC in D and C# run their tests at run-time
(but in theory an external tool may find a way to perform part of
those tests at compile-time).

A full implementation of DbC contains several other things beside
preconditions and postconditions, see http://www.python.org/dev/peps/pep-0316/
(it misses few things like loop invariants and loop variants).

For me DbC is useful almost as unit-testing (I use them at the same
time), this is why in the original post I have said it's one of the
things I miss most in Python.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Static typing, Python, D, DbC

2010-09-13 Thread Bearophile

John Nagle:

> All signed Windows 7 drivers must pass Microsoft's static checker,
> which checks that they don't have bad pointers and call all the
> driver APIs correctly.

I didn't know this, thank you. I'll read more about this topic.



Eric S. Johansson:

> If you are careless, assertions can build up to a
> significant percentage of the code base and Slow execution. the argument for
> dealing with this, last time I looked, was that you turn off assertions when
> your code is "running". This logic is flawed because bugs exist and will 
> assert
> themselves at the worst possible time which usually means after you turned off
> the assertions.

These are real problems.

In some situations the code is "fast enough" even with those runtime
tests, so in those situations it's better to keep them active even in
release builds or when you use the program (this is often true for
small script-like programs).

But in some situations you want a faster final program. Running the
run-time contracts only when you test the code and removing them when
you run the program for real sounds silly or dangerous. But life is
made of trade-offs, and running those contracts during testing is
better than _never_ running them. In practice if your tests are done
well enough (they are similar to the real usage of the program), the
contracts are able to catch most of the bugs anyway before the release
build.

The run-time contracts may slow down the code, but there are few ways
to reduce this problem. You have to write your contracts taking care
of not changing the computational complexity of the routine/method
they guard (so you usually don't want to perform a O(n) test for a
routine with O(n ln n) worst-case complexity).

You run your tests often, so if some tests are too much slow you see
it soon and you may fix the problem (the same is true for slow unit
tests).

In D there are several ways to "layer" the work you perform at
runtime, there is not just the release build and test build. You have
the version and debug statements, you may use them inside DbC
contracts too. So in normal test builds I use faster contracts, but if
I spot a bug I lower the debug level and I recompile the D code,
activating heavier tests, to better spot where the bug is. One good
thing of DbC is that when the "density" of presence of contracts in
your programs gets higher of some threshold, most of the bugs present
in the code show themselves up as failed assertions/contracts. To me
it seems related to percolation theory :-) (http://en.wikipedia.org/
wiki/Percolation_threshold ).

In D you may also use enforce(), that's essentially a camouflaged
throw/raise statement, if you use it outside DbC contracts, they are
tests that run even in release builds when your contracts are
disabled.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: palindrome iteration

2010-09-14 Thread Bearophile

e("aibohphobia" * 1000)
True
>>> is_palindrome(list("aibohphobia"))
True
>>> is_palindrome([1, 2, 3])
Traceback (most recent call last):
  ...
AttributeError: 'int' object has no attribute 'lower'
"""
n = len(text)
i = 0
j = n - 1
while i < j:
assert _is_palindrome_loop_invariant(i, j, n)
if text[i].lower() != text[j].lower():
return False
else:
i += 1
j -= 1
return True


if __name__ == "__main__":
import doctest
doctest.testmod()
print "Doctests done."


is_palindrome([1, 2, 3]) fails not because the input is a list, but
because integers lack the lower() method. It's useful to add few
failing doctests too, to show what are the limits of the duck typing
of the function. As you see I have tried some corner cases, empty
string, single char, two chars, three chars, and more. I have also
added a 'stress test' with a larger input.

In Python this code is not efficient because of the interpreter
overhead and because I call a lower() method on each char (string of
length 1), so this is not good Python code. But it's easy to translate
to a lower level language, and I think Psyco does a good enough work
on it.

In C language (plus stdbool) the tolower() is more efficient, but
strlen() wastes some time. Often it's better to keep around the length
too of your strings, for example using a "fat pointer", as in D.


// C code
//#define NDEBUG
#include "assert.h"
#include "stdio.h"
#include "string.h"
#include "stdbool.h"
#include "ctype.h"
#include "stdlib.h"

bool is_palindrome_loop_invariant(int i, int j, int length) {
assert(i >= 0);
assert(i < length);
assert(j >= 0);
assert(j < length);
assert(i <= j);
assert(i == length - 1 - j);
return true;
}


bool is_palindrome(const char *text) {
const int n = strlen(text);
int i = 0;
int j = n - 1;
while (i < j) {
assert(is_palindrome_loop_invariant(i, j, n));
if (tolower(text[i]) != tolower(text[j])) {
return false;
} else {
i++;
j--;
}
}
return true;
}


void is_palindrome_test() {
assert(is_palindrome(""));
assert(is_palindrome("1"));
assert(is_palindrome("aa"));
assert(!is_palindrome("ab"));
assert(!is_palindrome("abc"));
assert(is_palindrome("aba"));
assert(is_palindrome("abA"));
assert(is_palindrome("AbA"));
assert(is_palindrome("ABA"));
assert(is_palindrome("aibohphobia"));
assert(!is_palindrome("ab"));

const int n = 1000;
const char *s = "aibohphobia";
const int len_s = strlen(s);
char *text = malloc(n * len_s + 1);
if (text == NULL)
exit(EXIT_FAILURE);
int i;
char *p = text;
for (i = 0; i < n; i++) {
memcpy(p, s, len_s);
p += len_s;
}
text[n * len_s] = '\0';
// puts(text);
assert(is_palindrome(text));
}


int main() {
is_palindrome_test();
return EXIT_SUCCESS;
}


C experts (or good C lints) are probably able to find some loss of
precision or loss of sign errors in the following lines:

const int n = strlen(text);
const int len_s = strlen(s);
char *text = malloc(n * len_s + 1);
memcpy(p, s, len_s);

Writing perfect C code isn't an easy art. (I have used signed values
because writing correct C code with unsigned values is a pain, despite
avoids loss of precision or sign).


Compiling the C code with GCC 4.5.1, and disabled assertions (NDEBUG
defined):
-O3 -Wall -fomit-frame-pointer -S

The loop of is_palindrome() gets compiled to (x86, 32 bit):

L10:
addl$1, %esi
subl$1, %ebx
cmpl%ebx, %esi
jge L9
L4:
movsbl  (%edi,%esi), %eax
movl%eax, (%esp)
call_tolower
movl%eax, %ebp
movsbl  (%edi,%ebx), %eax
movl%eax, (%esp)
call_tolower
cmpl%eax, %ebp
je  L10


Those two calls to _tolower inside the loop don't help performance,
but if you know your strings contain only upper case and lower case
chars, and you are able to assume few other things on your machine,
it's easy to replace them with non-portable inlined code.

The addl and subl are the i++ and j--, GCC has moved them at top as
commonly done optimization.

The first cmpl is the (i < j) test, and the jge is a jump that ends
the loop when it's the right time to do so.

The movsbl go read one char, and put them in eax. The movl are
necessary to set arguments for the call to the tolower function.

The last cmpl compares the results of the tolower, and je (jump equal)
jumps to the start of the loop if they are equal.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

97 matches

Mail list logo