Re: Standard Deviation One-liner

2011-06-03 Thread Raymond Hettinger
On Jun 3, 10:55 am, Billy Mays  wrote:
> I'm trying to shorten a one-liner I have for calculating the standard
> deviation of a list of numbers.  I have something so far, but I was
> wondering if it could be made any shorter (without imports).
>
> Here's my function:
>
> a=lambda d:(sum((x-1.*sum(d)/len(d))**2 for x in d)/(1.*(len(d)-1)))**.5
>
> The functions is invoked as follows:
>
>  >>> a([1,2,3,4])
> 1.2909944487358056

Besides trying to do it one line, it is also interesting to write an
one-pass version with incremental results:

  http://mathcentral.uregina.ca/QQ/database/QQ.09.06/h/murtaza2.html

Another interesting avenue to is aim for highest possible accuracy.
Consider using math.fsum() to avoid rounding errors in the summation
of large numbers of nearly equal values.


Raymond

-
follow my python tips on twitter: @raymondh


-- 
http://mail.python.org/mailman/listinfo/python-list


Bloom Filter in 22 lines of Python (updated)

2011-06-03 Thread Raymond Hettinger
Thanks for all the feedback on the earlier post.

I've updated the recipe to use a cleaner API, simpler code,
more easily subclassable, and with optional optimizations
for better cache utilization and speed:

 http://code.activestate.com/recipes/577684-bloom-filter/


Raymond

--
follow my python tips and recipes on twitter: @raymondh


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Bloom Filter in 22 lines of Python (updated)

2011-06-08 Thread Raymond Hettinger
On Jun 6, 10:47 am, geremy condra  wrote:
> On Fri, Jun 3, 2011 at 1:17 PM, Raymond Hettinger  wrote:
> > Thanks for all the feedback on the earlier post.
>
> > I've updated the recipe to use a cleaner API, simpler code,
> > more easily subclassable, and with optional optimizations
> > for better cache utilization and speed:
>
> >  http://code.activestate.com/recipes/577684-bloom-filter/
>
> Any chance of this landing in collections?

I would be happy to add something like this to collections
once the API matures.

Right now, the constructor expects the user to know the desired
memory size and number of probes.  Those could be computed
from the maximum number of unique keys and the tolerable
error false positive rate.

The most convenient constructor could be:
b = BloomFilter(myseq)
where len(myseq) is presumed indicate the maximum number of
unique entries and there is a default tolerable false positive
rate of 1 in 10,000.

The API already lets the user substitute different methods
for get_probes().  It should also allow the bytearray to be
easily replaced by other objects with indexable access
(array objects, mmap objects, etc).

We could also provide union, intersection, and difference
operations but these would be a little awkward for general
purpose use because the result is only meaningful when
the sizes, probe functions, and input domain are all the same.

Fortunately, there is a lot of prior art, so the API design
can be informed by what has worked well for others.


Raymond















-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries?

2011-06-25 Thread Raymond Hettinger
On Jun 20, 9:43 pm, deathweaselx86  wrote:
> Howdy guys, I am new.
>
> I've been converting lists to sets, then back to lists again to get
> unique lists.
> e.g
>
> Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> foo 
> = ['1','2','3']
> >>> bar = ['2','5']
> >>> foo.extend(bar)
> >>> foo = list(set(foo))

That works great and is very clear.

If you want to also preserve order, use an OrderedDict:

>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionaries and incrementing keys

2011-06-25 Thread Raymond Hettinger
On Jun 14, 12:57 pm, Steve Crook  wrote:
> Today I spotted an alternative:
>
> dict[key] = dict.get(key, 0) + 1
>
> Whilst certainly more compact, I'd be interested in views on how
> pythonesque this method is.

It is very pythonesque in the it was the traditional one way to do it
(also one of the fastest ways).

Now we have collections.Counter which simplifies the code to:

   c = Counter()
   ...
   c[key] += 1

For existing keys, it is as fast as a regular dictionary (because it
is a dict subclass and it does not override or extend either
__getitem__ or __setitem__).  For new keys, it is a little slower
because it calls the __missing__ method which returns zero.

Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: blist question

2011-07-07 Thread Raymond Hettinger
On Jul 7, 5:08 am, dmitrey  wrote:
> hi all,
> I feel lack of native Python lists operations (e.g. taking N greatest
> elements with the involved key function and O(n) speed)

Take a look at heapq.nlargest()...


> and
> occasionally found blisthttp://pypi.python.org/pypi/blist/
> Its entry says it is better than Python list.

It should say:  better in some respects and worse in others

Do you honestly think that python's list implementation
as a simple array of pointers can be beaten in every
category of performance?


> Did anyone ensure?
> Will it ever be merged into Python source code?

It was rejected as a replacement for the existing list implementation.
There is some chance it will get accepted into the collections module
someday.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a little parsing challenge ☺

2011-07-17 Thread Raymond Hettinger
On Jul 17, 12:47 am, Xah Lee  wrote:
> i hope you'll participate. Just post solution here. Thanks.

http://pastebin.com/7hU20NNL


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a little parsing challenge ☺

2011-07-17 Thread Raymond Hettinger
On Jul 17, 7:15 am, Thomas 'PointedEars' Lahn 
wrote:
> Did you notice the excessive crosspost?  Please do not feed the troll.

IMO, this was a legitimate cross post since it is for a multi-language
programming challenge and everyone can learn from comparing the
results.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a little parsing challenge ☺

2011-07-17 Thread Raymond Hettinger
On Jul 17, 8:49 am, Thomas Boell  wrote:
> But why do you enumerate with start=1? Shouldn't you start with index 0?

The problem specification says that the the char number should match
the emacs goto-char function which is indexed from one, not from
zero.  This is testable by taking the output of the program and
running it through emacs to see that the cursor gets moved exactly to
the location of the mismatched delimiter.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list(), tuple() should not place at "Built-in functions" in documentation

2011-07-19 Thread Raymond Hettinger
On Jul 14, 6:21 pm, Inside  wrote:
> As telling in the subject,because "list" and "tuple" aren't functions,they 
> are types.Is that right?

list() and tuple() are in the right place in the documentation because
they would be harder to find if listed elsewhere.   Tools like str(),
int(), list(), tuple() etc tend to be used like functions, so the
current location in the docs is where they have been for years.

A simple fact of documentation that is that tools don't always fall
cleanly into distinct categories.  Accordingly, the Library Reference
includes str,int,list,tuple, etc in both Section 2 for builtin
functions and Section 5 for builtin types.

'nuff said,


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido rethinking removal of cmp from sort method

2011-03-26 Thread Raymond Hettinger
On Mar 26, 4:39 am, Mark Dickinson  wrote:
> On Mar 25, 2:00 pm, Stefan Behnel  wrote:
>
>
>
>
>
>
>
>
>
> > Westley Martínez, 25.03.2011 14:39:
>
> > > On Fri, 2011-03-25 at 07:11 +0100, Stefan Behnel wrote:
> > >> Steven D'Aprano, 25.03.2011 06:46:
> > >>> On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:
>
> >  It's probably the least justified builtin other than pow.
>
> > >>> I don't know about that. Correctly, efficiently and *quickly*
> > >>> implementing the three-argument version of pow is exactly the sort of
> > >>> thing that should be in the built-ins, or at least the standard library.
>
> > >> I think that touches it already. We have a "math" module, so why is 
> > >> there a
> > >> *builtin* for doing special math? How much code is there really that only
> > >> uses pow() and does not import "math"?
>
> > > pow() and math.pow() are different.
>
> > I don't find that a good excuse for pow() being a builtin. Why not just
> > rename it to "math.powmod" instead?
>
> +1 (modulo bikeshedding about the name).  I've long thought that three-
> argument pow belongs in the standard library rather than the core.
> And then the existence of the ** operator obviates the need for two-
> argument pow.

IMO, there is far too much willingness to churn long-standing APIs for
nearly zero benefit.  In my decade as a core Python developer, I've
come view deprecations as things that cause unnecessary pain for users
and is partially to blame for so many organizations staying with
older, unmaintained versions of Python.

When it comes to moving pow() to another module or changing the
signature for the __pow__ magic method, there's not much of a win to
be had, but it will affect tons of existing code (including examples
in printed books where pow() is commonly used in examples).

I think we (as a group) need to develop a strong aversion to breaking
existing APIs and instead focus on making existing APIs more robust or
giving them useful extensions.

The Python language is not like a personal library where changing APIs
has limited (and predictable costs).

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Understanding the Pystone measurement

2011-03-26 Thread Raymond Hettinger
On Mar 24, 9:21 pm, "tleeuwenb...@gmail.com" 
wrote:
> Hi there,
>
> Is there a good writeup of what the pystone measurement actually
> means? I'm working on benchmarking of some Python code at work, and
> I'm interested in how Pystone might be relevant to me. I've tried
> googling, but I can't find any introductory / definitional
> descriptions of what this module/measurement means.

Pystone provides an estimate of the relative performance of different
pieces of hardware running single threaded Python apps.

If your old laptop scores 12000 pystones and your new shiny macbook
scores 84000, then you can expect your non-i/o bound single threaded
python apps to run about seven times faster on the new machine.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Understanding the Pystone measurement

2011-03-26 Thread Raymond Hettinger
I forgot to mention that PyStone is just a Python translation from C
of the venerable Dhrystone benchmark.  See 
http://en.wikipedia.org/wiki/Dhrystone
for a short write-up on the history, purposes, and limitations of the
benchmark.

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: FW: ciao

2011-03-26 Thread Raymond Hettinger
On Mar 26, 11:34 am, MRAB  wrote:
> On 26/03/2011 18:07, bledar seferi wrote:
>
> >     3.Scrivere unafunsioncheprende comeargomentouna lista
> >     diinterierestituisce uninsieme contenentequei numerichesono2 o più
> >     voltenellalista fornita.Per esempio,seservecomelista di
> >     input=[1,2,3,3,4,4,5,6,7,7,7],quindila funzionerestituiràilset([3,4,7])
>
> >     Mi puoi aiutare a fare questo programma a python
>
> >     Grazie
>
> I'd use a defaultdict to count them.

Or use itertools.groupby() to group consecutive values, count the
length of each group, and emit the value whenever the count is more
than one:

>>> s = [1,2,3,3,4,4,5,6,7,7,7]
>>> [k for k, g in groupby(s) if len(list(g)) > 1]
[3, 4, 7]


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Guido rethinking removal of cmp from sort method

2011-03-28 Thread Raymond Hettinger
On Mar 28, 8:43 am, Steven D'Aprano  wrote:
> Thank you for spending the time to get some hard data, but I can't
> replicate your results since you haven't shown your code. Rather than
> attempt to guess what you did and duplicate it, I instead came up with my
> own timing measurements. Results are shown here, my code follows below:
>
> [steve@sylar ~]$ python2.7 sort_test.py
> Comparison function: 12.1256039143
> Key function: 3.51603388786
> Double sort: 2.33165812492
> cmp_to_key: 28.1129128933
>
> By far the best result comes from two calls to sort. Not far off is the
> key function solution. (Admittedly, coming up with a key function for non-
> numeric data would be challenging.) The comparison function, and the
> cmp_to_key solution, are clearly *much* slower.

On Mar 28, 8:43 am, Steven D'Aprano  wrote:
> Thank you for spending the time to get some hard data, but I can't
> replicate your results since you haven't shown your code. Rather than
> attempt to guess what you did and duplicate it, I instead came up with my
> own timing measurements. Results are shown here, my code follows below:
>
> [steve@sylar ~]$ python2.7 sort_test.py
> Comparison function: 12.1256039143
> Key function: 3.51603388786
> Double sort: 2.33165812492
> cmp_to_key: 28.1129128933
>
> By far the best result comes from two calls to sort. Not far off is the
> key function solution. (Admittedly, coming up with a key function for non-
> numeric data would be challenging.) The comparison function, and the
> cmp_to_key solution, are clearly *much* slower.

There is less here than meets the eye.  The cmp-function dispatch in
cmp_to_key() is written in pure Python.  Essentially, the timings are
showing the already well-known fact that call forwarding is faster in
C than in pure python.

Each of the O(n log n) comparisons is run through pure Python code
that like this:

  def __lt__(self, other):
  # mycmp is nested scope variable pointing to
  # the user-defined cmp-function
  return mycmp(self.obj, other.obj) < 0

I intend to add a C version of cmp_to_key() so that no trips around
the eval-loop are needed.  It should be about as efficient as
bound_method dispatch (very fast), leaving the user-supplied cmp-
function as the dominant cost in the handful of cases where the
superfast O(n) key-function approach can't be used for some reason.

The memory overhead should either stay the same or drop slightly
(currently at 14 bytes per element on a 32-bit build and 28 bytes per
element on a 64-bit build).

Also note that running timings on Py2.7 instead of Py3.2 disadvantages
key-functions.  In 2.7, key-functions use sortwrapper objects which
were removed in 3.2 (saving memory and reducing dispatch time
considerably).  Also, in Py2.7 cmp logic is called even when a key-
function is defined (because key-function logic was wrapped around the
already existing sort with cmp-logic) so you pay the price of both
(i.e. creating a 2-tuple for cmp arguments on every call).  In Py3.x
the cmp logic is gone, making the remaining key-function logic even
faster.  IOW, key-function logic is way faster and more space
efficient in 3.2.

One of the implications of the last paragraph is that if 2.7 style cmp-
logic were reinstated in 3.3, not only would it clutter the API with
two-ways-to-do-it, it would also disadvantage make the common case
(using key-functions) run much more slowly.  In other words, the
performance mavens will have shot themselves (and us) in the foot.

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Fun python 3.2 one-liner

2011-03-29 Thread Raymond Hettinger
from collections import Counter
from itertools import product

print('\n'.join('*'*(c//2000) for _,c in sorted(Counter(map(sum,
product(range(6), repeat=8))).items(


almost-normally-yours,

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fun python 3.2 one-liner

2011-03-29 Thread Raymond Hettinger
>>> print('\n'.join('*'*(c//2000) for _,c in sorted(Counter(map(sum, 
>>> product(range(6), repeat=8))).items(








*
***
*


**
*

*
*

**
**
***
**
**

*
*

*
**


*
***
*



Re-posting (forgot to attach the output).
Besides the interesting output, other
interesting virtues include very low
memory usage and that the inner-loop
pipeline runs entirely in C.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: delete namespaces

2011-03-29 Thread Raymond Hettinger
On Mar 29, 6:14 pm, monkeys paw  wrote:
> How do i delete a module namespace once it has been imported?
 . . .
> Then i make a modification to banner.py. When i import it again,
> the new changes are not reflected. Is there a global variable i can
> modify?

In Python2.x, you can use the reload() function:

Help on built-in function reload in module __builtin__:

reload(...)
reload(module) -> module

Reload the module.  The module must have been successfully
imported before.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Dictionary Descriptors

2011-03-30 Thread Raymond Hettinger
On the python-ideas list, someone made a wild proposal to add
descriptors to dictionaries.

None of the respondents seemed to realize that you could (not should,
just could) already implement this using hooks already present in the
language.  I'm posting an example here because I thought you all might
find it to be both interesting and educational.

For more details on how it works and how it relates to descriptors,
see http://mail.python.org/pipermail/python-ideas/2011-March/009657.html

Raymond

 sample code 

class MyDict(object):
def __init__(self, mapping):
self.mapping = mapping
def __getitem__(self, key):
value = self.mapping[key]
if hasattr(value, '__get__'):
print('Invoking descriptor on', key)
return value.__get__(key)
print('Getting', key)
return value
def __setitem__(self, key, value):
self.mapping[key] = value

class Property:
def __init__(self, getter):
self.getter = getter
def __get__(self, key):
return self.getter(key)

if __name__ == '__main__':
md = MyDict({})
md['x'] = 10
md['_y'] = 20
md['y'] = Property(lambda key: md['_'+key])
print(eval('x+y+1', {}, md))

 output 

Getting x
Invoking descriptor on y
Getting _y
31
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fun python 3.2 one-liner

2011-03-30 Thread Raymond Hettinger
On Mar 30, 2:19 am, Martin De Kauwe  wrote:
> what is the character limit on a one liner :P. Very interesting
> jesting apart, any more?

Sure, here are three one-liners using itertools.groupby() to emulate
some Unix pipelines:

  sort letters | uniq   # list unique values
  sort letters | uniq -c# count unique values
  sort letters | uniq -d# find duplicates

>>> from itertools import groupby

>>> [k for k, g in groupby(sorted('abracadabra'))]
['a', 'b', 'c', 'd', 'r']

>>> [(k, len(list(g))) for k, g in groupby(sorted('abracadabra'))]
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]

>>> [k for k, g in groupby(sorted('abracadabra')) if len(list(g)) > 1]
['a', 'b', 'r']


Raymond


P.S.  Of course, there are many ways to do this.

>>> sorted(set('abracadabra'))
['a', 'b', 'c', 'd', 'r']

>>> sorted(Counter('abracadabra').items())
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]

>>> sorted(k for k,c in Counter('abracadabra').items() if c > 1)
['a', 'b', 'r']
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: learn python the hard way exercise 42 help

2011-03-30 Thread Raymond Hettinger
On Mar 30, 6:48 am, neil harper  wrote:
> http://pastie.org/1735028
> hey guys play is confusing me, i get how next gets the first room, which
> is passed when the instance of Game() is created, but how does it get
> the next room?

It might help show calling patterns if you added print statements to
the while loop:

   def play(self):
next = self.start
while True:
room = getattr(self, next)
print "--- Calling the method:", room, "---"
next = room()
print "--- That method returned:", next, "---"

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: popular programs made in python?

2011-03-30 Thread Raymond Hettinger
On Mar 29, 7:32 am, Neil Alt  wrote:
> i mean made with python only, not just a small part of python.

BitTorrent was a huge success.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: delete namespaces

2011-03-30 Thread Raymond Hettinger
[monkeys paw]
> > How do i delete a module namespace once it has been imported?
 . . .
> > Then i make a modification to banner.py. When i import it again,
> > the new changes are not reflected.

[Terry Reedy]
> The best thing, if possible, is to restart the program.
> If you develop banner.py with adequate tests, you will want to restart
> the test anyway, and you should not need to modify much thereafter.

This is excellent advice.

You're much better-off starting fresh each time.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Directly Executable Files in Python

2011-03-30 Thread Raymond Hettinger
On Mar 28, 8:37 pm, Jordan Meyer  wrote:
> Is it possible to make a directly executable (such as .exe on Windows) file 
> from scripts written in Python? So as to prevent the end-user from having to 
> download an interpreter to run the program.

http://docs.python.org/faq/programming.html#how-can-i-create-a-stand-alone-binary-from-a-python-script


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why aren't copy and deepcopy in __builtins__?

2011-03-30 Thread Raymond Hettinger
On Mar 27, 8:29 pm, John Ladasky  wrote:
> Simple question.  I use these functions much more frequently than many
> others which are included in __builtins__.  I don't know if my
> programming needs are atypical, but my experience has led me to wonder
> why I have to import these functions.

I asked Guido about this once and he said that he didn't
consider them to be part of the core.  He worried that
they would be overused by beginners and they would be
a distraction from learning plain, simple Python which
doesn't often need either copy() or deepcopy().

AFAICT, he was right.  I've seen large projects where
deepcopy and copy where not used even once.

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best python games?

2011-03-30 Thread Raymond Hettinger
On Mar 25, 7:39 pm, sogeking99  wrote:
> hey guys, what are some of the best games made in python? free games
> really. like pygames stuff. i want to see what python is capable of.
>
> cant see any good one on pygames site really, though they have nothing
> like sort by rating or most downloaded as far as i can tell

At Pycon, I saw some impressive looking games written
during the PyWeek, Python Game Programming Challenge
http://www.pyweek.org/

I think they're fine examples of what Python is capable of.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Alias for an attribute defined in a superclass

2011-03-31 Thread Raymond Hettinger
On Mar 31, 3:14 pm, Ben Finney  wrote:
> Howdy all,
>
> I want to inherit from a class, and define aliases for many of its
> attributes. How can I refer to “the attribute that will be available by
> name ‘spam’ once this class is defined”?
>
>     class Foo(object):
>         def spam(self):
>             pass
>
>         def eggs(self):
>             pass
>
>     class Bar(Foo):
>         beans = Foo.spam
>         mash = Foo.eggs
>
> Is that the right way to do it?

For methods, that will work just fine.  For attributes, you will need
to make @property accessors that get and set the underlying attribute.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a better way to invert a list?

2011-04-05 Thread Raymond Hettinger
[Ian Kelly]
> Which is O(n).  If that is too verbose, you could also use a dictionary:
>
> def invert(p):
>     return dict(map(reversed, enumerate(p)))


def inv(p):
return dict(zip(p, itertools.count()))


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fun python 3.2 one-liner

2011-04-05 Thread Raymond Hettinger
On Apr 5, 6:38 am, Daniel Fetchinson 
wrote:
> >> what is the character limit on a one liner :P.
>
> > For PEP 8 compliance, 80 characters. :-)
>
> Yeah, but we don't live in the 80's or 90's anymore and our screens
> can support xterms (or let alone IDE widows) much wider than 80
> characters. I'm using 140 for python these days. Seriously, who would
> want to limit him/herself to 80 characters in 2011?

I wonder how many people will shorten otherwise clear
variable names just to get their lines to fit in 80 characters?


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is __root checked for in OrderedDict?

2011-04-07 Thread Raymond Hettinger
On Apr 7, 4:13 am, andrew cooke  wrote:
> If you look at the code 
> inhttp://hg.python.org/cpython/file/6adbf5f3dafb/Lib/collections/__init...the 
> attribute __root is checked for, and only created if missing.  Why?
>
> I ask because, from what I understand, the __init__ method will only be 
> called when the object is first being created, so __root will always be 
> missing.

First of all, three cheers for reading the source code!

A user can call __init__() even after the OrderedDict instance has
already been created.  If so, the __root attribute will already exist
and the whole operation becomes equivalent to an update().

You can see the same behavior with regular dictionaries:

>>> d = dict(a=1, b=2)
>>> d.__init__(b=4, c=5)
>>> d
{'a': 1, 'c': 5, 'b': 4}


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is __root checked for in OrderedDict?

2011-04-07 Thread Raymond Hettinger
On Apr 7, 2:40 pm, andrew cooke  wrote:
> Is that normal?  I mean, OK, it's possible (and yes I forgot it could be 
> called directly), but is there any usual reason to do so?

It's common for subclasses to call their parent's __init__ method, so
that should emulate dict as nearly as possible to help avoid
surprises.

For the most part, client code won't use __init__ directly because
update() already provides equivalent functionality (but with a better
name).

For list.__init__(), the behavior is different.  It clears the list,
so someone may be using an __init__ call when it isn't practical to
use the syntactic equivalents "del a[:]" or "a[:] = []".

Outside those cases, I don't it is "normal", but who knows what future
programmers will need to do?


> I guess what I'm asking is: if I'm writing library code should I be this 
> careful?

For the standard library, it pays to really think out the corner cases
because you never know what people are going to rely on.  The Liskov
substitution principle asks us to make OrderedDict behave as much like
dict as possible (so that OD's can be used a drop-in substitute when
needed).

Also subclassers tend to stress the API in ways that wouldn't be
common for the base class.

For standard library code, I think all the little things matter:
* int(), bool(), list() etc all return a "zero"
  value when called with no arguments
* the ordered dict code internally calls __update()
  instead of update() so that a subclass can override
  update() without breaking the constructor
* eval(repr(obj)) should round-trip whereever possible
* containers should behave reasonable well even if someone
  makes the container reference itself:  a=[]; a.append(a)
* classes should try to be pickable, copyable, and deepcopyable, etc
 ...

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Tips on Speeding up Python Execution

2011-04-08 Thread Raymond Hettinger
On Apr 8, 12:25 am, Chris Angelico  wrote:
> On Fri, Apr 8, 2011 at 5:04 PM, Abhijeet Mahagaonkar
>
>  wrote:
> > I was able to isolate that major chunk of run time is eaten up in opening a
> > webpages, reading from them and extracting text.
> > I wanted to know if there is a way to concurrently calling the functions.
>
> So, to clarify: you have code that's loading lots of separate pages,
> and the time is spent waiting for the internet? If you're saturating
> your connection, then this won't help, but if they're all small pages
> and they're coming over the internet, then yes, you certainly CAN
> fetch them concurrently. As the Perl folks say, There's More Than One
> Way To Do It; one is to spawn a thread for each request, then collect
> up all the results at the end. Look up the 'threading' module for
> details:
>
> http://docs.python.org/library/threading.html

The docs for Python3.2 have a nice example for downloading multiple
webpages in parallel:

http://docs.python.org/py3k/library/concurrent.futures.html#threadpoolexecutor-example

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generators and propagation of exceptions

2011-04-08 Thread Raymond Hettinger
On Apr 8, 8:55 am, r  wrote:
> I had a problem for which I've already found a "satisfactory"
> work-around, but I'd like to ask you if there is a better/nicer
> looking solution. Perhaps I'm missing something obvious.
>
> The code looks like this:
>
> stream-of-tokens = token-generator(stream-of-characters)
> stream-of-parsed-expressions = parser-generator(stream-of-tokens)
> stream-of-results = evaluator-generator(stream-of-parsed-expressions)
>
> each of the above functions consumes and implements a generator:
>
> def evaluator-generator(stream-of-tokens):
>   for token in stream-of-tokens:
>     try:
>        yield token.evaluate()           # evaluate() returns a Result
>     except Exception as exception:
>        yield ErrorResult(exception) # ErrorResult is a subclass of Result
>
> The problem is that, when I use the above mechanism, the errors
> propagate to the output embedded in the data streams. This means, I
> have to make them look like real data (in the example above I'm
> wrapping the exception with an ErrorExpression object) and raise them
> and intercept them again at each level until they finally trickle down
> to the output. It feels a bit like writing in C (checking error codes
> and propagating them to the caller).

You're right, checking and propagating at each level doesn't feel
right.

If you need an exception to break out of the loops and close the
generators, then it seems you should just ignore the exception on
every level except for the one where you really want to handle the
exception (possibly in an outer control-loop):

  while True:
  try:

  for result in evaluator-generator(stream-of-parsed-
expressions):


 let the exception float up to


>
> OTOH, if I don't intercept the exception inside the loop, it will
> break the for loop and close the generator. So the error no longer
> affects a single token/expression but it kills the whole session. I
> guess that's because the direction flow of control is sort of
> orthogonal to the direction of flow of data.
>
> Any idea for a working and elegant solution?
>
> Thanks,
>
> -r

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generators and propagation of exceptions

2011-04-08 Thread Raymond Hettinger
On Apr 8, 8:55 am, r  wrote:
> I had a problem for which I've already found a "satisfactory"
> work-around, but I'd like to ask you if there is a better/nicer
> looking solution. Perhaps I'm missing something obvious.
>
> The code looks like this:
>
> stream-of-tokens = token-generator(stream-of-characters)
> stream-of-parsed-expressions = parser-generator(stream-of-tokens)
> stream-of-results = evaluator-generator(stream-of-parsed-expressions)
>
> each of the above functions consumes and implements a generator:
>
> def evaluator-generator(stream-of-tokens):
>   for token in stream-of-tokens:
>     try:
>        yield token.evaluate()           # evaluate() returns a Result
>     except Exception as exception:
>        yield ErrorResult(exception) # ErrorResult is a subclass of Result
>
> The problem is that, when I use the above mechanism, the errors
> propagate to the output embedded in the data streams. This means, I
> have to make them look like real data (in the example above I'm
> wrapping the exception with an ErrorExpression object) and raise them
> and intercept them again at each level until they finally trickle down
> to the output. It feels a bit like writing in C (checking error codes
> and propagating them to the caller).

You could just let the exception go up to an outermost control-loop
without handling it at all on a lower level.  That is what exceptions
for you: terminate all the loops, unwind the stacks, and propagate up
to some level where the exception is caught:

  while 1:
 try:
results = evaluator-generator(stream-of-parsed-expressions)
for result in results:
print(result)
 except Exception as e:
handle_the_exception(e)

OTOH, If you want to catch the exception at the lowest level and wrap
it up as data (the ErrorResult in your example), there is a way to
make it more convenient.  Give the ErrorResult object some identify
methods that correspond to the methods being called by upper levels.
This will let the object float through without you cluttering each
level with detect-and-reraise logic.

   class ErrorResult:
   def __iter__(self):
   # pass through an enclosing iterator
   yield self

Here's a simple demo of how the pass through works:

>>> from itertools import *
>>> list(chain([1,2,3], ErrorResult(), [4,5,6]))
[1, 2, 3, <__main__.ErrorResult object at 0x2250f70>, 4, 5, 6]


> Any idea for a working and elegant solution?

Hope these ideas have helped.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generators and propagation of exceptions

2011-04-08 Thread Raymond Hettinger
On Apr 8, 12:47 pm, r  wrote:
> Anyway, thank you all for helping me out and bringing some ideas to
> the table. I was hoping there might be some pattern specifically
> designed for thiskind of job (exception generators anyone?), which
> I've overlooked. If not anything else, knowing that this isn't the
> case, makes me feel better about the solution I've chosen.

Sounds like you've gathered a bunch of good ideas and can now be
pretty sure of your design choices.

While it doesn't help your current use case, you might be interested
in the iter_except() recipe in 
http://docs.python.org/py3k/library/itertools.html#itertools-recipes

Raymond

twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating unit tests on the fly

2011-04-08 Thread Raymond Hettinger
On Apr 8, 12:10 pm, Roy Smith  wrote:
> I can even create new test cases from these on the fly with something
> like:
>
>  newClass = type("newClass", (BaseSmokeTest,), {'route': '/my/newly/
> discovered/anchor'})
>
> (credit 
> tohttp://jjinux.blogspot.com/2005/03/python-create-new-class-on-fly.html
> for that neat little trick).  The only thing I don't see is how I can
> now get unittest.main(), which is already running, to notice that a
> new test case has been created and add it to the list of test cases to
> run.  Any ideas on how to do that?

The basic unittest.main() runner isn't well suited to this task.  It
flows in a pipeline of discovery -> test_suite -> test_runner.

I think you're going to need a queue of tests, with your own test
runner consuming the queue, and your on-the-fly test creator running
as a producer thread.

Writing your own test runner isn't difficult.  1) wait on the queue
for a new test case. 2) invoke test_case.run() with a TestResult
object to hold the result 3) accumulate or report the results 4)
repeat forever.

Raymond

twitter: @raymondh

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python on iPad (PyPad)

2011-04-09 Thread Raymond Hettinger
On Apr 8, 10:13 pm, Jon Dowdall  wrote:
> Hi All,
>
> Sorry for the blatant advertising but hope some of you may be interested
> to know that I've created an iPad application containing the python
> interpreter and a simple execution environment. It's available in iTunes
> athttp://itunes.apple.com/us/app/pypad/id428928902?mt=8#
>
> I wanted to have python available 'on the go' without carrying a laptop.
> The current implementation is based on my need to test simple python
> functions in an isolated environment. I hope to add more iOS specific
> capabilities if there is enough interest.

I'm definitely interested in your adding more capabilities.

Nice work so far.  I'm surprised at the number of things that you got
work in the first version: long integers, simple classes, a number of
modules successfully import (math and itertools for example).

I don't now how difficult the work is, but it would be nice to have a
way to save multiple scripts, have an interactive prompt, and get more
modules to work (random, collections, functools).

Thanks again for your work.


Raymond
twitter: @raymondh



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dict.setdefault()

2011-04-11 Thread Raymond Hettinger
On Apr 11, 2:35 pm, rantingrick  wrote:
> setdefault should take **kw args in the case of needing to set
> multiple defaults at one time. I would even settle for an *arg list if
> i had to. Anything is better than...
>
> d.setdefault(blah, blah)
> d.setdefault(blah, blah)
> d.setdefault(blah, blah)
> d.setdefault(blah, blah)
> if blah is not blah:
>     d.setdefault(blah, blah)

I hate to feed a troll, but I'm curious when you see that pattern of
calls.  The typical use case always makes immediate use of the value
returned by setdefault():

   d.setdefault(key, []).append()

The pattern you've shown would more typically be expressed like this:

   for key in multiple_defaults.items():
   if key, default_value not in d:
   d[key] = default_value

Or people sometimes use updates to get the same effect:

   result = multiple_defaults.copy()
   result.update(d)

Another approach is to use something like ChainMap() which has been
added to Python 3.3. 
http://docs.python.org/dev/library/collections.html#collections.ChainMap

   result = ChainMap(d, multiple_defaults)


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dict.setdefault()

2011-04-11 Thread Raymond Hettinger
On Apr 11, 4:25 pm, Tim Chase  wrote:
> Finally, if it were added, I'd call it something like merge()

Guido rejected merge() a long time ago.

Anyway, there is a new ChainMap() tool in the collections module for
Py3.3 that should address a number of use cases for handling default
values.


Raymond

twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Questions about GIL and web services from a n00b

2011-04-15 Thread Raymond Hettinger
> > Is the limiting factor CPU?
>
> > If it isn't (i.e. you're blocking on IO to/from a web service) then the
> > GIL won't get in your way.
>
> > If it is, then run as many parallel *processes* as you have cores/CPUs
> > (assuming you're designing an application that can have multiple
> > instances running in parallel so that you can run over multiple servers
> > anyway).
>
> Great question.  At this point, there isn't a limiting factor, but yes
> the concern is around CPU in the future with lots of threads handling
> many simultaneous transactions.

In the Python world, the usual solution to high transaction loads is
to use event-driven processing (using an async library such as
Twisted) rather than using multi-threading which doesn't scale well in
any language.

Also, the usual way to take advantage of multiple-cores is to run
multiple pythons in separate processes.

Threading is really only an answer if you need to share data between
threads, if you only have limited scaling needs, and are I/O bound
rather than CPU bound


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Equivalent code to the bool() built-in function

2011-04-18 Thread Raymond Hettinger
On Apr 16, 1:24 pm, candide  wrote:
> Consider the following code :
>
> # --
> def bool_equivalent(x):
>      return True if x else False

It's faster to write:

def bool_equivalent(x):
return not not x


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: suggestions, comments on an "is_subdict" test

2011-04-23 Thread Raymond Hettinger
On Apr 22, 8:18 am, MRAB  wrote:
> On 22/04/2011 15:57, Irmen de Jong wrote:
>
>
>
>
>
>
>
> > On 22-4-2011 15:55, Vlastimil Brom wrote:
> >> Hi all,
> >> I'd like to ask for comments or advice on a simple code for testing a
> >> "subdict", i.e. check whether all items of a given dictionary are
> >> present in a reference dictionary.
> >> Sofar I have:
>
> >> def is_subdict(test_dct, base_dct):
> >>      """Test whether all the items of test_dct are present in base_dct."""
> >>      unique_obj = object()
> >>      for key, value in test_dct.items():
> >>          if not base_dct.get(key, unique_obj) == value:
> >>              return False
> >>      return True
>
> >> I'd like to ask for possibly more idiomatic solutions, or more obvious
> >> ways to do this. Did I maybe missed some builtin possibility?
>
> > I would use:
>
> > test_dct.items()<= base_dct.items()
>
> In Python 2:
>
>  >>> test_dct = {"foo": 0, "bar": 1}
>  >>> base_dct = {"foo": 0, "bar": 1, "baz": 2}
>  >>>
>  >>> test_dct.items() <= base_dct.items()
> False
>
> In Python 3:
>
>  >>> test_dct = {"foo": 0, "bar": 1}
>  >>> base_dct = {"foo": 0, "bar": 1, "baz": 2}
>  >>> test_dct.items() <= base_dct.items()
> True
>
> YMMV

That is because it is spelled differently in Python 2.7:

>>> test_dct = {"foo": 0, "bar": 1}
>>> base_dct = {"foo": 0, "bar": 1, "baz": 2}
>>> test_dct.viewitems() <= base_dct.viewitems()
True

Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Simple map/reduce utility function for data analysis

2011-04-25 Thread Raymond Hettinger
Here's a handy utility function for you guys to play with:

http://code.activestate.com/recipes/577676/


Raymond
twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simple map/reduce utility function for data analysis

2011-04-26 Thread Raymond Hettinger
On Apr 25, 7:42 pm, Paul Rubin  wrote:
> Raymond Hettinger  writes:
> > Here's a handy utility function for you guys to play with:
> >    http://code.activestate.com/recipes/577676/
>
> Cute, but why not use collections.defaultdict for the return dict?
> Untested:

My first draft had a defaultdict but that implementation detail would
get exposed to the user unless the return value was first coerced to a
regular dict.  Also, I avoided modern python features so the code
would run well on psyco and so that it would make sense to beginning
users.


> Untested:
>   d = defaultdict(list)
>   for key,value in ifilter(bool,imap(mapper, data)):
>  d[key].append(value)
>   ...

Nice use of itertools.  FWIW, ifilter() will accept None for the first
argument -- that's a bit faster than using bool().


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case study: debugging failed assertRaises bug

2011-04-26 Thread Raymond Hettinger
On Apr 25, 11:05 pm, Steven D'Aprano  wrote:
> I've just spent two hours banging my head against what I *thought*
> (wrongly!) was a spooky action-at-a-distance bug in unittest, so I
> thought I'd share it with anyone reading.

Thanks for telling your story.
I'm sure the lessons learned
will be helpful to your readers.


Raymond
twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: De-tupleizing a list

2011-04-26 Thread Raymond Hettinger
On Apr 25, 8:28 pm, Gnarlodious  wrote:
> I have an SQLite query that returns a list of tuples:
>
> [('0A',), ('1B',), ('2C',), ('3D',),...
>
> What is the most Pythonic way to loop through the list returning a
> list like this?:
>
> ['0A', '1B', '2C', '3D',...

You could unpack the 1-tuple the same way you would with a 2-tuple.

>>> result = [('0A',), ('1B',), ('2C',), ('3D',)]
>>> for elem, in result:
print elem


0A
1B
2C
3D


Raymond
http://twitter.com/raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Have you read the Python docs lately?

2011-04-27 Thread Raymond Hettinger
A number of developers have been working on adding examples and useful
advice to the docs.  To sharpen your skills, here are some pieces of
recommended reading:

http://docs.python.org/dev/library/heapq.html#priority-queue-implementation-notes

http://docs.python.org/dev/library/bisect.html#searching-sorted-lists

http://docs.python.org/dev/library/re.html#writing-a-tokenizer

http://docs.python.org/dev/library/cmd.html#cmd-example

http://docs.python.org/dev/library/collections.html#ordereddict-examples-and-recipes

http://docs.python.org/dev/howto/logging.html

http://docs.python.org/dev/howto/sorting.html

http://docs.python.org/dev/library/collections.html#collections.namedtuple


Raymond

python tips on twitter: @raymondh



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Have you read the Python docs lately?

2011-04-28 Thread Raymond Hettinger
On Apr 27, 11:28 pm, Paul Rubin  wrote:
> Raymond Hettinger  writes:
> > A number of developers have been working on adding examples and useful
> > advice to the docs.  To sharpen your skills, here are some pieces of
> > recommended reading:
>
> Thanks, those are nice.  The logging one looks especially useful.  The
> module always looked very confusing to me (too Java-like), and I've
> dreaded the day when I might have to figure out how to use it instead of
> my own ad-hoc logging.  I can sleep better now ;-).

Vinay put also put together a logging cookbook:

   http://docs.python.org/howto/logging-cookbook.html


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Coolest Python recipe of all time

2011-05-02 Thread Raymond Hettinger
I think it is time to give some visibility to some of the instructive
and very cool recipes in ActiveState's python cookbook.

My vote for the coolest recipe of all time is:


http://code.activestate.com/recipes/365013-linear-equations-solver-in-3-lines/

What are your favorites?


Raymond
twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-03 Thread Raymond Hettinger
On May 2, 11:29 pm, Gregory Ewing  wrote:
> Terry Reedy wrote:
> > The trick is that replacing x with j and evaluating
> > therefore causes (in Python) all the coefficients of x (now j) to be
> > added together separately from all the constant terms to reduce the
> > linear equation to a*x+b (= 0 implied).
>
> Hmmm... so if we used quaternions, could we solve systems
> of linear equations in 3 variables?

Yes :-)

The implementation of a Quanternion class and the Quartic equation is
left as an exercise for the reader ;-)


Raymond
@raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-03 Thread Raymond Hettinger
On May 2, 10:04 pm, Stefan Behnel  wrote:
> The bad thing about this recipe is that it requires quite a bit of
> background knowledge in order to infer that the code the developer is
> looking at is actually correct. At first sight, it looks like an evil hack,
> and the lack of documentation doesn't help either.

The recipe is cool in the same way that a magic trick is cool.

A practical recipe would use a more general purpose method (perhaps
using finite differences or continued fractions); it would have
documentation and tests; it would accept a regular python function
instead of a string; and it would avoid an unsanitized eval().  But
then it would completely lose its panache, its flourish, and the
pleasant gratification that you get when solving the riddle of how it
works.

We should have a separate thread for the most practical, best
documented, least surprising, and most boring recipe ;-)


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-03 Thread Raymond Hettinger
On May 2, 11:23 pm, Stefan Behnel  wrote:
> Terry Reedy, 03.05.2011 08:00:
>
> > On 5/3/2011 1:04 AM, Stefan Behnel wrote:
>
> >> The bad thing about this recipe is that it requires quite a bit of
> >> background knowledge in order to infer that the code the developer is
> >> looking at is actually correct.
>
> > The main math knowledge needed is the trivial fact that if a*x + b = 0,
> > then x = -b/a. The other math knowledge needed is that complex numbers add
> > componentwise. The trick is that replacing x with j and evaluating
> > therefore causes (in Python) all the coefficients of x (now j) to be added
> > together separately from all the constant terms to reduce the linear
> > equation to a*x+b (= 0 implied).
>
> As your above paragraph proves, it's the kind of implementation that
> requires three lines of executing code and at least 6 lines of additional
> comment. Hopefully accompanied by an excuse of the developer.

If you found nothing educational, interesting, or amusing about the
three-line linear equation solver, then you're *really* going to hate
this one:

  
http://groups.google.com/group/comp.lang.python/browse_frm/thread/e46de4596e93188b/


Raymond
@raymondh



-- 
http://mail.python.org/mailman/listinfo/python-list


Today's fun and educational Python recipe

2011-05-04 Thread Raymond Hettinger
Here's a 22-line beauty for a classic and amazing algorithm:
http://bit.ly/bloom_filter

The wiki article on the algorithm is brief and well-written:
http://en.wikipedia.org/wiki/Bloom_filter

It turns out that people in the 1970's were pretty smart :-)


Raymond

---
follow my other python tips and recipes on twitter:  @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Today's fun and educational Python recipe

2011-05-04 Thread Raymond Hettinger
> > It turns out that people in the 1970's were pretty smart :-)
>
> I think that often, the cleverness of people is inversely proportional
> to the amount of CPU power and RAM that they have in their computer.

The Google guys have plenty of CPU power *and* plenty of
cleverness :-)

According to the wikipedia article, Google BigTable uses Bloom filters
to reduce the disk lookups for non-existent rows or column.  The
Google Chrome web browser also uses Bloom filters to speed up its Safe
Browsing service.


> Also: wasn't there a talk on Pycon in which a bloom filter was mentioned?

Yes!  As a matter of fact there was:
http://www.slideshare.net/c.titus.brown/pycon-2011-talk-ngram-assembly-with-bloom-filters


Raymond

---
follow my other python tips and recipes on twitter:  @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Today's fun and educational Python recipe

2011-05-04 Thread Raymond Hettinger
On May 4, 12:42 pm, Terry Reedy  wrote:
> On 5/4/2011 2:17 PM, Raymond Hettinger wrote:
>
> > Here's a 22-line beauty for a classic and amazing algorithm:
> >http://bit.ly/bloom_filter
>
> > The wiki article on the algorithm is brief and well-written:
> >http://en.wikipedia.org/wiki/Bloom_filter
>
> As I understand the article, the array of num_bits should have
> num_probes (more or less independent) bits set for each key. But as I
> understand the code
>
>          for i in range(self.num_probes):
>              h, array_index = divmod(h, num_words)
>              h, bit_index = divmod(h, 32)
>              yield array_index, 1 << bit_index
>
> the same bit is being set or tested num_probes times. The last three
> lines have no dependence on i that I can see, so they appear to do the
> same thing each time. This seems like a bug.

The 512 bits in h are progressively eaten-up between iterations.  So
each pass yields a different (array index, bit_mask) pair.

It's easy to use the interactive prompt to show that different probes
are produced on each pass:

>>> bf = BloomFilter(num_bits=1000, num_probes=8)
>>> pprint(list(bf.get_probes('Alabama')))
[(19, 1073741824),
 (11, 64),
 (9, 134217728),
 (25, 1024),
 (24, 33554432),
 (6, 16),
 (7, 16777216),
 (22, 1048576)]

The 512 bits are uncorrelated -- otherwise sha512 wouldn't be much of
a cryptographic hash ;)

The fifty state example in the recipe is a reasonable demonstration
that the recipe works as advertised.  It successfully finds all fifty
states (the true positives) and it tries 100,000 negatives resulting
in only a handful of false negatives.  That should be somewhat
convincing that it all works.


Raymond

---
follow my other python tips and recipes on twitter:  @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Today's fun and educational Python recipe

2011-05-04 Thread Raymond Hettinger
On May 4, 12:27 pm, Paul Rubin  wrote:
> Raymond Hettinger  writes:
> > Here's a 22-line beauty for a classic and amazing algorithm:
> >http://bit.ly/bloom_filter
>
> The use of pickle to serialize the keys is a little bit suspicious if
> there might be a reason to dump the filter to disk and re-use it in
> another run of the program.  Pickle representation might change between
> Python releases, for example.  It's just supposed to stay interoperable
> between versions, not necessarily bitwise-identical.
>
> Otherwise it's quite nice.  I'd suggest adding a .update() operation
> that adds keys from a user-supplied iterator.

I chose pickle because it solved the problem of turning arbitrary
objects into bytes which are needed as inputs to sha512.

It seems that this particular choice of hash function is distracting
some readers away from the interesting part of how a Bloom filter
works.

Since the choice of hash functions is completely arbitrary, I'm
thinking of substituting something a little more basic.  What do you
think of this alternative as a way to make it clearer that each
successive probe is uncorrelated, yet fully dependent on the key?

def get_probes(self, key):
hasher = Random(key).randrange
num_words = len(self.arr)
for _ in range(self.num_probes):
array_index = hasher(num_words)
bit_index = hasher(32)
yield array_index, 1 << bit_index


Raymond
---

follow my other python tips and recipes on twitter:  @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Today's fun and educational Python recipe

2011-05-04 Thread Raymond Hettinger
On May 4, 5:26 pm, Terry Reedy  wrote:
> The test would be more convincing to many with 10 other geographic
> names (hard to come by, I know), or other english names or words or even
> with longer random strings that matched the lengths of the state names.
> But an average of 5/10 false positives in 5 runs is good.

I've just posted an update with a spell checker using a large
dictionary.  That should make it easy to validate against some real
world text samples.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-06 Thread Raymond Hettinger
[Steven D'Aprano]:
> As written, amb is just a brute-force solver using more magic than is
> good for any code, but it's fun to play with.

With a small change in API, much of the magic isn't needed.

from itertools import product

def amb(func, *argument_ranges):
for args in product(*argument_ranges):
if func(*args):
print(args)

amb(lambda a,b,c,d: 5*a+10*b+20*c+50*d == 45,
range(3, 21),   # number of 5 cent coins
range(11),  # number of 10 cent coins
range(6),   # number of 20 cent coins
range(3),   # number of 50 cent coins
)

def test(a, b, c, d):
s = "The %s brown %s jumped over the %s %s." % (a, b, c, d)
num_vowels = sum(s.count(c) for c in 'aeiou')
return num_vowels in (12, 18, 19)

amb(test,
['quick', 'slow', 'hungry', 'wise-old'],
['fox', 'hare', 'turtle', 'kangaroo'],
['lazy', 'stupid', 'sleepy', 'confused'],
['dog', 'aardvark', 'sloth', 'wombat'],
)

amb(lambda x, y, z: x*x + y*y == z*z,
range(1, 11),
range(1, 11),
range(1, 11),
)


Raymond


follow my recipes and tips on twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: checking if a list is empty

2011-05-06 Thread Raymond Hettinger
On May 5, 11:36 pm, Jabba Laci  wrote:
> Hi,
>
> If I want to check if a list is empty, which is the more pythonic way?
>
> li = []
>
> (1) if len(li) == 0:
> ...
> or
> (2) if not li:

The Python core developers use the second form.
See http://www.python.org/dev/peps/pep-0008/
for the official recommendation.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-07 Thread Raymond Hettinger
On May 7, 1:29 am, Steven D'Aprano  wrote:
> On Fri, 06 May 2011 12:36:09 -0600, Ian Kelly wrote:
> > The amb engine would conceptually execute this function for every
> > possible combination of a, b, and c,
>
> Which pretty much is the definition of "brute-force solver", no?

FWIW, here's one of my favorite brute-force solvers:

   http://code.activestate.com/recipes/576615-alphametics-solver/


Raymond

---
follow my recipes and tips on twitter:  @raymondh

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Coolest Python recipe of all time

2011-05-09 Thread Raymond Hettinger
On May 9, 2:31 am, Trent Nelson  wrote:
> > What are your favorites?
>
> I think I've posted this before, but I love my 
> 3-lines-if-you-ignore-the-scaffolding language translator.  Not because it's 
> clever code -- quite the opposite, the code is dead simple -- but because it 
> encompasses one of the things I love about Python the most: it gets shit done.
>
>     In [1]: from translate import *
>
>     In [2]: translate('French', 'The quick brown fox jumped over the lazy 
> dog.')
>     Le renard brun rapide a sauté par-dessus le chien paresseux.
>
>     In [3]: translate('German', 'The quick brown fox jumped over the lazy 
> dog.')
>     Der schnelle braune Fuchs sprang über den faulen Hund.
>
>     In [4]: translate('Spanish', 'The quick brown fox jumped over the lazy 
> dog.')
>     El zorro marrón rápido saltó sobre el perro perezoso.
>
> translate.py:
>
>     import sys
>     from urllib import urlopen, urlencode
>     from BeautifulSoup import BeautifulSoup
>
>     url = 'http://babelfish.altavista.com/tr'
>     languages = {
>         'French'    : 'en_fr',
>         'German'    : 'en_de',
>         'Italian'   : 'en_it',
>         'Spanish'   : 'en_es',
>         'Russian'   : 'en_ru',
>         'Portuguese': 'en_pt',
>         'Dutch'     : 'en_nl',
>         'Japanese'  : 'en_ja',
>     }
>
>     def translate(lang, text):
>         kwds = { 'trtext' : text, 'lp' : languages[lang]}
>         soup = BeautifulSoup(urlopen(url, urlencode(kwds)))
>         print soup.find('div', style='padding:10px;').string
>
>     if __name__ == '__main__':
>         translate(sys.argv[1], sys.argv[2])

That completely rocks!  Thanks for the recipe.


Raymond

--
follow my recipes and tips on twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: string formatting

2011-05-10 Thread Raymond Hettinger
> Which is the preferred way of string formatting?
>
> (1) "the %s is %s" % ('sky', 'blue')
>
> (2) "the {0} is {1}".format('sky', 'blue')
>
> (3) "the {} is {}".format('sky', 'blue')
>
> As I know (1) is old style. (2) and (3) are new but (3) is only
> supported from Python 2.7+.
>
> Which one should be used?


Sometimes, I use both ;-)
That can save you from ugly escapes such as %%s or {{0}}.

Here's an example from the standard library:
http://hg.python.org/cpython/file/7254c03b7180/Lib/collections.py#l235

Note the template has both {typename} formatting for the first pass
and %r style formatting in the generated code.

Raymond


follow my tips and recipes on twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3 dict question

2011-05-10 Thread Raymond Hettinger
On May 6, 12:40 pm, dmitrey  wrote:
> hi all,
> suppose I have Python dict myDict and I know it's not empty.
> I have to get any (key, value) pair from the dict (no matter which
> one) and perform some operation.
> In Python 2 I used mere
> key, val = myDict.items()[0]
> but in Python 3 myDict.items() return iterator.
> Of course, I could use
> for key, val in myDict.items():
>    do_something
>    break
> but maybe there is any better way?

If your use case allows the item to be removed, then use:

   key, val = myDict.popitem()

Otherwise, use:

   key, val = next(iter(MyDict.items()))

The latter is nice because next() allows you to supply a default
argument in case the dictionary is emtpy:

   key, val = next(iter(MyDict.items()), (None, None))

Raymond

-
follow my tips and recipes on twitter: @raymondh


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Faster Recursive Fibonacci Numbers

2011-05-24 Thread Raymond Hettinger
On May 17, 8:50 am, RJB  wrote:
> I noticed some discussion of recursion. the trick is to find a
> formula where the arguments are divided, not decremented.
> I've had a "divide-and-conquer" recursion for the Fibonacci numbers
> for a couple of years in C++ but just for fun rewrote it
> in Python.  It was easy.  Enjoy.  And tell me how I can improve it!
>
> def fibo(n):
>         """A Faster recursive Fibonaci function
> Use a formula from Knuth Vol 1 page 80, section 1.2.8:
>            If F[n] is the n'th Fibonaci number then
>                    F[n+m] = F[m]*F[n+1] + F[m-1]*F[n].
>   First set m = n+1
>    F[ 2*n+1 ] = F[n+1]**2 + F[n]*2.
>
>   Then put m = n in Knuth's formula,
>            F[ 2*n ] = F[n]*F[n+1] + F[n-1]* F[n],
>    and replace F[n+1] by F[n]+F[n-1],
>            F[ 2*n ] = F[n]*(F[n] + 2*F[n-1]).
> """
>         if n<=0:
>                 return 0
>         elif n<=2:
>                 return 1
>         elif n%2==0:
>                 half=n//2
>                 f1=fibo(half)
>                 f2=fibo(half-1)
>                 return f1*(f1+2*f2)
>         else:
>                 nearhalf=(n-1)//2
>                 f1=fibo(nearhalf+1)
>                 f2=fibo(nearhalf)
>                 return f1*f1 + f2*f2
>
> RJB the Lurkerhttp://www.csci.csusb.edu/dick/cs320/lab/10.html

There are many ways to write this function.  The one I like shows-off
a general purpose dynamic programming technique while staying *very*
close to a common textbook definition of a fibonacci number:

@functools.lru_cache()
def fibo(n):
return 1 if n < 2 else fibo(n-1) + fibo(n-2)

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() in class defs?

2011-05-25 Thread Raymond Hettinger
On May 25, 4:31 pm, Ian Kelly  wrote:
> Right.  It's unnecessary, so why saddle yourself with it?

FWIW, I expect to release a blog post tomorrow about the principal use
cases for super() and how to use it effectively.

With just a little bit of know-how, it can be an important tool in
your Python toolkit.

If any of the comp.lang.python readers want to review and comment on
my latest draft, please email me and I'll send it to you directly.

Cheers,


Raymond Hettinger

my email address is listed at 
http://users.rcn.com/python/download/Descriptor.htm

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parse config file and command-line arguments, to get a single collection of options

2011-05-25 Thread Raymond Hettinger
On May 25, 9:38 pm, Ben Finney  wrote:
> Howdy all,
>
> Python's standard library has modules for configuration file parsing
> (configparser) and command-line argument parsing (optparse, argparse). I
> want to write a program that does both, but also:
>
> * Has a cascade of options: default option values, overridden by config
>   file options, overridden by command-line options.
>
> * Reads a different, or even additional, configuration file if specified
>   on the command-line (e.g. --config-file foo.conf) and yet still obeys
>   the above cascade.
>
> * Allows a single definition of an option (e.g. logging level) to define
>   the same option for parsing from configuration files and the command
>   line.
>
> * Unifies the parsed options into a single collection for the rest of
>   the program to access without caring where they came from.
>
> How can I achieve this with minimum deviation from the Python standard
> library?

One thought is start with something like ChainMap,
http://code.activestate.com/recipes/305268-chained-map-lookups/?in=user-178123
, or some variant to unify multiple mapping objects into a single
prioritized collection.  A mapping for command line args can be made
by using vars() on an argparse namespace to create a dictionary.
ConfigParser's mapping is accessible via its get() method.  With a
ChainMap style object you can add other option sources such as
os.environ.  This should get you started on your grand unified, do-
everything-at-once vision with minimal deviation from the standard
library.

Raymond



-- 
http://mail.python.org/mailman/listinfo/python-list


Python's super() considered super!

2011-05-26 Thread Raymond Hettinger
I just posted a tutorial and how-to guide for making effective use of
super().

One of the reviewers, David Beazley, said, "Wow,  that's really
great!I see this becoming the definitive post on the subject"

The direct link is:

  http://rhettinger.wordpress.com/2011/05/26/super-considered-super/

It would also be great if some of you would upvote it on HackerNews.


Raymond Hettinger
---
follow my python tips on twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's super() considered super!

2011-05-26 Thread Raymond Hettinger
> It would also be great if some of you would upvote it on HackerNews.


Here's a link to the super() how-to-guide and commentary:  bit.ly/
iFm8g3

Raymod
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's super() considered super!

2011-05-27 Thread Raymond Hettinger
On May 26, 6:39 pm, Ben Finney  wrote:
> We also, though, need *real* URLs. Blind URLs through obfuscation
> services have their uses, but surely not in a forum like this. The real
> URL is http://news.ycombinator.com/item?id=2588262>.

Fair enough.  I had copied the link from Jesse's tweet (where shorter
is better).

Hope you enjoyed the post.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Class decorators might also be super too

2011-05-28 Thread Raymond Hettinger
David Beazley wrote a class decorator blog post that is worth reading:
  http://dabeaz.blogspot.com/2011/05/class-decorators-might-also-be-super.html

Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Class decorators might also be super too

2011-05-29 Thread Raymond Hettinger
On May 28, 11:33 pm, Michele Simionato 
wrote:
> He is basically showing that using mixins for implementing logging is not 
> such a good idea, i.e. you can get the same effect in a better way by making 
> use of other Python features. I argued the same thing many times in the past. 
> I even wrote a module once (strait) to reimplement 99% of multiple 
> inheritance without multiple inheritance, just to show that in can be done in 
> few lines of code in a language as powerful as Python.

More importantly, anyone who reads posts such as David' will walk away
with a deeper understanding of class decorators, mixins, and
inheritance.  That makes the post worthwhile even if someone never
ends up using those particular coding technique.


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float("nan") in set or as key

2011-05-29 Thread Raymond Hettinger
On May 28, 4:41 pm, MRAB  wrote:
> Here's a curiosity. float("nan") can occur multiple times in a set or as
> a key in a dict:

Which is by design.

NaNs intentionally have multiple possible instances (some
implementations even include distinct payload values).

Sets and dicts intentionally recognize an instance as being equal to
itself (identity-implies-equality); otherwise, you could put a NaN in
a set/dict but not be able to retrieve it.  Basic invariants would
fail -- such as:  assert all(elem in container for elem in container).

The interesting thing is that people experimenting with exotic objects
(things with random hash functions, things with unusual equality or
ordering relations, etc) are "surprised" when those objects display
their exotic behaviors.

To me, the "NaN curiousities" are among the least interesting.  It's
more fun to break sort algorithms with sets (which override the
ordering relations with subset/superset relations) or with an object
that mutates a list during the sort.  Now, that is curious :-)

Also, Dr Mertz wrote a Charming Python article full of these
curiosities:
http://gnosis.cx/publish/programming/charming_python_b25.txt

IMO, equality and ordering are somewhat fundamental concepts.  If a
class is written that twists those concepts around a bit, then it
should be no surprise if curious behavior emerges.  Heck, I would
venture to guess that something as simple as assuming the speed of
light is constant might yield twin paradoxes and other
curiousities ;-)

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 3.1.4 release candidate 1

2011-05-29 Thread Raymond Hettinger
On May 29, 3:44 pm, Benjamin Peterson  wrote:
> On behalf of the Python development team, I'm happy as a swallow to announce a
> release candidate for the fourth bugfix release for the Python 3.1
> series, Python
> 3.1.4.

The Pi release of Python :-)


Raymond

P.S.  For the most part, if you have a choice, then you are much
better off using Py3.2 than any release (even a bugfix release) of
Py3.1
-- 
http://mail.python.org/mailman/listinfo/python-list


Updated blog post on how to use super()

2011-05-31 Thread Raymond Hettinger
I've tightened the wording a bit, made much better use of keyword
arguments instead of kwds.pop(arg), and added a section on defensive
programming (protecting a subclass from inadvertently missing an MRO
requirement).  Also, there is an entry on how to use assertions to
validate search order requirements and make them explicit.

  http://bit.ly/py_super
 or
  http://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Any further suggestions are welcome.  I'm expecting this to evolve
into how-to guide to be included in the regular Python standard
documentation.  The goal is to serve as a reliable guide to using
super and how to design cooperative classes in a way that lets
subclasses compose and extent them.


Raymond Hettinger


follow my python tips on twitter: @raymondh
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Moronicity Xha Lee, Jargonizer

2005-09-29 Thread Raymond Hettinger
James Stroud wrote:
> There needs to be an email filter that, when a thread is begun by a specific
> user . . . it cans every
> message in that thread.

The tried-and-true solution is both simple and civil, "Don't feed the
trolls."


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Idle bytecode query on apparently unreachable returns

2005-10-11 Thread Raymond Hettinger
[Tom Anderson]:
> What puzzles me, though, are bytecodes 17, 39 and 42 - surely these aren't
> reachable? Does the compiler just throw in a default 'return None'
> epilogue, with routes there from every code path, even when it's not
> needed? If so, why?

Since unreachable code is never executed, there is no performance
payoff for optimizing it away.  It is not hard to write a dead-code
elimination routine, but why bother?  It would save a few bytes, slow
down compilation time, save nothing at runtime, and make the compiler
more complex/fragile.

FWIW, the peephole optimizer in Python/compile.c is mature -- the low
hanging fruit has already been harvested, leaving the field of
remaining optimizations somewhat barren.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random number generator thread safety

2005-11-08 Thread Raymond Hettinger
Mike Brown wrote:
> I have questions about thread safety in the 'random' module.
>
> When using the random.Random class (be it Mersenne Twister or Wichmann-Hill
> based), is it sufficiently thread-safe (preserving entropy and guarding
> against attack) to just have each thread work with its own random.Random
> instance? Or do I need to wrap my calls to each instance's methods to use
> locks? Wichmann-Hill in particular has the warning in its .random()
> vulnerability; do I need to make an exception for that case?

Thread-safety has nothing to do with preserving entropy or guarding
against attack.  All of the entropy in an MT sequence is contained in
the seed (upto 624 bytes) and that entropy is preserved through all
subsequent calls.

Create unique instances of MT generators when you want to be able to
reproduce the sequence later.

Nothing in the random module provides cryptographic guarantees.  If you
want crypto-strength, then use real encryption.  Search SourceForge for
patches that show how to use MD5 and other cryptographic hash functions
as a basis for random number generation.

As far as I'm concerned, there is no reason other than backwards
compatability to use the Wichmann-Hill generator -- the Mersenne
Twister is a much better generator.



Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: random number generator thread safety

2005-11-08 Thread Raymond Hettinger
> > Thread-safety has nothing to do with preserving entropy or guarding
> > against attack.  All of the entropy in an MT sequence is contained in
> > the seed (upto 624 bytes) and that entropy is preserved through all
> > subsequent calls.
>
> I think the concern is that there can be a thread switch after some
> intermediate result is computed but before the state is updated.

The concern expressed by the OP was "preserving entropy or guarding
against attack".

>  That
> would mean two threads can get random numbers that are identical or
> anyway correlated.  Whether that can happen with Python's MT, I don't
> know.

It can't.  That is what the docs mean when they say that MT is
thread-safe.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: which feature of python do you like most?

2005-11-08 Thread Raymond Hettinger
> which feature of python do you like most?

Indentation

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: exception raised by nested iterator being ignored by for loop

2005-11-19 Thread Raymond Hettinger
james t kirk wrote:
> I'm writing a wrapper class to handle the line merging and filtering
> for a log file analysis app
>
> The problem I'm running into is that the StopIteration exception
> raised when the wrapped file goes past EOF isn't causing the second
> for loop to stop.

Admiral Kirk,

The innermost for-loop is catching its own StopIteration.  You need to
explicitly raise StopIteration in your next() method when there are no
more lines in the file (hint, add an else-clause to the for-loop).

Alternatively, you could simplify your life by writing the whole
wrapper as a generator:


import gzip

def wrapper(filename) :
if filename[-3:] == ".gz" :
fh = gzip.GzipFile(filename, "r")
else :
fh = open(filename, "r")
for line in fh:
if line[:1] != "t": # filter out lines starting with 't'
yield line.rstrip()
fh.close()


Raymond Hettinger
Starfleet Command

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Underscores in Python numbers

2005-11-19 Thread Raymond Hettinger
Gustav Hållberg wrote:
> I tried finding a discussion around adding the possibility to have
> optional underscores inside numbers in Python. This is a popular option
> available in several "competing" scripting langauges, that I would love
> to see in Python.
>
> Examples:
>   1_234_567
>   0xdead_beef
>   3.141_592

I suppose it could be done.  OTOH, one could argue that most production
code has no business hardwiring-in numerical constants greater than 999
;-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hash()

2005-12-05 Thread Raymond Hettinger
[John Marshall]
> For strings of > 1 character, what are the chances
> that hash(st) and hash(st[::-1]) would return the
> same value?

Python's string hash algorithm is non-commutative, so a collision with
a reversed string is not likely.  The exact answer depends on the
population of strings being hashed, but it's not hard to compute
collision statistics for a sampling of those strings:

   collisions = len(sample) - len(set(hash(s) for s in sample))


FWIW, here is how Python computes string hash values:

static long
string_hash(PyStringObject *a)
{
register int len;
register unsigned char *p;
register long x;

len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (103*x) ^ *p++;
x ^= a->ob_size;
if (x == -1)
x = -2;
return x;
}



> My goal is to uniquely identify multicharacter strings,
> all of which begin with "/" and never end with "/".
> Therefore, st != st[::-1].

Just use a set -- no string reversal is needed for detection of unique
multicharacter strings..


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP 288 ponderings

2005-01-02 Thread Raymond Hettinger
[Steven Bethard]
> (1) What's the benefit of the generator versions of these functions over
> the class-based versions?

Generators are easier to write, are clearer, and run faster.

They automatically
* create a distinct generator-iterator object upon each invocation
* create the next() and idempotent __iter__() methods.
* raise StopIteration  upon termination
* remain stopped if next() is called too many times
* save their own local variable state, avoiding the need for self.var references
* resume execution at the point of the last yield



> (2) Since in all the examples there's a one-to-one correlation between
> setting a generator attribute and calling the generator's next function,
> aren't these generator attribute assignments basically just trying to
> define the 'next' parameter list?

They are not the same.  The generator needs some way to receive the values.  The
function arguments cannot be used because they are needed to create the
generator-iterator.  The yield statements likewise won't work because the first
yield is not encountered until well after the first next() call.

The given examples are minimal and are intended only to demonstrate the idea.



> I definitely don't like the
> idea of a magical __self__ variable that isn't declared anywhere.

It is no more magical than f.__name__ or f.__doc__ for functions.  The concept
is almost identical to that for threading.local().  Also, the __self__ argument
is a non-issue because there are other alternate approaches such as providing a
function that retrieves the currently running generator.  Which approach is
ultimately taken is a matter of aesthetics -- the PEP itself concentrates on the
core idea instead of debating syntax.

The real issue with PEP 288's idea for generator attributes is that the current
C implementation doesn't readily accommodate this change.  Major surgery would
be required :-(

The more important part of the PEP is the idea for generator exceptions.  The
need arises in the context of flushing/closing resources upon generator
termination.



Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why does UserDict.DictMixin use keys instead of __iter__?

2005-01-04 Thread Raymond Hettinger
[Steven Bethard]
> Sorry, my intent was not to say that I didn't know from the docs that
> UserDict.DictMixin required keys().  Clearly it's documented.  My
> question was *why* does it use keys()?  Why use keys() when keys() can
> be derived from __iter__, and __iter__ IMHO looks to be a more basic
> part of the mapping protocol.

Viewed from the present, __iter__() may seem more basic.  However, it is a
recent innovation.  The keys() method, on the other hand, goes back to the
beginning.  There were no shortage of mapping-like classes defining keys() but
not __iter__().

Still, if __iter__() is provided, UserDict.DictMixin will take advantage of it.
The same is also true for __contains__(), and iteritems().


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator expressions: performance anomaly?

2005-01-16 Thread Raymond Hettinger
"John Machin" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Please consider the timings below, where a generator expression starts
> out slower than the equivalent list comprehension, and gets worse:
>
> >python -m timeit -s "orig=range(10)" "lst=orig[:];lst[:]=(x for x
> in orig)"
 . . .
> >python -m timeit -s "orig=range(20)" "lst=orig[:];lst[:]=(x for x
> in orig)"

This has nothing to do with genexps and everything to do with list slice
assignment.

List slice assignment is an example of a tool with a special case optimization
for inputs that know their own length -- that enables the tool to pre-allocate
its result rather than growing and resizing in spurts.  Other such tools include
tuple(), map() and zip().

The incredulous tone of the posting suggests a presumption that genexps always
outperform list comps.  That is not the case. Sometimes you want a list instead
of a generator because you want more than just iteration; perhaps the consumer
function may be able to take advantage of length reporting, indexing, slicing,
re-iterability or other useful list behaviors.

Also, the timing jig bites.  Use xrange() instead of range().  Otherwise,
there's an immediate forfeit of the memory savings normally being sought by the
use of generators and iterators.



> Background: There was/is a very recent thread about ways of removing
> all instances of x from a list. /F proposed a list comprehension to
> build the result list.

FWIW, deques have a natural idiom for this kind of situation:

for i in xrange(len(mydeque)):
x = mydeque.popleft()
if some_condition(x):
mydeque.append(x)



> Given a requirement to mutate the original list,
> this necessitates the assignment to lst[:].

Sometimes people introduce arbitrary requirements and twist themselves in knots
when real applications tend to allow a wider range of alternative solutions.
There is a funky code smell emanating from any code resembling
a[:]=some_iterator.  It would make me examine how 'a' is being used and check
whether the surrounding code can use the iterator or an new object.



> Comments, clues, ... please.

As a point of interest, note that Py2.4 improved some of its built-in iterators
to report their length if requested.  Now you now why.

>>> d = dict(a=1, b=2, c=3, d=4)
>>> len(d.iteritems())
4



Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator expressions: performance anomaly?

2005-01-16 Thread Raymond Hettinger
[Raymond Hettinger]
> >List slice assignment is an example of a tool with a special case
optimization
> >for inputs that know their own length -- that enables the tool to
pre-allocate
> >its result rather than growing and resizing in spurts.  Other such tools
include
> >tuple(), map() and zip().

[John Machin]
> My reading of the source: if the input is not a list or tuple, a
> (temporary) tuple is built from the input, using PySequence_Tuple() in
> abstract.c. If the input cannot report its own length, then that
> function resorts to "growing and resizing in spurts", using the
> following code:
>
> if (j >= n) {
> if (n < 500)
> n += 10;
> else
> n += 100;
> if (_PyTuple_Resize(&result, n) != 0) {
>
> Perhaps it could be changed to use a proportional increase, like
> list_resize() in listobject.c, which advertises (amortised) linear
> time.

Check out the current source.  The time machine beat you to it.

Keep the faith,


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator expressions: performance anomaly?

2005-01-17 Thread Raymond Hettinger
[Delaney, Timothy C]
> Nick's other suggestion - that genexps propagate __len__ - might
> still be interesting. Of course, it would only be applicable for
> unconditional genexps(i.e. no if clause).

Length transparency for iterators is not as general as one would expect.  I once
spent a good deal of effort exploring where it made sense, and I was surprised
to find that it only rarely works out.  Length transparency is an unexpectedly
thorny subject with many dead-ends which precludes a fully general solution such
as that proposed by Nick.

For a recap of my research, see the docstring for Lib/test/test_iterlen.py .


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator expressions: performance anomaly?

2005-01-19 Thread Raymond Hettinger
[Steve Holden]
> Since it doesn't yet optimize 2+5 to a constant-folded 7 you should
> realize that you are suggesting a large increase in the compiler's
> analytical powers.

FWIW, limited constant folding is already in CVS for Py2.5.



Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: is this sort method the same as the one in python 2.4

2005-01-29 Thread Raymond Hettinger
"Lowell Kirsh"
> I'm trying to emulate the sorted() method introduced in python 2.4. The
> only difference is that it takes a sequence as one of its arguments
> rather than being a method of the sequence class. Does my method do the
> same as the sorted()?

Almost.  This is closer to the mark:

def sorted(iterable, cmp=None, key=None, reverse=False):
"return a sorted copy of its input"
if sys.version_info >= (2,4):
return sorted(iterable, cmp, key, reverse)
seq = list(iterable)
if reverse:
seq.reverse()# preserve stability
if key is not None:
seq = [(key(elem), i, elem) for i, elem in enumerate(seq)]
seq.sort(cmp)
if key is not None:
seq = [elem for (key, i, elem) in seq]
if reverse:
seq.reverse()
return seq


Try it against the tests in Lib/test/test_builtin.py.

The differences from your version:
* >= 2.4 rather than just > 2.4
* renamed the parameter to iterable
* handle the case where both cmp and key are defined
* add an enumerated tie breaker to prevent full key comparisons
* preserve by using reverse twice

The real sorted() does the same thing but is implemented a bit differently.  A
custom key wrapper is applied to each object so that only the key value gets
compared (no need for a full tuple with a tie breaker value).

Raymond Hettinger



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: is this sort method the same as the one in python 2.4

2005-01-30 Thread Raymond Hettinger
"Lowell Kirsh"
> How come you reverse the list twice? And why does this preserve stability?

It's easy to see if you trace through the steps:

Given sample the following dataset and a desire to sort on the first field:
>>> data = [('a', 1), ('a', 2), ('b', 3)]

Here are the step:
>>> data.reverse()
>>> data
[('b', 3), ('a', 2), ('a', 1)]
>>> data.sort(key=lambda record: record[0])
>>> data
[('a', 2), ('a', 1), ('b', 3)]
>>> data.reverse()
>>> data
[('b', 3), ('a', 1), ('a', 2)]

Note, in the final result, the two equal records (the ones with 'a') appear in
the same order as the original dataset (that is what stability means).

Now, try it without the initial reversal and note that stability is not
preserved:
>>> data = [('a', 1), ('a', 2), ('b', 3)]
>>> data.sort(key=lambda record: record[0])
>>> data.reverse()
>>> data
[('b', 3), ('a', 2), ('a', 1)]

Here's another way of accomplishing the original sort and preserving stability:
>>> data = [('a', 1), ('a', 2), ('b', 3)]
>>> sorted(data, cmp = lambda x,y:  cmp(y[0], x[0]))
[('b', 3), ('a', 1), ('a', 2)]



Raymond Hettinger




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: is this sort method the same as the one in python 2.4

2005-01-30 Thread Raymond Hettinger
"Pedro Werneck"
>
> What about this ?
>
>
> #
> if sys.version_info >= (2,4):
> def sorted(iterable, *args, **kwds):
> seq = list(iterable)
> seq.sort(*args, **kwds)
> return seq
> #
>
> It worked against the TestSorted in lib/test/test_builtins.py

The key= and reverse= parameters were introduced to list.sort() in Py2.4.
Consequently, the above code won't provide the desired functionality in Py2.3
and prior.


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: set, dict and other structures

2005-01-31 Thread Raymond Hettinger
<[EMAIL PROTECTED]>
> I'm frequently using Py2.4 sets, I find them quite useful, and I like
> them, even if they seem a little slower than dicts.

Glad to hear that you find them to be a useful abstraction.

Since sets are internally implemented as dicts, they should have almost
identical performance (net of a small difference for the time to forward the
calls).


> Sets also need the
> same memory of dicts (can they be made to use less memory, not storing
> values? Maybe this requires too much code rewriting).

Yes, an alternate implementation could be written that would take less memory
(by about 1/3) and be a bit faster.  The approach would be to copy relevant
sections of the dictionary implementation and then remove anything pertaining to
dict values.  If the need arises and I find the time, this will probably happen
someday.

In the meantime, the current implementation is rock solid and gives great
performance.  Have you encountered situations where a 1/3 memory savings and a
slight speed improvement would have made a critical difference in a real
application?



> I presume such sets are like this because they are kind of dicts. If
> this is true, then converting a dict to a set (that means converting
> just the keys; this is often useful for a successive processing with
> set operators like intersection, etc) can be a very quick operation;
> but this little code shows me that this isn't true, set-ify a dict is
> even slower than doing it on a list. Can this operation made faster (by
> people that writes Python interpreter)?

Dicts take longer than lists because dict iteration takes longer than for lists.

If set-ifying a dictionary turns out to be an essential and time-critical
operation, it is possible to add some special case code for this.  The question
is whether it is worth it.   If you find the need is pressing, file a feature
request explaining why it is essential.  Then, I'll implement it for you.  On
the other hand, if this is a theoretical nice-to-have, then why clutter up the
code (with pre-sizing and forcing all values to True).



> I've studied the nice sets python module from Py2.3, and I am...
> well... writing something similar, a (python) graph module for sparse
> graphs. I've found 3 modules like that around on the Net, but I think
> they can be improved (even if I'm not an expert of Python, and I know,
> my code is probably quite rough/naive compared to "sets" module one...)
> Graphs are complex data structures, so probably one person needs from a
> graph data structure are different from other people needs, so it's
> probably not easy to define a "standard" graph module fit for everyone.
> Still, I think it can be useful to have a standard module to manage
> them.
> Do you think it be useful to add a (well made) standard module like
> that?

Start by posting a module outside the standard library; gathering user feedback;
and, if popular, propose it for inclusion in the standard library.

My guess is that there will be two issues.  One is that no one implementation of
graphs seems to satisfy all users.  The second is that only a small fraction of
Python users need for graph support (there is probably a much greater demand for
combinatorics, numeric/numarray, or financial modules).  If the appeal is not
broad, it has little chance of making it into the standard library.


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: returning True, False or None

2005-02-04 Thread Raymond Hettinger
"Steven Bethard"
> For a given list:
> * If all values are None, the function should return None.
> * If at least one value is True, the function should return True.
> * Otherwise, the function should return False.
 . . .
> Right now, my code looks like:
>
>  if True in lst:
>  return True
>  elif False in lst:
>  return False
>  else:
>  return None
>
> This has a light code smell for me though -- can anyone see a simpler
> way of writing this?


return max(lst)


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: returning True, False or None

2005-02-04 Thread Raymond Hettinger
"Steven Bethard"
> For a given list:
> * If all values are None, the function should return None.
> * If at least one value is True, the function should return True.
> * Otherwise, the function should return False.

One more approach, just for grins:

s = set(lst)
return True in s or s == set([None]) and None


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is there no instancemethod builtin?

2005-06-20 Thread Raymond Hettinger
[George Sakkis]
> The fact that strings don't have __iter__ is an implementation
> detail.  I can't think of any reason other than historic and perhaps
> backwards compatibility for this;
> iterables should IMHO by definition be exactly
> the objects with __iter__).

There would be no benefit other than satisfying that particular world
view.  It is a feature that all sequences are automatically iterable
without having to write a custom __iter__ method.  That makes life a
bit easier for writers of sequence-like classes.



[Michele Simionato]
> I think strings do not have __iter__ on purpose, exactly to
> distinguish them from other iterables, since sometimes it is nice
> to consider them atomic, but I am not sure of this. You should
> ask the developers.

That is not correct.  Since any sequence is automatically iterable
(because of the presence of __getitem__), the inclusion of a separate
__iter__ method is purely optional.  The option to include a custom
__iter__ method has been exercised only when it has offered some
performance benefit.  IOW, the inclusion of __iter__ for a sequence is
an arbitrary implementation detail -- there is no other significance.


> Anyway, the right definition of iterable is
> (as I was told) "an object X such that iter(X) does not throw an
> exception". Objects following the __getitem__ protocol
>- such as strings -are iterables even if they do not have
>an __iter__  method.

An object is iterable if and only if it provides either __iter__ or
__getitem__.  



Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a dictionary from a list

2005-06-27 Thread Raymond Hettinger
[Roy Smith]
> I also think the published description is needlessly confusing.  Why does
> it use
>
>{'one': 2, 'two': 3}
>
> as the example mapping when
>
>{'one': 1, 'two': 2}
>
> would illustrate exactly the same point but be easier to comprehend.  The
> mapping given is the kind of thing I would expect to see in an obfuscated
> programming contest.
>
> Also, what's the point of the last example:
>
>dict([(['one', 'two'][i-2], i) for i in (2, 3)])
>
> It boils down to passing a list of tuples as an argument, which is already
> illustrated by other examples.  This is just a complicated and obtuse way
> to construct the list of tuples.  What does it add to the understanding of
> how the dict() constructor works?

If you can switch from indignation to constructive criticism, then
consider sending a note to Andrew Kuchling suggesting ways to improve
the examples.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a dictionary from a list

2005-06-27 Thread Raymond Hettinger
[Roy Smith]
> I also think the published description is needlessly confusing.  Why does
> it use
>
>{'one': 2, 'two': 3}
>
> as the example mapping when
>
>{'one': 1, 'two': 2}
>
> would illustrate exactly the same point but be easier to comprehend.  The
> mapping given is the kind of thing I would expect to see in an obfuscated
> programming contest.
>
> Also, what's the point of the last example:
>
>dict([(['one', 'two'][i-2], i) for i in (2, 3)])
>
> It boils down to passing a list of tuples as an argument, which is already
> illustrated by other examples.  This is just a complicated and obtuse way
> to construct the list of tuples.  What does it add to the understanding of
> how the dict() constructor works?

If you can switch from indignation to constructive criticism, then
consider sending a note to Andrew Kuchling suggesting ways to improve
the examples.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lisp development with macros faster than Python development?..

2005-07-06 Thread Raymond Hettinger
[EMAIL PROTECTED] wrote:
> I've been reading the beloved Paul Graham's "Hackers and Painters".
> He claims he developed a web app at light speed using Lisp and lots
> of macros.
>
> It got me curious if Lisp
> is inherently faster to develop complex apps in.


With Lisp or Forth, a master programmer has unlimited power and
expressiveness.  With Python, even a regular guy can reach for the
stars.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lisp development with macros faster than Python development?..

2005-07-06 Thread Raymond Hettinger
[EMAIL PROTECTED] wrote:
> The problem is that questions like 'What lang is fastest to develop
> in?'
> are hard to answer definitively.

FWIW, Google's answer to that question is C++, Java, and Python.  For
any given problem, any of the three are acceptable.  Each programmer or
engineering team gets to decide based on his or her language
expertise.*



Raymond


* Source: Greg Stein's keynote address at PyCon 2005.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: f*cking re module

2005-07-06 Thread Raymond Hettinger
> There's really not a single good re tutorial or documentation
>I could found!

With * being a greedy operator, your post's subject line matches,
"firetrucking" which, of course, has nothing to do with regular
expressions, or python.org's re how-to guide, or Amazon's 18 books on
the subject, or the hundreds of available on-line tutorials.

http://www.amk.ca/python/howto/regex/
http://www.amazon.com/exec/obidos/search-handle-form/102-9115182-7050550
http://www.google.com/search?q=regular+expression+tutorial


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   >