Re: copy on write

2012-02-02 Thread Hrvoje Niksic
Steven D'Aprano  writes:

> Perhaps you are thinking that Python could determine ahead of time
> whether x[1] += y involved a list or a tuple, and not perform the
> finally assignment if x was a tuple. Well, maybe, but such an approach
> (if possible!) is fraught with danger and mysterious errors even
> harder to debug than the current situation. And besides, what should
> Python do about non-built-in types? There is no way in general to
> predict whether x[1] = something will succeed except to actually try
> it.

An alternative approach is to simply not perform the final assignment if
the in-place method is available on the contained object.  No prediction
is needed to do it, because the contained object has to be examined
anyway.  No prediction is needed, just don't.  Currently,
lhs[ind] += rhs is implemented like this:

item = lhs[ind]
if hasattr(item, '__iadd__'):
lhs.__setitem__(ind, item.__iadd__(rhs))
else:
lhs.__setitem__(ind, item + rhs)
# (Note item assignment in both "if" branches.)

It could, however, be implemented like this:

item = lhs[ind]
if hasattr(item, '__iadd__'):
item += rhs  # no assignment, item supports in-place change
else:
lhs.__setitem__(ind, lhs[ind] + rhs)

This would raise the exact same exception in the tuple case, but without
executing the in-place assignment.  On the other hand, some_list[ind] += 1
would continue working exactly the same as it does now.

In the same vein, in-place methods should not have a return value
(i.e. they should return None), as per Python convention that functions
called for side effect don't return values.

The alternative behavior is unfortunately not backward-compatible (it
ignores the return value of augmented methods), so I'm not seriously
proposing it, but I believe it would have been a better implementation
of augmented assignments than the current one.  The present interface
doesn't just bite those who try to use augmented assignment on tuples
holding mutable objects, but also those who do the same with read-only
properties, which is even more reasonable.  For example, obj.list_attr
being a list, one would expect that obj.list_attr += [1, 2, 3] does the
same thing as obj.list_attr.extend([1, 2, 3]).  And it almost does,
except it also follows up with an assignment after the list has already
been changed, and the assignment to a read-only property raises an
exception.  Refusing to modify the list would have been fine, modifying
it without raising an exception (as described above) would have been
better, but modifying it and *then* raising an exception is a surprise
that takes some getting used to.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: round down to nearest number

2012-02-11 Thread Hrvoje Niksic
Terry Reedy  writes:

> On 2/9/2012 8:23 PM, noydb wrote:
>> So how would you round UP always?  Say the number is 3219, so you want
 (//100+1)*100
> 3400

Note that that doesn't work for numbers that are already round:

>>> (3300//100+1)*100
3400# 3300 would be correct

I'd go with Chris Rebert's (x + 99) // 100.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Porting the 2-3 heap data-structure library from C to Python

2012-03-07 Thread Hrvoje Niksic
Alec Taylor  writes:

> The source-code used has been made available:
> http://www.cosc.canterbury.ac.nz/research/RG/alg/ttheap.h
> http://www.cosc.canterbury.ac.nz/research/RG/alg/ttheap.c
>
> I plan on wrapping it in a class.

You should get acquainted with the Python/C API, which is the standard
way of extending Python with high-performance (and/or system-specific) C
code.  See "Extending and Embedding" and "Python/C API" sections at
http://docs.python.org/.

There is also a mailing list for help with the C API, see
http://mail.python.org/mailman/listinfo/capi-sig for details.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Porting the 2-3 heap data-structure library from C to Python

2012-03-10 Thread Hrvoje Niksic
Stefan Behnel  writes:

>> which is the standard way of extending Python with high-performance
>> (and/or system-specific) C code.
>
> Well, it's *one* way.  Certainly not the easiest way, neither the most
> portable and you'll have a hard time making it the fastest.

I didn't say it was easy, but standard, in the sense of documented in
Python documentation.  Python/C is as portable as Python itself, and as
fast as the platform allows.  I understand your desire to promote
Cython, but please stop resorting to FUD in doing so.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and Lisp : car and cdr

2011-06-19 Thread Hrvoje Niksic
Ethan Furman  writes:

>> def car(L):
>> return L[0]
>> def cdr(L):
>> return L[1]
>
> IANAL (I am not a Lisper), but shouldn't that be 'return L[1:]' ?

Not for the linked list implementation he presented.

>> def length(L):
>> if not L: return 0
>> return 1 + length(cdr(L))
>
> How is this different from regular ol' 'len' ?

len would just return 2 for every linked list, and would raise an
exception for empty list (represented by None in Lie's implementation).

A more Pythonic implementation would represent the linked list as a
first-class objects with car and cdr being attributes, allowing for
fairly natural expression of __len__, __iter__, etc.  For example:

class List(object):
__slots__ = 'car', 'cdr'

def __init__(self, it=()):
it = iter(it)
try:
self.car = it.next()
except StopIteration:
pass
else:
self.cdr = List(it)

def __len__(self):
if not hasattr(self, 'cdr'):
return 0
return 1 + len(self.cdr)

def __iter__(self):
head = self
while hasattr(head, 'cdr'):
yield head.car
head = head.cdr

def __repr__(self):
return "%s(%r)" % (type(self).__name__, list(self))

>>> l = List([1, 2, 3])
>>> l
List([1, 2, 3])
>>> l.car
1
>>> l.cdr
List([2, 3])
>>> l.cdr.cdr.car
3
>>> l.cdr.cdr.cdr
List([])
>>> tuple(l)
(1, 2, 3)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How does CO_FUTURE_DIVISION compiler flag get propagated?

2011-07-02 Thread Hrvoje Niksic
Terry  writes:

> Future division ("from __future__ import division") works within
> scripts executed by import or execfile(). However, it does not work
> when entered interactively in the interpreter like this:
>
 from __future__ import division
 a=2/3

Are you referring to the interactive interpreter normally invoked by
just running "python"?  That seems to work for me:

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 2/3
0
>>> from __future__ import division
>>> 2/3
0.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Possible File iteration bug

2011-07-14 Thread Hrvoje Niksic
Billy Mays  writes:

> Is there any way to just create a new generator that clears its
> closed` status?

You can define getLines in terms of the readline file method, which does
return new data when it is available.

def getLines(f):
lines = []
while True:
line = f.readline()
if line == '':
break
lines.append(line)
return lines

or, more succinctly:

def getLines(f):
return list(iter(f.readline, ''))
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convert '165.0' to int

2011-07-22 Thread Hrvoje Niksic
Frank Millman  writes:

> int(float(x)) does the job, and I am happy with that. I was just
> asking if there were any alternatives.

int(float(s)) will corrupt integers larger than 2**53, should you ever
need them.  int(decimal.Decimal(s)) works with numbers of arbitrary
size.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list comprehension to do os.path.split_all ?

2011-07-30 Thread Hrvoje Niksic
Neil Cerutti  writes:

> On 2011-07-29, Dennis Lee Bieber  wrote:
>>  Fine... So normpath it first...
>>
> os.path.normpath(r'C:/windows').split(os.sep)
>> ['C:', 'windows']

That apparently doesn't distinguish between r'C:\windows' and
r'C:windows'.  On Windows the first is an absolute path, the second a
relative path, and both contain a drive letter.

> while tail != '':
> retval.append(tail)
> head, tail = os.path.split(head)
> else:
> if os.path.isabs(path):
> retval.append(os.path.sep)
> return list(reversed(retval))

Note that using 'else' after 'while' is superfluous if the loop doesn't
contain a 'break' statement.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use-cases for alternative iterator

2011-03-10 Thread Hrvoje Niksic
Steven D'Aprano  writes:

> I've never seen this second form in actual code. Does anyone use it,
> and if so, what use-cases do you have?

Since APIs that signal end-of-iteration by returning a sentinel have
fallen out of favor in Python (with good reason), this form is rare, but
still it's sometimes useful.  I've used it in actual code for reading a
file in fixed-size chunks, like this:

for chunk in iter(lambda: f.read(CHUNK_SIZE), ''):
...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator / iterator mystery

2011-03-13 Thread Hrvoje Niksic
Dave Abrahams  writes:

 list(chain(  *(((x,n) for n in range(3)) for x in 'abc')  ))
> [('c', 0), ('c', 1), ('c', 2), ('c', 0), ('c', 1), ('c', 2), ('c', 0), ('c', 
> 1), ('c', 2)]
>
> Huh?  Can anyone explain why the last result is different?

list(chain(*EXPR)) is constructing a tuple out of EXPR.  In your case,
EXPR evaluates to a generator expression that yields generator
expressions iterated over by chain and then by list.  It is equivalent
to the following generator:

def outer():
for x in 'abc':
def inner():
for n in range(3):
yield x, n
yield inner()

list(chain(*outer()))
... the same result as above ...

The problem is that all the different instances of the inner() generator
refer to the same "x" variable, whose value has been changed to 'c' by
the time any of them is called.  The same gotcha is often seen in code
that creates closures in a loop, such as:

>>> fns = [(lambda: x+1) for x in range(3)]
>>> map(apply, fns)
[3, 3, 3]   # most people would expect [1, 2, 3]

In your case the closure is less explicit because it's being created by
a generator expression, but the principle is exactly the same.  The
classic fix for this problem is to move the closure creation into a
function, which forces a new cell to be allocated:

def adder(x):
return lambda: x+1

>>> fns = [adder(x) for x in range(3)]
>>> map(apply, fns)
[1, 2, 3]

This is why your enum3 variant works.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What other languages use the same data model as Python?

2011-05-03 Thread Hrvoje Niksic
Steven D'Aprano  writes:

> "Python's data model is different from other languages"
>
> which is perfectly correct, if you think of C as "other languages". But 
> it's equally correct to say that Python's data model is the same as other 
> languages. As I understand it, Python and Ruby have the same data model. 
> So does Java, so long as you only consider objects[...]
> What other languages use the same, or mostly similar, data model as 
> Python?

Count in Common Lisp and Scheme.

I would say that, considering currently most popular languages and
platforms, Python's data model is in the majority.  It is only the
people coming from a C++ background that tend to be confused by it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __dict__ attribute for built-in types

2011-10-27 Thread Hrvoje Niksic
candide  writes:

> But beside this, how to recognise classes whose object doesn't have a
> __dict__ attribute ?

str, list and others aren't classes, they are types.  While all
(new-style) classes are types, not all types are classes.  It's
instances of classes (types created by executing the "class" statement
or its equivalent) that automatically get a __dict__, unless __slots__
was used at class definition time to suppress it.  Built-in and
extension types can choose whether to implement __dict__.

(Mechanics of defining built-in and extension types are of course
implementation-specific.  CPython allows adding __dict__ to any
extension type by setting the tp_dictoffset member of the type
definition struct to the appropriate offset into the instance struct.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __dict__ attribute for built-in types

2011-10-28 Thread Hrvoje Niksic
candide  writes:

> Le 28/10/2011 00:57, Hrvoje Niksic a écrit :
>
>> was used at class definition time to suppress it.  Built-in and
>> extension types can choose whether to implement __dict__.
>>
>
> Is it possible in the CPython implementation to write something like this :
>
> "foo".bar = 42
>
> without raising an attribute error ?

No, and for good reason.  Strings are immutable, so that you needn't
care which particular instance of "foo" you're looking at, they're all
equivalent.  The interpreter uses that fact to cache instances of short
strings such as Python identifiers, so that most places that look at a
string like "foo" are in fact dealing with the same instance.  If one
could change an attribute of a particular instance of "foo", it would no
longer be allowed for the interpreter to transparently cache them.  The
same goes for integers and other immutable built-in objects.

If you really need to attach state to strings, subclass them as Steven
explained.  All code that accepts strings (including all built-ins) will
work just fine, transparent caching will not happen, and attributes are
writable.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary sorting

2011-11-04 Thread Hrvoje Niksic
Ben Finney  writes:

> Tim Chase  writes:
>
>> On 11/03/11 16:36, Terry Reedy wrote:
>> > CPython iterates (and prints) dict items in their arbitrary internal
>> > hash table order, which depends on the number and entry order of the
>> > items. It is a bug to depend on that arbitrary order in any way.
>>
>> Does this "never trust it" hold even for two consecutive iterations
>> over an unchanged dict? I didn't see anything in the docs[1] to make
>> such a claim,
>
> Exactly.

This is false.  The docs say:

If items(), keys(), values(), iteritems(), iterkeys(), and
itervalues() are called with no intervening modifications to the
dictionary, the lists will directly correspond. This allows the
creation of (value, key) pairs using zip(): pairs = zip(d.values(),
d.keys()).

(http://docs.python.org/library/stdtypes.html#mapping-types-dict)

> The order of retrieval is entirely up to the implementation.

This part is still true, but the order won't change behind your back if
you're not touching the dict.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Server Questions (2 of them)

2011-11-20 Thread Hrvoje Niksic
Andrew  writes:

> How to do you create a server that accepts a set of user code?
[...]

Look up the "exec" statement, the server can use it to execute any code
received from the client as a string.

Note "any code", though; exec runs in no sandbox and if a malicious
client defines addition(1, 2) to execute os.system('sudo rm -rf /'), the
server will happily do just that.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unpack('>f', b'\x00\x01\x00\x00')

2011-12-01 Thread Hrvoje Niksic
Chris Rebert  writes:

> C does not have a built-in fixed-point datatype, so the `struct`
> module doesn't handle fixed-point numbers directly.

The built-in decimal module supports fixed-point arithmetic, but the
struct module doesn't know about it.  A bug report (or patch) by someone
who works with binary representations of fixed-point would be a good
start to improve it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: order independent hash?

2011-12-02 Thread Hrvoje Niksic
Chris Angelico  writes:

>> The hash can grow with (k,v) pairs accumulated in the run time.
>> An auto memory management mechanism is required for a hash of a non-fixed 
>> size of (k,v) pairs.
>
> That's a hash table

In many contexts "hash table" is shortened to "hash" when there is no
ambiguity.  This is especially popular among Perl programmers where the
equivalent of dict is called a hash.

> Although strictly speaking, isn't that "Python dicts are implemented
> as hash tables in CPython"? Or is the hashtable implementation
> mandated?

It's pretty much mandated because of the __hash__ protocol.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: order independent hash?

2011-12-04 Thread Hrvoje Niksic
Terry Reedy  writes:

>> [Hashing is] pretty much mandated because of the __hash__ protocol.
>
> Lib Ref 4.8. Mapping Types — dict
> "A mapping object maps hashable values to arbitrary objects."
>
> This does not say that the mapping has to *use* the hash value ;-).
> Even if it does, it could use a tree structure instead of a hash
> table.

An arbitrary mapping doesn't, but reference to the hash protocol was in
the context of implementation constraints for dicts themselves (my
response quotes the relevant part of Chris's message).  If a Python
implementation tried to implement dict as a tree, instances of classes
that define only __eq__ and __hash__ would not be correctly inserted in
such a dict.  This would be a major source of incompatibility with
Python code, both in the standard library and at large.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: order independent hash?

2011-12-07 Thread Hrvoje Niksic
Chris Angelico  writes:

> 2011/12/5 Hrvoje Niksic :
>> If a Python implementation tried to implement dict as a tree,
>> instances of classes that define only __eq__ and __hash__ would not
>> be correctly inserted in such a dict.
>
> Couldn't you just make a tree of hash values? Okay, that's probably
> not the most useful way to do things, but technically it'd comply with
> the spec.

That's a neat idea.  The leaves of the tree would contain a list of
items with the same hash, but that's what you effectively get with a
linear-probe hash table anyway.

As you said, not immediately useful, but one could imagine the technique
being of practical use when implementing Python or a Python-compatible
language in a foreign environment that supports only tree-based
collections.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: order independent hash?

2011-12-08 Thread Hrvoje Niksic
Tim Chase  writes:

> From an interface perspective, I suppose it would work.  However one
> of the main computer-science reasons for addressing by a hash is to
> get O(1) access to items (modulo pessimal hash structures/algorithms
> which can approach O(N) if everything hashes to the same
> value/bucket), rather than the O(logN) time you'd get from a tree. So
> folks reaching for a hash/map might be surprised if performance
> degraded with the size of the contents.

In a language like Python, the difference between O(1) and O(log n) is
not the primary reason why programmers use dict; they use it because
it's built-in, efficient compared to alternatives, and convenient to
use.  If Python dict had been originally implemented as a tree, I'm sure
it would be just as popular.

Omitting the factor of O(log n) as functionally equivalent to O(1) is
applicable to many situations and is sometimes called "soft-O" notation.
One example from practice is the pre-2011 C++, where the standardization
committee failed to standardize hash tables on time for the 1998
standard.  Although this was widely recognized as an oversight, a large
number of programs simply used tree-based std::maps and never noticed a
practical difference between between average-constant-time and
logarithmic complexity lookups.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: order independent hash?

2011-12-09 Thread Hrvoje Niksic
Steven D'Aprano  writes:

> Except for people who needed dicts with tens of millions of items.

Huge tree-based dicts would be somewhat slower than today's hash-based
dicts, but they would be far from unusable.  Trees are often used to
organize large datasets for quick access.

The case of dicts which require frequent access, such as those used to
implement namespaces, is different, and more interesting.  Those dicts
are typically quite small, and for them the difference between O(log n)
and O(1) is negligible in both theory (since n is "small", i.e. bounded)
and practice.  In fact, depending on the details of the implementation,
the lookup in a small tree could even be marginally faster.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unzip function?

2012-01-18 Thread Hrvoje Niksic
Neal Becker  writes:

> python has builtin zip, but not unzip
>
> A bit of googling found my answer for my decorate/sort/undecorate problem:
>
> a, b = zip (*sorted ((c,d) for c,d in zip (x,y)))
>
> That zip (*sorted...
>
> does the unzipping.
>
> But it's less than intuitively obvious.
>
> I'm thinking unzip should be a builtin function, to match zip.

"zip" and "unzip" are one and the same since zip is inverse to itself:

>>> [(1, 2, 3), (4, 5, 6)]
[(1, 2, 3), (4, 5, 6)]
>>> zip(*_)
[(1, 4), (2, 5), (3, 6)]
>>> zip(*_)
[(1, 2, 3), (4, 5, 6)]
>>> zip(*_)
[(1, 4), (2, 5), (3, 6)]

What you seem to call unzip is simply zip with a different signature,
taking a single argument:

>>> def unzip(x):
...   return zip(*x)
...
>>> [(1, 2, 3), (4, 5, 6)]
[(1, 2, 3), (4, 5, 6)]
>>> unzip(_)
[(1, 4), (2, 5), (3, 6)]
>>> unzip(_)
[(1, 2, 3), (4, 5, 6)]
>>> unzip(_)
[(1, 4), (2, 5), (3, 6)]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: while True or while 1

2012-01-23 Thread Hrvoje Niksic
Dave Angel  writes:

> I do something similar when there's a portion of code that should
> never be reached:
>
> assert("reason why I cannot get here")

Shouldn't that be assert False, "reason why I cannot get here"?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's "only one way to do it" philosophy isn't good?

2007-06-29 Thread Hrvoje Niksic
Douglas Alan <[EMAIL PROTECTED]> writes:

> I think you overstate your case.  Lispers understand iteration
> interfaces perfectly well, but tend to prefer mapping fuctions to
> iteration because mapping functions are both easier to code (they
> are basically equivalent to coding generators) and efficient (like
> non-generator-implemented iterators).  The downside is that they are
> not quite as flexible as iterators (which can be hard to code) and
> generators, which are slow.

Why do you think generators are any slower than hand-coded iterators?
Consider a trivial sequence iterator:

$ python -m timeit -s 'l=[1] * 100
class foo(object):
  def __init__(self, l):
self.l = l
self.i = 0
  def __iter__(self):
return self
  def next(self):
self.i += 1
try:
  return self.l[self.i - 1]
except IndexError:
  raise StopIteration
' 'tuple(foo(l))'
1 loops, best of 3: 173 usec per loop

The equivalent generator is not only easier to write, but also
considerably faster:

$ python -m timeit -s 'l=[1] * 100
def foo(l):
  i = 0
  while 1:
try:
  yield l[i]
except IndexError:
  break
i += 1
' 'tuple(foo(l))'
1 loops, best of 3: 46 usec per loop
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Looking for an interpreter that does not request internet access

2007-06-29 Thread Hrvoje Niksic
James Alan Farrell <[EMAIL PROTECTED]> writes:

> Hello,
> I recently installed new anti-virus software and was surprised the
> next time I brought up IDLE, that it was accessing the internet.
>
> I dislike software accessing the internet without telling me about it,
> especially because of my slow dial up connection (there is no option
> where I live), but also because I feel it unsafe.

When I start up IDLE, I get this message:


Personal firewall software may warn about the connection IDLE
makes to its subprocess using this computer's internal loopback
interface.  This connection is not visible on any external
interface and no data is sent to or received from the Internet.


It would seem to explain the alarm you're seeing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's "only one way to do it" philosophy isn't good?

2007-06-29 Thread Hrvoje Niksic
Douglas Alan <[EMAIL PROTECTED]> writes:

>>>  The downside is that they are not quite as flexible as iterators
>>> (which can be hard to code) and generators, which are slow.
>
>> Why do you think generators are any slower than hand-coded iterators?
>
> Generators aren't slower than hand-coded iterators in *Python*, but
> that's because Python is a slow language.

But then it should be slow for both generators and iterators.

> *Perhaps* there would be some opportunities for more optimization if
> they had used a less general mechanism.)

Or if the generators were built into the language and directly
supported by the compiler.  In some cases implementing a feature is
*not* a simple case of writing a macro, even in Lisp.  Generators may
well be one such case.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory leak issue with complex data structure

2007-07-05 Thread Hrvoje Niksic
Alan Franzoni <[EMAIL PROTECTED]> writes:

> I have a serious "leak" issue; even though I clear all those sets
> and I delete all the references I can have to the current namespace,
> memory is not freed.

Maybe the memory is freed (marked as available for further use by
Python), just not released to the operating system.[1]  To test against
that, try to allocate more Python structures and see if they reuse the
freed memory or if they allocate even more memory.  Even better, run
code like this:

while 1:
  ... populate your data structures ...
  clear()

If this causes Python to allocate more and more memory, it means you
have a real leak.  If not, it means that the GC is working fine, but
it's not possible to release the memory to the OS.


[1]
Not giving freed memory back to the system is not (necessarily) a
Python bug; the same thing happens in C and is a consequence of
managed memory being assigned to the process as a contiguous block.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is the most efficient way to test for False in a list?

2007-07-09 Thread Hrvoje Niksic
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes:

 but what is your best way to test for for False in a list?
[...]
>>> status = all(list)
>> Am I mistaken, or is this no identity test for False at all?
>
> You are mistaken.
> all take an iterable and returns if each value of it is true.

Testing for truth is not the same as an identity test for False.  OP's
message doesn't make it clear which one he's looking for.  This
illustrates the difference:

>>> False in [3, 2, 1, 0, -1]
True# no False here
>>> all([3, 2, 1, 0, -1])
False   # false value present, not necessarily False
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Per thread data

2007-07-09 Thread Hrvoje Niksic
Will McGugan <[EMAIL PROTECTED]> writes:

> Is there a canonical way of storing per-thread data in Python?

mydata = threading.local()
mydata.x = 1
...

http://docs.python.org/lib/module-threading.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is the most efficient way to test for False in a list?

2007-07-09 Thread Hrvoje Niksic
Paul McGuire <[EMAIL PROTECTED]> writes:

>> >>> False in [3, 2, 1, 0, -1]
>>
>> True# no False here>>> all([3, 2, 1, 0, -1])
>>
>> False   # false value present, not necessarily False
>
> I think if you want identity testing, you'll need to code your own;

I'm aware of that, I simply pointed out that "False in list" and
any(list) are not equivalent and where the difference lies.

 any(map(lambda _ : _ is False,[3,2,1,0,-1]))

Note that you can use itertools.imap to avoid the unnecessary
intermediate list creation.  Even better is to use a generator
expression:

>>> any(x is False for x in [3, 2, 1, 0, -1])
False
>>> any(x is False for x in [3, 2, 1, 0, -1, False])
True
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.wait() losing child?

2007-07-12 Thread Hrvoje Niksic
Nick Craig-Wood <[EMAIL PROTECTED]> writes:

>>  I think your polling way works; it seems there no other way around this 
>>  problem other than polling or extending Popen class.
>
> I think polling is probably the right way of doing it...

It requires the program to wake up every 0.1s to poll for freshly
exited subprocesses.  That doesn't consume excess CPU cycles, but it
does prevent the kernel from swapping it out when there is nothing to
do.  Sleeping in os.wait allows the operating system to know exactly
what the process is waiting for, and to move it out of the way until
those conditions are met.  (Pedants would also notice that polling
introduces on average 0.1/2 seconds delay between the subprocess dying
and the parent reaping it.)

In general, a program that waits for something should do so in a
single call to the OS.  OP's usage of os.wait was exactly correct.

Fortunately the problem can be worked around by hanging on to Popen
instances until they are reaped.  If all of them are kept referenced
when os.wait is called, they will never end up in the _active list
because the list is only populated in Popen.__del__.

> Internally subprocess uses os.waitpid(pid) just waiting for its own
> specific pids.  IMHO this is the right way of doing it other than
> os.wait() which waits for any pids.  os.wait() can reap children
> that you weren't expecting (say some library uses os.system())...

system calls waitpid immediately after the fork.  This can still be a
problem for applications that call wait in a dedicated thread, but the
program can always ignore the processes it doesn't know anything
about.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.wait() losing child?

2007-07-12 Thread Hrvoje Niksic
Jason Zheng <[EMAIL PROTECTED]> writes:

> greg wrote:
>> Jason Zheng wrote:
>>> Hate to reply to my own thread, but this is the working program
>>> that can demonstrate what I posted earlier:
>> I've figured out what's going on. The Popen class has a
>> __del__ method which does a non-blocking wait of its own.
>> So you need to keep the Popen instance for each subprocess
>> alive until your wait call has cleaned it up.
>> The following version seems to work okay.
>>
> It still doesn't work on my machine. I took a closer look at the Popen
> class, and I think the problem is that the __init__ method always
> calls a method _cleanup, which polls every existing Popen
> instance.

Actually, it's not that bad.  _cleanup only polls the instances that
are no longer referenced by user code, but still running.  If you hang
on to Popen instances, they won't be added to _active, and __init__
won't reap them (_active is only populated from Popen.__del__).

This version is a trivial modification of your code to that effect.
Does it work for you?

#!/usr/bin/python

import os
from subprocess import Popen

pids = {}
counts = [0,0,0]

for i in xrange(3):
   p = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w'))
   pids[p.pid] = p, i
   print "Starting child process %d (%d)" % (i,p.pid)

while (True):
   pid, ignored = os.wait()
   try:
  p, i = pids[pid]
   except KeyError:
  # not one of ours
  continue
   del pids[pid]
   counts[i] += 1

   #terminate if count>10
   if (counts[i]==10):
 print "Child Process %d terminated." % i
 if reduce(lambda x,y: x and (y>=10), counts):
   break
 continue

   print "Child Process %d terminated, restarting" % i
   p = Popen('sleep 1', shell=True, cwd='/home', stdout=file(os.devnull,'w'))
   pids[p.pid] = p, i
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.wait() losing child?

2007-07-12 Thread Hrvoje Niksic
Nick Craig-Wood <[EMAIL PROTECTED]> writes:

>>  This can still be a problem for applications that call wait in a
>>  dedicated thread, but the program can always ignore the processes
>>  it doesn't know anything about.
>
> Ignoring them isn't good enough because it means that the bit of
> code which was waiting for that process to die with os.getpid() will
> never get called, causing a deadlock in that bit of code.

It won't deadlock, it will get an ECHILD or equivalent error because
it's waiting for a PID that doesn't correspond to a running child
process.  I agree that this can be a problem if and when you use
libraries that can call system.  (In that case sleeping for SIGCHLD is
probably a good solution.)

> What is really required is a select() like interface to wait which
> takes more than one pid.  I don't think there is such a thing
> though, so polling is your next best option.

Except for the problems outlined in my previous message.  And the fact
that polling becomes very expensive (O(n) per check) once the number
of processes becomes large.  Unless one knows that a library can and
does call system, wait is the preferred solution.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.wait() losing child?

2007-07-13 Thread Hrvoje Niksic
Jason Zheng <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>>> greg wrote:
>> Actually, it's not that bad.  _cleanup only polls the instances that
>> are no longer referenced by user code, but still running.  If you hang
>> on to Popen instances, they won't be added to _active, and __init__
>> won't reap them (_active is only populated from Popen.__del__).
>>
>
> Perhaps that's the difference between Python 2.4 and 2.5.
[...]
> Nope it still doesn't work. I'm running python 2.4.4, tho.

That explains it, then, and also why greg's code didn't work.  You
still have the option to try to run 2.5's subprocess.py under 2.4.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to create new files?

2007-07-13 Thread Hrvoje Niksic
Robert Dailey <[EMAIL PROTECTED]> writes:

> class filestream:
>   def __init__( self, filename ):
>   self.m_file = open( filename, "rwb" )
[...]
> So far, I've found that unlike with the C++ version of fopen(), the
> Python 'open()' call does not create the file for you when opened
> using the mode 'w'.

According to your code, you're not using 'w', you're using 'rwb'.  In
that respect Python's open behaves the same as C's fopen.

> Also, you might notice that my "self.m_file.read()" function is wrong,
> according to the python docs at least. read() takes the number of
> bytes to read, however I was not able to find a C++ equivalent of
> "sizeof()" in Python. If I wanted to read in a 1 byte, 2 byte, or 4
> byte value from data into python I have no idea how I would do this.

Simply read as much data as you need.  If you need to unpack external
data into Python object and vice versa, look at the struct module
(http://docs.python.org/lib/module-struct.html).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about PyDict_SetItemString

2007-07-13 Thread Hrvoje Niksic
lgx <[EMAIL PROTECTED]> writes:

>From Google results, I find some source code write like that. But
>some code write like below:
>
> obj =  PyString_FromString("value");
> PyDict_SetItemString(pDict,"key",obj);
> Py_DECREF(obj);
>
> So, which one is correct?

The latter is correct.  While PyDict_GetItemString returns a borrowed
reference, PyDict_SetItemString doesn't steal the reference.  This
makes sense because adding to the dictionary can fail for various
reasons (insufficient memory, invalid key, hash or comparison
functions failing), and that allows you to write code like this:

obj = ;
int err = PyDict_SetItemString(dict, "key", obj);
Py_DECREF(obj);
if (err)
  return NULL;   /* or whatever is appropriate in your case */

That won't leak regardless of whether PyDict_SetItemString succeeded,
and will correctly propagate an error if it occurs.

Please note that there is a new mailing list for Python/C API
questions, see http://mail.python.org/mailman/listinfo/capi-sig .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.wait() losing child?

2007-07-13 Thread Hrvoje Niksic
Jason Zheng <[EMAIL PROTECTED]> writes:

>>> Nope it still doesn't work. I'm running python 2.4.4, tho.
>> That explains it, then, and also why greg's code didn't work.  You
>> still have the option to try to run 2.5's subprocess.py under 2.4.
> Is it more convenient to just inherit the Popen class?

You'd still need to change its behavior to not call _cleanup.  For
example, by removing "your" instances from subprocess._active before
chaining up to Popen.__init__.

> I'm concerned about portability of my code. It will be run on
> multiple machines with mixed Python 2.4 and 2.5 environments.

I don't think there is a really clean way to handle this.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Implementaion of random.shuffle

2007-07-16 Thread Hrvoje Niksic
Steve Holden <[EMAIL PROTECTED]> writes:

> So it would appear that the developers chose the Knuth algorithm
> (with a slight variation) for *their* implementation. Now you have
> to ask yourself whether your surmise is genuinely correct (in which
> case the documentation may contain a bug) or whether the
> documentation is indeed correct and you are in error.

That is a good question.  The random module uses the Mersenne twister,
which has a repetition period of 2**19937.  The number of n-sized
permutations of a list with n elements is n!, while each shuffle
requires n calls to the PRNG.  This means that to be able to generate
all permutations, the PRNG must have a period of at least n! * n.  In
the case of MT, it means that, regarding the period, you are safe for
lists with around 2079 elements.  shuffle's documentation may have
been written before the random module was converted to use the MT.

2**19937 being a really huge number, it's impossible to exhaust the
Mersenne twister by running it in sequence.  However, there is also
the question of the spread of the first shuffle.  Ideally we'd want
any shuffle, including the first one, to be able to produce any of the
n! permutations.  To achieve that, the initial state of the PRNG must
be able to support at least n! different outcomes, which means that
the PRNG must be seeded by at least log2(n!) bits of randomness from
an outside source.  For reference, Linux's /dev/random stops blocking
when 64 bits of randomness are available from the entropy pool, which
means that, in the worst case, shuffling more than 20 elements cannot
represent all permutations in the first shuffle!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Accessing Python variables in an extension module

2007-07-16 Thread Hrvoje Niksic
MD <[EMAIL PROTECTED]> writes:

> 2) Is there anyway to find the type of the object in C using something
> like a switch statement? I was looking for something like this
>switch type(object) {
>   STRING: "This is a string object";
>   break;
>   INTEGER: "This is an integer object";
>   break;
>   BOOLEAN: "This is a boolean object";
>   .
>   .
>   }

Not switch, but the closest you'll get is:

if (object->ob_type == PyString_Type) {
  ... string
}
else if (object->ob_type == PyInt_Type) {
  ... int
}
else if (object->ob_type == PyBool_Type) {
  ... bool
}

> I don't want to run all the C Py***_Check functions on the object.

Py*_Check are not expensive if the object really is of the target
type.  They are necessary to support subtyping correctly.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Implementaion of random.shuffle

2007-07-17 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes:

> In the case of CPython, the current implementation uses the Mersenne
> Twister, which has a huge period of 2**19937. However, 2081! is
> larger than that number, which means that at best a list of 2081
> items or longer can't be perfectly shuffled (not every permutation
> can be selected by the algorithm).

Note that each shuffle requires n calls to the PRNG, not just one,
which reduces the theoretically safe list size by 1.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Semantics of file.close()

2007-07-17 Thread Hrvoje Niksic
"Evan Klitzke" <[EMAIL PROTECTED]> writes:

> You should take a look at the man pages for close(2) and write(2) (not
> fclose). Generally you will only get an error in C if you try to close
> a file that isn't open. In Python you don't even have to worry about
> that -- if you close a regular file object more than once no exception
> will be thrown, _unless_ you are using os.close(), which mimics the C
> behavior. If you are out of space, in C you will get an error returned
> by the call to write (even if the data isn't actually flushed to disk
> yet by the kernel). I'm pretty sure Python mimics this behavior, so an
> exception would be called on the write, not on the close operation.

But the writes are buffered, and close causes the buffer to be
flushed.  file.close can throw an exception just like fclose, but it
will still ensure that the file is closed.

> > How do I ensure that the close() methods in my finally clause do
> > not throw an exception?

In the general case, you can't.  Preferably you'd want to make sure
that both files are closed:

try:
  f1 = file(...)
  try:
f2 = file(...)
... do something with f1 and f2 ...
  finally:
f2.close()
finally:
  f1.close()

Now file.close would be called on both files regardless of where an
exception occurs.  If you use Python 2.5, this would be a good use
case for the "nested" function from the contextlib module, which allow
you to write the above more elegantly:

from __future__ import with_statement
from contextlib import nested

with nested(file(...), file(...)) as (f1, f2):
  ... do something with f1 and f2 ...


Finally, most of this applies to files open for writing, where Python
is forced to flush the cache on close.  If the file is opened for
reading, you can assume that the exception will not be raised (and you
can safely ignore it with `try: f.close() except IOError: pass' if you
want to be sure; after all, you can't lose data when closing a file
open for reading).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Semantics of file.close()

2007-07-18 Thread Hrvoje Niksic
"Evan Klitzke" <[EMAIL PROTECTED]> writes:

>> But the writes are buffered, and close causes the buffer to be
>> flushed.  file.close can throw an exception just like fclose, but
>> it will still ensure that the file is closed.
>
> Is this buffering being done by Python or the kernel?

It is done in the user space, by the C stdio library which Python
currently uses for IO.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interpreting os.lstat()

2007-07-19 Thread Hrvoje Niksic
Adrian Petrescu <[EMAIL PROTECTED]> writes:

> I checked the online Python documentation at 
> http://python.org/doc/1.5.2/lib/module-stat.html
> but it just says to "consult the documentation for your system.".

The page you're looking for is at
http://www.python.org/doc/current/lib/os-file-dir.html .  For lstat it
says "Like stat(), but do not follow symbolic links."  For stat it
says:

Perform a stat() system call on the given path. The return value
is an object whose attributes correspond to the members of the
stat structure, namely: st_mode (protection bits), st_ino (inode
number), st_dev (device), st_nlink (number of hard links), st_uid
(user ID of owner), st_gid (group ID of owner), st_size (size of
file, in bytes), st_atime (time of most recent access), st_mtime
(time of most recent content modification), st_ctime (platform
dependent; time of most recent metadata change on Unix, or the
time of creation on Windows)
[...]
For backward compatibility, the return value of stat() is also
accessible as a tuple of at least 10 integers giving the most
important (and portable) members of the stat structure, in the
order st_mode, st_ino, st_dev, st_nlink, st_uid, st_gid, st_size,
st_atime, st_mtime, st_ctime. More items may be added at the end
by some implementations. The standard module stat defines
functions and constants that are useful for extracting information
from a stat structure. (On Windows, some items are filled with
dummy values.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: class C: vs class C(object):

2007-07-20 Thread Hrvoje Niksic
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:

> In particular, old-style classes are noticeably faster than
> new-style classes for some things (I think it was attribute lookup
> that surprised me recently, possibly related to the property
> stuff...)

Can you post an example that we can benchmark?  I ask because the
opposite is usually claimed, that (as of Python 2.4 or 2.5) new-style
classes are measurably faster.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: subprocess (spawned by os.system) inherits open TCP/UDP/IP port

2007-07-20 Thread Hrvoje Niksic
alf <[EMAIL PROTECTED]> writes:

> still would like to find out why it is happening (now FD_CLOEXEC
> narrowed may yahooing/googling searches). While realize that file
> descriptors are shared by forked processes it is still weird why the
> port moves to the child process once parent gets killed. what it the
> parent got multiple subprocesses.

Netstat probably shows only one of the processes that hold to the
port, possibly the one with the lowest PID (the parent).

> Plus it is kind of unintuitive os.system does not protect from such
> behavoir which is for me more an equivalent of like issuing a ne
> wcommand/ starting a process from the shell.

It is considered a feature that fork/exec'ed programs inherit file
descriptors -- that's how stdin and stdout get inherited all the time.
It doesn't occur often with network connections because shells rarely
have reason to open them.
-- 
http://mail.python.org/mailman/listinfo/python-list


Multiple regex match idiom

2007-05-09 Thread Hrvoje Niksic
I often have the need to match multiple regexes against a single
string, typically a line of input, like this:

if (matchobj = re1.match(line)):
  ... re1 matched; do something with matchobj ...
elif (matchobj = re2.match(line)):
  ... re2 matched; do something with matchobj ...
elif (matchobj = re3.match(line)):


Of course, that doesn't work as written because Python's assignments
are statements rather than expressions.  The obvious rewrite results
in deeply nested if's:

matchobj = re1.match(line)
if matchobj:
  ... re1 matched; do something with matchobj ...
else:
  matchobj = re2.match(line)
  if matchobj:
... re2 matched; do something with matchobj ...
  else:
matchobj = re3.match(line)
if matchobj:
  ...

Normally I have nothing against nested ifs, but in this case the deep
nesting unnecessarily complicates the code without providing
additional value -- the logic is still exactly equivalent to the
if/elif/elif/... shown above.

There are ways to work around the problem, for example by writing a
utility predicate that passes the match object as a side effect, but
that feels somewhat non-standard.  I'd like to know if there is a
Python idiom that I'm missing.  What would be the Pythonic way to
write the above code?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading a file and resuming reading.

2007-05-25 Thread Hrvoje Niksic
"Karim Ali" <[EMAIL PROTECTED]> writes:

> -
> while not eof  <- really want the EOF and not just an empty line!
> readline by line
> end while;
> -

for line in open_file:
  ...

It will stop on EOF, not on empty line.

> But also, in case for one reason or another the program crashes, I
> want to be able to rexecute it and for it to resume reading from the
> same position as it left. If a while loop like the one above can be
> implemented I can do this simply by counting the lines!

If you open the file in binary mode, you can easily keep track of the
position in file:

bytepos = 0
with file(filename) as f:
  for line in f:
... process line ...
bytepos += len(line)

If you need to restart the operation, simply seek to the previously
known position:

# restart with old bytyepos
with file(filename) as f:
  f.seek(bytepos)
  for line in f:
... process line ...
bytepos += len(line)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzled by "is"

2007-08-09 Thread Hrvoje Niksic
Grzegorz Słodkowicz <[EMAIL PROTECTED]> writes:

>> Seriously, it's just an optimization by the implementers. There is
>> no need for more than one empty tuple, since tuples can never be
>> modified once created.
>>
>> But they decided not to create (1, ) in advance. They probably knew
>> that hardly anybody would want to create that tuple ;-) [Seriously:
>> if you started trying to predict which tuples would be used you
>> would go insane, but the empty tuple is the most likely candidate].
>>
> That's just theorisation but I'd rather expect the interpreter simply
> not to create a second tuple while there already is an identical
> one.

But then tuple creation would be slowed down by searching for whether
an "identical one" already exists.  In the general case, that is quite
unlikely, so it's not done.  (I suspect that only the requirement to
store the list of all tuples somewhere would outweigh any potential
gains of this strategy; and if the search were implemented as a hash
table lookup, even more space would be wasted.)  It's done for the
empty tuple because no search is necessary, only a size test.

> Admittedly the empty tuple is a special case but then 'Special cases
> aren't special enough to break the rules'.

Except no rule is being broken.  As others have pointed out, since
tuples are immutable, caching them is quite safe.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fatest standard way to sum bytes (and their squares)?

2007-08-12 Thread Hrvoje Niksic
Erik Max Francis <[EMAIL PROTECTED]> writes:

> So far the fastest way I've found is using the `sum` builtin and
> generators::
>
>   ordinalSum = sum(ord(x) for x in data)
>   ordinalSumSquared = sum(ord(x)**2 for x in data)

For ordinalSum, using imap is almost twice as fast:

$ python -m timeit -s 'data=[chr(x) for x in xrange(256)]' 'sum(ord(x) for x in 
data)'
1 loops, best of 3: 92.4 usec per loop
$ python -m timeit -s 'data=[chr(x) for x in xrange(256)]; from itertools 
import imap' 'sum(imap(ord, data))'
1 loops, best of 3: 55.4 usec per loop

Of course, that optimization doesn't work for the squared sum; using a
lambda only pessimizes it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fatest standard way to sum bytes (and their squares)?

2007-08-13 Thread Hrvoje Niksic
Erik Max Francis <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>
>> For ordinalSum, using imap is almost twice as fast:
>> $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]'
>> 'sum(ord(x) for x in data)'
>> 1 loops, best of 3: 92.4 usec per loop
>> $ python -m timeit -s 'data=[chr(x) for x in xrange(256)]; from itertools 
>> import imap' 'sum(imap(ord, data))'
>> 1 loops, best of 3: 55.4 usec per loop
>
> You're using data which is a list of chars (strings), rather than a
> string itself, which is what the format is in.  The imap
> optimization doesn't appear to work quite as dramatically well for
> me with strings instead of lists, but it certainly is an
> improvement.

I wouldn't expect to see any difference in strings and lists.  In this
simple test I get approximately the same ~1.7x speedup:

$ python -m timeit 'sum(ord(x) for x in "abcdefghijklmnopqrstuvwxyz")'
10 loops, best of 3: 12.7 usec per loop
$ python -m timeit -s 'from itertools import imap' 'sum(imap(ord, 
"abcdefghijklmnopqrstuvwxyz"))'
10 loops, best of 3: 7.42 usec per loop
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File Read Cache - How to purge?

2007-08-21 Thread Hrvoje Niksic
Signal <[EMAIL PROTECTED]> writes:

> 2. Is there anyway to somehow to take advantage of this "caching" by
> initializing it without reading through the entire file first?
>
> 3. If the answer to #2 is No, then is there a way to purge this
> "cache" in order to get a more accurate result in my routine?  That
> is without having to read another large file first?

On a Unix system the standard way to purge the cache is to unmount the
file system and remount it.  If you can't do that on Windows, you can
get the same effect by placing the test files on an external (USB)
hard drive; unplugging the drive and plugging it back again will
almost certainly force the OS to flush any associated caches.  Having
to do that is annoying, even as a last resort, but still better than
nothing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python 2.5.1 segfault, multithreading & dual core issue?

2007-08-21 Thread Hrvoje Niksic
Paul Sijben <[EMAIL PROTECTED]> writes:

> I am running a multi-threaded python application in a dual core
> intel running Ubuntu.
[...]

Judging from the stack trace, this patch has a good chance of fixing
your problem:

http://mail.python.org/pipermail/python-dev/2007-August/074232.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to optimise this code?

2007-08-21 Thread Hrvoje Niksic
Christof Winter <[EMAIL PROTECTED]> writes:

> To get rid of the if statements, replace __init__ function with:
>
>  def __init__(self, tc):
>  functionToCall = eval("self.testCase%s" % tc)

Or functionToCall = getattr(self, "testCase" + tc)

eval can introduce unwanted side effects.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File Read Cache - How to purge?

2007-08-21 Thread Hrvoje Niksic
Nick Craig-Wood <[EMAIL PROTECTED]> writes:

> If you are running linux > 2.6.18 then you can use
> /proc/sys/vm/drop_caches for exactly that purpose.
>
>   http://www.linuxinsight.com/proc_sys_vm_drop_caches.html

That URL claims that you need to run "sync" before dropping the cache,
and so do other resources.  I wonder if that means that dropping the
cache is unsafe on a running system.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File Read Cache - How to purge?

2007-08-22 Thread Hrvoje Niksic
Steve Holden <[EMAIL PROTECTED]> writes:

>> That URL claims that you need to run "sync" before dropping the
>> cache, and so do other resources.  I wonder if that means that
>> dropping the cache is unsafe on a running system.
>
> Good grief. Just let the operating system do its job, for Pete's
> sake, and go find something else to obsess about.

Purging the page cache for the purposes of benchmarking (such as
measuring cold start time of large applications) is an FAQ, not an
"obsession".  No one is arguing that the OS shouldn't do its job in
the general case.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File Read Cache - How to purge?

2007-08-22 Thread Hrvoje Niksic
Nick Craig-Wood <[EMAIL PROTECTED]> writes:

>> >   http://www.linuxinsight.com/proc_sys_vm_drop_caches.html
>> 
>>  That URL claims that you need to run "sync" before dropping the cache,
>>  and so do other resources.  I wonder if that means that dropping the
>>  cache is unsafe on a running system.
>
> It isn't unsafe, the OS just can't drop pages which haven't been
> synced to disk so you won't get all the pages dropped unless you
> sync first.

Thanks for the clarification.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does shuffle() produce uniform result ?

2007-08-24 Thread Hrvoje Niksic
tooru honda <[EMAIL PROTECTED]> writes:

> I have read the source code of the built-in random module,
> random.py.  After also reading Wiki article on Knuth Shuffle
> algorithm, I wonder if the shuffle method implemented in random.py
> produces results with modulo bias.

It doesn't have modulo bias because it doesn't use modulo to produce a
random index; it multiplies the floating point value with the desired
range.  I'm not sure if that method produces any measurable bias.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Registering a python function in C

2007-08-31 Thread Hrvoje Niksic
fernando <[EMAIL PROTECTED]> writes:

> Could someone post an example on how to register a python function as
> a callback in a C function?

If I understand correctly, your C function receives a Python function
(as a function object of type PyObject *), which you need to call from
C.  To do that, call PyObject_CallFunction(obj, format, args...) where
format and args are documented in
http://docs.python.org/api/arg-parsing.html.

Does that help?

Also note that there is a dedicated mailing list for the Python/C
API; see http://mail.python.org/mailman/listinfo/capi-sig .
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fcntl problems

2007-08-31 Thread Hrvoje Niksic
"mhearne808[insert-at-sign-here]gmail[insert-dot-here]com" <[EMAIL PROTECTED]> 
writes:

> I think I'm still confused.

What Miles tried to tell you is that you should call fcnt.flock from
both PA and PB.  In the example you posted, you failed to call it from
PB.  No lock call, so no locking happened.

> I have a script that will be run from a cron job once a minute.  One
> of the things this script will do is open a file to stash some
> temporary results.  I expect that this script will always finish its
> work in less than 15 seconds, but I didn't want to depend on that.
> 
> Thus I started to look into file locking, which I had hoped I could
> use in the following fashion:
>
> Process A opens file foo
> Process A locks file foo
> Process A takes more than a minute to do its work
> Process B wakes up
> Process B determines that file foo is locked
> Process B quits in disgust
> Process A finishes its work

File locking supports that scenario, as you suspected.  You need to
use flock with LOCK_EX|LOCK_NB.  If the call succeeds, you got the
lock.  If you get an exception whose errno is EWOULDBLOCK, you quit in
disgust.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Printing lists in columns

2007-09-04 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

>> for row in izip_longest(*d, fillvalue='*'):
>>  print ', '.join(row)
>>
>> HTH
>
> I thought that but when I tried it I recieved a
> "Syntax Error: Invalid Syntax"
> with a ^ pointing to fillvalue :S

Python isn't too happy about adding individual keyword arguments after
an explicit argument tuple.  Try this instead:

for row in izip_longest(*d, **dict(fillvalue='*')):
 print ', '.join(row)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: creating really big lists

2007-09-05 Thread Hrvoje Niksic
Dr Mephesto <[EMAIL PROTECTED]> writes:

> I would like to create a pretty big list of lists; a list 3,000,000
> long, each entry containing 5 empty lists. My application will
> append data each of the 5 sublists, so they will be of varying
> lengths (so no arrays!).
>
> Does anyone know the most efficient way to do this? I have tried:
>
> list = [[[],[],[],[],[]] for _ in xrange(300)]

You might want to use a tuple as the container for the lower-level
lists -- it's more compact and costs less allocation-wise.

But the real problem is not list allocation vs tuple allocation, nor
is it looping in Python; surprisingly, it's the GC.  Notice this:

$ python
Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> t0=time.time(); l=[([],[],[],[],[]) for _ in xrange(300)];
>>> t1=time.time()
>>> t1-t0
143.89971613883972

Now, with the GC disabled:
$ python
Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> gc.disable()
>>> import time
>>> t0=time.time(); l=[([],[],[],[],[]) for _ in xrange(300)];
>>> t1=time.time()
>>> t1-t0
2.9048631191253662

The speed difference is staggering, almost 50-fold.  I suspect GC
degrades the (amortized) linear-time list building into quadratic
time.  Since you allocate all the small lists, the GC gets invoked
every 700 or so allocations, and has to visit more and more objects in
each pass.  I'm not sure if this can be fixed (shouldn't the
generational GC only have to visit the freshly created objects rather
than all of them?), but it has been noticed on this group before.

If you're building large data structures and don't need to reclaim
cyclical references, I suggest turning GC off, at least during
construction.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: creating really big lists

2007-09-06 Thread Hrvoje Niksic
Dr Mephesto <[EMAIL PROTECTED]> writes:

> I need some real speed!

Is the speed with the GC turned off sufficient for your usage?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Autogenerate functions (array of lambdas)

2007-09-06 Thread Hrvoje Niksic
Chris Johnson <[EMAIL PROTECTED]> writes:

> What I want to do is build an array of lambda functions, like so:
>
> a = [lambda: i for i in range(10)]

Use a factory function for creating the lambdas.  The explicit
function call will force a new variable binding to be created each
time, and the lambda will refer to that binding rather than to the
loop variable binding, which is reused for all loop iterations.  For
example:

def makefn(i):
return lambda: i

>>> a = [makefn(i) for i in xrange(10)]
>>> [f() for f in a]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The alternative is to explicitly import the value into the lambda's
parameter list, as explained by others.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generating a unique identifier

2007-09-08 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes:

> Should garbage-collecting 16 million strings really take 20+
> minutes?

It shouldn't.  For testing purposes I've created a set of 16 milion
strings like this:

s = set()
for n in xrange(1600):
  s.add('somerandomprefix' + str(n))  # prefix makes the strings a bit larger

It takes maybe about 20 seconds to create the set.  Quitting Python
takes 4-5 seconds.  This is stock Python 2.5.1.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: MemoryError on reading mbox file

2007-09-12 Thread Hrvoje Niksic
Christoph Krammer <[EMAIL PROTECTED]> writes:

> I have to convert a huge mbox file (~1.5G) to MySQL.

Have you tried commenting out the MySQL portion of the code?  Does the
code then manage to finish processing the mailbox?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dynamically removing methods in new-style classes

2007-09-12 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

> I am trying unsuccessfully to remove some methods from an instance,

You can't remove the method from an instance because the method is
stored in its class.

> With the older python classes I could have done:
> self.__class__.__dict__[''test1"] to achieve the desired result.

self.__class__.test1 still works, doesn't it?  Removing methods can be
achieved the same way:

>>> x=X()
>>> class X(object):
...  def blah(self): pass
...
>>> x=X()
>>> x.blah
>
>>> del type(x).blah
>>> x.blah
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'X' object has no attribute 'blah'
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: class that keeps track of instances

2007-09-17 Thread Hrvoje Niksic
<[EMAIL PROTECTED]> writes:

> 1) New instance has to have a property called 'name'
> 2) When instance is attemped to created, e.g., x=kls(name='myname'), and
> there already exists an instance with obj.name =='myname', that
> pre-existing instance is returned, instead of making new one.  
> 3) A class property 'all' for class gives me a list of all the
> instances.  So kls.all lets me iterates through all instances.
> 4) When all the hard-link to an instance is deleted, the instance should
> be deleted, just like an instance from any regular class does.

class Meta(type):
  all = property(lambda type: type.cache.values())

class kls(object):
  __metaclass__ = Meta
  cache = weakref.WeakValueDictionary()
  def __new__(cls, name):
if name in kls.cache:
return kls.cache[name]
self = object.__new__(cls)
self.name = name
kls.cache[name] = self
return self

>>> x = kls(name='foo')
>>> x
<__main__.kls object at 0xb7d5dc8c>
>>> x is kls(name='foo')
True
>>> x is kls(name='bar')
False
>>> print kls.all# only one instance, 'bar' was short-lived
[<__main__.kls object at 0xb7d5dc8c>]
>>> x = 'somethingelse'
>>> print kls.all
[]

> Assuming that I have to write it on my own, what should I do?  I
> tried to implement it using weakref.WeakValueDictionary and
> metaclass, but instance doesn't disappear when I think it should
> disappear.  I am also wondering if it is easier to keeping
> {name:id(obj)} would be a better solution.

The problem is that, given just an ID, you have no way to get a hold
of the actual object.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-18 Thread Hrvoje Niksic
Bruno Desthuilliers <[EMAIL PROTECTED]> writes:

> If a class X is in the MRO of call Y, then X is a superclass of Y. I
> agree that the documentation for super is somewhat misleading (and
> obviously wrong), but it still *give access to* (at least one of)
> the superclass(es).

I believe the confusion comes from different assumptions about what
"superclasses" refers to.  super() iterates over superclasses of the
*instance* in use, but an individual call to super does not
necessarily invoke the superclass of the *implementation* of the
method.  For example, given a random class:

class X(Y):
  def foo(self):
super(X, self).foo()

...there is in fact no guarantee that super() calls a superclass of
X.  However, it is certainly guaranteed that it will call a superclass
of type(self).

Pre-2.2 Python used a simpler scheme where the superclass was always
called, but it caused problems with diamond inheritance where some
methods would be called either twice or not at all.  (This is
explained in http://www.python.org/download/releases/2.2.3/descrintro/
in some detail.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-19 Thread Hrvoje Niksic
Ben Finney <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic <[EMAIL PROTECTED]> writes:
>
>> class X(Y):
>>   def foo(self):
>> super(X, self).foo()
>> 
>> ...there is in fact no guarantee that super() calls a superclass of
>> X.  However, it is certainly guaranteed that it will call a superclass
>> of type(self).
>
> Not even that. It could call *any class in the inheritance
> hierarchy*,

The inheritance hierarchiy is populated by the various (direct and
indirect) superclasses of type(self).

> depending on how the MRO has resolved "next class". Even one that is
> neither an ancestor nor a descendant of X.

My point exactly.  superclass of X is not the same as superclass of
type(self).  Super iterates over the latter, where you expect the
former.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-19 Thread Hrvoje Niksic
Ben Finney <[EMAIL PROTECTED]> writes:

> Evan is claiming that "the next class in the MRO _is_ a superclass",
> apparently by his definition or some other that I've not seen.

The definition of superclass is not the issue, the issue is
"superclass *of which class*"?  You expect super(A, self) to iterate
only over superclasses of A, even when self is an instance of a
subtype of A.  What really happens is that super(A, self) yields the
next method in type(self)'s MRO, which can and does cause include
classes that are not by any definition superclasses of A.  All of
those classes are, however, superclasses of the instance's type.

I think it is not possible to have super(A, self) only call
superclasses of A and at the same time having multiple inheritance
work without calling some methods in the hierarchy twice or not at
all.  Guido's paper at http://tinyurl.com/qkjgp explains the reasoning
behind super in some detail.

>> I agree that the documentation for super is somewhat misleading (and
>> obviously wrong),
>
> Well, that's the first time someone has acknowledged that in this
> thread, so I guess this is something.

For the record, I also agree with that.  The documentation should
document in some detail that super(type, obj) yields superclasses of
type(obj), not of type, and that the "type" argument is only used for
super to be able to locate the next type in the list.

>> I wouldn't use such an extreme word as 'madness', but I totally agree
>> that this should be corrected. Care to submit a doc patch ?
>
> I don't understand what practical uses 'super' is intended for

It's intended for cooperative multiple inheritance, a la CLOS's
call-next-method.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-19 Thread Hrvoje Niksic
Michele Simionato <[EMAIL PROTECTED]> writes:

> On Sep 19, 12:36 pm, Bruno Desthuilliers  [EMAIL PROTECTED]> wrote:
>
>> The next class in the MRO *is* a superclass of the *instance*. Else it
>> wouldn't be in the MRO !-)
>
> Bruno, there is no such a thing as a superclass in a multiple
> inheritance world, and it is a very bad idea to continue to use that
> terminology.

Your arguments against the superclass term seem to assume that there
is only a single superclass to a particular class.  In the example you
give in your essay, I would say that all of A, B, and T are
superclasses of C, and Python's super correctly iterates over all of
them.

Wikipedia defines superclass as a "class from which other classes are
derived", which seems perfectly valid for MI.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-19 Thread Hrvoje Niksic
Michele Simionato <[EMAIL PROTECTED]> writes:

> On Sep 19, 1:16 pm, Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
>> Your arguments against the superclass term seem to assume that there
>> is only a single superclass to a particular class.
>
> If you say "the" superclass, then you also assume it is unique.

FWIW, Bruno said "a", at least in the section you quoted.

> But the big issue is that the order of the methods depends on the
> second argument to super, the instance, so there is no useful
> concept of the superclass of the first argument of super.

No argument here.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-20 Thread Hrvoje Niksic
Ben Finney <[EMAIL PROTECTED]> writes:

>> The definition of superclass is not the issue, the issue is
>> "superclass *of which class*"?  You expect super(A, self) to iterate
>> only over superclasses of A, even when self is an instance of a
>> subtype of A.
>
> Yes. Those are the specific parameters to the function call, so that
> *is* what I expect.

The specific parameters are a type and an instance.  Those same
parameters can and do allow for an implementation that accesses
supertypes of type(self).  That is in fact more logical; otherwise one
could simply iterate over A.__bases__ and we wouldn't need an
elaborate 'super' construct.

Not iterating only over A's superclasses is the entire *point* of
super.  The only deficiency of super I see in this thread is
incomplete documentation.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __contains__() : Bug or Feature ???

2007-09-21 Thread Hrvoje Niksic
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:

> I need to overload the operator in and let him return an object
> ... It seems it is not a behavior Python expect :

Python expects it all right, but it intentionally converts the value
to a boolean.  The 'in' operator calls PySequence_Contains, which
returns a boolean value at the C level.  User-supplied __contains__ is
implemented as an adaptor in typeobject.c (slot_sq_contains).  It
takes the value returned by your __contains__ implementation and
converts it to 0 or 1.

I don't think you can overload 'in' as you want without pervasive
changes to CPython source code.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: __contains__() and overload of in : Bug or Feature ???

2007-09-21 Thread Hrvoje Niksic
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:

>> The string "yop" evaluates to the boolean value True, as it is not
>> empty.
>
> Does it means that when overloading an operator, python just
> wrap the call to the method and keep control of the returned
> values ???

In case of 'in' operator, it does.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Calling constructor but not initializer

2007-09-21 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes:

> I can construct an empty instance in the __new__ constructor, and I
> can initialize an non-empty instance in the __init__ initializer,
> but I can't think of any good way to stop __init__ from being called
> if the instance is empty. In pseudo-code, I want to do something
> like this:
>
> class Parrot(object):
> def __new__(cls, data):
> construct a new empty instance
> if data is None:
> return that empty instance
> else:
> call __init__ on the instance to populate it
> return the non-empty instance

Suggestion 1: since you "construct a new empty instance" in both
cases, simply move the entire logic to __init__.

Suggestion 2: name your initialization method something other than
__init__ and the
calling-type-object-automatically-calls-__init__-after-__new__ simply
disappears.

Can you specify the way you'd like to instantiate the class?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorting a list numbers stored as strings

2007-09-25 Thread Hrvoje Niksic
"Delaney, Timothy (Tim)" <[EMAIL PROTECTED]> writes:

> Yep - appears I must have been misremembering from another language
> (dunno which)

Tcl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorteddict PEP proposal [started off as orderedict]

2007-09-25 Thread Hrvoje Niksic
Steven Bethard <[EMAIL PROTECTED]> writes:

> With this is the implementation, I'm definitely -1. Not because it's a
> bad implementation, but because if the iteration is always doing a
> sort, then there's no reason for a separate data structure.

Agreed.  A true sorted dict would keep its keys sorted in the first
place, a la C++ std::map.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorteddict PEP proposal [started off as orderedict]

2007-09-26 Thread Hrvoje Niksic
Duncan Booth <[EMAIL PROTECTED]> writes:

> I that's the point though: you can't write one implementation that has good 
> performance for all patterns of use

An implementation of sorted dict using a balanced tree as the
underlying data structure would give decent performance in all the
mentioned use cases.  For example, red-black trees search, insert, and
delete in O(log n) time.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the address of list.append and list.append.__doc__

2007-09-26 Thread Hrvoje Niksic
HYRY <[EMAIL PROTECTED]> writes:

> This works, but I think the key of DOC is too long, so I want to use
> the id of list.append.__doc__ as the key; or use the id of
> list.append:

Using the id is not a good idea because id's are not permanent.  Using
list.append as the hash key will work and will internally use the
pointer to produce the hash key, which is probably what you want
anyway.

> So, I asked how to get list.append from a.append

>>> def unbound(meth):
...   return getattr(type(meth.__self__), meth.__name__)
...
>>> unbound(a.append)


> and why id(list.append.__doc__) changes.

Because the doc for builtins is internally kept in a read-only C
string for efficiency.  The Python string is built only when actually
used.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorteddict PEP proposal [started off as orderedict]

2007-09-26 Thread Hrvoje Niksic
Mark Summerfield <[EMAIL PROTECTED]> writes:

> On 26 Sep, 09:51, Hrvoje Niksic <[EMAIL PROTECTED]> wrote:
>> Duncan Booth <[EMAIL PROTECTED]> writes:
>> > I that's the point though: you can't write one implementation
>> > that has good performance for all patterns of use
>>
>> An implementation of sorted dict using a balanced tree as the
>> underlying data structure would give decent performance in all the
>> mentioned use cases.  For example, red-black trees search, insert, and
>> delete in O(log n) time.
>
> Basically, as implemented, I have to invalidate if there is any
> change [...]

No argument here, as long as the limitation is understood to be a
consequence of the current implementation model.  Seriously proposing
a sorteddict that is a mere syntactic sugar over dict dooms the PEP to
rejection.

Major programming language libraries have included sorted mapping and
set types for a while now, making the performance and complexity
constraints generally well understood.  We should make use of that
knowledge when designing sorteddict.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorteddict PEP proposal [started off as orderedict]

2007-09-26 Thread Hrvoje Niksic
Paul Hankin <[EMAIL PROTECTED]> writes:

>> An implementation of sorted dict using a balanced tree as the
>> underlying data structure would give decent performance in all the
>> mentioned use cases.  For example, red-black trees search, insert,
>> and delete in O(log n) time.
>
> But dicts do search, insert and delete in O(1) time, so using some
> variety of balanced tree will give you much worse performance when
> you're doing regular dict operations.

I wouldn't call it "much worse"; while O(log(n)) is worse than O(1),
it's still very fast, which is why popular programming language
libraries have an ordered mapping type based on balanced trees.  Also
note that dict performance can degrade with hash collisions, while
trees can maintain complexity guarantees on all operations.

In the end, it's a tradeoff.  Hash tables offer O(1) access, but lack
ordering.  Balanced trees offer ordering at the price of O(log n)
access.  Both have their uses, but neither is syntactic sugar for the
other.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ~ bit-wise unary operator

2007-09-26 Thread Hrvoje Niksic
Ladislav Andel <[EMAIL PROTECTED]> writes:

> Hello, why ~ bit-wise unary operator returns -(x+1) and not bit
> inversion of the given integer?

On 2s-complement architectures, -(x+1) *is* bit inversion of the given
integer.

> example:
> a = 7978
> a = ~a
> python returns -7979
>
> but I need to get back 57557 as in C language.

Python does exactly what C does in this case.

$ cat a.c
#include 
int main(void)
{
  int a = 7978;
  a = ~a;
  printf("%d\n", a);
  return 0;
}
$ gcc a.c
$ ./a.out
-7979

If you want 16-bit unsigned arithmetic, use 2**16 + ~a, which yields
57557.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyObject_CallObject: difference between functions and class methods

2007-09-27 Thread Hrvoje Niksic
[ Note that there is now a mailing list dedicated to the C API:
http://mail.python.org/mailman/listinfo/capi-sig ]

mauro <[EMAIL PROTECTED]> writes:

> I am trying to call within a C extension a Python function provided as
> an argument by the user with: PyObject_Call(). The C extension should
> work also if the user supplies a class method, but in this case I am
> getting an error. Do I need to explicitly pass 'self' as an argument
> to PyObject_Call()?

You don't.  The reference to self will be added automatically when
invoking the function you receive as object.method.

> if ((tmp_args = PyTuple_New(1)) == NULL)
>   PyErr_SetString( PyExc_ReferenceError, "attempt to access a 
> null-
> pointer" );
> PyTuple_SetItem(tmp_args, 0, paramlist);

Maybe you are mismanaging the reference count -- PyTuple_SetItem
steals the refcount of its argument.  Anyway, why not use
PyObject_CallFunction or PyObject_CallFunctionObjArgs?  For example:

PyObject *
mymodule_main(PyObject *ignored, PyObject *func)
{
  PyObject *result, *my_param;
  /* ... do something, e.g. create my_param ... */

  /* call func */
  result = PyObject_CallFunction(received_func, "O", my_param);
  Py_DECREF(my_param);  /* assuming you no longer need it */
  if (!result)
return NULL;

  /* ... do something with result ... */

  Py_DECREF(result);
  Py_INCREF(Py_None);
  return Py_None;  /* or whatever */
}
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Cross-platform time out decorator

2007-09-27 Thread Hrvoje Niksic
Joel <[EMAIL PROTECTED]> writes:

> I found the solution :
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440569
> describes a solution based on threads. I tested it and it works
> perfectly.

Note that, unlike the original alarm code, it doesn't really interrupt
the timed-out method, it just returns the control back to the caller,
using an exception to mark that a timeout occurred.  The "timed out"
code is still merrily running in the background.  I don't know if it's
a problem in your case, but it's an important drawback.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Cross-platform time out decorator

2007-09-27 Thread Hrvoje Niksic
Joel <[EMAIL PROTECTED]> writes:

>> Note that, unlike the original alarm code, it doesn't really interrupt
>> the timed-out method, it just returns the control back to the caller,
>> using an exception to mark that a timeout occurred.  The "timed out"
>> code is still merrily running in the background.  I don't know if it's
>> a problem in your case, but it's an important drawback.
>
> There should be a method to stop the thread though?

Not in Python.  Thread killing primitives differ between systems and
are unsafe in general, so they're not exposed to the interpreter.  On
Windows you can attempt to use ctypes to get to TerminateThread, but
you'll need to hack at an uncomfortably low level and be prepared to
deal with the consequences, such as memory leaks.  If the timeouts
happen rarely and the code isn't under your control (so you have no
recourse but to terminate the thread), it might be worth it though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Cross-platform time out decorator

2007-09-27 Thread Hrvoje Niksic
"Chris Mellon" <[EMAIL PROTECTED]> writes:

> You can use ctypes and the Python API to raise a Python exception in
> the thread.

How, by changing the thread's exception state?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can I overload the compare (cmp()) function for a Lists ([]) index function?

2007-09-28 Thread Hrvoje Niksic
xkenneth <[EMAIL PROTECTED]> writes:

> Looking to do something similair. I'm working with alot of timestamps
> and if they're within a couple seconds I need them to be indexed and
> removed from a list.
> Is there any possible way to index with a custom cmp() function?
>
> I assume it would be something like...
>
> list.index(something,mycmp)

The obvious option is reimplementing the functionality of index as an
explicit loop, such as:

def myindex(lst, something, mycmp):
for i, el in enumerate(lst):
if mycmp(el, something) == 0:
return i
raise ValueError("element not in list")

Looping in Python is slower than looping in C, but since you're
calling a Python function per element anyway, the loop overhead might
be negligible.

A more imaginative way is to take advantage of the fact that index
uses the '==' operator to look for the item.  You can create an object
whose == operator calls your comparison function and use that object
as the argument to list.index:

class Cmp(object):
def __init__(self, item, cmpfun):
self.item = item
self.cmpfun = cmpfun
def __eq__(self, other):
return self.cmpfun(self.item, other) == 0

# list.index(Cmp(something, mycmp))

For example:

>>> def mycmp(s1, s2):
...   return cmp(s1.tolower(), s2.tolower())
>>> ['foo', 'bar', 'baz'].index(Cmp('bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('Bar', mycmp))
1
>>> ['foo', 'bar', 'baz'].index(Cmp('nosuchelement', mycmp))
Traceback (most recent call last):
  File "", line 1, in 
ValueError: list.index(x): x not in list

The timeit module shows, somewhat surprisingly, that the first method
is ~1.5 times faster, even for larger lists.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Limits on search length

2007-10-01 Thread Hrvoje Niksic
Daryl Lee <[EMAIL PROTECTED]> writes:

> I am trying to locate all lines in a suite of files with quoted
> strings of particular lengths.  A search pattern like r'".{15}"'
> finds 15-character strings very nicely.  But I have some very long
> ones, and a pattern like r'".{272}"' fails miserably, even though I
> know I have at least one 272-character string.

It seems to work for me.  Which version of Python are you using?

Here is how I tested it.  First, I modified your program so that it
actually runs (sys and re imports were missing) and removed
unnecessary globbing and file opening:

import sys, re

searchPattern  = sys.argv[1]
cpat = re.compile(searchPattern)

lineNumber = 0
for line in sys.stdin:
lineNumber += 1
m = cpat.search(line)
if m is not None:
print "(", lineNumber, ")", line

Now, create a file with three lines, each with a string of different
length:

$ printf '"%*s"\n' 271 > fl
$ printf '"%*s"\n' 272 >> fl
$ printf '"%*s"\n' 273 >> fl

And run the script:

$ python scriptfile '".{272}"' < fl
( 2 ) "[... 272 blanks]"

That looks correct to me.

> In the short term, I can resort to locating the character positions
> of the quotes,

You can also catch all strings and only filter those of the length you
care about.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reentrancy of Python interpreter

2007-10-01 Thread Hrvoje Niksic
Brad Johnson <[EMAIL PROTECTED]> writes:

> I have a place where I execute a Python command that calls into C++
> code which then in turn calls back into Python using the same
> interpreter. I get a fatal error which is "PyThreadStage_Get: no
> current thread."

Does the C++ code call into the interpreter from a different thread?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: s.split() on multiple separators

2007-10-02 Thread Hrvoje Niksic
Antoon Pardon <[EMAIL PROTECTED]> writes:

> It may be convincing if you only consider natural numbers in
> ascending order. Suppose you have the sequence a .. b and you want
> the reverse.  If you work with included bounds the reverse is just b
> .. a. If you use the python convention, things become more
> complicated.

It's a tradeoff.  The convention used by Python (and Lisp, Java and
others) is more convenient for other things.  Length of the sequence
x[a:b] is simply b-a.  Empty sequence is denoted simply with x[a:a],
where you would need to use the weird x[a:a-1] with inclusive bounds.
Subsequences such as x[a:b] and x[b:c] merge smoothly into x[a:c],
making it natural to iterate over subsequences without visiting an
element twice.

> Another problem is if you are working with floats. Suppose you have
> a set of floats. Now you want the subset of numbers that are between
> a and b included.  If you want to follow the convention that means
> you have to find the smallest float that is bigger than b, not a
> trivial task.

The exact same argument can be used against the other convention: if
you are working with inclusive bounds, and you need to represent the
subset [a, b), you need to find the largest float that is smaller than
b.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: enumerate overflow

2007-10-03 Thread Hrvoje Niksic
Raymond Hettinger <[EMAIL PROTECTED]> writes:

> [Paul Rubin]
>> I hope in 3.0 there's a real fix, i.e. the count should promote to
>> long.
>
> In Py2.6, I will mostly likely put in an automatic promotion to long
> for both enumerate() and count().  It took a while to figure-out how
> to do this without killing the performance for normal cases (ones
> used in real programs, not examples contrived to say, "omg, see what
> *could* happen").

Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create
the Python number should work well for huge sequences without
(visibly) slowing down the normal case.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: migrating to packages

2007-10-03 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

> I will expose my case quicly.
> The MYCLASES.py file contains the A class, so i can use
> from MYCLASES import A
> a = ()
>
> Using the "package mode" (wich looks fine BTW), having the simple
> MYCLASES/
>  __init__.py
>  A.py
>
> forces my (i guess) to use the
> from MYCLASES.A import A

Exactly.  Using mypackage.mymodule instead of just mymodule is the
entire *point* of a package.  That way, if someone creates another
module with using the same name (mymodule), it won't conflict with
yours.  If you don't want to change mymodule to mypackage.mymodule,
why use a package in the first place?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: migrating to packages

2007-10-04 Thread Hrvoje Niksic
Bruno Desthuilliers <[EMAIL PROTECTED]> writes:

> it's quite common to use the __init__.py of the package (as
> explained by Ben) as a facade to the internal organization of the
> package, so you can change this internal organization without
> breaking client code.

We agree on that.  It is the OP who *wants* to access his modules
directly without ever naming the package.  That is why I think he is
missing the point of having a package in the first place.

>> That way, if someone creates another module with using the same
>> name (mymodule), it won't conflict with yours.  If you don't want
>> to change mymodule to mypackage.mymodule, why use a package in the
>> first place?
>
> Because you have too much code to keep it in a single file.

There is no "single file", the OP already has modules A and B.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: migrating to packages

2007-10-04 Thread Hrvoje Niksic
Bruno Desthuilliers <[EMAIL PROTECTED]> writes:

>> We agree on that.  It is the OP who *wants* to access his modules
>> directly without ever naming the package.
>
> To be exact, he wants to reorganize it's source code (splitting a
> file that's getting too big AFAICT)

You're right, I misread his original problem statement (as you also
correctly pointed out later in the post).  So yes, a package will do
what he wants, simply by arranging the necessary imports in
__init__.py.  Sorry about the misunderstanding.

>> That is why I think he is missing the point of having a package in
>> the first place.
>
> MHO opinion is that *you* are missing *one* of the point*s* of having
> packages.

:-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: migrating to packages

2007-10-05 Thread Hrvoje Niksic
Gerardo Herzig <[EMAIL PROTECTED]> writes:

> If the original MYCLASSES.py has 5 different classes ,say A,B,C,D,E
> , each one has to be imported (as A and B) in order to be used for
> the client code. The thing is, there are more than 5 classes, and
> looks like a lot of unnecesary work to me, since a particular
> program can use 1,2, or 3 classes at the timeThats why im
> watching the way to override the `import statement'...
>
> Damn client code!!!

You can create both a package and a compatibility module.  The package
would be broken into modules for modularity, while the compatibility
module would import what old code needs from the package, like this:

# old.py:
from new.submodule1 import A, B
from new.submodule2 import C, D
...

Now, old code can keep using "from old import A" and such, while new
code would import new.submodule1, new.submodule2, etc., as necessary.

Old code is no worse off because, although it uses the compatibility
module that just imports everything, that is in essence what the
previous module did as well.  On the other hand, new code can make use
of the modularity and reduce load times by only importing what it
really needs.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: remove list elements..

2007-10-05 Thread Hrvoje Niksic
Abandoned <[EMAIL PROTECTED]> writes:

> I do this use FOR easly but the speed very imported for me. I want
> to the fastest method please help me.

Can you post the code snippet that was too slow for you?  Are the
lists sorted?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Don't use __slots__

2007-10-08 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes:

> Well, I've read the thread, and I've read the thread it links to,
> and for the life of me I'm still no clearer as to why __slots__
> shouldn't be used except that:
[...]
> But is there actually anything *harmful* that can happen if I use
> __slots__?

Here is one harmful consequence: __slots__ breaks multiple
inheritance:

class A(object):
  __slots__ = ['a', 'b']

class B(object):
  __slots__ = ['c']

class AB(A, B):
  pass

Traceback (most recent call last):
  File "", line 1, in 
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict

Even if A and B had the exact same slots, for example ['a', 'b'], it
wouldn't make a difference.  AB explicitly setting __slots__ to
something like ['a', 'b', 'c'] doesn't help either.  But that is only
a technical answer to your technical question which misses the real
problem people like Aahz and Guido have with __slots__.  (I don't
claim to represent them, of course, the following is my
interpretation.)

The backlash against __slots__ is a consequence of it being so easy to
misunderstand what __slots__ does and why it exists.  Seeing __slots__
has led some people to recommend __slots__ to beginners as a way to
"catch spelling mistakes", or as a way to turn Python's classes into
member-declared structures, a la Java.  For people coming from Java
background, catching mistakes as early as possible is almost a dogma,
and they are prone to accept the use of __slots__ (and living with the
shortcomings) as a rule.

Python power users scoff at that because it goes against everything
that makes Python Python.  Use of __slots__ greatly reduces class
flexibility, by both disabling __dict__ and __weakref__ by default,
and by forcing a tight instance layout that cripples inheritance.
With people using __slots__ for the majority of their classes, it
becomes much harder for 3rd-party code to attach an unforeseen
attribute to an existing object.  Even with single inheritance,
__slots__ has unintuitive semantics because subclasses automatically
get __dict__ and __weakref__, thereby easily breaking the "benefits"
of their use.

__slots__ is a low-level tool that allows creation of dict-less
objects without resorting to Python/C.  As long as one understands it
as such, there is no problem with using it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Singleton

2007-10-10 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

> Now when I run the 'run.py', it will print two different numbers.
> sys.modules tells me that 'mod1' is imported as both 'one.mod1' and
> 'mod1', which explains the result.

If I were you, I'd make sure that the module duplicate problem is
resolved first, for example by putting run.py somewhere outside one/.
Then the singleton problem disappears as well.

> It is possible to solve this by always importing with the complete
> path like 'one.mod1', even when inside the 'one' directory, but
> that's an error waiting to happen.

Is it, really?  As far as I can tell, Python handles that case rather
robustly.  For example:

$ mkdir one
$ touch one/__init__.py
$ touch one/mod1.py one/mod2.py
$ echo 'import mod2' > one/mod1.py
$ python
Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import one.mod1
>>> import sys
>>> sorted(sys.modules)
['UserDict', '__builtin__', '__main__', '_codecs', '_sre', '_types', 'codecs', 
'copy_reg', 'encodings', 'encodings.aliases', 'encodings.codecs', 
'encodings.encodings', 'encodings.types', 'encodings.utf_8', 'exceptions', 
'linecache', 'one', 'one.mod1', 'one.mod2', 'os', 'os.path', 'posix', 
'posixpath', 're', 'readline', 'rlcompleter', 'signal', 'site', 'sre_compile', 
'sre_constants', 'sre_parse', 'stat', 'sys', 'types', 'warnings', 'zipimport']

Although mod1 imports mod2 simply with "import mod2", the fact that
mod1 itself is imported as part of "one" is respected.  As a result,
mod2 is imported as "one.mod2", exactly as if it were imported from
outside the "one" package.

run.py is an exception because it is started directly using "python
run.py", so it never gets the information that it's supposed to be
part of a package.  To fix the problem, all you need to do is make
sure that executable scripts such as run.py are either placed safely
outside the package, or that they take care to always use absolute
imports, such as "import one.mod1" instead of "import mod1".  Placing
them outside the package is a good example of preventing an error
waiting to happen, like the one you hinted at.
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   >