from:"Adam Olsen"

Re: [Python-Dev] Ext4 data loss

2009-03-12 Thread Adam Olsen

On Tue, Mar 10, 2009 at 2:11 PM, Christian Heimes  wrote:
> Multiple blogs and news sites are swamped with a discussion about ext4
> and KDE 4.0. Theodore Ts'o - the developer of ext4 - explains the issue
> at
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54.
>
>
> Python's file type doesn't use fsync() and be the victim of the very
> same issue, too. Should we do anything about it?

It's a kernel defect and we shouldn't touch it.

Traditionally you were hooped regardless of what you did, just with
smaller windows.  Did you want to lose your file 50% of the time or
only 10% of the time?  Heck, 1% of the time you lose the *entire*
filesystem.

Along came journaling file systems.  They guarantee the filesystem
itself stays intact, but not your file.  Still, if you hedge your bets
it's a fairly small window.  In fact if you kill performance you can
eliminate the window: write to a new file, flush all the buffers, then
use the journaling filesystem to rename; few people do that though,
due to the insane performance loss.

What we really want is a simple memory barrier.  We don't need the
file to be saved *now*, just so long as it gets saved before the
rename does.  Unfortunately the filesystem APIs don't touch on this,
as they were designed when losing the entire filesystem was
acceptable.  What we need is a heuristic to make them work in this
scenario.  Lo and behold ext3's data=ordered did just that!

Personally, I consider journaling to be a joke without that.  It has
different justifications, but not this critical one.  Yet the ext4
developers didn't see it that way, so it was sacrificed to new
performance improvements (delayed allocation).

2.6.30 has patches lined up that will fix this use case, making sure
the file is written before the rename.  We don't have to touch it.

Of course if you're planning to use the file without renaming then you
probably do need an explicit fsync and an API for that might help
after all.  That's a different problem though, and has always existed.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Adam Olsen

On Mon, Apr 6, 2009 at 2:22 PM, Mark Dickinson  wrote:
> Well, I'd say that the obvious solution here is to compute
> the constant 2.0**53 just once, somewhere outside the
> inner loop.  In any case, that value would probably be better
> written as 2.0**DBL_MANT_DIG (or something similar).
>
> As Antoine reported, the constant-folding caused quite
> a confusing bug report (issue #5593):  the problem (when
> we eventually tracked it down) was that the folded
> constant was in a .pyc file, and so wasn't updated when
> the compiler flags changed.

Another way of looking at this is that we have a ./configure option
which affects .pyc output.  Therefor, we should add a flag to the
magic number, causing it to be regenerated as needed.

Whether that's better or worse than removing constant folding I
haven't decided.  I have such low expectations of floating point that
I'm not surprised by bugs like this.  I'm more surprised that people
expect consistent, deterministic results...

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-10-07 Thread Adam Olsen

On Sun, Sep 20, 2009 at 10:17, Zooko O'Whielacronx  wrote:
> On Sun, Sep 20, 2009 at 8:27 AM, Antoine Pitrou  wrote:
>> AFAIK, C extensions should fail loading when they have the wrong UCS2/4 
>> setting.
>
> That would be an improvement!  Unfortunately we instead get mysterious
> misbehavior of the module, e.g.:
>
> http://bugs.python.org/setuptools/msg309
> http://allmydata.org/trac/tahoe/ticket/704#comment:5

The real issue here is getting confused because python's option is
misnamed.  We support UTF-16 and UTF-32, not UCS-2 and UCS-4.  This
means that when decoding UTF-8, any scalar value outside the BMP will
be split into a pair of surrogates on UTF-16 builds; if we were using
UCS-2 that'd be an error instead (and *nothing* would understand
surrogates.)

Yet we are getting an error here.  However, if you look at the details
you'll notice it's on a 6-byte UTF-8 code unit sequence, corresponding
in the second link to U+6E657770.  Although the originally UTF-8 left
open the possibility of including up to 31 bits (or U+7FFF), this
was removed in RFC 3629 and is now strictly prohibited.  The modern
unicode character set itself also imposes that restriction.  There is
nothing beyond U+10.  Nothing should create a such a high code
point, and even if it happened internally a RFC 3629-conformant UTF-8
encoder must refuse to pass it through.

Something more subtle must be going on.  Possibly several bugs (such
as a non-conformant encoder or garbage being misinterpreted as UTF-8).

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Retrieve an arbitrary element from a set without removing it

2009-10-24 Thread Adam Olsen

On Fri, Oct 23, 2009 at 11:04, Vitor Bosshard  wrote:
> I see this as being useful for frozensets as well, where you can't get
> an arbitrary element easily due to the obvious lack of .pop(). I ran
> into this recently, when I had a frozenset that I knew had 1 element
> (it was the difference between 2 other sets), but couldn't get to that
> element easily (get the pun?)

item, = set_of_one


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bizarre mtime behaviour

2009-11-01 Thread Adam Olsen

On Sun, Nov 1, 2009 at 08:23, Antoine Pitrou  wrote:
> Hello,
>
> I wondered if someone had a clue about the following behaviour.
> While debugging an erratic test_mailbox failure on RDM's buildbot (and other
> machines), it turned out that the system sometimes set the wrong mtime on a
> directory:
>
> $ date && python -c 'import os; os.link("setup.py", "t/c")' && stat t && date
>
> Sun Nov  1 09:49:04 EST 2009
>  File: `t'
>  Size: 144             Blocks: 0          IO Block: 4096   directory
> Device: 811h/2065d      Inode: 223152      Links: 2
> Access: (0755/drwxr-xr-x)  Uid: ( 1001/  pitrou)   Gid: ( 1005/  pitrou)
> Access: 2009-11-01 09:10:11.0 -0500
> Modify: 2009-11-01 09:49:03.0 -0500
> Change: 2009-11-01 09:49:03.0 -0500
> Sun Nov  1 09:49:04 EST 2009
>
> As you see above, the mtime for directory 't' is set to a full second before 
> the
> actual modification has happened.
>
> Sprinkling traces of time.time() and os.path.getmtime() on Lib/mailbox.py 
> shows
> this is exactly what trips up test_mailbox. I've got posted a patch to fix it
> (see issue #6896), but I would like to know if such OS behaviour is normal.

Looks like an OS bug to me.  Linux I'm guessing?


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] decimal.py: == and != comparisons involving NaNs

2009-11-13 Thread Adam Olsen

On Mon, Nov 9, 2009 at 06:01, Mark Dickinson  wrote:
> Well, when running in some form of 'non-stop' mode, where (quiet) NaN
> results are supposed to be propagated to the end of a computation, you
> certainly want equality comparisons with nan just to silently return false.
> E.g., in code like:
>
> if x == 0:
>    
> else:
>    
>
> nans should just end up in the second branch, without the programmer
> having had to think about it too hard.

if x != 0:

else:

nans should just end up in the first branch, without the programmer
having had to think about it too hard.

There is a more consistent alternative: have all comparisons involving
NaN also return NaN, signifying they're unordered.  Let bool coercion
raise the exception.  Thus, both examples would raise an exception,
but a programmer who wants to handle NaN could do so explicitly:

temp = x == 0
if temp.isnan() or temp:

else:

IEEE 754 is intended for a very different context.  I don't think it
makes sense to attempt literal conformance to it.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] decimal.py: == and != comparisons involving NaNs

2009-11-13 Thread Adam Olsen

On Fri, Nov 13, 2009 at 14:52, Mark Dickinson  wrote:
> On Fri, Nov 13, 2009 at 9:50 PM, Mark Dickinson  wrote:
>> And they do:  nan != 0 returns False.  Maybe I'm missing your point
>> here?
>
> Aargh!  True!  I meant to say True!

Huh.  Somewhere along the line I lost track of how python handled NaN.
 I thought "comparisons always evaluate to false" was the rule.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-26 Thread Adam Olsen

On Mon, Jan 25, 2010 at 23:57, Terry Reedy  wrote:
> On 1/25/2010 9:32 PM, Nick Coghlan wrote:
>
>> However, as Cameron pointed out, the O() value for an operation is an
>> important characteristic of containers, and having people get used to an
>> O(1) list.pop(0) in CPython could create problems not only for other
>> current Python implementations but also for future versions of CPython
>> itself.
>
> The idea that CPython should not be improved because it would spoil
> programmers strikes me as a thin, even desparate objection. One could say
> that same thing about the recent optimization of string += string so that
> repeated concats are O(n) instead of O(n*n). What a trap if people move code
> to other implementations (or older Python) without that new feature.

This is a much better optimization than the string appending
optimization, as it is both portable and robust.

I find it shocking to change a semantic I've come to see as a core
part of the language, but I can't come up with a rational reason to
oppose it.  The approach is sane and the performance impact is
(presumably) negligible.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Decimal <-> float comparisons in py3k.

2010-03-19 Thread Adam Olsen

On Thu, Mar 18, 2010 at 12:41, Mark Dickinson  wrote:
> I'm only seeing two arguments against this at the moment: (1) it has
> the potential to break code that relies on being able to sort
> heterogeneous lists.  But given that heterogeneous lists can't be
> sorted stably anyway (see my earlier post about lists containing ints,
> floats and Decimals), perhaps this is an acceptable risk. (2) A few of
> the posters here (Steven, Nick, and me) feel that it's slightly more
> natural to allow these comparisons;  but I think the argument's fairly
> evenly balanced at the moment between those who'd prefer an exception
> and those who'd prefer to allow the comparisons.

Conceptually I like the idea of them all being comparable, but are
there any real use cases involving heterogeneous lists?  All the
examples I've seen have focused on how they're broken, not on how
they'd be correct (including possible math after the comparison!) if
the language compared properly.

Without such use cases allowing comparison seems like a lot of work for nothing.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-20 Thread Adam Olsen

On Sat, Mar 20, 2010 at 09:11, Antoine Pitrou  wrote:
> Mark Dickinson  gmail.com> writes:
>>
>> On Fri, Mar 19, 2010 at 9:50 PM, Guido van Rossum  python.org>
> wrote:
>> > There is one choice which I'm not sure about. Should a mixed
>> > float/Decimal operation return a float or a Decimal?
>>
>> I'll just say that it's much easier to return a Decimal if you want to
>> be able to make guarantees about rounding behaviour, basically because
>> floats can be converted losslessly to Decimals.  I also like the fact
>> that the decimal module offers more control (rounding mode, precision,
>> flags, wider exponent range) than float.
>
> A problem, though, is that decimals are much slower than floats. If you have a
> decimal creeping in some part of a calculation it could degrade performance
> quite a bit.

For a little context, we have this numeric tower:

int -> Fraction -> float -> complex

And coincidentally (or not), we have an unspoken rule that you can go
right, but never left (int/int -> float, int/Fraction -> Fraction,
Fraction/float -> float, Fraction/complex -> complex, etc).  This
gives us a preference for fast, inexact results.

Decimal is more precise, and pays a performance cost for it.  It also
seems odd to stick it between float and complex (nobody's planning a
ComplexDecimal, right?)  That suggests it should go between Fraction
and float.  Decimal/float -> float.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-20 Thread Adam Olsen

On Sat, Mar 20, 2010 at 17:20, Greg Ewing  wrote:
> There are two ways in which that linear tower is overly
> simplistic:
>
> * It conflates the notions of exactness and width. They're
> really orthogonal concepts, and to reflect this you would
> need two parallel towers, with exact and inexact versions
> of each type.
>
> * Decimal and float really belong side-by-side in the
> tower, rather than one above the other. Neither of them is
> inherently any more precise or exact than the other.
>
> There doesn't seem to be any good solution here. For every
> use case in which Decimal+float->float appears better, there
> seems to be another one for which Decimal+float->Decimal
> appears better.

Sure, from a purist point of view my post is completely wrong.  It
doesn't correspond to the mathematical reality.

What it does correspond to is the code.  Only going rightward through
the types is what we have today.  A linear progression is a lot
simpler to understand than any sort of cycle; parallel progressions
isn't even on the table.

float has been the king of inexact types right for a long time.  All
other things being equal, that's good enough for me.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Decimal & amp; lt; -& amp; gt; float comparisons in py3k.

2010-03-20 Thread Adam Olsen

On Sat, Mar 20, 2010 at 22:48, Greg Ewing  wrote:
> Nick Coghlan wrote:
>
>> Note that Antoine's point was that float("0.1") and
>> Decimal.from_float(0.1) should compare equal.
>
> That would mean that Decimal("0.1") != float("0.1"), which might be
> surprising to someone who didn't realise they were mixing floats
> and decimals.

Much like Fraction("0.1") != float("0.1") is surprising.

The only way to get rid of surprises around float is to get rid of
float, and that ain't happening.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-21 Thread Adam Olsen

On Sun, Mar 21, 2010 at 00:58, Nick Coghlan  wrote:
> I don't actually mind either way - the pragmatic tower is about coding
> convenience rather than numeric purity (and mixing Fractions and
> Decimals in the same algorithm is somewhat nonsensical - they're
> designed for two completely different problem domains).

I think the rule I've been going on is that ideal types (int,
Fraction) are on one end and pragmatic types (float, complex) are on
the other.  Since Decimal can be used exactly it clearly bridges both
groups.

*However*, there's other possible types out there, and would they fit
into my system?  I've just taken a look at sympy and although it's
clearly an ideal type, it also allows mixing with float and complex,
both producing sympy types.  That puts it clearly past float and
complex in the tower.

I have no ideal where Decimal should go.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-21 Thread Adam Olsen

On Sun, Mar 21, 2010 at 16:59, Steven D'Aprano  wrote:
> If naive users are going to use the interpreter as a calculator, they're
> going to start off using floats and ints simply because they require
> less typing. My idea is to allow a gentle learning curve with Decimal
> (and Fraction) without scaring them off with exceptions or excessive
> warnings: a single warning per session would be okay, a warning after
> every operation would be excessive in my opinion, and exceptions by
> default would be right out.

That strikes me as a passive-aggressive way of saying we tolerate it
for interactive use, but don't you dare mix them for real programs.

A warning should be regarded as a bug in real programs — unless it's a
transitional measure — so it might as well be an exception.  Don't
guess and all that.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-22 Thread Adam Olsen

On Mon, Mar 22, 2010 at 12:26, Mark Dickinson  wrote:
> I don't want to let the abstractions of the numeric tower get in the
> way of the practicalities:  we should modify the abstractions if
> necessary!  In particular, it's not clear to me that all numeric types
> have to be comparable with each other.  It might make sense for
> Decimal + complex mixed-type operations to be disallowed, for example.

Only until a needy user breaks out the duck tape and builds a
ComplexDecimal type. ;)

The nature of the beast is more will be added on later.  So long as
Decimal == complex works right I don't see a problem with Decimal +
complex raising an exception.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-23 Thread Adam Olsen

On Tue, Mar 23, 2010 at 11:31, Mark Dickinson  wrote:
> Agreed, notwithstanding the above comments.  Though to avoid the
> problems described above, I think the only way to make this acceptable
> would be to prevent hashing of signaling nans.  (Which the decimal
> module current does; it also prevents hashing of quiet NaNs, but I
> can't see any good rationale for that.)

a = Decimal('nan')
a != a

They don't follow the behaviour required for being hashable.

float NaN should stop being hashable as well.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-23 Thread Adam Olsen

On Tue, Mar 23, 2010 at 12:04, Mark Dickinson  wrote:
> On Tue, Mar 23, 2010 at 5:48 PM, Adam Olsen  wrote:
>> a = Decimal('nan')
>> a != a
>>
>> They don't follow the behaviour required for being hashable.
>
> What's this required behaviour?  The only rule I'm aware of is that if
> a == b then hash(a) == hash(b).  That's not violated here.
>
> Note that containment tests check identity before equality, so there's
> no problem with putting (float) nans in sets or dicts:
>
>>>> x = float('nan')
>>>> s = {x}
>>>> x in s
> True

Ergh, I thought that got changed.  Nevermind then.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mixing float and Decimal -- thread reboot

2010-03-25 Thread Adam Olsen

On Thu, Mar 25, 2010 at 04:18, Steven D'Aprano  wrote:
> def myfunc(x, y):
>    if x == y:
>        return 1.0
>    else:
>        return something_complicated**(x-y)
>
>
> Optimising floating point code is fraught with dangers (the above fails
> for x=y=INF as well as NAN) but anything that make Not A Numbers
> pretend to be numbers is a bad thing.

What about this:

def myfunc(x):
if x >= THRESHOLD:
return 1.0
else:
return something_complicated(x)

If one behaves right it's more likely a fluke, not a designed in
feature.  It's certainly not obvious without covering every comparison
with comments.

Maybe that's the solution.  Signal by default on comparison, but add a
collection of naneq/naneg/etc functions (math module, methods,
whatever) that use a particular quiet mapping, making the whole thing
explicit?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why is nan != nan?

2010-03-25 Thread Adam Olsen

On Thu, Mar 25, 2010 at 18:57, Steven D'Aprano  wrote:
> Simply put: we should treat "two unclear values are different" as more
> compelling than "two unclear values are the same" as it leads to fewer,
> smaller, errors. Consider:
>
> log(-1) = NAN  # maths equality, not assignment
> log(-2) = NAN
>
> If we allow NAN = NAN, then we permit the error:
>
> log(-1) = NAN = log(-2)
> therefore log(-1) = log(-2)
> and 1 = 2
>
> But if make NAN != NAN, then we get:
>
> log(-1) != log(-2)
>
> and all of mathematics does not collapse into a pile of rubble. I think
> that is a fairly compelling reason to prefer inequality over equality.
>
> One objection might be that while log(-1) and log(-2) should be
> considered different NANs, surely NANs should be equal to themselves?
>
> -1 = -1
> implies log(-1) = log(-1)

IMO, this just shows how ludicrous it is to compare NaNs.  No matter
what we do it will imply some insane mathematical consequence implied
and code that will break.

They are, after all, an error passed silently.

Why is it complex can raise an exception when sorted, forcing you to
use a sane (and explicit) method, but for NaN it's okay to silently
fail?


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why is nan != nan?

2010-03-27 Thread Adam Olsen

On Fri, Mar 26, 2010 at 17:16, Raymond Hettinger
 wrote:
> Of the ideas I've seen in this thread, only two look reasonable:
> * Do nothing.  This is attractive because it doesn't break anything.
> * Have float.__eq__(x, y) return True whenever x and y are
>    the same NaN object.  This is attractive because it is a
>    minimal change that provides a little protection for
>    simple containers.
> I support either of those options.

What's the flaw in using isnan()?


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why is nan != nan?

2010-03-27 Thread Adam Olsen

On Sat, Mar 27, 2010 at 18:27, Robert Kern  wrote:
> On 2010-03-27 13:36 , Adam Olsen wrote:
>> What's the flaw in using isnan()?
>
> There are implicit comparisons being done inside list.__contains__() and
> other such methods. They do not, and should not, know about isnan().

Those methods should raise an exception.  Conceptually, NaN should
contaminate the result and make list.__contains__() return some
"unsortable", but we don't want to bend the whole language backwards
just for one obscure feature, especially when we have a much better
approach most of the time (exceptions).

The reason why NaN's current behaviour is so disturbing is that it
increases the mental load of everybody dealing with floats.  When you
write new code or debug a program you have to ask yourself what might
happen if a NaN is produced.  When maintaining existing code you have
to figure out if it's written a specific way to get NaN to work right,
or if it's even a fluke that NaN's work right, even if it was never
intended for NaNs or never sees them on developer machines.  This is
all the subtlety we work so hard to avoid normally, so why make an
exception here?  NaNs themselves have use cases, but their subtlety
doesn't.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why is nan != nan?

2010-03-28 Thread Adam Olsen

On Sun, Mar 28, 2010 at 17:55, Greg Ewing  wrote:
> Steven D'Aprano wrote:
>
>> I disagree -- if I ask:
>>
>> 3.0 in [1.0, 2.0, float('nan'), 3.0]
>>
>> I should get True, not an exception.
>
> Yes, I don't think anyone would disagree that NaN should compare
> unequal to anything that isn't a NaN. Problems only arise when
> comparing two NaNs.

NaN includes real numbers.  Although a NaN is originally produced for
results that are not real numbers, further operations could produce a
real number; we'd never know as NaN has no precision.  Extending with
complex numbers instead gives enough precision to show how this can
happen.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Adam Olsen

On 9/4/06, Nick Maclaren <[EMAIL PROTECTED]> wrote:
> Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> > On Mon, 04 Sep 2006 17:24:56 +0100,
> > David Hopwood <[EMAIL PROTECTED]
> > der.co.uk> wrote:
> > >Jean-Paul Calderone wrote:
> > >> PyGTK would presumably implement its pending call callback by writing a
> > >> byte to a pipe which it is also passing to poll().
> > >
> > >But doing that in a signal handler context invokes undefined behaviour
> > >according to POSIX.
> >
> > write(2) is explicitly listed as async-signal safe in IEEE Std 1003.1, 2004.
> > Was this changed in a later edition?  Otherwise, I don't understand what you
> > mean by this.
>
> Try looking at the C90 or C99 standard, for a start :-(
>
> NOTHING may safely be done in a real signal handler, except possibly
> setting a value of type static volatile sig_atomic_t.  And even that
> can be problematic.  And note that POSIX defers to C on what the C
> languages defines.  So, even if the function is async-signal-safe,
> the code that calls it can't be!

I don't believe that is true.  It says (or atleast SUSv3 says) that:

"""  3.26 Async-Signal-Safe Function

A function that may be invoked, without restriction, from
signal-catching functions. No function is async-signal-safe unless
explicitly described as such."""

Sure, it doesn't give me a warm-fuzzy feeling of knowing why it works,
but we can expect that it magically does.  My understanding is that
threading in general is the same way...

Of course that doesn't preclude bugs in the various implementations,
but those trump the standards anyway.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-04 Thread Adam Olsen

On 9/4/06, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
>   Now, we've had this API for a long time already (at least 2.5
> years).  I'm pretty sure it works well enough on most *nix systems.
> Event if it works 99% of the times, it's way better than *failing*
> *100%* of the times, which is what happens now with Python.

Failing 99% of the time is as bad as failing 100% of the time, if your
goal is to eliminate the short timeout on poll().  1% is quite a lot,
and it would probably have an annoying tendency to trigger repeatedly
when the user does certain things (not reproducible by you of course).

That said, I do hope we can get 100%, or at least enough nines that we
can increase the timeout significantly.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-08 Thread Adam Olsen

On 9/8/06, Jan Kanis <[EMAIL PROTECTED]> wrote:
> At the risk of waking up a thread that was already declared dead, but
> perhaps this is usefull.

I don't think we should let this die, at least not yet.  Nick seems to
be arguing that ANY signal handler is prone to random crashes or
corruption (due to bugs).  However, we already have a signal handler,
so we should already be exposed to the random crashes/corruption.

If we're going to rely on signal handling being correct then I think
we should also rely on write() being correct.  Note that I'm not
suggesting an API that allows arbitrary signal handlers, but rather
one that calls write() on an array of prepared file descriptors
(ignoring errors).

Ensuring modifications to that array are atomic would be tricky, but I
think it would be doable if we use a read-copy-update approach (with
two alternating signal handler functions).  Not sure how to ensure
there's no currently running signal handlers in another thread though.
 Maybe have to rip the atomic read/write stuff out of the Linux
sources to ensure it's *always* defined behavior.

Looking into the existing signalmodule.c, I see no attempts to ensure
atomic access to the Handlers data structure.  Is the current code
broken, at least on non-x86 platforms?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-08 Thread Adam Olsen

On 9/8/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> Ensuring modifications to that array are atomic would be tricky, but I
> think it would be doable if we use a read-copy-update approach (with
> two alternating signal handler functions).  Not sure how to ensure
> there's no currently running signal handlers in another thread though.
>  Maybe have to rip the atomic read/write stuff out of the Linux
> sources to ensure it's *always* defined behavior.

Doh, except that's exactly what sig_atomic_t is for.  Ah well, can't
win them all.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-10 Thread Adam Olsen

On 9/9/06, Nick Maclaren <[EMAIL PROTECTED]> wrote:
> I can't honestly promise to put any time into this in the forseeable
> future, but will try (sometime).  If anyone wants to tackle this,
> please ask me for comments/help/etc.

It took me a while to realize just what was wrong with my proposal,
but I did, and it led me to a new proposal.  I'd appreciate if you
could point out any holes in it.  First though, for the benefit of
those reading, I'll try to explain the (multiple!) reasons why mine
fails.

First, sig_atomic_t essentially promises that the compiler will behave
atomically and the CPU it's ran on will behave locally atomic.  It
does not claim to make writes visible to other CPUs in an atomic way,
and thus you could have different bytes show up at different times.
The x86 architecture uses a very simple scheme and won't do this
(unless the compiler itself does), but other architectures will.

Second, the start of a write call may be delayed a very long time.
This means that a fd may not be written to for hours until after the
signal started.  We can't release any fd's used for such a purpose, or
else risk random writing to them if they get reused later..

Third, it doesn't resolve the existing problems.  If I'm going to fix
signals I should fix ALL of signals. :)

Now on to my new proposal.  I do still use write().  If you can't
accept that I think we should rip signals out entirely, just let them
kill the process.  Not a reliable feature of any OS.

We create a single pipe and use it for all signals.  We never release
it, instead letting the OS do it when the process gets cleaned up.  We
write the signal number to it as a byte (assuming there's at most 256
unique signals).

This much would allow a GUI's poll loop to wake up when there is a
signal, and give control back to the python main loop, which could
then read off the signals and queue up their handler functions.

The only problem is when there is no GUI poll loop.  We don't want
python to have to poll the fd, we'd rather it just check a variable.
Is it possible to set/clear a flag in a sufficiently portable
(reentrant-safe, non-blocking, thread-safe) fashion?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-11 Thread Adam Olsen

On 9/11/06, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> On 9/11/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > This much would allow a GUI's poll loop to wake up when there is a
> > signal, and give control back to the python main loop, which could
> > then read off the signals and queue up their handler functions.
>
>   I like this approach.  Not only we would get a poll-able file
> descriptor to notify a GUI main loop when signals arrive, we'd also
> avoid the lack of async safety in Py_AddPendingCall /
> Py_MakePendingCalls which affects _current_ Python code.
>
>   Note that the file descriptor of the read end of the pipe has to
> become a public Python API so that 3rd party extensions may poll it.
> This is crucial.

Yeah, so long as Python still does the actual reading.


> > The only problem is when there is no GUI poll loop.  We don't want
> > python to have to poll the fd, we'd rather it just check a variable.
> > Is it possible to set/clear a flag in a sufficiently portable
> > (reentrant-safe, non-blocking, thread-safe) fashion?
>
>   It's simple.  That pipe file descriptor has to be changed to
> non-blocking mode in both ends of the pipe, obviously, with fcntl.
> Then, to find out whether a signal happened or not we modify
> PyErr_CheckSignals() to try to read from the pipe.  If it reads bytes
> from the pipe, we process the corresponding python signal handlers or
> raise KeyboardInterrupt.  If the read() syscall returns zero bytes
> read, we know no signal was delivered and move on.

Aye, but my point was that a syscall is costly, and we'd like to avoid
it if possible.

We'll probably have to benchmark it though, to find out if it's worth
the hassle.


>   The only potential problem left is that, by changing the pipe file
> descriptor to non-blocking mode we can only write as many bytes to it
> without reading from the other side as the pipe buffer allows.  If a
> large number of signals arrive very quickly, that buffer may fill and
> we lose signals.  But I think the default buffer should be more than
> enough.  And normally programs don't receive lots of signals in a
> small time window.  If it happens we may lose signals, but that's very
> rare, and who cares anyway.

Indeed, we need to document very clearly that:
* Signals may be dropped if there is a burst
* Signals may be delayed for a very long time, and if you replace a
previous handler your new handler may get signals intended for the old
handler

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-11 Thread Adam Olsen

On 9/11/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Gustavo Carneiro wrote:
> >   The only potential problem left is that, by changing the pipe file
> > descriptor to non-blocking mode we can only write as many bytes to it
> > without reading from the other side as the pipe buffer allows.  If a
> > large number of signals arrive very quickly, that buffer may fill and
> > we lose signals.
>
> That might be an argument for *not* trying to
> communicate the signal number by the value
> written to the pipe, but keep a separate set
> of signal-pending flags, and just use the pipe
> as a way of indicating that *something* has
> happened.

That brings you back to how you access the flags variable.  At best it
is very difficult, requiring unique assembly code for every supported
platform.  At worst, some platforms may not have any way to do it from
an interrupt context..

A possible alternative is to keep a set of flags for every thread, but
that requires the threads poll their variable regularly, and possibly
a wake-up pipe for each thread..

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-12 Thread Adam Olsen

On 9/12/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
>
> > That brings you back to how you access the flags variable.
>
> The existing signal handler sets a flag, doesn't it?
> So it couldn't be any more broken than the current
> implementation.
>
> If we get too paranoid about this, we'll just end
> up deciding that signals can't be used for anything,
> at all, ever. That doesn't seem very helpful,
> although techically I suppose it would solve
> the problem. :-)
>
> My own conclusion from all this is that if you
> can't rely on writing to a variable in one part
> of your program and reading it back in another,
> then computer architectures have become far
> too clever for their own good. :-(

They've been that way for a long, long time.  The irony is that x86 is
immensely stupid in this regard, and as a result most programmers
remain unaware of it.

Other architectures have much more interesting read/write and cache
reordering semantics, and the code is certainly broken there.  C
leaves it undefined with good reason.

My previous mention of using a *single* flag may survive corruption
simply because we can tolerate false positives.  Signal handlers would
write 0x, the poll loop would check if *any* bit is set.  If
so, write 0x0, read off the fd, then loop around and check it again.
If the start of the read() acts as a write-barrier it SHOULD guarantee
we don't miss any positive writes.

Hmm, if that works we should be able to generalize it for all the
other flags too.  Something to think about anyway...

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-12 Thread Adam Olsen

On 9/12/06, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> On 9/12/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > My previous mention of using a *single* flag may survive corruption
> > simply because we can tolerate false positives.  Signal handlers would
> > write 0x, the poll loop would check if *any* bit is set.  If
> > so, write 0x0, read off the fd, then loop around and check it again.
> > If the start of the read() acts as a write-barrier it SHOULD guarantee
> > we don't miss any positive writes.
>
>   Why write 0x?  Why can't the variable be of a "volatile
> char" type?  Assuming sizeof(char) == 1, please don't tell me
> architecture XPTO will write the value 4 bits at a time! :P

Nope.  It'll write 32 bits, then break that up into 8 bits :)
Although, at the moment I can't fathom what harm that would cause...

For the record, all volatile does is prevent compiler reordering
across sequence points.

Interestingly, it seems "volatile sig_atomic_t" is the correct way to
declare a variable for (single-threaded) signal handling.  Odd that
volatile didn't show up in any of the previous documentation I read..


>   I see your point of using a flag to avoid the read() syscall most of
> the time.  Slightly more complex, but possibly worth it.
>
>   I was going to describe a possible race condition, then wrote the
> code below to help explain it, modified it slightly, and now I think
> the race is gone.  In any case, the code might be helpful to check if
> we are in sync.  Let me know if you spot any  race condition I missed.
>
>
> static volatile char signal_flag;
> static int signal_pipe_r, signal_pipe_w;
>
> PyErr_CheckSignals()
> {
>   if (signal_flag) {
>  char signum;
>  signal_flag = 0;
>  while (read(signal_pipe_r, &signum, 1) == 1)
>  process_signal(signum);
>   }
> }

I'd prefer this to be a "while (signal_flag)" instead, although it
should technically work either way.


> static void
> signal_handler(int signum)
> {
>char signum_c = signum;
>signal_flag = 1;
>write(signal_pipe_w, &signum_c, 1);
> }

This is wrong.  PyErr_CheckSignals could check and clear signal_flag
before you reach the write() call.  "signal_flag = 1" should come
after.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-12 Thread Adam Olsen

On 9/12/06, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> On 9/12/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > My previous mention of using a *single* flag may survive corruption
> > simply because we can tolerate false positives.  Signal handlers would
> > write 0x, the poll loop would check if *any* bit is set.  If
> > so, write 0x0, read off the fd, then loop around and check it again.
> > If the start of the read() acts as a write-barrier it SHOULD guarantee
> > we don't miss any positive writes.

> PyErr_CheckSignals()
> {
>   if (signal_flag) {
>  char signum;
>  signal_flag = 0;
>  while (read(signal_pipe_r, &signum, 1) == 1)
>  process_signal(signum);
>   }
> }

The more I think about this the less I like relying on read() imposing
a hardware write barrier.  Unless somebody can say otherwise, I think
we'd be better of putting dummy
PyThread_aquire_lock/PyThread_release_lock calls in there.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] infinities

2006-11-27 Thread Adam Olsen

On 11/26/06, tomer filiba <[EMAIL PROTECTED]> wrote:
> i found several places in my code where i use positive infinity
> (posinf) for various things, i.e.,
>
>
> i like the concept, but i hate the "1e1" stuff... why not add
> posint, neginf, and nan to the float type? i find it much more readable as:
>
> if limit < 0:
> limit = float.posinf
>
> posinf, neginf and nan are singletons, so there's no problem with
> adding as members to the type.

There's no reason this has to be part of the float type.  Just define
your own PosInf/NegInf singletons and PosInfType/NegInfType classes,
giving them the appropriate special methods.

NaN is a bit iffier, but in your case it's sufficient to raise an
exception whenever it would be created.

Consider submitting it to the Python Cookbook when you're done. ;)

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] a feature i'd like to see in python #1: better iteration control

2006-12-03 Thread Adam Olsen

On 12/3/06, Ben Wing <[EMAIL PROTECTED]> wrote:
> many times writing somewhat complex loops over lists i've found the need
> to sometimes delete an item from the list.  currently there's no easy
> way to do so; basically, you have to write something like

As I don't believe there's any need for a language extension, I've
posted my approach to this on comp.lang.python:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/724aa6bcf021cfad/c4c629bd1bacc12b#c4c629bd1bacc12b

Note that deleting from the list as you iterate over it tends to have
very poor performance for larger lists.  My post includes some
timings, demonstrating this.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Twisted Isn't Specific (was Re: Trial balloon: microthreads library in stdlib)

2007-02-14 Thread Adam Olsen

On 2/14/07, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> When one is nested inside the other.  This isn't a common need, but it's
> occasionally useful if you need to switch back and forth between blocking
> and non-blocking code.  For example, suppose that you have some code that
> wants to offer a synchronous interface to an asynchronous library...  and
> the synchronous code is being called from a FastCGI "accept" event
> loop.  The inner code can't use the outer event loop, because the outer
> loop isn't going to proceed until the inner code is finished.

This would also let you wrap sys.stdout.write() in a nested event loop
so as to allow print statements to still work while you use have it
set to non-blocking mode, but I could see it argued that using print
statements at all is wrong at that point.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen

On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> E.g. have a wait(events = [], timeout = -1) method would be sufficient
> for most cases, where an event would specify

I agree with everything except this.  A simple function call would
have O(n) cost, thus being unacceptable for servers with many open
connections.  Instead you need it to maintain a set of events and let
you add or remove from that set as needed.

> I have implemented something like the above, based on greenlets.

I assume greenlets would be an internal implementation detail, not
exposed to the interface?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen

On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
> >> I have implemented something like the above, based on greenlets.
> >
> > I assume greenlets would be an internal implementation detail, not
> > exposed to the interface?
> Yes, you could use stackless, perhaps even Twisted,
> but I'm not sure if that would work because the requirement for the
> "reads single-threaded" is the simple wait(...) function call that does
> a yield
> (over multiple stack levels down to the function that created the task),
> something that is only provided by greenlet and stackless to my knowledge.

I don't think we're on the same page then.  The way I see it you want
a single async IO implementation shared by everything while having a
collection of event loops that cooperate "just enough".  The async IO
itself would likely end up being done in C.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen

On 2/15/07, Joachim König-Baltes <[EMAIL PROTECTED]> wrote:
> Adam Olsen schrieb:
> > I don't think we're on the same page then.  The way I see it you want
> > a single async IO implementation shared by everything while having a
> > collection of event loops that cooperate "just enough".  The async IO
> > itself would likely end up being done in C.
> >
> No, I'd like to have:
>
> - An interface for a task to specifiy the events it's interested in, and
> waiting
>   for at least one of the events (with a timeout).
> - an interface for creating a task (similar to creating a thread)
> - an interface for a schedular to manage the tasks

My tasks are transient and only wait on one thing at a time (if not
waiting for the scheduler to let them run!).  I have my own semantics
for creating tasks that incorporate exception propagation.  My
exception propagation (and return handling) require a scheduler with
specific support for them.  Net result is that I'd have to wrap all
you provide, if not monkey-patching it because it doesn't provide the
interfaces to let me wrap it properly.

All I want is a global select.poll() object that all the event loops
can hook into and each will get a turn to run after each call.

Well, that, plus I want it to work around all the platform-specific
peculiarities.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-15 Thread Adam Olsen

On 2/15/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> Is the only problem here that this style of development hasn't had been made
> visible enough?

Perhaps not the only problem, but definitely a big part of it.  I
looked for such a thing in twisted after python 2.5 came out and was
unable to find it.  If I had I may not have bothered to update my own
microthreads to use python 2.5 (my proof-of-concept was based on real
threads).

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] microthreading vs. async io

2007-02-26 Thread Adam Olsen

On 2/25/07, Armin Rigo <[EMAIL PROTECTED]> wrote:
> Hi Adam,
>
> On Thu, Feb 15, 2007 at 06:17:03AM -0700, Adam Olsen wrote:
> > > E.g. have a wait(events = [], timeout = -1) method would be sufficient
> > > for most cases, where an event would specify
> >
> > I agree with everything except this.  A simple function call would
> > have O(n) cost, thus being unacceptable for servers with many open
> > connections.  Instead you need it to maintain a set of events and let
> > you add or remove from that set as needed.
>
> I just realized that this is not really true in the present context.
> If the goal is to support programs that "look like" they are
> multi-threaded, i.e. don't use callbacks, as I think is Joachim's goal,
> then most of the time the wait() function would be only called with a
> *single* event, rarely two or three, never more.  Indeed, in this model
> a large server is implemented with many microthreads: at least one per
> client.  Each of them blocks in a separate call to wait().  In each such
> call, only the events revelant to that client are mentioned.
>
> In other words, the cost is O(n), but n is typically 1 or 2.  It is not
> the total number of events that the whole application is currently
> waiting on.  Indeed, the scheduler code doing the real OS call (e.g. to
> select()) can collect the events in internal dictionaries, or in Poll
> objects, or whatever, and update these dictionaries or Poll objects with
> the 1 or 2 new events that a call to wait() introduces.  In this
> respect, the act of *calling* wait() already means "add these events to
> the set of all events that need waiting for", without the need for a
> separate API for doing that.

That would depend on whether Joachim's wait() refers to the individual
tasks' calls or the scheduler's call.  I assumed it referred to the
scheduler.  In the basic form it would literally be select.select(),
which has O(n) cost and often fairly large n.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] with_traceback

2007-02-28 Thread Adam Olsen

On 2/27/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On Tue, 27 Feb 2007 13:37:21 +1300, Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> >I don't like that answer. I can think of legitimate
> >reasons for wanting to pre-create exceptions, e.g. if
> >I'm intending to raise and catch a particular exception
> >frequently and I don't want the overhead of creating
> >a new instance each time.
>
> This seems like kind of a strange micro-optimization to have an impact on a 
> language change discussion.  Wouldn't it be better just to optimize instance 
> creation overhead?  Or modify __new__ on your particular heavily-optimized 
> exception to have a free-list, so it can be both correct (you can still 
> mutate exceptions) and efficient (you'll only get a new exception object if 
> you really need it).

It sounds like we should always copy the exception given to raise, and
that not doing so is an optimization (albeit a commonly hit one).

Not arguing for or against, just making an observation.

On second thought, we could check that the refcount is 1 and avoid
copying in the common case of "raise Foo()".  Is reraising common
enough that we need to optimize it?


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] with_traceback

2007-02-28 Thread Adam Olsen

On 2/28/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
>
> > It sounds like we should always copy the exception given to raise,
>
> I don't like that either, for all the reasons that
> make it infeasible to copy an arbitrary object in a
> general way.

Exceptions aren't arbitrary objects though.  The requirement that they
inherit from BaseException is specifically to create a common
interface.  Copying would be an extension of that interface.

I believe calling copy.copy() would be sufficient.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] with_traceback

2007-02-28 Thread Adam Olsen

On 2/28/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> I am beginning to think that there are serious problems with attaching
> the traceback to the exception; I really don't like the answer that
> pre-creating an exception is unpythonic in Py3k.

How plausible would it be to optimize all exception instantiation?
Perhaps use slots and a freelist for everything inheriting from
BaseException and not inheriting from other builtin types?


> On 2/28/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> > On Wed, 28 Feb 2007 18:29:11 -0700, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > >
> > >I believe calling copy.copy() would be sufficient.
> > >
> > That doesn't sound like an improvement to me.  Normal code will be more
> > wasteful.  Code which the author has gone out of his way to tune will be
> > as wasteful as /average/ code currently is, and more wasteful than tuned
> > code now is.
> >


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] with_traceback

2007-02-28 Thread Adam Olsen

On 2/28/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
>
> > How plausible would it be to optimize all exception instantiation?
> > Perhaps use slots and a freelist for everything inheriting from
> > BaseException and not inheriting from other builtin types?
>
> I'm not sure a free list would help much for instances
> of user define classes, since creating one involves setting
> up a dict, etc. And if you use __slots__ you end up with
> objects of different sizes, which isn't free-list-friendly.

Not easy, but doable.  Perhaps a plan B if nobody comes up with a plan A.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Performance of pre-creating exceptions?

2007-03-02 Thread Adam Olsen

$ python2.5 -m timeit -r 10 -n 100 -s 'class Foo(Exception): pass'
'try: raise Foo()' 'except: pass'
100 loops, best of 10: 2.49 usec per loop
$ python2.5 -m timeit -r 10 -n 100 -s 'class Foo(Exception):' -s '
 def __init__(self): pass' 'try: raise Foo()' 'except: pass'
100 loops, best of 10: 3.15 usec per loop
$ python2.5 -m timeit -r 10 -n 100 -s 'e = Exception()' 'try:
raise e' 'except: pass'
100 loops, best of 10: 2.03 usec per loop

We can get more than half of the benefit simply by using a default
__init__ rather than a python one.  If you need custom attributes but
they're predefined you could subclass the exception and have them as
class attributes.  Given that, is there really a need to pre-create
exceptions?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding socket timeout to urllib2

2007-03-06 Thread Adam Olsen

On 3/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 3/6/07, Facundo Batista <[EMAIL PROTECTED]> wrote:
> > Guido van Rossum wrote:
> >
> > >> - I'll modify urlopen for it to accept a socket_timeout parameter,
> > >> default to None
> > >
> > > I'd call it timeout. There can't really be much ambiguity can there?
> >
> > Yes and no. For example, if I do a
> > ``urllib2.urlopen("ftp://ftp.myhome.com.ar/blah.txt";, timeout=10)``, the
> > timeout is about the socket or about the file transfer?
>
> Think of it this way. "Timeout" doesn't mean the whole thing needs to
> be completed in 10 secs. It means that over 10 secs of no activity
> causes it to be aborted.

IOW, It's an idle timeout.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding socket timeout to urllib2

2007-03-06 Thread Adam Olsen

On 3/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 3/6/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > IOW, It's an idle timeout.
>
> That's not in wikipedia. :-)

I know, I checked before posting. ;)


> It's the only timeout that is available to us, realistically; the
> socket module calls it timeout everywhere. So I think that should be a
> good name for it. The argument name doesn't need to server as complete
> documentation. I don't expect we'll ever see another kind of timeout
> added to this same API, and if we do, we'll just have to pick a
> different name for it. ;-)


On 3/6/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Not quite.  It's a timeout when you are waiting for some sort of response.
> If you make a connection to an ftp server to send files the connection
> shouldn't be aborted if you take more than 10 seconds to prepare the file
> you want to upload.  OTOH, if you send the file and don't get an
> acknowledgement back for 10 seconds, then you get a TimeoutError.

I think calling it "timeout" in the API is fine.  The documentation
can then clarify that it's an idle timeout, except it only applies
when blocked in a network operation.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding socket timeout to urllib2

2007-03-07 Thread Adam Olsen

On 3/6/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 3/6/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > I think calling it "timeout" in the API is fine.  The documentation
> > can then clarify that it's an idle timeout, except it only applies
> > when blocked in a network operation.
>
> Since "idel timeout" is not a commonly understood term it would be
> even better if it was explained without using it.

I disagree, but meh, I'll stick to my http://pink.bikeshed.org/

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking calls to object.init/new

2007-03-21 Thread Adam Olsen

On 3/21/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 3/21/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > super() has always felt strange to me.
>
> When used in __init__? Or in general? If the former, that's because
> it's a unique Python wart to even be able to use super for __init__.

In general.  Too many things could fail without errors, so it wasn't
obvious how to use it correctly.  None of the articles I've read
helped either.

> > Now, with PEP 3102 and the strict __init__, not so much.
>
> Works for me. :-)


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking calls to object.init/new

2007-03-21 Thread Adam Olsen

On 3/21/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> On 3/21/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> > On Wed, 21 Mar 2007 15:45:16 -0700, Guido van Rossum <[EMAIL PROTECTED]> 
> > wrote:
> > >See python.org/sf/1683368. I'd like to invite opinions on whether it's
> > >worth breaking an unknown amount of user code in 2.6 for the sake of
> > >stricter argument checking for object.__init__ and object.__new__. I
> > >think it probably isn't; but the strict version could be added to 3.0
> > >and a warning issued in 2.6 in -Wpy3k mode. Alternatively, we could
> > >introduce the stricter code in 2.6, fix the stdlib modules that it
> > >breaks, and hope for the best. Opinions?
> > >
> >
> > Perhaps I misunderstand the patch, but it would appear to break not just
> > some inadvisable uses of super(), but an actual core feature of super().
> > Maybe someone can set me right.  Is this correct?
> >
> >   class Base(object):
> >   def __init__(self, important):
> >   # Don't upcall with `important` because object is the base
> >   # class and its __init__ doesn't care (or won't accept) it
> >   super(Base, self).__init__()
> >   self.a = important
> >
> > If so, what are the implications for this?
> >
> >   class Other(object):
> >   def __init__(self, important):
> >   # Don't upcall with `important` because object is the base
> >   # class and its __init__ doesn't care (or won't accept) it
> >   super(Other, self).__init__()
> >   self.b = important
> >
> >   class Derived(Base, Other):
> >   pass
> >
> >
> > (A similar example could be given where Base and Other take differently
> > named arguments with nothing to do with each other.  The end result is
> > the same either way, I think.)
>
> The common name is actually critical.  Your argument names are
> essentially a shared namespace, just like that on the object itself,
> and they're both modifying it on the assumption of being the only
> thing that does so.
>
> There's two ways to fix your example.  First, adding a common base
> class which is the "owner" of that name:
>
> class Owner(object):
> def __init__(self, important, **kwargs):
> super(Owner, self).__init__(**kwargs)  # important is skipped
>
> class Left(Owner):
> def __init__(self, important, **kwargs):
> super(Left, self).__init__(important=important, **kwargs)
>
> class Right(Owner):
> def __init__(self, important, **kwargs):
> super(Right, self).__init__(important=important, **kwargs)
>
> class Derived(Left, Right):
> pass
>
> >>> Derived("hi")
>
>
> The other is to rename the argument, removing the namespace conflict:
>
> class Left(object):
> def __init__(self, oranges, **kwargs):
> super(Left, self).__init__(oranges=oranges, **kwargs)
>
> class Right(object):
> def __init__(self, apples, **kwargs):
> super(Right, self).__init__(apples=apples, **kwargs)
>
> class Derived(Left, Right):
> pass

Hmm, where's that "undo post" button...

That should be:

class Left(object):
def __init__(self, oranges, **kwargs):
super(Left, self).__init__(**kwargs)

class Right(object):
def __init__(self, apples, **kwargs):
super(Right, self).__init__(**kwargs)

class Derived(Left, Right):
pass

And I would have gotten an error when I tested it had I been using the
strict __init__.

>
> >>> Derived(apples=3, oranges=8)
>
> In this second version you could clean up Derived's interface by
> adding either "def __init__(self, apples, oranges, **kwargs)" and
> passing them both explicitly, or by adding "def __init__(self, *,
> **kwargs)" and requiring they by given to you by name.  Either way
> you're completely safe.
>
>
> >
> > I think I understand the desire to pull keyword arguments out at each
> > step of the upcalling process, but I don't see how it can work, since
> > "up" calling isn't always what's going on - given a diamond, there's
> > arbitrary side-calling, so for cooperation to work every method has to
> > pass on every argument, so object.__init__ has to take arbitrary args,
> > since no one knows when their "up" call will actually hit object.
> >
> > Since without diamonds, naive "by-name" upcalling works, I assume that
> > super() is actually intended to be used with diamonds, so this seems
> > relevant.
> >
> > I hope I've just overlooked something.  Writing this email feels very
> > strange.
>
> super() has always felt strange to me.  Now, with PEP 3102 and the
> strict __init__, not so much.
>
> --
> Adam Olsen, aka Rhamphoryncus
>


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking calls to object.init/new

2007-03-21 Thread Adam Olsen

On 3/21/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On Wed, 21 Mar 2007 15:45:16 -0700, Guido van Rossum <[EMAIL PROTECTED]> 
> wrote:
> >See python.org/sf/1683368. I'd like to invite opinions on whether it's
> >worth breaking an unknown amount of user code in 2.6 for the sake of
> >stricter argument checking for object.__init__ and object.__new__. I
> >think it probably isn't; but the strict version could be added to 3.0
> >and a warning issued in 2.6 in -Wpy3k mode. Alternatively, we could
> >introduce the stricter code in 2.6, fix the stdlib modules that it
> >breaks, and hope for the best. Opinions?
> >
>
> Perhaps I misunderstand the patch, but it would appear to break not just
> some inadvisable uses of super(), but an actual core feature of super().
> Maybe someone can set me right.  Is this correct?
>
>   class Base(object):
>   def __init__(self, important):
>   # Don't upcall with `important` because object is the base
>   # class and its __init__ doesn't care (or won't accept) it
>   super(Base, self).__init__()
>   self.a = important
>
> If so, what are the implications for this?
>
>   class Other(object):
>   def __init__(self, important):
>   # Don't upcall with `important` because object is the base
>   # class and its __init__ doesn't care (or won't accept) it
>   super(Other, self).__init__()
>   self.b = important
>
>   class Derived(Base, Other):
>   pass
>
>
> (A similar example could be given where Base and Other take differently
> named arguments with nothing to do with each other.  The end result is
> the same either way, I think.)

The common name is actually critical.  Your argument names are
essentially a shared namespace, just like that on the object itself,
and they're both modifying it on the assumption of being the only
thing that does so.

There's two ways to fix your example.  First, adding a common base
class which is the "owner" of that name:

class Owner(object):
def __init__(self, important, **kwargs):
super(Owner, self).__init__(**kwargs)  # important is skipped

class Left(Owner):
def __init__(self, important, **kwargs):
super(Left, self).__init__(important=important, **kwargs)

class Right(Owner):
def __init__(self, important, **kwargs):
super(Right, self).__init__(important=important, **kwargs)

class Derived(Left, Right):
pass

>>> Derived("hi")


The other is to rename the argument, removing the namespace conflict:

class Left(object):
def __init__(self, oranges, **kwargs):
super(Left, self).__init__(oranges=oranges, **kwargs)

class Right(object):
def __init__(self, apples, **kwargs):
super(Right, self).__init__(apples=apples, **kwargs)

class Derived(Left, Right):
pass

>>> Derived(apples=3, oranges=8)

In this second version you could clean up Derived's interface by
adding either "def __init__(self, apples, oranges, **kwargs)" and
passing them both explicitly, or by adding "def __init__(self, *,
**kwargs)" and requiring they by given to you by name.  Either way
you're completely safe.


>
> I think I understand the desire to pull keyword arguments out at each
> step of the upcalling process, but I don't see how it can work, since
> "up" calling isn't always what's going on - given a diamond, there's
> arbitrary side-calling, so for cooperation to work every method has to
> pass on every argument, so object.__init__ has to take arbitrary args,
> since no one knows when their "up" call will actually hit object.
>
> Since without diamonds, naive "by-name" upcalling works, I assume that
> super() is actually intended to be used with diamonds, so this seems
> relevant.
>
> I hope I've just overlooked something.  Writing this email feels very
> strange.

super() has always felt strange to me.  Now, with PEP 3102 and the
strict __init__, not so much.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking calls to object.init/new

2007-03-22 Thread Adam Olsen

On 3/22/07, Thomas Wouters <[EMAIL PROTECTED]> wrote:
> On 3/22/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > In general.  Too many things could fail without errors, so it wasn't
> > obvious how to use it correctly.  None of the articles I've read
> > helped either.
>
> I've been thinking about writing an article that explains how to use
> super(), so let's start here :) This is a long post that I'll probably
> eventually copy-paste-and-edit into an article of some sort, when I get the
> time. Please do comment, except with 'MI is insane' -- I already know that.
> Nevertheless, I think MI has its uses.

I'm going to be blunt, and I apologize if I offend.  In short, your
article is no better than any of the others.

TOOWTDI

What you've done is list off various ways why multiple inheritance and
super() can fail, and then provide a toolbox from which a programmer
can cobble together a solution to fit their exact needs.  It's not
pythonic.  What we need is a *single* style that can be applied
consistently to 90+% of problems while still requiring minimal effort
to read later.

Using keyword arguments and consuming them is the best I've seen so
far.  Sure it's a little verbose, but the verbosity is repetitive and
easy to tune out.  It also requires the classes to cooperate.  News
flash: Python isn't C++ or Java.  Python uses a shared __dict__ rather
than private namespaces in each class.  Python *always* requires the
classes to cooperate.

If you want to combine uncooperative classes you need to use
delegation.  I'm sure somebody could whip up a metaclass to automate
it, especially with the new metaclass syntax, not to mention ABCs to
say "I'm string-ish" when you're delegating str.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Breaking calls to object.init/new

2007-03-22 Thread Adam Olsen

On 3/22/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> Can we move this to c.l.py or python-ideas? I don't think it has any
> bearing on the decision on whether object.__init__() or
> object.__new__() should reject excess arguments. Or if it does I've
> lost the connection through the various long articles.

It's about use-cases involving what object.__init__() does.  However,
as it supports the decision that has been already made, you are right
in that it no longer belongs on python-dev.  I'll move further replies
somewhere else.

> I also would like to ask Mr. Olsen to tone down his rhetoric a bit.
> There's nothing unpythonic about designing an API using positional
> arguments.

Again, I apologize for that.  But the unpythonic comment referred only
to providing an assortment of unobvious choices, rather than a single
obvious one (perhaps with specialty options rarely used).  It was not
in reference to my previous argument against positional arguments.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] function for counting items in a sequence

2007-04-07 Thread Adam Olsen

On 4/7/07, Steven Bethard <[EMAIL PROTECTED]> wrote:
> Here's a patch implementing collections.counts() as suggested above:

The name doesn't make it obvious to me what's going on.  Maybe
countunique()?  Some other options are countdistinct() and
countduplicates().


>  >>> items = 'acabbacba'
>  >>> item_counts = counts(items)
>  >>> for item in 'abcd':
>  ... print item, item_counts[item]
>  ...
>  a 4
>  b 3
>  c 2
>  d 0

Would become:

>>> items = 'acabbacba'
>>> counts = countunique(items)
>>> for item in 'abcd':
... print item, counts[item]
...
a 4
b 3
c 2
d 0

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New Super PEP

2007-04-28 Thread Adam Olsen

On 4/28/07, Calvin Spealman <[EMAIL PROTECTED]> wrote:
> Comments welcome, of course. Bare with my first attempt at crafting a PEP.
>
> PEP: XXX
> Title: Super As A Keyword
> Version: $Revision$
> Last-Modified: $Date$
> Author: Calvin Spealman <[EMAIL PROTECTED]>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 30-Apr-2007
> Python-Version: 2.6
> Post-History:

You need a section on alternate proposals.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Adam Olsen

On 9/13/07, Hrvoje Nikšić <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-09-13 at 13:15 +0200, "Martin v. Löwis" wrote:
> > > To put it another way, would it actually matter if the reference
> > > counts for such objects became hopelessly wrong due to non-atomic
> > > adjustments?
> >
> > If they drop to zero (which may happen due to non-atomic adjustments),
> > Python will try to release the static memory, which will crash the
> > malloc implementation.
>
> More precisely, Python will call the deallocator appropriate for the
> object type.  If that deallocator does nothing, the object continues to
> live.  Such objects could also start out with a refcount of sys.maxint
> or so to ensure that calls to the no-op deallocator are unlikely.
>
> The part I don't understand is how Python would know which objects are
> global/static.  Testing for such a thing sounds like something that
> would be slower than atomic incref/decref.

I've explained my experiments here:
http://www.artima.com/forums/flat.jsp?forum=106&thread=214235&start=30&msRange=15#279978

Basically though, atomic incref/decref won't work.  Once you've got
two threads modifying the same location the costs skyrocket.  Even
without being properly atomic you'll get the same slowdown on x86
(who's cache coherency is fairly strict.)

The only two options are:
A) Don't modify an object on every incref/decref.  Deletion must be
delayed.  This lets you share (thread-safe) objects.
B) Don't share *any* objects.  This is a process model (even if
they're lightweight like erlang).  For the near future, it's much
easier to do this using real processes though.

Threading is much more powerful, but it remains to be proven that it
can be done efficiently.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Adam Olsen

On 9/13/07, Justin Tulloss <[EMAIL PROTECTED]> wrote:
>
>
> On 9/13/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> >
> > Basically though, atomic incref/decref won't work.  Once you've got
> > two threads modifying the same location the costs skyrocket.  Even
> > without being properly atomic you'll get the same slowdown on x86
> > (who's cache coherency is fairly strict.)
>
>
> I'm a bit skeptical of the actual costs of atomic incref. For there to be
> contention, you would need to have to be modifying the same memory location
> at the exact same time. That seems unlikely to ever happen. We can't bank on
> it never happening, but an occasionally expensive operation is ok. After
> all, it's occasional.

That was my initial expectation too.  However, the incref *is* a
modification.  It's not simply an issue of the "exact same time", but
anything that causes the cache entries to bounce back and forth and
delay the rest of the pipeline.  If you have a simple loop like "for i
in range(count): 1.0+n", then the 1.0 literal will get shared between
threads, and the refcount will get hammered.

Is it reasonable to expect that much sharing?  I think it is.
Literals are an obvious example, but there's also configuration data
passed between threads.  Pystone seems to have enough sharing to kill
performance.  And after all, isn't sharing the whole point (even in
the definition) of threads?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Adam Olsen

On 9/13/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > what if ... we use atomic test-and-set to
> > handle reference counting (with a lock for those CPU architectures where we
> > haven't written the necessary assembler fragment), then implement a lock for
> > each mutable type and another for global state (thread state, interpreter
> > state, etc)?
>
> Could be worth a try. A first step might be to just implement
> the atomic refcounting, and run that single-threaded to see
> if it has terribly bad effects on performance.

I've done this experiment.  It was about 12% on my box.  Later, once I
had everything else setup so I could run two threads simultaneously, I
found much worse costs.  All those literals become shared objects that
create contention.

I'm now working on an approach that writes out refcounts in batches to
reduce contention.  The initial cost is much higher, but it scales
better too.  I've currently got it to just under 50% cost, meaning two
threads is a slight net gain.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Adam Olsen

On 9/14/07, Justin Tulloss <[EMAIL PROTECTED]> wrote:
>
> On 9/14/07, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > > Could be worth a try. A first step might be to just implement
> > > the atomic refcounting, and run that single-threaded to see
> > > if it has terribly bad effects on performance.
> >
> > I've done this experiment.  It was about 12% on my box.  Later, once I
> > had everything else setup so I could run two threads simultaneously, I
> > found much worse costs.  All those literals become shared objects that
> > create contention.
>
> It's hard to argue with cold hard facts when all we have is raw speculation.
> What do you think of a model where there is a global "thread count" that
> keeps track of how many threads reference an object? Then there are
> thread-specific reference counters for each object. When a thread's refcount
> goes to 0, it decrefs the object's thread count. If you did this right,
> hopefully there would only be cache updates when you update the thread
> count, which will only be when a thread first references an object and when
> it last references an object.
>
> I mentioned this idea earlier and it's growing on me. Since you've actually
> messed around with the code, do you think this would alleviate some of the
> contention issues?

There would be some poor worst-case behaviour.  In the case of
literals you'd start referencing them when you call a function, then
stop when the function returns.  Same for any shared datastructure.

I think caching/buffering refcounts in general holds promise though.
My current approach uses a crude hash table as a cache and only
flushes when there's a collision or when the tracing GC starts up.  So
far I've only got about 50% of the normal performance, but that's with
90% or more scalability, and I'm hoping to keep improving it.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Adam Olsen

On 9/14/07, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight <[EMAIL PROTECTED]> wrote:
> >
> >On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote:
> >>On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss  <[EMAIL PROTECTED]>
> >>wrote:
> >>>Your idea can be combined with the maxint/2 initial refcount for
> >>>>non-disposable objects, which should about eliminate thread-count
> >>>>updates
> >>>>for them.
> >>>>--
> >>>
> >>>I don't really like the maxint/2 idea because it requires us to
> >>>differentiate between globals and everything else. Plus, it's a  hack. I'd
> >>>like a more elegant solution if possible.
> >>
> >>It's not really a solution either.  If your program runs for a couple
> >>minutes and then exits, maybe it won't trigger some catastrophic  behavior
> >>from this hack, but if you have a long running process then you're  almost
> >>certain to be screwed over by this (it wouldn't even have to be *very*
> >>long running - a month or two could do it on a 32bit platform).
> >
> >Not true: the refcount becoming 0 only calls a dealloc function.. For
> >objects which are not deletable, the dealloc function should simply  set the
> >refcount back to maxint/2. Done.
> >
>
> So, eg, replace the Py_FatalError in none_dealloc with an assignment to
> ob_refcnt?  Good point, sounds like it could work (I'm pretty sure you
> know more about deallocation in CPython than I :).

As I've said, this is all moot.  The cache coherence protocols on x86
means this will be nearly as slow as proper atomic refcounting, and
will not scale if multiple threads regularly touch the object.  My
experience is that they will touch it regularly.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] urllib exception compatibility

2007-09-27 Thread Adam Olsen

On 9/27/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Gregory P. Smith wrote:
> > Is IOError is the right name to use?  OSError is raised for things that
> > are not IO such as subprocess, dlopen, system.
>
> The trouble with either of these is that the class
> of errors we're talking about don't necessarily come
> directly from the OS or I/O library.
>
> Often I raise my own EnvironmentError instances for
> things which don't have any associated OS error code
> but are nonetheless environment-related, such as an
> error in a file format.
>
> I don't reuse IOError or OSError because I feel as
> though I ought to supply an errno with these, but
> there isn't any.
>
> I suppose we could pick one of these and make it
> official that it's okay to instantiate it without
> an errno. But it's hard to decide which one,
> because they both sound too narrow in scope.
>
> I don't like EMError either, btw. Maybe EnvError?
> Although that sounds like it has something to do
> with the unix environment variables.

ExternalError?  Pretty vague though.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GC Changes

2007-10-01 Thread Adam Olsen

On 10/1/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Justin Tulloss wrote:
> > Would
> > somebody care to give me a brief overview on how the current gc module
> > interacts with the interpreter
>
> The cyclic GC kicks in when memory is running low. Since

This isn't true at all.  It's triggered by heuristics based on the
total number of allocated objects.  It doesn't know how much memory is
available and is not called if an allocation fails.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] GC Changes

2007-10-01 Thread Adam Olsen

On 10/1/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
> > This isn't true at all.  It's triggered by heuristics based on the
> > total number of allocated objects.
>
> Hmmm, all right, it seems I don't know what I'm
> talking about. I'll shut up now before I spread
> any more misinformation. Sorry.

Hey, no worries.  I half expect someone to correct me. ;)

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Explicit Tail Calls

2007-10-12 Thread Adam Olsen

On 10/12/07, Shane Hathaway <[EMAIL PROTECTED]> wrote:
> Shane Hathaway wrote:
> > Shane Hathaway wrote:
> >> I'm interested in seeing a good way to write tail calls in Python.  Some
> >> algorithms are more readable when expressed using tail recursion.
> >
> > About ten seconds after I wrote the previous message, I realized two things:
> >
> > - It's easy to write "return Return" instead of "raise Return".  So
> > "raise TailCall" is probably better.
> >
> > - I can write a complete implementation of this idea with nothing but a
> > simple decorator.  Check it out!
>
> With yet another 10 seconds, I realized my quick implementation actually
> does nothing to optimize tail calls.  Working on a fix.

Since they're not going to be accepted into python anyway, the
implementation is off-topic for python-dev.  Please take them up
elsewhere (such as my offer to discuss in private.)

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Explicit Tail Calls

2007-10-12 Thread Adam Olsen

On 10/12/07, Shane Hathaway <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm interested in seeing a good way to write tail calls in Python.  Some
> algorithms are more readable when expressed using tail recursion.
>
> I know tail call optimization has been discussed before [1], but I would
> like to consider a different approach.  The previous discussion centered
> on implicit tail call optimization, which incurs the risk of changing
> the behavior of currently working code.  (For example, is it safe to
> optimize tail calls within try...finally blocks?  Probably not.  And I
> generally want all stack frames to appear in tracebacks, unless I say
> otherwise.)
>
> I would like to suggest an explicit form of tail calls.  A new built-in
> exception type called "Return" will be added, and it will be used like this:

So long as you're willing to make it explicit (which I strongly
encourage), you can accomplish nearly anything you'd like with
decorators and functions.  There doesn't seem to be strong enough use
cases to get anything into the core language anyway.

If you don't like the existing decorator recipes I can help you come
up with a better one in private, off the list.


> def fact2(n, v):
> if n:
> raise Return(fact2, n-1, v*n)
> else:
> return v

I hope your use cases are better than this. ;)


> The interpreter will catch Return exceptions and use them to call
> something else.  The caller of a function that uses "raise Return" will
> see the result of the tail call as the returned value, rather than the
> Return exception.  I am not yet considering implementation details.
>
> Not all algorithms are good candidates for this.  I used the fact2
> example only because it's readily available.  I know there are other
> people interested in tail call optimization in Python [2] [3]; perhaps
> some of them are watching and can provide better examples.
>
> Furthermore, there might be some explicit syntax that converts "return
> f(...)" statements to "raise Return(f, ...)", such as a decorator.
> However, I'm less interested in the syntax and more interested in the
> basic capability.
>
> Shane
>
> [1] http://mail.python.org/pipermail/python-dev/2004-July/046150.html
> [2] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/474088
> [3] http://www.voidspace.org.uk/python/weblog/arch_d7_2007_09_22.shtml#e833
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/rhamph%40gmail.com
>


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fwd: Deadlock by a second import in a thread

2007-10-19 Thread Adam Olsen

On 10/19/07, Facundo Batista <[EMAIL PROTECTED]> wrote:
> 2007/10/19, Adam Olsen <[EMAIL PROTECTED]>:
>
> > The solution then is, if your python file will ever be imported, you
> > must write a main function and do all the work there instead.  Do not
> > write it in the style of a script (with significant work in the global
> > scope.)
>
> I had this a as a good coding style, not so mandatory.
>
> I agree with you that the OP shouldn't be doing that, but note that
> the main problem arises here because it's completely unpredictable the
> import in strptime for an external user.
>
> Do you recommend to close the bug as "won't fix" saying something like...
>
> The deadlock happens because strptime has an import inside it, and
> recursive imports are not allowed in different threads.
>
> As a general rule and good coding style, don't run your code when the
> module is imported, but put it in a function like "main" in the second 
> file,
> import it and call it from the first one. This will solve your problem.
>
> Note that this happens to you with strptime, but could happen with a lot
> of functions that do this internal import of something else. So,
> you'll never
> be sure.
>
> What do you think?

Whether this is a minor problem due to poor style or a major problem
due to a language defect is a matter of perspective.  I'm working on
redesigning Python's threading support, expecting it to be used a
great deal more, which'd push it into the major problem category.

For now I'd leave it open.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Deadlock by a second import in a thread

2007-10-19 Thread Adam Olsen

On 10/19/07, Facundo Batista <[EMAIL PROTECTED]> wrote:
> Hi!
>
> I was looking to this bug:   http://bugs.python.org/issue1255
>
> It basically creates a deadlock in Python by doing the following:
>
> - aa.py imports bb.py
> - bb.py imports time and generates a thread
> - the thread uses time.strptime
>
> The deadlock is because the strptime function imports another module,
> line 517 of timemodule.c:
>
>   PyObject *strptime_module = PyImport_ImportModule("_strptime");
>
> This situation is well known, found a lot of references to this
> import-thread-import problem in discussions and previous bugs (i.e.:
> http://bugs.python.org/issue683658).
>
> What I did *not* find, and why I'm asking here, is how to solve it.
>
> Exists a known solution to this?

When python encounters a recursive import within a single thread it
allows you to get access to partially-imported modules, making the
assumption that you won't do any significant work until the entire
import process completes.

Only one thread is allowed to do an import at a time though, as
they'll do significant work with it immediately, so being
partially-imported would be a race condition.

Writing a python file as a script means you do significant work in the
body, rather than in a function called after importing completes.
Importing this python file then violates the assumption for
single-threaded recursive imports, and creating threads then violates
their safety assumptions.

The solution then is, if your python file will ever be imported, you
must write a main function and do all the work there instead.  Do not
write it in the style of a script (with significant work in the global
scope.)

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] XML codec?

2007-11-08 Thread Adam Olsen

On 11/8/07, Walter Dörwald <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
>
> >> Then how about the suggested "xml-auto-detect"?
> >
> > That is better.
>
> OK.
>
> >>> Then, I'd claim that the problem that the codec solves doesn't really
> >>> exist. IOW, most XML parsers implement the auto-detection of encodings,
> >>> anyway, and this is where architecturally this functionality belongs.
> >> But not all XML parsers support all encodings. The XML codec makes it
> >> trivial to add this support to an existing parser.
> >
> > I would like to question this claim. Can you give an example of a parser
> > that doesn't support a specific encoding
>
> It seems that e.g. expat doesn't support UTF-32:
>
> from xml.parsers import expat
>
> p = expat.ParserCreate()
> e = "utf-32"
> s = (u"" % e).encode(e)
> p.Parse(s, True)
>
> This fails with:
>
> Traceback (most recent call last):
>File "gurk.py", line 6, in 
>  p.Parse(s, True)
> xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1,
> column 1
>
> Replace "utf-32" with "utf-16" and the problem goes away.
>
> > and where adding such a codec
> > solves that problem?
> >
> > In particular, why would that parser know how to process Python Unicode
> > strings?
>
> It doesn't have to. You can use an XML encoder to reencode the unicode
> string into bytes (forcing an encoding that the parser knows):
>
> import codecs
> from xml.parsers import expat
>
> ci = codecs.lookup("xml-auto-detect")
> p = expat.ParserCreate()
> e = "utf-32"
> s = (u"" % e).encode(e)
> s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0]
> p.Parse(s, True)
>
> >> Furthermore encoding-detection might be part of the responsibility of
> >> the XML parser, but this decoding phase is totally distinct from the
> >> parsing phase, so why not put the decoding into a common library?
> >
> > I would not object to that - just to expose it as a codec. Adding it
> > to the XML library is fine, IMO.
>
> But it does make sense as a codec. The decoding phase of an XML parser
> has to turn a byte stream into a unicode stream. That's the job of a codec.

Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
codecs to do the encoding.  There's no need to create a magical
mystery codec to pick out which though.  It's not even sufficient for
XML:

1) round-tripping a file should be done in the original encoding.
Containing the auto-detected encoding within a codec doesn't let you
see what it picked.
2) the encoding may be specified externally from the file/stream[1].
The xml parser needs to handle these out-of-band encodings anyway.


[2] http://mail.python.org/pipermail/xml-sig/2004-October/010649.html

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] XML codec?

2007-11-09 Thread Adam Olsen

On Nov 9, 2007 6:10 AM, Walter Dörwald <[EMAIL PROTECTED]> wrote:
>
> Martin v. Löwis wrote:
> >>> Yes, an XML parser should be able to use UTF-8, UTF-16, UTF-32, etc
> >>> codecs to do the encoding.  There's no need to create a magical
> >>> mystery codec to pick out which though.
> >> So the code is good, if it is inside an XML parser, and it's bad if it
> >> is inside a codec?
> >
> > Exactly so. This functionality just *isn't* a codec - there is no
> > encoding. Instead, it is an algorithm for *detecting* an encoding.
>
> And what do you do once you've detected the encoding? You decode the
> input, so why not combine both into an XML decoder?

It seems to me that parsing XML requires 3 steps:
1) determine encoding
2) decode byte stream
3) parse XML (including handling of character references)

All an xml codec does is make the first part a side-effect of the
second part.  Rather than this:

encoding = detect_encoding(raw_data)
decoded_data = raw_data.decode(encoding)
tree = parse_xml(decoded_data, encoding)  # Verifies encoding

You'd have this:

e = codecs.getincrementaldecoder("xml-auto-detect")()
decoded_data = e.decode(raw_data, True)
tree = parse_xml(decoded_data, e.encoding)  # Verifies encoding

It's clear to me that detecting an encoding is actually the simplest
part of all this (so long as there's an API to do it!)  Putting it
inside a codec seems like the wrong subdivision of responsibility.

(An example using streams would end up closer, but it still seems
wrong to me.  Encoding detection is always one way, while codecs are
always two way (even if lossy.))

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] XML codec?

2007-11-09 Thread Adam Olsen

On Nov 9, 2007 3:59 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> >> It makes working with XML data a lot easier: you simply don't have to
> >> bother with the encoding of the XML data anymore and can just let the
> >> codec figure out the details. The XML parser can then work directly
> >> on the Unicode data.
> >
> > Having the functionality indeed makes things easier. However, I don't
> > find
> >
> >   s.decode(xml.detect_encoding(s))
> >
> > particularly more difficult than
> >
> >   s.decode("xml-auto-detection")
>
> Not really, but the codec has more control over what happens to
> the stream, ie. it's easier to implement look-ahead in the codec
> than to do the detection and then try to push the bytes back onto
> the stream (which may or may not be possible depending on the
> nature of the stream).

io.BufferedReader() standardizes a .peek() API, making it trivial.  I
don't see why we couldn't require it.

(As an aside, .peek() will fail to do what detect_encodings() needs if
BufferedReader's buffer size is too small.  I do wonder if that
limitation is appropriate.)


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [poll] New name for builtins

2007-11-28 Thread Adam Olsen

On Nov 28, 2007 8:20 AM, Christian Heimes <[EMAIL PROTECTED]> wrote:
> I'm sending this mail to Python-dev in the hope to reach more developers.
>
> GvR likes to rename the __builtin__ to reduce confusing between
> __builtin__ and __builtins__. He wanted to start a poll on the new name
> but apparently he forgot.

In a recent thread on python-ideas[1] it was suggested that builtins
be added as an argument to eval and exec.  I'd prefer to do that and
eliminate the name altogether.

If not that I suggest something like __inject_builtins__.  This
implies it's a command to eval/exec, and doesn't necessarily reflect
your current builtins (which are canonically accessible as an
attribute of your frame.)

[1] http://mail.python.org/pipermail/python-ideas/2007-November/001250.html

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [poll] New name for builtins

2007-11-28 Thread Adam Olsen

On Nov 28, 2007 11:50 AM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Nov 28, 2007 10:46 AM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > On Nov 28, 2007 8:20 AM, Christian Heimes <[EMAIL PROTECTED]> wrote:
> > > I'm sending this mail to Python-dev in the hope to reach more developers.
> > >
> > > GvR likes to rename the __builtin__ to reduce confusing between
> > > __builtin__ and __builtins__. He wanted to start a poll on the new name
> > > but apparently he forgot.
> >
> > In a recent thread on python-ideas[1] it was suggested that builtins
> > be added as an argument to eval and exec.  I'd prefer to do that and
> > eliminate the name altogether.
> > [1] http://mail.python.org/pipermail/python-ideas/2007-November/001250.html
>
> You can do that but the special entry in globals is still required in
> order to pass it on to all scopes that need it.
>
> > If not that I suggest something like __inject_builtins__.  This
> > implies it's a command to eval/exec, and doesn't necessarily reflect
> > your current builtins (which are canonically accessible as an
> > attribute of your frame.)
>
> You're misunderstanding the reason why __builtins__ exists at all. It
> is used *everywhere* as the root namespace, not just as a special case
> to inject different builtins.

Ahh, so only replacing __builtins__ is unsupported (an implementation
detail, as it may be cached), not all use of it?  It is confusing that
something normally unsupported becomes required for eval/exec.


> ATM I'm torn between __root__ and __python__.

-1 on __python__.  It seems to be an abbreviation of "python
interpreter core" or the like, but on its own it implies nothing about
what it means.

Contrast that with __root__ where we all know what a root is, even
though it doesn't imply what kind of root it is or how its used.

__root_globals__ would be another option, showing clearly how it
relates to our existing use of the "globals" term.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-06 Thread Adam Olsen

On Dec 6, 2007 9:56 PM, Sean Reifschneider <[EMAIL PROTECTED]> wrote:
> Overview (my reading of it):
>
>PyGTK wakes up 10x a second in it's main loop because signals may be
>delivered to another thread and will only get picked up if the mainloop
>wakes up.
>
> In the following thread:
>
>http://mail.python.org/pipermail/python-dev/2006-September/068569.html
>
> it sounds like the patch at:
>
>http://bugs.python.org/issue1564547
>
> doesn't solve the problem.  A recent gnome bug brings this issue back up:
>
>http://bugzilla.gnome.org/show_bug.cgi?id=481569
>
> I went ahead and closed the python issue as "rejected" to hopefully get
> some more activity on it.
>
> I thought about this some, and I wondered if there was some way we could
> signal the sleeping thread when a signal came in on another thread.  Like
> perhaps we could make some code to create a pipe, and put it someplace that
> all threads could get access to.  Then, if a thread gets a signal, write on
> this pipe.  The mainloop could include this file descriptor in the set it's
> watching, so it would wake up when the signal came in.
>
> Is this something Python should provide, or something PyGTK should do?  If
> an approach like the above would work, we could make it so that select()
> always created this file descriptor and added it to one of the FD sets, so
> that it would do the right thing behind the scenes.
>
> I have no idea if this is a reasonable approach, but it's something that
> came to mind when I thought about the problem and was an approach I didn't
> see mentioned before in the discussion.

That's pretty much what issue1564547 does.  I think there's two marks
against it:
* Using poll and fd's is pretty platform specific for what should be a
general-purpose API
* Handling signals is icky, hard to get right, and nobody trusts it

Since I don't think there's any more immediate solutions I'll provide
a plan B: my threading patch[1] will have a dedicated signal handler
thread, allowing them to be processed regardless of one blocked
thread.  I'm also providing an interrupt API the gtk bindings could
use to support wakeups, while keeping the poll+fd details private.


[1] http://code.google.com/p/python-safethread/
  The patch is, of course, out of date.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-07 Thread Adam Olsen

On Dec 7, 2007 2:35 PM,  <[EMAIL PROTECTED]> wrote:
>
> On 02:48 pm, [EMAIL PROTECTED] wrote:
> >Not only that, but current python signal handling is not theorethically
> >async safe; there are race conditions in the Py_AddPendingCalls API,
> >and it
> >just happens to work most of the time.

[This refers to the internal datastructures used by
Py_AddPendingCalls, which aren't updated in a safe way.
Hard/impossible to fix in C, but fairly easy with embedded assembly.]

> Twisted has encountered one such issue, described here:
>
> http://twistedmatrix.com/trac/ticket/1997#comment:12

[This refers to the overall design, which is inherently racey.]

> Unfortunately, I don't know enough about signals to suggest or comment
> on the solution.  Any Python/C wrapper around a syscall which can be
> interrupted needs to somehow atomically check for the presence of
> pending python signal handlers; I don't know of any POSIX API to do
> that.

Overall, what you'd need to do is register a wakeup function (to be
called by a signal handler or another thread), and have that wakeup
function cancel whatever you're doing.  The hard part is it needs to
work at *ANY* time while it's registered, before you've even called
the library function or syscall you intend to cancel!

I currently know of two methods of achieving this:
1) If reading a file or socket, first poll the fd, then do a
non-blocking read.  The wakeup function writes to a wakeup pipe you
also poll, which then wakes you up.  A wakeup after poll completes is
ignored, but the non-blocking read will finish promptly enough anyway.
2) Use sigsetjmp before a syscall (I wouldn't trust a library call),
then have the signal handler jump completely out of the operation.
This is evil and unportable, but probably works.

Additionally, this only gets SIGINT with the default behaviour to work
right, as it can be entirely implemented in C.  If you want to handle
arbitrary signals running arbitrary python code you really need a
second thread to run them in.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 1:38 PM, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> On 08/12/2007, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > On Dec 8, 2007 9:57 AM, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> > > Which is the best solution?  I think my patch fixes two problems: 1. the
> > > need to have a FD to wake up poll() (t o fix the problem with what we
> are
> > > discussing in this thread), and 2. make Python's signal handling more
> > > reliable (not 100% reliable because it doesn't handle longer bursts of
> > > signals than the pipe buffer can take, but at least is race free).
> >
> > I think it's okay to drop signals if too many come. The FD should be
> > put in non-blocking mode so the signal handler won't block forever.
> > Does Unix even promise that a signal gets delivered twice if it gets
> > sent quickly twice in a row?
>
> Good point.

Note that we may drop a new signal, not the same one we got several
times.  I don't know if Unix will do that.  Then again, I've been
unable to find documentation promising it'd deliver any signal at all.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 2:56 PM,  <[EMAIL PROTECTED]> wrote:
> On 05:20 pm, [EMAIL PROTECTED] wrote:
> >The best solution I can think of is to add a new API that takes a
> >signal and a file descriptor and registers a C-level handler for that
> >signal which writes a byte to the file descriptor. You can then create
> >a pipe, connect the signal handler to the write end, and add the read
> >end to your list of file descriptors passed to select() or poll(). The
> >handler must be written in C in order to avoid the race condition
> >referred to by Glyph (signals arriving after the signal check in the
> >VM main loop but before the select()/poll() system call is entered
> >will not be noticed until the select()/poll() call completes).
>
> This paragraph jogged my memory.  I remember this exact solution being
> discussed now, a year ago when I was last talking about these issues.
>
> There's another benefit to implementing a write-a-byte C signal handler.
> Without this feature, it wouldn't make sense to have passed the
> SA_RESTART flag to sigaction, because and GUIs written in Python could
> have spent an indefinite amount of time waiting to deliver their signal
> to Python code.  So, if you had to handle SIGCHLD in Python, for
> example, calls like file().write() would suddenly start raising a new
> exception (EINTR).  With it, you could avoid a whole class of subtle
> error-handling code in Twisted programs.

SA_RESTART still isn't useful.  The low-level poll call (not write!)
must stop and call back into python.  If that doesn't indicate an
error you can safely restart your poll call though, and follow it with
a (probably non-blocking) write.

Note that the only reason to use C for a low-level handler here is
give access to sigatomic_t and avoid needing locks.  If you ran the
signal handler in a background thread (using sigwait to trigger them)
you could use a python handler.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 4:28 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
> On Dec 8, 2007 2:36 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > On Dec 8, 2007 2:56 PM,  <[EMAIL PROTECTED]> wrote:
> > > On 05:20 pm, [EMAIL PROTECTED] wrote:
> > > >The best solution I can think of is to add a new API that takes a
> > > >signal and a file descriptor and registers a C-level handler for that
> > > >signal which writes a byte to the file descriptor. You can then create
> > > >a pipe, connect the signal handler to the write end, and add the read
> > > >end to your list of file descriptors passed to select() or poll(). The
> > > >handler must be written in C in order to avoid the race condition
> > > >referred to by Glyph (signals arriving after the signal check in the
> > > >VM main loop but before the select()/poll() system call is entered
> > > >will not be noticed until the select()/poll() call completes).
> > >
> > > This paragraph jogged my memory.  I remember this exact solution being
> > > discussed now, a year ago when I was last talking about these issues.
> > >
> > > There's another benefit to implementing a write-a-byte C signal handler.
> > > Without this feature, it wouldn't make sense to have passed the
> > > SA_RESTART flag to sigaction, because and GUIs written in Python could
> > > have spent an indefinite amount of time waiting to deliver their signal
> > > to Python code.  So, if you had to handle SIGCHLD in Python, for
> > > example, calls like file().write() would suddenly start raising a new
> > > exception (EINTR).  With it, you could avoid a whole class of subtle
> > > error-handling code in Twisted programs.
> >
> > SA_RESTART still isn't useful.  The low-level poll call (not write!)
> > must stop and call back into python.  If that doesn't indicate an
> > error you can safely restart your poll call though, and follow it with
> > a (probably non-blocking) write.
>
> Can't say I understand all of this, but it does reiterate that there
> are more problems with signals than just the issue that Gustavo is
> trying to squash. The possibility of having *any* I/O interrupted is
> indeed a big worry. Though perhaps this could be alleviated by rigging
> things so that signals get delivered (at the C level) to the main
> thread and the rest of the code runs in a non-main thread?

That's the approach my threading patch will take, although reversed
(signals are handled by a background thread, leaving the main thread
as the *main* thread.)

I share your concern about interrupting whatever random syscalls (not
even limited to I/O!) that a library happens to use.


> > Note that the only reason to use C for a low-level handler here is
> > give access to sigatomic_t and avoid needing locks.  If you ran the
> > signal handler in a background thread (using sigwait to trigger them)
> > you could use a python handler.
>
> I haven't seen Gustavo's patch yet, but *my* reason for using a C
> handler was different -- it was because writing a byte to a pipe in
> Python would do nothing to fix Gustavo's issue.
>
> Looking at the man page for sigwait()  it could be an alternative
> solution, but I'm not sure how it would actually allow PyGTK to catch
> KeyboardInterrupt.

My mail at [1] was referring to this.  Option 1 involved writing to a
pipe that gets polled while option 2 requires we generate a new signal
targeting the specific thread we want to interrupt.

I'd like to propose an interim solution though: pygtk could install
their own SIGINT handler during the gtk mainloop (or all gtk code?),
have it write to a pipe monitored by gtk, and have gtk raise
KeyboardInterrupt if it gets used.  This won't allow custom SIGINT
handlers or any other signal handlers to run promptly, but it should
be good enough for OLPC's use case.


[1] http://mail.python.org/pipermail/python-dev/2007-December/075607.html

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 5:21 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
> On Dec 8, 2007 3:57 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> >
> > On Dec 8, 2007 4:28 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > >
> > > On Dec 8, 2007 2:36 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > > > On Dec 8, 2007 2:56 PM,  <[EMAIL PROTECTED]> wrote:
> > > > > On 05:20 pm, [EMAIL PROTECTED] wrote:
> > > > > >The best solution I can think of is to add a new API that takes a
> > > > > >signal and a file descriptor and registers a C-level handler for that
> > > > > >signal which writes a byte to the file descriptor. You can then 
> > > > > >create
> > > > > >a pipe, connect the signal handler to the write end, and add the read
> > > > > >end to your list of file descriptors passed to select() or poll(). 
> > > > > >The
> > > > > >handler must be written in C in order to avoid the race condition
> > > > > >referred to by Glyph (signals arriving after the signal check in the
> > > > > >VM main loop but before the select()/poll() system call is entered
> > > > > >will not be noticed until the select()/poll() call completes).
> > > > >
> > > > > This paragraph jogged my memory.  I remember this exact solution being
> > > > > discussed now, a year ago when I was last talking about these issues.
> > > > >
> > > > > There's another benefit to implementing a write-a-byte C signal 
> > > > > handler.
> > > > > Without this feature, it wouldn't make sense to have passed the
> > > > > SA_RESTART flag to sigaction, because and GUIs written in Python could
> > > > > have spent an indefinite amount of time waiting to deliver their 
> > > > > signal
> > > > > to Python code.  So, if you had to handle SIGCHLD in Python, for
> > > > > example, calls like file().write() would suddenly start raising a new
> > > > > exception (EINTR).  With it, you could avoid a whole class of subtle
> > > > > error-handling code in Twisted programs.
> > > >
> > > > SA_RESTART still isn't useful.  The low-level poll call (not write!)
> > > > must stop and call back into python.  If that doesn't indicate an
> > > > error you can safely restart your poll call though, and follow it with
> > > > a (probably non-blocking) write.
> > >
> > > Can't say I understand all of this, but it does reiterate that there
> > > are more problems with signals than just the issue that Gustavo is
> > > trying to squash. The possibility of having *any* I/O interrupted is
> > > indeed a big worry. Though perhaps this could be alleviated by rigging
> > > things so that signals get delivered (at the C level) to the main
> > > thread and the rest of the code runs in a non-main thread?
> >
> > That's the approach my threading patch will take, although reversed
> > (signals are handled by a background thread, leaving the main thread
> > as the *main* thread.)
>
> Hm... Does this mean you're *always* creating an extra thread to handle 
> signals?

Yup, Py_Initialize will do it.


> > I share your concern about interrupting whatever random syscalls (not
> > even limited to I/O!) that a library happens to use.
> >
> >
> > > > Note that the only reason to use C for a low-level handler here is
> > > > give access to sigatomic_t and avoid needing locks.  If you ran the
> > > > signal handler in a background thread (using sigwait to trigger them)
> > > > you could use a python handler.
> > >
> > > I haven't seen Gustavo's patch yet, but *my* reason for using a C
> > > handler was different -- it was because writing a byte to a pipe in
> > > Python would do nothing to fix Gustavo's issue.
> > >
> > > Looking at the man page for sigwait()  it could be an alternative
> > > solution, but I'm not sure how it would actually allow PyGTK to catch
> > > KeyboardInterrupt.
> >
> > My mail at [1] was referring to this.  Option 1 involved writing to a
> > pipe that gets polled while option 2 requires we generate a new signal
> > targeting the specific thread we want to interrupt.
> >
> > I'd like to propose an interim solution though: pygtk could install
> > their own SIGINT handler during the gtk mainloop (or al

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 6:30 PM, Gustavo Carneiro <[EMAIL PROTECTED]> wrote:
> On 09/12/2007, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > Gustavo, at some point you suggested making changes to Python so that
> > all signals are blocked in all threads except for the main thread. I
> > think I'd be more inclined to give that the green light than the patch
> > using pipes for all signal handling, as long as we can make sure that
> > this blocking of all signals isn't inherited by fork()'ed children --
> > we had serious problems with that in 2.4 where child processes were
> > unkillable (except for SIGKILL).
>
> I don't think that solution works after all.  We can only block signals for
> certain threads inside the threads themselves.  But we do not control all
> threads.  Some are created by C libraries, and these threads will not have
> signals blocked by default, and also there is no 'thread creation hook' that
> we can use.

Note that new threads inherit signal masks from their creator.  It's
only threads created before loading python that are a problem.  For my
threading patch I plan to document that as simply something embedders
have to do.


> > I'd also be OK with a patch that
> > leaves the existing signal handling code intact but *adds* a way to
> > have a signal handler written in C that writes one byte to one end of
> > a pipe -- where the pipe is provided by Python code.
>
> I think this is most balanced approach of all.

Yeah, use the existing Handlers array to record which signals have
come in, rather using the byte passed through the pipe.

And I missed a problem in bug #1643738: Handlers[...].tripped should
be a sig_atomic_t, not int.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-08 Thread Adam Olsen

On Dec 8, 2007 6:54 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Dec 8, 2007 5:30 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > On Dec 8, 2007 5:21 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > > Hm... Does this mean you're *always* creating an extra thread to handle 
> > > signals?
> >
> > Yup, Py_Initialize will do it.
>
> That's unacceptable. It must be possible to build Python without
> threads (and still support signals -- in fact one could argue that
> signals make *more* sense when there are no threads :-).

For my patch it won't make much sense to disable threads, so I don't
mind taking liberties there.

>
> [...]
>
> > To summarize, there's two problems to be solved:
> > 1) low-level corruption in the signal handlers as they record a new
> > signal, such as in Py_AddPendingCalls
>
> This is purely theoretical, right? Has anyone ever observed this?

I've never heard of it happening.  If the compiler doesn't do much
reordering (the CPU isn't an issue as this is only called in the main
thread) then the most you might get is dropped calls.

It's fairly safe the way signal handlers use it, but they'd work just
as well (and easier to understand/verify) without the whole queue
aspect; just setting some flags and resetting _Py_Ticker.


> > 2) high-level wakeup race: "check for pending signals, have a signal
> > come in, then call a blocking syscall/library (oblivious to the new
> > signal)."
>
> Right. That's the race which really does happen, and for which the
> current lame-y work-around is to use a short timeout.
>
> [...]
>
> > > Anyway, I would still like to discuss this on #python-dev Monday.
> > > Adam, in what time zone are you? (I'm PST.) Who else is interested?
> >
> > MST.
>
> Unfortunately I can't stay at work later than 5:30 or so which would
> be too early for you I believe. I could try again after 8pm, your 9pm.
> Would that work at all? Otherwise I'd rather try earlier in the day if
> that works at all for you.

5:30 am or 5:30 pm?  Any time after 11 am MST (10 am PST) should be
fine for me.  (My previous email was a little naive about how late I
get up.)  I shouldn't be gone until around midnight MST.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-10 Thread Adam Olsen

On Dec 10, 2007 4:26 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > Adam & I are now on #python-dev. Can you join?
>
> I think we successfully resolved this. Adam Olsen will produce a patch
> that allows one to specify a single file descriptor to which a zero
> byte will be written by the C-level signal handler. Twisted and PyGTK
> will have to coordinate about this file descriptor. Glyph believes
> this is possible and sufficient.

http://bugs.python.org/issue1583

>
> (A preliminary version of the patch may be found here:
> http://dpaste.com/27576/ )
>
> We considered two alternatives:
>
> (a) A patch by myself where the file descriptor would instead be
> passed together with a signal handler. This was eventually rejected
> because it places an extra burden on every piece of code that
> registers a signal handler.
>
> (b) A more elaborate patch by Adam which would allow many file
> descriptors to be registered. This was rejected for being more code
> and solving a problem that most likely doesn't exist (multiple
> independent main loops running in different threads).
>
> We also located the exact source of the 100 msec timeout in PyGTK:
>
> http://svn.gnome.org/viewvc/pygtk/trunk/gtk/gtk.override?annotate=2926
>
> line 1075: *timeout = 100;
>
> The recommendation for the OLPC XO project is to remove this line or
> make the timeout much larger, as the only reason why this was even
> added to PyGTK is wanting a fast response to ^C from the console,
> which doesn't represent a viable use case on the XO.
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/rhamph%40gmail.com
>



-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Signals+Threads (PyGTK waking up 10x/sec).

2007-12-11 Thread Adam Olsen

On Dec 11, 2007 11:00 PM, Andrew Bennetts <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > On Dec 11, 2007 4:54 PM, Jan Claeys <[EMAIL PROTECTED]> wrote:
> > > Op vrijdag 07-12-2007 om 07:26 uur [tijdzone -0700], schreef Sean
> > > Reifschneider:
> > > > I would say that this is an optimization that helps a specific set of
> > > > platforms, including one that I think we really care about, the OLPC
> > > > which needs it for decreased battery use.
> > >
> > > Almost every laptop user would benefit from it, and even some desktop or
> > > server users might save on their electric power bill...
> >
> > Do you have data to support this claim?
>
> http://www.lesswatts.org/projects/powertop/powertop.php
>
> Some quotes plucked from that page:
>
> "In the screenshot, the laptop isn't doing very well. Most of the time the
> processor is in C2, and then only for an average of 4.4 milliseconds at a 
> time.
> If the laptop spent most of its time in C4 for at least 20 milliseconds, the
> battery life would have been approximately one hour longer."
>
> "When running a full GNOME desktop, 3 wakeups per second is achievable."
>
> There's considerable effort being invested in the GNOME and Linux software 
> stack
> at the moment to get rid of unnecessary CPU wakeups, and people are reporting
> significant improvements in laptop power consumption as a result of that work.

There's a known issues page on there, on the bottom of which is
sealert, which used python, gtk, and threads.  It has since been
rewritten to not use threads, but it did exhibit the problem
set_wakeup_fd fixes (at least provides our half of the fix.)

https://bugzilla.redhat.com/show_bug.cgi?id=239893

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-3000] Need closure on cmp removal

2008-01-04 Thread Adam Olsen

On Jan 4, 2008 12:18 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:

> In the past some folks have been pushing for the resurrection of (some
> form of) __cmp__, which is currently removed from Py3k (except for
> some remnants which we'll clean up in due time).
>
> I'd like to get closure on this issue. If someone volunteers within a
> week to write a PEP, I'll give them a month to write the PEP, and then
> I'll review it. The PEP better come with a patch implementing
> (roughly) the desired behavior as well, relative to the 3.0 branch.
>
> If I don't hear from a committed volunteer within a week, I'll drop
> this and start removing __cmp__ references aggressively (starting with
> issue #1717). Saying "if no-one else volunteers, I can give it a shot"
> is not sufficient commitment. Saying "I will give it a shot" is. If
> someone commits but no PEP+patch is in my possession by February 4
> (and no attenuating circumstances have been brought to my attention),
> I will assume the PEP won't happen and will start removing __cmp__
> references. Once a PEP and patch are presented, I'll review them and
> make a decision.


I can't speak for the others, but I know I've decided not to pursue it.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Small RFEs and the Bug Tracker

2008-02-22 Thread Adam Olsen

On Fri, Feb 22, 2008 at 4:57 AM, Nick Coghlan <[EMAIL PROTECTED]> wrote:
># Feature request resolutions
>accepted - feature request accepted (possibly via attached patch)
>rejected - feature request rejected

Can we make the names a little longer?  "feature accepted" and
"feature rejected" are more obvious than simply "accepted" and
"rejected".  Same for some of the bug ones.


># Bug report resolutions
>fixed - reported bug fixed (possibly via attached patch)
>invalid - reported behaviour is intentional and not a bug
>works for me - bug could not be replicated from bug report
>out of date - bug is already fixed in later Python version
>wont fix - valid bug, but not fixable in CPython (very rare)
>
># Common resolutions
>    duplicate - same as another issue (refer to other issue in a comment)


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Documentation reorganization [was: ... for ability to execute zipfiles & directories]

2008-03-04 Thread Adam Olsen

On Tue, Mar 4, 2008 at 3:13 PM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Georg Brandl writes:
>   > You speak my mind. For ages I've wanted to put the builtins together with
>   > the language reference into a new document called "Python Core Language".
>   > I've just never had the time to draft a serious proposal.
>
>  I think that combination is reasonable, but I would like to see the
>  clear division between the language (ie, the syntax) and the built-in
>  functionality maintained.  I'm not sure I like the proposed title for
>  that reason.

Such a division would make it unnecessarily hard to find documentation
on True, False, None, etc.  They've become keywords for pragmatic
purposes (to prevent accidental modification), not because we think
they ideally should be syntax instead of builtins.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Documentation reorganization [was: ... for ability to execute zipfiles & directories]

2008-03-04 Thread Adam Olsen

On Tue, Mar 4, 2008 at 5:04 PM, Steve Holden <[EMAIL PROTECTED]> wrote:
> Greg Ewing wrote:
>  > Adam Olsen wrote:
>  >> Such a division would make it unnecessarily hard to find documentation
>  >> on True, False, None, etc.  They've become keywords for pragmatic
>  >> purposes (to prevent accidental modification), not because we think
>  >> they ideally should be syntax instead of builtins.
>  >
>  > Maybe the solution is to rename the Library Reference
>  > to the Class and Module Reference or something like
>  > that.
>  >
>  Although DRY is fine as a programming principle, it fails for pedagogic
>  purposes. We should therefore be prepared to repeat the same material in
>  different contexts (hopefully by including some common documentation
>  source rather than laborious and error-prone copy-and-paste).
>
>  Document things where people expect to find them. (Now *there's* a
>  usability study screaming to be done ... and SoC is coming up).

Python's usage of import makes it clear when something is imported
from a library, as opposed to being an integral part of the language.
To me, this is an obvious basis on whether to look in the language
reference or in the stdlib reference.

That said, it would be useful to also have a link for major builtin
types in the stdlib section, if only for people who learned to look
for them there.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Documentation reorganization

2008-03-04 Thread Adam Olsen

On Tue, Mar 4, 2008 at 8:03 PM, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> Adam Olsen writes:
>   > On Tue, Mar 4, 2008 at 3:13 PM, Stephen J. Turnbull <[EMAIL PROTECTED]> 
> wrote:
>
>   > > I would like to see the clear division between the language (ie,
>   > > the syntax) and the built-in functionality maintained.  I'm not
>   > > sure I like the proposed title for that reason.
>   >
>   > Such a division would make it unnecessarily hard to find documentation
>   > on True, False, None, etc.  They've become keywords for pragmatic
>   > purposes (to prevent accidental modification), not because we think
>   > they ideally should be syntax instead of builtins.
>
>  This is Python; of course practicality beats purity.  I have no
>  problem with putting some keywords in the "built-in functionality"
>  section, or even (boggle) duplicate them across the two sections.

-1 on duplicating anything.  Provide links to a single location
instead.  Otherwise you end up with two explanations of the same
thing, with different wording and subtle differences or missing
details.

>  I too was put off by the separation of syntax from built-in
>  functionality when I first started using the documentation, but later
>  I came to appreciate it.  I'm a relatively casual user of Python, and
>  having a spare "syntax" section has made it much easier to learn new
>  syntax such as comprehensions and generators.  I suspect it will make
>  it a lot easier to learn the differences between Python 2 and Python
>  3, too.  I do not want to lose that.

I learned them through third-party docs.  Even now I have a hard time
finding list comprehension/generator expression in the docs.
Apparently they're in the Expression section, under "Displays for
lists, sets and dictionaries", and neither that nor anything with list
comprehension or generator expression is in the index.  The term
"Displays" is pretty obscure as well, not something I've seen used
besides on this list or right there in the documentation.

>  I don't pretend to be speaking for anyone else, but I'd be surprised
>  if I were unique.

Your experiences *shouldn't* be unique, but I'm afraid they might be.
Another example is the use of BNF, which although dominant in its
field, it provides a steep learning curve for most programmers.

I'm afraid this has turned into a rant, but it should be seen as the
experiences of someone who relies on the documentation a great deal.
Is there a better way to channel feedback on the documentation?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Equality on method objects

2008-03-09 Thread Adam Olsen

On Sun, Mar 9, 2008 at 9:02 AM, Armin Rigo <[EMAIL PROTECTED]> wrote:
> Hi all,
>
>  In Python 2.5, I made an attempt to make equality consistent for the
>  various built-in and user-defined method types.  I failed, though, as
>  explained in http://bugs.python.org/issue1617161.  The outcome of this
>  discussion is that, first of all, we need to decide which behavior is
>  "correct":
>
> >>> [].append == [].append
> True or False?
>
>  (See the issue tracker for why the answer should probably be False.)
>
>  The general question is: if x.foo and y.foo resolve to the same method,
>  should "x.foo == y.foo" delegate to "x == y" or be based on "x is y"?
>
>  The behavior about this has always been purely accidental, with three
>  different results for user-defined methods versus built-in methods
>  versus method wrappers (those who know what the latter are, raise your
>  hand).
>
>  (Yes, Python < 2.5 managed three different behaviors instead of just
>  two: one of the types (don't ask me which) would base its equality on
>  the identity of the 'self', but still compute its hash from the hash of
>  'self'...)

They should only compare equal if interchangeable.  In the case of a
mutable container (ie list), a value comparison of self is irrelevant
garbage, so it should always be compared (and hashed) based on
identity.

IOW, "x = []; x.append == x.append" should be True, and everything
else should be False.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Equality on method objects

2008-03-10 Thread Adam Olsen

On Mon, Mar 10, 2008 at 4:26 AM, Armin Rigo <[EMAIL PROTECTED]> wrote:
> Hi Phillip,
>
>
>  On Sun, Mar 09, 2008 at 07:05:12PM -0400, Phillip J. Eby wrote:
>  > I did not, however, need the equality of bound methods to be based on
>  > object value equality, just value identity.
>  >
>  > ...at least until recently, anyway.  I do have one library that wants
>  > to have equality-based comparison of im_self.  What I ended up doing
>  > is writing code that tests what the current Python interpreter is
>  > doing, and if necessary implements a special method type, just for
>  > purposes of working around the absence of im_self equality
>  > testing.  However, it's a pretty specialized case (...)
>
>  I found myself in exactly the same case: a pretty specialized example
>  where I wanted bound methods to use im_self equality rather than
>  identity, solved by writing my own bound-method-like object.  But that's
>  not really hard to do, and the general tendency (which matches my own
>  opinion too) seems to be that using im_self identity is less surprizing.
>
>  In general, "x.append" is interchangeable with "x.append" even if
>  "x.append is not x.append", so let's go for the least surprizing
>  behavior: "m1.im_self is m2.im_self and m1.im_func==m2.im_func".
>  Objection?

+1

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Complexity documentation request

2008-03-12 Thread Adam Olsen

On Mon, Mar 10, 2008 at 12:05 PM, Daniel Stutzbach
<[EMAIL PROTECTED]> wrote:
> On Sun, Mar 9, 2008 at 9:22 AM, Aahz <[EMAIL PROTECTED]> wrote:
>  >  There probably would be some value in a wiki page on python.org that
>  >  provides this information, particularly across versions.  You may be
>  >  able to find volunteers to help on comp.lang.python.
>
>  I just created a very basic one at
>  http://wiki.python.org/moin/TimeComplexity?action=show
>
>  I'm not that familiar with the Wiki syntax, so the tables are kind of
>  ugly at the moment.
>
>  I wasn't sure about many of the set() operations, so I didn't include those.

For python's purposes, I think it's simpler to classify an operation
as either "linear" or "near constant", then have an explanation that
"near constant" is only the typical performance (it doesn't make
guarantees about worst case behaviour), may include O(log n)
implementations, etc.  That suffices to distinguish use cases, and
anything more specific may be dominated by constant factors anyway.

Something like sort is a special case.  I don't think the languages
needs to guarantee any particular performance, yet it's worth
documenting that CPython has a rather good implementation.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen

On Tue, Mar 18, 2008 at 1:29 AM, Stefan Ring <[EMAIL PROTECTED]> wrote:
> The company I work for has over the last couple of years created an
>  application server for use in most of our customer projects. It embeds Python
>  and most project code is written in Python by now. It is quite 
> resource-hungry
>  (several GB of RAM, MySQL databases of 50-100GB). And of course it is
>  multi-threaded and, at least originally, we hoped to make it utilize multiple
>  processor cores. Which, as we all know, doesn't sit very well with Python. 
> Our
>  application runs heavy background calculations most of the time (in Python)
>  and has to service multiple (few) GUI clients at the same time, also using
>  Python. The problem was that a single background thread would increase the
>  response time of the client threads by a factor of 10 or (usually) more.
>
>  This led me to add a dirty hack to the Python core to make it switch threads
>  more frequently. While this hack greatly improved response time for the GUI
>  clients, it also slowed down the background threads quite a bit. top would
>  often show significantly less CPU usage -- 80% instead of the more usual 
> 100%.
>
>  The problem with thread switching in Python is that the global semaphore used
>  for the GIL is regularly released and immediately reacquired. Unfortunately,
>  most of the time this leads to the very same thread winning the race on the
>  semaphore again and thus more wait time for the other threads. This is where
>  my dirty patch intervened and just did a nanosleep() for a short amount of
>  time (I used 1000 nsecs).

Can you try with a call to sched_yield(), rather than nanosleep()?  It
should have the same benefit but without as much performance hit.

If it works, but is still too much hit, try tuning the checkinterval
to see if you can find an acceptable throughput/responsiveness
balance.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen

On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring <[EMAIL PROTECTED]> wrote:
> Adam Olsen  gmail.com> writes:
>
>  > Can you try with a call to sched_yield(), rather than nanosleep()?  It
>  > should have the same benefit but without as much performance hit.
>  >
>  > If it works, but is still too much hit, try tuning the checkinterval
>  > to see if you can find an acceptable throughput/responsiveness
>  > balance.
>  >
>
>  I tried that, and it had no effect whatsoever. I suppose it would make an 
> effect
>  on a single CPU or an otherwise heavily loaded SMP system but that's not the
>  secnario we care about.

So you've got a lightly loaded SMP system?  Multiple threads all
blocked on the GIL, multiple CPUs to run them, but only one CPU is
active?  I that case I can imagine how sched_yield() might finish
before the other CPUs wake up a thread.

A FIFO scheduler would be the right thing here, but it's only a short
term solution.  Care for a long term solution? ;)

http://code.google.com/p/python-safethread/

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen

On Wed, Mar 19, 2008 at 10:42 AM, Stefan Ring <[EMAIL PROTECTED]> wrote:
>
> On Mar 19, 2008 05:24 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
>
>  > On Wed, Mar 19, 2008 at 10:09 AM, Stefan Ring <[EMAIL PROTECTED]> wrote:
>  > > Adam Olsen  gmail.com> writes:
>  > >
>  > > > Can you try with a call to sched_yield(), rather than nanosleep()?
>  > >  > It
>  > >  > should have the same benefit but without as much performance hit.
>  > >  >
>  > > > If it works, but is still too much hit, try tuning the
>  > >  > checkinterval
>  > >  > to see if you can find an acceptable throughput/responsiveness
>  > >  > balance.
>  > >  >
>  > >
>  > > I tried that, and it had no effect whatsoever. I suppose it would
>  > >  make an effect
>  > > on a single CPU or an otherwise heavily loaded SMP system but that's
>  > >  not the
>  > >  secnario we care about.
>  >
>  > So you've got a lightly loaded SMP system?  Multiple threads all
>  > blocked on the GIL, multiple CPUs to run them, but only one CPU is
>  > active?  I that case I can imagine how sched_yield() might finish
>  > before the other CPUs wake up a thread.
>  >
>  > A FIFO scheduler would be the right thing here, but it's only a short
>  > term solution.  Care for a long term solution? ;)
>  >
>  > http://code.google.com/p/python-safethread/
>
>  I've already seen that but it would not help us in our current
>  situation. The performance penalty really is too heavy. Our system is
>  slow enough already ;). And it would be very difficult bordering on
>  impossible to parallelize Plus, I can imagine that all extension modules
>  (and our own code) would have to be adapted.
>
>  The FIFO scheduler is perfect for us because the load is typically quite
>  low. It's mostly at those times when someone runs a lengthy calculation
>  that all other users suffer greatly increased response times.

So you want responsiveness when idle but throughput when busy?

Are those calculations primarily python code, or does a C library do
the grunt work?  If it's a C library you shouldn't be affected by
safethread's increased overhead.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Improved thread switching

2008-03-19 Thread Adam Olsen

On Wed, Mar 19, 2008 at 11:25 AM, Stefan Ring <[EMAIL PROTECTED]> wrote:
> Adam Olsen  gmail.com> writes:
>
>
> > So you want responsiveness when idle but throughput when busy?
>
>  Exactly ;)
>
>
>  > Are those calculations primarily python code, or does a C library do
>  > the grunt work?  If it's a C library you shouldn't be affected by
>  > safethread's increased overhead.
>  >
>
>  It's Python code all the way. Frankly, it's a huge mess, but it would be very
>  very hard to come up with a scalable solution that would allow to optimize
>  certain hotspots and redo them in C or C++. There isn't even anything left to
>  optimize in particular because all those low hanging fruit have already been
>  taken care of. So it's just ~30kloc Python code over which the total time 
> spent
>  is quite uniformly distributed :(.

I see.  Well, at this point I think the most you can do is file a bug
so the problem doesn't get forgotten.  If nothing else, if my
safethread stuff goes in it'll very likely include a --with-gil
option, so I may put together a FIFO scheduler.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Py3k and asyncore/asynchat

2008-03-24 Thread Adam Olsen

On Mon, Mar 24, 2008 at 3:04 PM, Thomas Wouters <[EMAIL PROTECTED]> wrote:
>
> On Fri, Feb 15, 2008 at 9:11 PM, Josiah Carlson <[EMAIL PROTECTED]>
> wrote:
> > Twisted core has been proposed, but I believe the consensus was that
> > it wasn't desirable, generally.
> >
>
> I remember only a couple of dissenting voices, and only a small number of
> participants. Of the dissenting voices, I do not recall any actual arguments
> about undesireability, just misunderstandings of how Twisted actually works.
> Getting Twisted core (meaning Deferreds, a simple reactor and the Protocol
> class) into the core is still on my TODO list.
>
>
> > I'm also pretty sure that people learn twisted because everyone learns
> > twisted.  It's one of those buzz-words ;).
> >
>
> I think that's quite an unfair assessment, even in jest :) Twisted is well
> worth learning to actually use it, as it's a very versatile event loop and
> does it best to integrate nicely with other event systems. And including it
> in the standard library improves integration with other event loops by
> creating a single interface. It's not a matter of dropping it in, though; it
> requires some careful structuring to avoid embarrassing situations like we
> have with the xml package, but still people to provide their own reactor.
>
> In case you're wondering how the twisted reactor in the stdlib is useful to
> people not using Twisted, take a look at what you currently need to do to
> combine stdlib modules like urllib and ftplib with event systems like
> Tkinter and PyGTK. Not to mention that the Twisted implementations of
> various protocols are really quite, quite good -- in many cases quite a lot
> better than the stdlib ones. But including those takes yet more time.

In that sense it'd be competing with safethread for inclusion in
Python.  Whereas safethread requires little if any API changes,
twisted requires an entirely new API that can be event-driven.  Worse,
we'd likely to be stuck maintaining both APIs for a long time, if not
forever.

Twisted may be one of the best (if not *the* best) ways of writing
concurrent programs today, but it doesn't need to be in the stdlib for
that.  If safethread is going to solve many of the same problems, with
less changes required by the users of the language, then this is the
wrong time to add twisted.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Py3k and asyncore/asynchat

2008-03-24 Thread Adam Olsen

On Mon, Mar 24, 2008 at 3:37 PM, Thomas Wouters <[EMAIL PROTECTED]> wrote:
>
> On Mon, Mar 24, 2008 at 10:21 PM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> >
> > Twisted may be one of the best (if not *the* best) ways of writing
> > concurrent programs today, but it doesn't need to be in the stdlib for
> > that.  If safethread is going to solve many of the same problems, with
> > less changes required by the users of the language, then this is the
> > wrong time to add twisted.
> >
>
> You must have missed the part where we already have a large set of event
> loops, and not having a single interface to them is in fact hurting people.
> Twisted goes out of its way to interact nicely with event loops, but it can
> only do that with ones it knows about (and are versatile enough to hook
> into.) Having a single event system in the standard library is definitely
> advantageous, even if safethreads were available everywhere and the
> performance in the common case was satisfactory. It used to be the case that
> people thought asyncore was this standard event system, but it's long since
> ceased to be.

I'm not opposed to standardizing on twisted as the canonical way to do
events in python, or to having an event system in python.  My concern
is that may be used as an excuse to slowly rewrite the entire language
into an event-driven model.

However, that was based on the assumption that modules like urllib2
weren't already event-driven.  Looking now, it seems it already is,
and wraps it in a blocking API for simple use cases.

So long as we're only talking about replacing existing event-driven
stuff, and so long as we retain the existing blocking API (including
calling from threads!), I don't think I have any valid opposition.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Known doctest bug with unicode?

2008-04-18 Thread Adam Olsen

On Fri, Apr 18, 2008 at 8:27 AM, Jeroen Ruigrok van der Werven
<[EMAIL PROTECTED]> wrote:
> # vim: set fileencoding=utf-8 :
>
>  kanamap = {
> u'あ': 'a'
>  }
>
>  def transpose(word):
> """Convert a word in kana to its equivalent Hepburn romanisation.
>
> >>> transpose(u'あ')
> 'a'
> """
> transposed = ''
> for character in word:
> transposed += kanamap[character]
> return transposed
>
>  if __name__ == '__main__':
> import doctest
> doctest.testmod()
>
>  doctest:
>
>  [16:24] [EMAIL PROTECTED] (1) {20} % python trans.py
>  **
>  File "trans.py", line 11, in __main__.transpose
>  Failed example:
> transpose(u'あ')
>  Exception raised:
> Traceback (most recent call last):
>   File "doctest.py", line 1212, in __run
> compileflags, 1) in test.globs
>   File "", line 1, in 
> transpose(u'あ')
>   File "trans.py", line 16, in transpose
> transposed += kanamap[character]
> KeyError: u'\xe3'
>  **
>  1 items had failures:
>1 of   1 in __main__.transpose
>  ***Test Failed*** 1 failures.
>
>  normal interpreter:
>
>  >>> fromm trans import transpose
>  >>> transpose(u'あ')
>  'a'

What you've got is an 8-bit string containing a unicode literal.
Since this gets past the module's compilation stage, it doctest passes
it to the compiler again, and it defaults to iso-8859-1.  Thus
u'あ'.encode('utf-8').decode('latin-1') -> u'\xe3\x81\x82'.

Possible solutions:
1. Make the docstring itself unicode, assuming doctest allows this.
2. Call doctest explicitly, giving it the correct encoding.
3. See if you can put an encoding declaration in the doctest itself.
4. Make doctest smarter, so that it can grab the original module's encoding.
5. Wait until 3.0, where this is hopefully fixed by making doctests
use unicode by default?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Optimization of Python ASTs: How should we deal with constant values?

2008-05-08 Thread Adam Olsen

On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <[EMAIL PROTECTED]> wrote:
> Nick Coghlan wrote:
>>
>> There are a lot of micro-optimisations that are actually context
>> independent, so moving them before the symtable pass should be quite
>> feasible - e.g. replacing "return None" with "return", stripping dead code
>> after a return statement, changing a "if not" statement into an "if"
>> statement with the two suites reversed, changing "(1, 2, 3)" into a stored
>> constant, folding "1 + 2" into the constant "3".
>>
>> I believe the goal is to see how many of the current bytecode
>> optimisations can actually be brought forward to the AST generation stage,
>> rather than waiting until after the bytecode symtable calculation and
>> compilation passes.
>>
> That's been the aim so far. It's been largely successful with the exception
> of a few edge cases (most notably the functions vs. generator stuff). The
> elimination of unreachable paths (whether they be things like "if 0: ..." or
> "return; ... more code ...") completely breaks generators since we might
> potentially be blowing away "yield" statements during the elimination
> process.

Also breaks various sanity checks relating to the global statement.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Optimization of Python ASTs: How should we deal with constant values?

2008-05-08 Thread Adam Olsen

On Thu, May 8, 2008 at 5:54 PM, Thomas Lee <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
>>
>> On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Nick Coghlan wrote:
>>>
>>>>
>>>> There are a lot of micro-optimisations that are actually context
>>>> independent, so moving them before the symtable pass should be quite
>>>> feasible - e.g. replacing "return None" with "return", stripping dead
>>>> code
>>>> after a return statement, changing a "if not" statement into an "if"
>>>> statement with the two suites reversed, changing "(1, 2, 3)" into a
>>>> stored
>>>> constant, folding "1 + 2" into the constant "3".
>>>>
>>>> I believe the goal is to see how many of the current bytecode
>>>> optimisations can actually be brought forward to the AST generation
>>>> stage,
>>>> rather than waiting until after the bytecode symtable calculation and
>>>> compilation passes.
>>>>
>>>>
>>>
>>> That's been the aim so far. It's been largely successful with the
>>> exception
>>> of a few edge cases (most notably the functions vs. generator stuff). The
>>> elimination of unreachable paths (whether they be things like "if 0: ..."
>>> or
>>> "return; ... more code ...") completely breaks generators since we might
>>> potentially be blowing away "yield" statements during the elimination
>>> process.
>>>
>>
>> Also breaks various sanity checks relating to the global statement.
>>
>>
>
> What sanity checks are these exactly? Is this related to the lnotab?

Here we are.  In 2.4.4:

>>> def foo():
...   print test
...   if 0:
... import test
...
>>> foo()
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 2, in foo
NameError: global name 'test' is not defined

2.5.1 correctly raises an UnboundLocalError instead.

Searching the bug DB for old bugs involving the optimizer showed
several variations on this, such as "if 0: yield None" at module scope
not raising a SyntaxError (Issue 1875, still open in trunk).  Clearly
there needs to be a general approach to avoid affecting these checks
with optimizations.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-05-31 Thread Adam Olsen

On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti
<[EMAIL PROTECTED]> wrote:
> Would anyone mind if I did add a public C API for gc.disable() and
> gc.enable()? I would like to use it as an optimization for the pickle
> module (I found out that I get a good 2x speedup just by disabling the
> GC while loading large pickles). Of course, I could simply import the
> gc module and call the functions there, but that seems overkill to me.
> I included the patch below for review.

I'd rather see it fixed.  It behaves quadratically if you load enough
to trigger full collection a few times.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 3 >

1 - 100 of 230 matches

Mail list logo