Re: [Python-Dev] [poll] New name for __builtins__

2007-12-03 Thread Nick Coghlan
Guido van Rossum wrote:
> On Dec 2, 2007 7:40 AM, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> Just for the record, I also like the idea of __builtins__ being a magic
>> alias for the boringly-but-practically named builtins module.
> 
> [Imagine me jumping up and down and screaming at the top of my lungs
> out of frustration:]
> 
> BUT THAT'S NOT WHAT IT IS! IT'S A HOOK FOR SANDBOXING! YOU SHOULD
> NEVER BE USING __builtins__ DIRECTLY EXCEPT WHEN CONTROLLING THE SET
> OF BUILTINS AVAILABLE TO UNTRUSTED CODE!
> 

I never mess with the builtin definitions under either name, but I agree 
that my description was highly inaccurate. It's not a topic I've spent 
much time considering :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] blocking a non-blocking socket

2007-12-03 Thread Bill Janssen
Thanks, Audun.  If you look at the code, you'll see that both a
connect method and a do_handshake method already exist, and work
pretty much as you describe.  The issue is what to do when the user
doesn't use them -- specifies do_handshake_on_connect=True.

> Another way of doing it could be to expose a connect() method on the ssl
> objects.  It changes the socket.ssl api, but I'd say it is in the same
> spirit as the do_handshake_on_connect parameter since no existing code
> will break.  The caller then calls connect() until it does not return

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Guido van Rossum
On Dec 2, 2007 12:49 PM, Neil Toronto <[EMAIL PROTECTED]> wrote:
> It turned out not *that* hard to code around for attribute caching, and
> the extra cruft only gets invoked on a cache miss. The biggest problem
> isn't speed - it's that it's possible (though extremely unlikely), while
> testing keys for equality, that a rich compare alters the underlying
> dict. This causes the caching lookup to have to try to get an entry
> pointer again, which could invoke the rich compare, which might alter
> the underlying dict..

How about subclasses of str? These have all the same issues...

> I'm working on making it as fast as the original when the MRO is short.
> Question for Guido: should I roll this into the fastglobals patch?

No, please keep them separate.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Guido van Rossum
On Dec 2, 2007 6:28 PM, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> I don't see a problem with requiring dictionary key comparisons to be
> side-effect-free - even in the general case of dictionaries, not just
> namespace ones.

Me neither -- but the problem is enforcement.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
Guido van Rossum wrote:
> On Dec 2, 2007 12:49 PM, Neil Toronto <[EMAIL PROTECTED]> wrote:
>> It turned out not *that* hard to code around for attribute caching, and
>> the extra cruft only gets invoked on a cache miss. The biggest problem
>> isn't speed - it's that it's possible (though extremely unlikely), while
>> testing keys for equality, that a rich compare alters the underlying
>> dict. This causes the caching lookup to have to try to get an entry
>> pointer again, which could invoke the rich compare, which might alter
>> the underlying dict..
> 
> How about subclasses of str? These have all the same issues...

Yeah. I ended up having it, per class, permanently revert to uncached 
lookups when it detects that a class dict in the MRO has non-string 
keys. That's flagged by lookdict_string, which uses PyString_CheckExact.

Neil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Phillip J. Eby
At 12:27 PM 12/3/2007 -0700, Neil Toronto wrote:
>Guido van Rossum wrote:
> > On Dec 2, 2007 12:49 PM, Neil Toronto <[EMAIL PROTECTED]> wrote:
> >> It turned out not *that* hard to code around for attribute caching, and
> >> the extra cruft only gets invoked on a cache miss. The biggest problem
> >> isn't speed - it's that it's possible (though extremely unlikely), while
> >> testing keys for equality, that a rich compare alters the underlying
> >> dict. This causes the caching lookup to have to try to get an entry
> >> pointer again, which could invoke the rich compare, which might alter
> >> the underlying dict..
> >
> > How about subclasses of str? These have all the same issues...
>
>Yeah. I ended up having it, per class, permanently revert to uncached
>lookups when it detects that a class dict in the MRO has non-string
>keys. That's flagged by lookdict_string, which uses PyString_CheckExact.

I'm a bit confused here.  Isn't the simplest way to cache attribute 
lookups to just have a cache dictionary in the type, and update that 
dictionary whenever a change is made to a superclass?  That's 
essentially how __slotted__ attribute changes on base classes work 
now, isn't it?  Why do we need to mess around with the dictionary 
entries themselves in order to do that?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
Phillip J. Eby wrote:
> At 12:27 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Guido van Rossum wrote:
>> > How about subclasses of str? These have all the same issues...
>>
>> Yeah. I ended up having it, per class, permanently revert to uncached
>> lookups when it detects that a class dict in the MRO has non-string
>> keys. That's flagged by lookdict_string, which uses PyString_CheckExact.
> 
> I'm a bit confused here.  Isn't the simplest way to cache attribute 
> lookups to just have a cache dictionary in the type, and update that 
> dictionary whenever a change is made to a superclass?  That's 
> essentially how __slotted__ attribute changes on base classes work now, 
> isn't it?  Why do we need to mess around with the dictionary entries 
> themselves in order to do that?

The nice thing about caching pointers to dict entries is that they don't 
change as often as values do. There are fewer ways to invalidate an 
entry pointer: inserting set, resize, clear, and delete. If you cache 
values, non-inserting set could invalidate as well.

Because inserting into namespace dicts should be very rare, caching 
entries rather than values should reduce the number of times cache 
entries are invalidated to near zero. Updating is expensive, so that's 
good for performance.

Rare updating also means it's okay to invalidate the entire cache rather 
than single entries, so the footprint of the caching mechanism in the 
dict can be very small. For example, I've got a single 64-bit counter in 
each dict that gets incremented after every potentially invalidating 
operation. That comes down to 8 bytes of storage and two extra machine 
instructions (currently) per invalidating operation. The cache checks it 
against its own counter, and updating ensures that it's synced.

Some version of the non-string keys problem would exist with any caching 
mechanism, though. An evil rich compare can always monkey about with 
class dicts in the MRO. If a caching scheme caches values and doesn't 
account for that, it could return stale values. If it caches entries and 
doesn't account for that, it could segfault. I suppose you could argue 
that returning stale values is fitting punishment for using an evil rich 
compare, though the punishee isn't always the same person as the punisher.

Neil

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Guido van Rossum
On Dec 3, 2007 3:48 PM, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> Actually, you're missing the part where such evil code *can't* muck
> things up for class dictionaries.  Type dicts aren't reachable via
> ordinary Python code; you *have* to modify them via setattr.  (The
> __dict__ of types returns a read-only proxy object, so the most evil
> rich compare you can imagine still can't touch it.)

What's to prevent that evil comparison to call setattr on the class?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Phillip J. Eby
At 03:26 PM 12/3/2007 -0700, Neil Toronto wrote:
>Phillip J. Eby wrote:
> > At 12:27 PM 12/3/2007 -0700, Neil Toronto wrote:
> >> Guido van Rossum wrote:
> >> > How about subclasses of str? These have all the same issues...
> >>
> >> Yeah. I ended up having it, per class, permanently revert to uncached
> >> lookups when it detects that a class dict in the MRO has non-string
> >> keys. That's flagged by lookdict_string, which uses PyString_CheckExact.
> >
> > I'm a bit confused here.  Isn't the simplest way to cache attribute
> > lookups to just have a cache dictionary in the type, and update that
> > dictionary whenever a change is made to a superclass?  That's
> > essentially how __slotted__ attribute changes on base classes work now,
> > isn't it?  Why do we need to mess around with the dictionary entries
> > themselves in order to do that?
>
>The nice thing about caching pointers to dict entries is that they don't
>change as often as values do. There are fewer ways to invalidate an
>entry pointer: inserting set, resize, clear, and delete. If you cache
>values, non-inserting set could invalidate as well.
>
>Because inserting into namespace dicts should be very rare, caching
>entries rather than values should reduce the number of times cache
>entries are invalidated to near zero. Updating is expensive, so that's
>good for performance.
>
>Rare updating also means it's okay to invalidate the entire cache rather
>than single entries, so the footprint of the caching mechanism in the
>dict can be very small. For example, I've got a single 64-bit counter in
>each dict that gets incremented after every potentially invalidating
>operation. That comes down to 8 bytes of storage and two extra machine
>instructions (currently) per invalidating operation. The cache checks it
>against its own counter, and updating ensures that it's synced.
>
>Some version of the non-string keys problem would exist with any caching
>mechanism, though. An evil rich compare can always monkey about with
>class dicts in the MRO. If a caching scheme caches values and doesn't
>account for that, it could return stale values. If it caches entries and
>doesn't account for that, it could segfault. I suppose you could argue
>that returning stale values is fitting punishment for using an evil rich
>compare, though the punishee isn't always the same person as the punisher.

Actually, you're missing the part where such evil code *can't* muck 
things up for class dictionaries.  Type dicts aren't reachable via 
ordinary Python code; you *have* to modify them via setattr.  (The 
__dict__ of types returns a read-only proxy object, so the most evil 
rich compare you can imagine still can't touch it.)

This means that MRO cache invalidation can already be detected using 
"type"'s tp_setattro implementation.  And setting attributes on types 
is already extremely rare.  It doesn't seem to me that there's any 
need to use the same namespace speedup mechanism here: capturing 
setattr operations on a type should be sufficient to implement 
invalidation, without mucking about with dictionary entries.  An 
ordinary dict should suffice.

Of course, I suppose there are use cases where somebody uses a class 
attribute as a "global" of sorts, and those use cases would be slowed 
down.  However, if you want to use the entry caching approach, you 
wouldn't need to worry about the segfault case.  (Since somebody 
would have to use C to get at the "real" dictionary.)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
Phillip J. Eby wrote:
> At 03:26 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Phillip J. Eby wrote:
>> > At 12:27 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Some version of the non-string keys problem would exist with any caching
>> mechanism, though. An evil rich compare can always monkey about with
>> class dicts in the MRO. If a caching scheme caches values and doesn't
>> account for that, it could return stale values. If it caches entries and
>> doesn't account for that, it could segfault. I suppose you could argue
>> that returning stale values is fitting punishment for using an evil rich
>> compare, though the punishee isn't always the same person as the 
>> punisher.
> 
> Actually, you're missing the part where such evil code *can't* muck 
> things up for class dictionaries.  Type dicts aren't reachable via 
> ordinary Python code; you *have* to modify them via setattr.  (The 
> __dict__ of types returns a read-only proxy object, so the most evil 
> rich compare you can imagine still can't touch it.)

Interesting. But I'm going to have to say it probably wouldn't work as 
well, since C code can and does alter tp_dict directly. Those places in 
the core would have to be altered to invalidate the cache. There's also 
the issue of extensions, which so far have been able to alter any 
tp_dict without problems. It'd also be really annoying for a class to 
have to notify all of its subclasses when one of its attributes changed.

In other words, I can see the footprint being rather large and difficult 
to manage. By hooking right into dicts and letting them track when 
things change, every other piece of code in the system can happily 
continue doing whatever it likes without needing to worry that it might 
invalidate some cache entry somewhere. I'm confident that's the right 
design choice whether it's best to cache entries or not.

I hope you don't feel that I'm just trying to be contradictory. I'm 
actually enjoying the discussion a lot. I'd rather have my grand ideas 
tested now than discover I was wrong later.

Neil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Phillip J. Eby
At 03:51 PM 12/3/2007 -0800, Guido van Rossum wrote:
>On Dec 3, 2007 3:48 PM, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> > Actually, you're missing the part where such evil code *can't* muck
> > things up for class dictionaries.  Type dicts aren't reachable via
> > ordinary Python code; you *have* to modify them via setattr.  (The
> > __dict__ of types returns a read-only proxy object, so the most evil
> > rich compare you can imagine still can't touch it.)
>
>What's to prevent that evil comparison to call setattr on the class?

If you're caching values, it should be sufficient to have setattr 
trigger the invalidation.  For entries, I have to admit I don't 
understand the approach well enough to make a specific proposal.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
Phillip J. Eby wrote:
> At 03:51 PM 12/3/2007 -0800, Guido van Rossum wrote:
>> On Dec 3, 2007 3:48 PM, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
>> > Actually, you're missing the part where such evil code *can't* muck
>> > things up for class dictionaries.  Type dicts aren't reachable via
>> > ordinary Python code; you *have* to modify them via setattr.  (The
>> > __dict__ of types returns a read-only proxy object, so the most evil
>> > rich compare you can imagine still can't touch it.)
>>
>> What's to prevent that evil comparison to call setattr on the class?
> 
> If you're caching values, it should be sufficient to have setattr 
> trigger the invalidation.  For entries, I have to admit I don't 
> understand the approach well enough to make a specific proposal.

As long as you could determine whether PyDict_SetItem inserted a new 
key, it would make sense. (If it only updates a value, the cache doesn't 
need to change because the pointer to the entry is still valid and the 
entry points to the new value.) The PyDict_SetItem API would have to 
change, or the dict would have to somehow pass the information 
out-of-bound. Neither option sounds great to me, so I'd go with caching 
values from setattr.

Neil

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
I apologize - I had forgotten what you were telling me by the time I 
replied. Here's a better answer.

> Phillip J. Eby wrote:
>> At 03:26 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Actually, you're missing the part where such evil code *can't* muck 
>> things up for class dictionaries.  Type dicts aren't reachable via 
>> ordinary Python code; you *have* to modify them via setattr.  (The 
>> __dict__ of types returns a read-only proxy object, so the most evil 
>> rich compare you can imagine still can't touch it.)

C code can and does alter tp_dict directly already. If caching were 
implemented within type's setattr, all these places would have to be 
changed to use setattr only. That doesn't seem so bad at first. It's a 
change in convention, certainly: a new informal rule that says "no 
monkeying with a PyTypeObject's tp_dict, period". Lack of observance 
could be difficult to debug, as a PyDict_SetItem would appear to have 
worked just fine to C code but not show up to Python code.

Neil

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Phillip J. Eby
At 10:17 PM 12/3/2007 -0700, Neil Toronto wrote:
>Phillip J. Eby wrote:
> > Actually, you're missing the part where such evil code *can't* muck
> > things up for class dictionaries.  Type dicts aren't reachable via
> > ordinary Python code; you *have* to modify them via setattr.  (The
> > __dict__ of types returns a read-only proxy object, so the most evil
> > rich compare you can imagine still can't touch it.)
>
>Interesting. But I'm going to have to say it probably wouldn't work as
>well, since C code can and does alter tp_dict directly. Those places in
>the core would have to be altered to invalidate the cache.

Eh?  Where is the type dictionary altered outside of setattr and 
class creation?


>  There's also
>the issue of extensions, which so far have been able to alter any
>tp_dict without problems.

Do you have any actual examples?

Believe me, I'm the last person to suggest removing useful hack, er, 
hooks.  :)  But I don't think that type __dict__ munging is actually 
common at all.


>It'd also be really annoying for a class to
>have to notify all of its subclasses when one of its attributes changed.

It's not all subclasses - only those subclasses that don't shadow the 
attribute.  Also, it's not necessarily the case that notification 
would be O(subclasses) - it could be done via a version counter, as 
in your approach.  Admittedly, that would require an extra bit of 
indirection, since you'd need to keep (and check) counters for each descriptor.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-string keys in namespace dicts

2007-12-03 Thread Neil Toronto
Phillip J. Eby wrote:
> At 10:17 PM 12/3/2007 -0700, Neil Toronto wrote:
>> Interesting. But I'm going to have to say it probably wouldn't work as
>> well, since C code can and does alter tp_dict directly. Those places in
>> the core would have to be altered to invalidate the cache.
> 
> Eh?  Where is the type dictionary altered outside of setattr and class 
> creation?

You're right - my initial grep turned up stuff that looked like tp_dict 
monkeying out of context. The ctypes module does it a lot, but only in 
its various *_new functions.

>> It'd also be really annoying for a class to
>> have to notify all of its subclasses when one of its attributes changed.
> 
> It's not all subclasses - only those subclasses that don't shadow the 
> attribute.  Also, it's not necessarily the case that notification would 
> be O(subclasses) - it could be done via a version counter, as in your 
> approach.  Admittedly, that would require an extra bit of indirection, 
> since you'd need to keep (and check) counters for each descriptor.

And the extra overhead comes back to bite us again, and probably in a 
critical path. (I'm sure you've been bitten in a critical path before.) 
That's been the issue with all of these caching schemes so far - Python 
is just too durned dynamic to guarantee them anything they can exploit 
for efficiency, so they end up slowing down common operations. (Not that 
I'd change a bit of Python, mind you.)

For example, almost everything I've tried slows down attribute lookups 
on built-in types. Adding one 64-bit version counter check and a branch 
on failure incurs a 3-5% penalty. That's not the end of the world, but 
it makes pybench take about 0.65% longer.

I finally overcame that by making a custom dictionary type to use as the 
cache. I haven't yet tested something my caching lookups are slower at - 
they're all faster so far for builtins and Python objects with any size 
MRO - but I haven't tested exhaustively and I haven't done failing 
hasattr-style lookups. Turns out that not finding an attribute all the 
way up the MRO (which can lead to a persistent cache miss if done with 
the same name) is rather frequent in Python and is expected to be fast. 
I can cache missing attributes as easily as present attributes, but they 
could pile up if someone decides to hasattr an object with a zillion 
different names.

I have a cunning plan, though, which is probably best explained using a 
patch.

At any rate, I'm warming to this setattr idea, and I'll likely try that 
next whether my current approach works out or not.

Neil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com