date:20060218

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Martin v. Löwis

Adam Olsen wrote:
> Demo/metaclass/Meta.py:55

That wouldn't break. If you had actually read the code, you would have
seen it is

try:
ga = dict['__getattr__']
except KeyError:
pass

How would it break if dict had a default factory? ga would get the
__getattr__ value, and everything would be fine. The KeyError is
ignored, after all.

> Demo/tkinter/guido/AttrDialog.py:121  # Subclasses override self.classes

Hmm

try:
cl = self.classes[c]
except KeyError:
cl = 'unknown'

So cl wouldn't be 'unknown'. Why would that be a problem?

> Lib/ConfigParser.py:623

   try:
v = map[var]
except KeyError:
raise InterpolationMissingOptionError(
option, section, rest, var)

So there is no InterpolationMissingOptionError. *Of course not*.
The whole point would be to provide a value for all interpolation
variables.

> Lib/random.py:315

This entire functions samples k elements with indices between 0
and len(population). Now, people "shouldn't" be passing dictionaries
in in the first place; that specific code tests whether there
valid values at indices 0, n//2, and n. If the dictionary
isn't really a sequence (i.e. if it doesn't provide values
at all indices), the function may later fail even if it passes
that test.

With a default-valued dictionary, the function would not fail,
but a large number of samples might be the default value.

> Lib/string.py:191

Same like ConfigParser: the intperpolation will always succeed,
interpolating all values (rather than leaving $identifier in the
string). That would be precisely the expected behaviour.

> Lib/weakref.py:56  # Currently uses UserDict but I assume it will
> switch to dict eventually

Or, rather, UserDict might grow the on_missing feature as well.

That is irrelevant for this issue, though:

o = self.data[key]()
if o is None:
raise KeyError, key  # line 56
else:
return o

So we are looking for lookup failures in self.data, here:
self.dict is initialized to {} in UserDict, with no
default factory. So there cannot be a change in behaviour.

> Perhaps the KeyError shouldn't ever get triggered in this case, I'm
> not sure.  I think that's besides the point though.  The programmer
> clearly expected it would.

No. I now see your problem: An "except KeyError" does *not* mean
that the programmer "clearly expects it will" raise an KeyError.
Instead, the programmer expects it *might* raise a KeyError, and
tries to deal with this situation.

If the situation doesn't arise, the code continue just fine.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] ssize_t branch merged

2006-02-18 Thread Martin v. Löwis

Neal Norwitz wrote:
> I suppose that might be nice, but would require configure magic.  I'm
> not sure how it could be done on Windows.

Contributions are welcome. On Windows, it can be hard-coded.

Actually, something like

#if SIZEOF_SIZE_T == SIZEOF_INT
#define PY_SSIZE_T_MAX INT_MAX
#elif SIZEOF_SIZE_T == SIZEOF_LONG
#define PY_SSIZE_T_MAX LONG_MAX
#else
#error What is size_t equal to?
#endif

might work.

> There are much more important problems to address at this point IMO. 
> Just review the recent fixes related to Py_BuildValue() on
> python-checkins to see what I mean.

Nevertheless, it would be desirable IMO if it expanded to a literal,
so that the preprocessor could understand it.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Ron Adam

Josiah Carlson wrote:
> Bob Ippolito <[EMAIL PROTECTED]> wrote:
>>
>> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
>>
>>> Greg Ewing <[EMAIL PROTECTED]> wrote:
 Stephen J. Turnbull wrote:
>> "Guido" == Guido van Rossum <[EMAIL PROTECTED]> writes:
> Guido> - b = bytes(t, enc); t = text(b, enc)
>
> +1  The coding conversion operation has always felt like a  
> constructor
> to me, and in this particular usage that's exactly what it is.  I
> prefer the nomenclature to reflect that.
 This also has the advantage that it competely
 avoids using the verbs "encode" and "decode"
 and the attendant confusion about which direction
 they go in.

 e.g.

s = text(b, "base64")

 makes it obvious that you're going from the
 binary side to the text side of the base64
 conversion.
>>> But you aren't always getting *unicode* text from the decoding of  
>>> bytes,
>>> and you may be encoding bytes *to* bytes:
>>>
>>> b2 = bytes(b, "base64")
>>> b3 = bytes(b2, "base64")
>>>
>>> Which direction are we going again?
>> This is *exactly* why the current set of codecs are INSANE.   
>> unicode.encode and str.decode should be used *only* for unicode  
>> codecs.  Byte transforms are entirely different semantically and  
>> should be some other method pair.
> 
> The problem is that we are overloading data types.  Strings (and bytes)
> can contain both encoded text as well as data, or even encoded data.

Right

> Educate the users.  Raise better exceptions telling people why their
> encoding or decoding failed, as Ian Bicking already pointed out.  If
> bytes.encode() and the equivalent of text.decode() is going to disappear,

+1 on better documentation all around with regards to encodings and 
Unicode.  So far the best explanation I've found (so far) is in PEP 100. 
  The Python docs and built in help hardly explain more than the minimal 
argument list for the encoding and decoding methods, and the str and 
unicode type constructor arguments aren't explained any better.

> Bengt Richter had a good idea with bytes.recode() for strictly bytes
> transformations (and the equivalent for text), though it is ambiguous as
> to the direction; are we encoding or decoding with bytes.recode()?  In
> my opinion, this is why .encode() and .decode() makes sense to keep on
> both bytes and text, the direction is unambiguous, and if one has even a
> remote idea of what the heck the codec is, they know their result.
> 
>  - Josiah

I like the bytes.recode() idea a lot. +1

It seems to me it's a far more useful idea than encoding and decoding by 
overloading and could do both and more.  It has a lot of potential to be 
an intermediate step for encoding as well as being used for many other 
translations to byte data.

I think I would prefer that encode and decode be just functions with 
well defined names and arguments instead of being methods or arguments 
to string and Unicode types.

I'm not sure on exactly how this would work. Maybe it would need two 
sets of encodings, ie.. decoders, and encoders.  An exception would be
given if it wasn't found for the direction one was going in.

Roughly... something or other like:

 import encodings

 encodings.tostr(obj, encoding):
if encoding not in encoders:
raise LookupError 'encoding not found in encoders'
# check if obj works with encoding to string
# ...
b = bytes(obj).recode(encoding)
return str(b)

 encodings.tounicode(obj, decodeing):
if decoding not in decoders:
raise LookupError 'decoding not found in decoders'
# check if obj works with decoding to unicode
# ...
b = bytes(obj).recode(decoding)
return unicode(b)

Anyway... food for thought.

Cheers,
Ronald Adam







___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The decorator(s) module

2006-02-18 Thread Georg Brandl

Guido van Rossum wrote:
> WFM. Patch anyone?

Done.
http://python.org/sf/1434038

Georg

> On 2/17/06, Ian Bicking <[EMAIL PROTECTED]> wrote:
>> Alex Martelli wrote:
>> > Maybe we could fix that by having property(getfunc) use
>> > getfunc.__doc__ as the __doc__ of the resulting property object
>> > (easily overridable in more normal property usage by the doc=
>> > argument, which, I feel, should almost invariably be there).

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis

Aahz wrote:
> The problem is that they don't understand that "Martin v. L?wis" is not
> Unicode -- once all strings are Unicode, this is guaranteed to work.

This specific call, yes. I don't think the problem will go away as long
as both encode and decode are available for both strings and byte
arrays.

> While it's not absolutely true, my experience of watching Unicode
> confusion is that the simplest approach for newbies is: encode FROM
> Unicode, decode TO Unicode.

I think this is what should be in-grained into the library, also. It
shouldn't try to give additional meaning to these terms.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Josiah Carlson

Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > Bengt Richter had a good idea with bytes.recode() for strictly bytes
> > transformations (and the equivalent for text), though it is ambiguous as
> > to the direction; are we encoding or decoding with bytes.recode()?  In
> > my opinion, this is why .encode() and .decode() makes sense to keep on
> > both bytes and text, the direction is unambiguous, and if one has even a
> > remote idea of what the heck the codec is, they know their result.
> > 
> >  - Josiah
> 
> I like the bytes.recode() idea a lot. +1
> 
> It seems to me it's a far more useful idea than encoding and decoding by 
> overloading and could do both and more.  It has a lot of potential to be 
> an intermediate step for encoding as well as being used for many other 
> translations to byte data.

Indeed it does.

> I think I would prefer that encode and decode be just functions with 
> well defined names and arguments instead of being methods or arguments 
> to string and Unicode types.

Attaching it to string and unicode objects is a useful convenience. 
Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . 
Tossing the encode/decode somewhere else, like encodings, or even string,
I see as a backwards step.

> I'm not sure on exactly how this would work. Maybe it would need two 
> sets of encodings, ie.. decoders, and encoders.  An exception would be
> given if it wasn't found for the direction one was going in.
> 
> Roughly... something or other like:
> 
>  import encodings
> 
>  encodings.tostr(obj, encoding):
> if encoding not in encoders:
> raise LookupError 'encoding not found in encoders'
> # check if obj works with encoding to string
> # ...
> b = bytes(obj).recode(encoding)
> return str(b)
> 
>  encodings.tounicode(obj, decodeing):
> if decoding not in decoders:
> raise LookupError 'decoding not found in decoders'
> # check if obj works with decoding to unicode
> # ...
> b = bytes(obj).recode(decoding)
> return unicode(b)
> 
> Anyway... food for thought.

Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something? 
Are we going to need to embed the direction in the encoding/decoding
name (to_base64, from_base64, etc.)?  That doesn't any better than
binascii.b2a_base64 .  What about .reencode and .redecode?  It seems as
though the 're' added as a prefix to .encode and .decode makes it
clearer that you get the same type back as you put in, and it is also
unambiguous to direction.

The question remains: is str.decode() returning a string or unicode
depending on the argument passed, when the argument quite literally
names the codec involved, difficult to understand?  I don't believe so;
am I the only one?

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Walter Dörwald

Guido van Rossum wrote:
> On 2/17/06, Ian Bicking <[EMAIL PROTECTED]> wrote:
>> Guido van Rossum wrote:
>> > d = {}
>> > d.default_factory = set
>> > ...
>> > d[key].add(value)
>>
>> Another option would be:
>>
>>d = {}
>>d.default_factory = set
>>d.get_default(key).add(value)
>>
>> Unlike .setdefault, this would use a factory associated with the dictionary, 
>> and no default value would get passed in.
>> Unlike the proposal, this would not override __getitem__ (not overriding
>> __getitem__ is really the only difference with the proposal).  It would be 
>> clear reading the code that you were not
>> implicitly asserting they "key in d" was true.
>>
>> "get_default" isn't the best name, but another name isn't jumping out at me 
>> at the moment.  Of course, it is not a Pythonic
>> argument to say that an existing method should be overridden, or 
>> functionality made nameless simply because we can't think
>> of a name (looking to anonymous functions of course ;)
>
> I'm torn. While trying to implement this I came across some ugliness in 
> PyDict_GetItem() -- it would make sense if this also
> called
> on_missing(), but it must return a value without incrementing its
> refcount, and isn't supposed to raise exceptions -- so what to do if 
> on_missing() returns a value that's not inserted in the
> dict?
>
> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.
>
> OTOH most reviewers here seem to appreciate on_missing() as a way to do 
> various other ways of alterning a dict's
> __getitem__() behavior behind a caller's back -- perhaps it could even be 
> (ab)used to
> implement case-insensitive lookup.

I don't like the fact that on_missing()/default_factory can change the 
behaviour of __getitem__, which upto now has been
something simple and understandable.
Why don't we put the on_missing()/default_factory functionality into get() 
instead?

d.get(key, default) does what it did before. d.get(key) invokes on_missing() 
(and dict would have default_factory == type(None))

Bye,
   Walter Dörwald



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg

Martin, v. Löwis wrote:
>> How are users confused?
> 
> Users do
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> 
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.
> 
> What it *should* tell them is
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode() method *do* check the return
types of the codecs and only allow strings or Unicode
on return (no lists, instances, tuples or anything else).

You seem to ignore this fact.

If we were to follow your idea, we should remove .encode()
and .decode() altogether and refer users to the codecs.encode()
and codecs.decode() function. However, I doubt that users
will like this idea.

>> bytes.encode CAN only produce bytes.
> 
> I don't understand MAL's design, but I believe in that design,
> bytes.encode could produce anything (say, a list). A codec
> can convert anything to anything else.

True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Serial function call composition syntax foo(x, y) -> bar() -> baz(z)

2006-02-18 Thread Michael Hudson

"Guido van Rossum" <[EMAIL PROTECTED]> writes:

> It's only me that's allowed to top-post. :-)

At least you include attributions these days! 

Cheers,
mwh

-- 
  SPIDER:  'Scuse me. [scuttles off]
  ZAPHOD:  One huge spider.
FORD:  Polite though.
   -- The Hitch-Hikers Guide to the Galaxy, Episode 11
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Thomas Wouters

On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:

> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> 
> You seem to ignore this fact.

Actually, I think the problem is that while we all agree the
bytestring/unicode methods are a useful way to convert from bytestring to
unicode and back again, we disagree on their *general* usefulness. Sure, the
codecs mechanism is powerful, and even more so because they can determine
their own returntype. But it still smells and feels like a Perl attitude,
for the reasons already explained numerous times, as well:

 - The return value for the non-unicode encodings depends on the value of
   the encoding argument.

 - The general case, by and large, especially in non-powerusers, is to
   encode unicode to bytestrings and to decode bytestrings to unicode. And
   that is a hard enough task for many of the non-powerusers. Being able to
   use the encode/decode methods for other tasks isn't helping them.

That is why I disagree with the hypergeneralization of the encode/decode
methods, regardless of the fact that it is a natural expansion of the
implementation of codecs. Sure, it looks 'right' and 'natural' when you look
at the implementation. It sure doesn't look natural, to me and to many
others, when you look at the task of encoding and decoding
bytestrings/unicode.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Michael Hudson

"Guido van Rossum" <[EMAIL PROTECTED]> writes:

> I'm torn. While trying to implement this I came across some ugliness
> in PyDict_GetItem() -- it would make sense if this also called
> on_missing(), but it must return a value without incrementing its
> refcount, and isn't supposed to raise exceptions

This last bit has been a painful lie for quite some time.  I don't
know what can be done about it, though -- avoid the use of
PyDict_GetItem() in situations where you don't expect string only
dicts (so using it on globals and instance dicts would still be ok)?

> -- so what to do if
> on_missing() returns a value that's not inserted in the dict?

Well, like some others I am a bit uncomfortable with changing the
semantics of such an important operation on such an important data
structure.  But then I'm also not that unhappy with setdefault, so I
must be weird.

> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.

Yes.

> OTOH most reviewers here seem to appreciate on_missing() as a way to
> do various other ways of alterning a dict's __getitem__() behavior
> behind a caller's back -- perhaps it could even be (ab)used to
> implement case-insensitive lookup.

Well, I'm not sure I do.

There seems to be quite a conceptual difference between being able to
make a new kind of dictionary and mess with the behaviour of one that
exists already, but I don't know if that matters in practice (the fact
that you can currently do things like "import sys; sys.__dict__.clear()"
doesn't seem to cause real problems).

Finally, I'll just note that subclassing to modify the behaviour of a
builtin type has generally been actively discouraged in python so far.
If all dictionary lookups went through a method that you could
override in Python (i.e. subclasses could replace ma_lookup, in
effect) this would be easy to do in Python code.  But they don't, and
bug reports suggesting that they do have been rejected in the past
(and I agree with the rejection, fwiw).

So that rambled a bit.  But in essence: I'd much prefer much prefer an
addtion of a method or a type than modifictaion of existing behaviour.

Cheers,
mwh

-- 
  If you're talking "useful", I'm not your bot.
-- Tim Peters, 08 Nov 2001
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
> 
> No. The reason to ban string.decode and bytes.encode is that
> it confuses users.

Instead of starting to ban everything that can potentially
confuse a few users, we should educate those users and tell
them what these methods mean and how they should be used.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Michael Hudson

This posting is entirely tangential.  Be warned.

"Martin v. Löwis" <[EMAIL PROTECTED]> writes:

> It's worse than that. The return *type* depends on the *value* of
> the argument. I think there is little precedence for that:

There's one extremely significant example where the *value* of
something impacts on the type of something else: functions.  The types
of everything involved in str([1]) and len([1]) are the same but the
results are different.  This shows up in PyPy's type annotation; most
of the time we just track types indeed, but when something is called
we need to have a pretty good idea of the potential values, too.

Relavent to the point at hand?  No.  Apologies for wasting your time
:)

Cheers,
mwh

-- 
  The ultimate laziness is not using Perl.  That saves you so much
  work you wouldn't believe it if you had never tried it.
-- Erik Naggum, comp.lang.lisp
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Ron Adam

Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
>>> Bengt Richter had a good idea with bytes.recode() for strictly bytes
>>> transformations (and the equivalent for text), though it is ambiguous as
>>> to the direction; are we encoding or decoding with bytes.recode()?  In
>>> my opinion, this is why .encode() and .decode() makes sense to keep on
>>> both bytes and text, the direction is unambiguous, and if one has even a
>>> remote idea of what the heck the codec is, they know their result.
>>>
>>>  - Josiah
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
> 
> Indeed it does.
> 
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
> 
> Attaching it to string and unicode objects is a useful convenience. 
> Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . 
> Tossing the encode/decode somewhere else, like encodings, or even string,
> I see as a backwards step.
> 
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
>>
>> Roughly... something or other like:
>>
>>  import encodings
>>
>>  encodings.tostr(obj, encoding):
>> if encoding not in encoders:
>> raise LookupError 'encoding not found in encoders'
>> # check if obj works with encoding to string
>> # ...
>> b = bytes(obj).recode(encoding)
>> return str(b)
>>
>>  encodings.tounicode(obj, decodeing):
>> if decoding not in decoders:
>> raise LookupError 'decoding not found in decoders'
>> # check if obj works with decoding to unicode
>> # ...
>> b = bytes(obj).recode(decoding)
>> return unicode(b)
>>
>> Anyway... food for thought.
> 
> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> Are we encoding _to_ something, or are we decoding _from_ something? 

This was just an example of one way that might work, but here are my 
thoughts on why I think it might be good.

In this case, the ambiguity is reduced as far as the encoding and 
decodings opperations are concerned.)

  somestring = encodings.tostr( someunicodestr, 'latin-1')

It's pretty clear what is happening to me.

 It will encode to a string an object, named someunicodestr, with 
the 'latin-1' encoder.

And also rusult in clear errors if the specified encoding is 
unavailable, and if it is, if it's not compatible with the given 
*someunicodestr* obj type.

Further hints could be gained by.

 help(encodings.tostr)

Which could result in... something like...
 """
 encoding.tostr( ,  ) -> string

 Encode a unicode string using a encoder codec to a
 non-unicode string or transform a non-unicode string
 to another non-unicode string using an encoder codec.
 """

And if that's not enough, then help(encodings) could give more clues. 
These steps would be what I would do. And then the next thing would be 
to find the python docs entry on encodings.

Placing them in encodings seems like a fairly good place to look for 
these functions if you are working with encodings.  So I find that just 
as convenient as having them be string methods.

There is no intermediate default encoding involved above, (the bytes 
object is used instead), so you wouldn't get some of the messages the 
present system results in when ascii is the default.

(Yes, I know it won't when P3K is here also)

> Are we going to need to embed the direction in the encoding/decoding
> name (to_base64, from_base64, etc.)?  That doesn't any better than
> binascii.b2a_base64 .  

No, that's why I suggested two separate lists (or dictionaries might be 
better).  They can contain the same names, but the lists they are in 
determine the context and point to the needed codec.  And that step is 
abstracted out by putting it inside the encodings.tostr() and 
encodings.tounicode() functions.

So either function would call 'base64' from the correct codec list and 
get the correct encoding or decoding codec it needs.

What about .reencode and .redecode?  It seems as
> though the 're' added as a prefix to .encode and .decode makes it
> clearer that you get the same type back as you put in, and it is also
> unambiguous to direction.

But then wouldn't we end up with multitude of ways to do things?

 s.encode(codec) == s.redecode(codec)
 s.decode(codec) == s.reencode(codec)
 unicode(s, codec) == s.decode(codec)
 str(u, codec) == u.encode(codec)
 str(s, codec) == s.encode(code

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg

Thomas Wouters wrote:
> On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
> 
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>>
>> You seem to ignore this fact.
> 
> Actually, I think the problem is that while we all agree the
> bytestring/unicode methods are a useful way to convert from bytestring to
> unicode and back again, we disagree on their *general* usefulness. Sure, the
> codecs mechanism is powerful, and even more so because they can determine
> their own returntype. But it still smells and feels like a Perl attitude,
> for the reasons already explained numerous times, as well:

It's by no means a Perl attitude.

The main reason is symmetry and the fact that strings and Unicode
should be as similar as possible in order to simplify the task of
moving from one to the other.

>  - The return value for the non-unicode encodings depends on the value of
>the encoding argument.

Not really: you'll always get a basestring instance.

>  - The general case, by and large, especially in non-powerusers, is to
>encode unicode to bytestrings and to decode bytestrings to unicode. And
>that is a hard enough task for many of the non-powerusers. Being able to
>use the encode/decode methods for other tasks isn't helping them.

Agreed.

Still, I believe that this is an educational problem. There are
a couple of gotchas users will have to be aware of (and this is
unrelated to the methods in question):

* "encoding" always refers to transforming original data into
  a derived form

* "decoding" always refers to transforming a derived form of
  data back into its original form

* for Unicode codecs the original form is Unicode, the derived
  form is, in most cases, a string

As a result, if you want to use a Unicode codec such as utf-8,
you encode Unicode into a utf-8 string and decode a utf-8 string
into Unicode.

Encoding a string is only possible if the string itself is
original data, e.g. some data that is supposed to be transformed
into a base64 encoded form.

Decoding Unicode is only possible if the Unicode string itself
represents a derived form, e.g. a sequence of hex literals.

> That is why I disagree with the hypergeneralization of the encode/decode
> methods, regardless of the fact that it is a natural expansion of the
> implementation of codecs. Sure, it looks 'right' and 'natural' when you look
> at the implementation. It sure doesn't look natural, to me and to many
> others, when you look at the task of encoding and decoding
> bytestrings/unicode.

That's because you only look at one specific task.

Codecs also unify the various interfaces to common encodings
such as base64, uu or zip which are not Unicode related.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Pierre Barbier de Reuille

Quoting [EMAIL PROTECTED]:

>
> Guido> Over lunch with Alex Martelli, he proposed that a subclass of
> Guido> dict with this behavior (but implemented in C) would be a good
> Guido> addition to the language.
>
> Instead, why not define setdefault() the way it should have been done in the
> first place?  When you create a dict it has the current behavior.  If you
> then call its setdefault() method that becomes the default value for missing
> keys.
>
> d = {'a': 1}'
> d['b']  # raises KeyError
> d.get('c')  # evaluates to None
> d.setdefault(42)
> d['b']  # evaluates to 42
> d.get('c')  # evaluates to 42
>
> For symmetry, setdefault() should probably be undoable: deldefault(),
> removedefault(), nodefault(), default_free(), whatever.

Well, first not ot break the current interface, and second because I think it
reads better I would prefer :

  d = {'a': 1}'
  d['b']  # raises KeyError
  d.get('c')  # evaluates to None
  d.default = 42
  d['b']  # evaluates to 42
  d.get('c')  # evaluates to 42

And to undo the default, you can simply do :

  del d.default

And of course, you can get the current value :

  d.default

But then, as proposed many times, I would rather see a function call. Like :

d.default = lambda key: 42

The argument of the function is the current key. It would allow things 
like that
:

d.default = time_comsuming_operation

where time_comsuming_operation get a single argument.

>
> The only question in my mind is whether or not getting a non-existent value
> under the influence of a given default value should stick that value in the
> dictionary or not.
>
> down-with-more-builtins-ly, y'rs,
>
> Skip
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/pierre.barbier%40cirad.fr
>


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Adam Olsen

On 2/18/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> Look at what we've currently got going for data transformations in the
> standard library to see what these removals will do: base64 module,
> binascii module, binhex module, uu module, ...  Do we want or need to
> add another top-level module for every future encoding/codec that comes
> out (or does everyone think that we're done seeing codecs)?  Do we want
> to keep monkey-patching binascii with names like 'a2b_hqx'?  While there
> is currently one text->text transform (rot13), do we add another module
> for text->text transforms? Would it start having names like t2e_rot13()
> and e2t_rot13()?

If top-level modules are the problem then why not make codecs into a package?

from codecs import utf8, base64

utf8.encode(u) -> b
utf8.decode(b) -> u
base64.encode(b) -> b
base64.decode(b) -> b

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A codecs nit

2006-02-18 Thread M.-A. Lemburg

Barry Warsaw wrote:
> On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:
> 
>> Those are not pseudo-encodings, they are regular codecs.
>>
>> It's a common misunderstanding that codecs are only seen as serving
>> the purpose of converting between Unicode and strings.
>>
>> The codec system is deliberately designed to be general enough
>> to also work with many other types, e.g. it is easily possible to
>> write a codec that convert between the hex literal sequence you
>> have above to a list of ordinals:
> 
> Slightly off-topic, but one thing that's always bothered me about the
> current codecs implementation is that str.encode() (and friends)
> implicitly treats its argument as module, and imports it, even if the
> module doesn't live in the encodings package.  That seems like a mistake
> to me (and a potential security problem if the import has side-effects).

It was a mistake, yes, and thanks for bringing this up.

Codec packages should implement and register their own
codec search functions.

> I don't know whether at the very least restricting the imports to the
> encodings package would make sense or would break things.
> 
 import sys
 sys.modules['smtplib']
> Traceback (most recent call last):
>   File "", line 1, in ?
> KeyError: 'smtplib'
 ''.encode('smtplib')
> Traceback (most recent call last):
>   File "", line 1, in ?
> LookupError: unknown encoding: smtplib
 sys.modules['smtplib']
> 
> 
> I can't see any reason for allowing any randomly importable module to
> act like an encoding.

The encodings package search function will try to import
the module and then check the module signature. If the
module fails to export the codec registration API, then
it raises the LookupError you see above.

At the time, it was nice to be able to write codec
packages as Python packages and have them readily usable
by just putting the package on the sys.path.

This was a side-effect of the way the encodings search
function worked. The original design idea was to have
all 3rd party codecs register themselves with the
codec registry. However, this implies that the application
using the codecs would have to run the registration
code at least ones. Since the encodings package search
function provided a more convenient way, this was used
by most codec package programmers.

In Py 2.5 we'll change that. The encodings package search
function will only allow codecs in that package to be
imported. All other codec packages will have to provide
their own search function and register this with the
codecs registry.

The big question is: what to do about 2.3 and 2.4 - adding
the same patch will cause serious breakage, since popular
codec packages such as Tamito's Japanese package rely
on the existing behavior.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adventures with ASTs - Inline Lambda

2006-02-18 Thread skip


talin> ... whereas with 'given' you can't be certain when to stop
talin> parsing the argument list.

So require parens around the arglist:

(x*y given (x, y))

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Stackless Python sprint at PyCon 2006

2006-02-18 Thread Richard Tew

Hi,During the sprint period after PyCon, we are planning on sprinting to bring Stackless up to date and to make it more current and approachable.  A key part of this is porting it and the recently completed 64 bit changes that have been made to it to the latest version of Python.  At the end of the sprint we hope to have up to date working 32 and 64 bit versions.
If anyone on this list who is attending PyCon, has some time to spare during the sprint period and an interest in perhaps getting more familiar with Stackless, you would be more than welcome in joining us to help out.  Familiarity with the Python source code and its workings would be a great help in the work we hope to get done.  Especially participants with an interest in ensuring and testing that the porting done works on other platforms than those we will be developing on (Windows XP and Windows XP x64 edition).
Obviously being the most familiar with the Stackless Python source code, Christian Tismer has kindly offered us guidance by acting as the coach for the sprint, taking time away from the PyPy sprint.In any case, if you have any questions, or are interested, please feel free to reply, whether here, to this email address or to 
[EMAIL PROTECTED].Thanks,Richard TewSenior ProgrammerCCP GamesYou can read more about the sprint and the scheduled talk about how Stackless is used in the massively multiplayer game EVE Online we make, at PyCon at the folloing URL:
http://www.stackless.com/Members/rmtew/pycon2006And don't forget the Stackless website :)http://www.stackless.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The decorator(s) module

2006-02-18 Thread Alex Martelli

On Feb 18, 2006, at 12:38 AM, Georg Brandl wrote:

> Guido van Rossum wrote:
>> WFM. Patch anyone?
>
> Done.
> http://python.org/sf/1434038

I reviewed the patch and added a comment on it,  but since the point  
may be controversial I had better air it here for discussion: in 2.4,  
property(fset=acallable) does work (maybe silly, but it does make a  
write-only property) -- with the patch as given, it would stop  
working (due to attempts to get __doc__ from the None value of fget);  
I think we should ensure it keeps working (and add a unit test to  
that effect).

Alex

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Aahz

On Sat, Feb 18, 2006, Ron Adam wrote:
>
> I like the bytes.recode() idea a lot. +1
> 
> It seems to me it's a far more useful idea than encoding and decoding by 
> overloading and could do both and more.  It has a lot of potential to be 
> an intermediate step for encoding as well as being used for many other 
> translations to byte data.
> 
> I think I would prefer that encode and decode be just functions with 
> well defined names and arguments instead of being methods or arguments 
> to string and Unicode types.
> 
> I'm not sure on exactly how this would work. Maybe it would need two 
> sets of encodings, ie.. decoders, and encoders.  An exception would be
> given if it wasn't found for the direction one was going in.

Here's an idea I don't think I've seen before:

bytes.recode(b, src_encoding, dest_encoding)

This requires the user to state up-front what the source encoding is.
One of the big problems that I see with the whole encoding mess is that
so much of it contains implicit assumptions about the source encoding;
this gets away from that.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread M.-A. Lemburg

Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
>>
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
>>
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
> 
> Here's an idea I don't think I've seen before:
> 
> bytes.recode(b, src_encoding, dest_encoding)
> 
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.

You might want to look at the codecs.py module: it has all these
things and a lot more.

http://docs.python.org/lib/module-codecs.html
http://svn.python.org/view/python/trunk/Lib/codecs.py?view=markup

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-18 Thread Walter Dörwald

M.-A. Lemburg wrote:
> Walter Dörwald wrote:
> I'd suggest we keep codecs.lookup() the way it is and
> instead add new functions to the codecs module, e.g.
> codecs.getencoderobject() and codecs.getdecoderobject().
>
> Changing the codec registration is not much of a problem:
> we could simply allow 6-tuples to be passed into the
> registry.
 OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples 
 and the search functions must return 6-tuples.
 And we add codecs.getencoderobject() and codecs.getdecoderobject() as well 
 as new classes codecs.StatefulEncoder and
 codecs.StatefulDecoder. What about old search functions that return 
 4-tuples?
>>>
>>> The registry should then simply set the missing entries to None and the 
>>> getencoderobject()/getdecoderobject() would then
>>> have
>>> to raise an error.
>>
>> Sounds simple enough and we don't loose backwards compatibility.
>>
>>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
>>
>> +1, but I'd like to have a replacement for this, i.e. a function that 
>> returns all info the registry has about an encoding:
>>
>> 1. Name
>> 2. Encoder function
>> 3. Decoder function
>> 4. Stateful encoder factory
>> 5. Stateful decoder factory
>> 6. Stream writer factory
>> 7. Stream reader factory
>>
>> and if this is an object with attributes, we won't have any problems if we 
>> extend it in the future.
>
> Shouldn't be a problem: just expose the registry dictionary
> via the _codecs module.
>
> The rest can then be done in a Python function defined in
> codecs.py using a CodecInfo class.

This would require the Python code to call codecs.lookup() and then look into 
the codecs dictionary (normalizing the encoding
name again). Maybe we should make a version of __PyCodec_Lookup() that allows 
4- and 6-tuples available to Python and use that?
The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 
4-tuples.
>> BTW, if we change the API, can we fix the return value of the stateless 
>> functions? As the stateless function always
>> encodes/decodes the complete string, returning the length of the string 
>> doesn't make sense.
>> codecs.getencoder() and codecs.getdecoder() would have to continue to return 
>> the old variant of the functions, but
>> codecs.getinfo("latin-1").encoder would be the new encoding function.
>
> No: you can still write stateless encoders or decoders that do
> not process the whole input string. Just because we don't have
> any of those in Python, doesn't mean that they can't be written
> and used. A stateless codec might want to leave the work
> of buffering bytes at the end of the input data which cannot
> be processed to the caller.

But what would the call do with that info? It can't retry encoding/decoding the 
rejected input, because the state of the codec
has been thrown away already.
> It is also possible to write
> stateful codecs on top of such stateless encoding and decoding
> functions.

That's what the codec helper functions from Python/_codecs.c are for.

Anyway, I've started implementing a patch that just adds 
codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16,
UTF-16-LE and UTF-16-BE are already working.
Bye,
Walter Dörwald



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis

M.-A. Lemburg wrote:
> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> 
> You seem to ignore this fact.

I'm not ignoring the fact that you have explained this
many times. I just fail to understand your explanations.

For example, you said at some point that codecs are not
restricted to Unicode. However, I don't recall any
explanation what the restriction *is*, if any restriction
exists. No such restriction seems to be documented.

> True. However, note that the .encode()/.decode() methods on
> strings and Unicode narrow down the possible return types.
> The corresponding .bytes methods should only allow bytes and
> Unicode.

I forgot that: what is the rationale for that restriction?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Martin v. Löwis

Michael Hudson wrote:
> There's one extremely significant example where the *value* of
> something impacts on the type of something else: functions.  The types
> of everything involved in str([1]) and len([1]) are the same but the
> results are different.  This shows up in PyPy's type annotation; most
> of the time we just track types indeed, but when something is called
> we need to have a pretty good idea of the potential values, too.
> 
> Relavent to the point at hand?  No.  Apologies for wasting your time
> :)

Actually, I think it is relevant. I never thought about it this way,
but now that you mention it, you are right.

This demonstrates that the string argument to .encode is actually
a function name, atleast the way it is implemented now. So
.encode("uu") and .encode("rot13") are *two* different methods,
instead of being a single method.

This brings me back to my original point: "rot13" should be a function,
not a parameter to some function. In essence, .encode reimplements
apply(), with the added feature of not having to pass the function
itself, but just its name.

Maybe this design results from a really deep understanding of

Namespaces are one honking great idea -- let's do more of those!

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stackless Python sprint at PyCon 2006

2006-02-18 Thread Martin v. Löwis

Richard Tew wrote:
> If anyone on this list who is attending PyCon, has some time to spare
> during the sprint period and an interest in perhaps getting more
> familiar with Stackless, you would be more than welcome in joining us to
> help out.  Familiarity with the Python source code and its workings
> would be a great help in the work we hope to get done.  Especially
> participants with an interest in ensuring and testing that the porting
> done works on other platforms than those we will be developing on
> (Windows XP and Windows XP x64 edition).

If you are going to work on XP x64, make sure you have the latest
platform SDK installed on these machines. I plan to build AMD64
binaries with the platform SDK, not with VS 2005.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adventures with ASTs - Inline Lambda

2006-02-18 Thread Talin

[EMAIL PROTECTED] wrote:

>talin> ... whereas with 'given' you can't be certain when to stop
>talin> parsing the argument list.
>
>So require parens around the arglist:
>
>(x*y given (x, y))
>
>Skip
>  
>
I would not be opposed to mandating the parens, and its an easy enough 
change to make. The patch on SF lets you do it both ways, which will 
give people who are interested a chance to get a feel for the various 
alternatives.

I realize of course that this is a moot point. But perhaps I can help to 
winnow down the dozens of rejected lambda replacement proposals to just 
a few rejected lamda proposals :)

-- Talin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>>
>> You seem to ignore this fact.
> 
> I'm not ignoring the fact that you have explained this
> many times. I just fail to understand your explanations.

Feel free to ask questions.

> For example, you said at some point that codecs are not
> restricted to Unicode. However, I don't recall any
> explanation what the restriction *is*, if any restriction
> exists. No such restriction seems to be documented.

The codecs are not restricted w/r to the data types
they work on. It's up to the codecs to define which
data types are valid and which they take on input and
return.

>> True. However, note that the .encode()/.decode() methods on
>> strings and Unicode narrow down the possible return types.
>> The corresponding .bytes methods should only allow bytes and
>> Unicode.
> 
> I forgot that: what is the rationale for that restriction?

To assure that only those types can be returned from those
methods, ie. instances of basestring, which in return permits
type inference for those methods.

The codecs functions encode() and decode() don't have these
restrictions, and thus provide a generic interface to the
codec's encode and decode functions. It's up to the caller
to restrict the allowed encodings and as result the
possible input/output types.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-18 Thread M.-A. Lemburg

Walter Dörwald wrote:
> M.-A. Lemburg wrote:
>> Walter Dörwald wrote:
>> I'd suggest we keep codecs.lookup() the way it is and
>> instead add new functions to the codecs module, e.g.
>> codecs.getencoderobject() and codecs.getdecoderobject().
>>
>> Changing the codec registration is not much of a problem:
>> we could simply allow 6-tuples to be passed into the
>> registry.
> OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples 
> and the search functions must return 6-tuples.
> And we add codecs.getencoderobject() and codecs.getdecoderobject() as 
> well as new classes codecs.StatefulEncoder and
> codecs.StatefulDecoder. What about old search functions that return 
> 4-tuples?
 The registry should then simply set the missing entries to None and the 
 getencoderobject()/getdecoderobject() would then
 have
 to raise an error.
>>> Sounds simple enough and we don't loose backwards compatibility.
>>>
 Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
>>> +1, but I'd like to have a replacement for this, i.e. a function that 
>>> returns all info the registry has about an encoding:
>>>
>>> 1. Name
>>> 2. Encoder function
>>> 3. Decoder function
>>> 4. Stateful encoder factory
>>> 5. Stateful decoder factory
>>> 6. Stream writer factory
>>> 7. Stream reader factory
>>>
>>> and if this is an object with attributes, we won't have any problems if we 
>>> extend it in the future.
>> Shouldn't be a problem: just expose the registry dictionary
>> via the _codecs module.
>>
>> The rest can then be done in a Python function defined in
>> codecs.py using a CodecInfo class.
> 
> This would require the Python code to call codecs.lookup() and then look into 
> the codecs dictionary (normalizing the encoding
> name again). Maybe we should make a version of __PyCodec_Lookup() that allows 
> 4- and 6-tuples available to Python and use that?
> The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 
> 4-tuples.

Hmm, you're right: the dictionary may not have the requested codec
info yet (it's only used as cache) and only a call to _PyCodec_Lookup()
would fill it.

>>> BTW, if we change the API, can we fix the return value of the stateless 
>>> functions? As the stateless function always
>>> encodes/decodes the complete string, returning the length of the string 
>>> doesn't make sense.
>>> codecs.getencoder() and codecs.getdecoder() would have to continue to 
>>> return the old variant of the functions, but
>>> codecs.getinfo("latin-1").encoder would be the new encoding function.
>> No: you can still write stateless encoders or decoders that do
>> not process the whole input string. Just because we don't have
>> any of those in Python, doesn't mean that they can't be written
>> and used. A stateless codec might want to leave the work
>> of buffering bytes at the end of the input data which cannot
>> be processed to the caller.
> 
> But what would the call do with that info? It can't retry encoding/decoding 
> the rejected input, because the state of the codec
> has been thrown away already.

This depends a lot on the nature of the codec. It may well be
possible to work on chunks of input data in a stateless way,
e.g. say you have a string of 4-byte hex values, then the decode
function would be able to work on 4 bytes each and let the caller
buffer any remaining bytes for the next call. There'd be no need for
keeping state in the decoder function.

>> It is also possible to write
>> stateful codecs on top of such stateless encoding and decoding
>> functions.
> 
> That's what the codec helper functions from Python/_codecs.c are for.

I'm not sure what you mean here.

> Anyway, I've started implementing a patch that just adds 
> codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16,
> UTF-16-LE and UTF-16-BE are already working.

Nice :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adventures with ASTs - Inline Lambda

2006-02-18 Thread Nick Coghlan

Talin wrote:
> [EMAIL PROTECTED] wrote:
> 
>>talin> ... whereas with 'given' you can't be certain when to stop
>>talin> parsing the argument list.
>>
>> So require parens around the arglist:
>>
>>(x*y given (x, y))
>>
>> Skip
>>  
>>
> I would not be opposed to mandating the parens, and its an easy enough 
> change to make. The patch on SF lets you do it both ways, which will 
> give people who are interested a chance to get a feel for the various 
> alternatives.

Another ambiguity is that when they're optional it is unclear whether or not 
adding them means the callable now expects a tuple argument (i.e., doubled 
parens at the call site). If they're mandatory, then it is clear that only 
doubled parentheses at the definition point require doubled parentheses at the 
call site (this is, not coincidentally, exactly the same rule as applies for 
normal functions).

> I realize of course that this is a moot point. But perhaps I can help to 
> winnow down the dozens of rejected lambda replacement proposals to just 
> a few rejected lamda proposals :)

Heh.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis

M.-A. Lemburg wrote:
>>>True. However, note that the .encode()/.decode() methods on
>>>strings and Unicode narrow down the possible return types.
>>>The corresponding .bytes methods should only allow bytes and
>>>Unicode.
>>
>>I forgot that: what is the rationale for that restriction?
> 
> 
> To assure that only those types can be returned from those
> methods, ie. instances of basestring, which in return permits
> type inference for those methods.

Hmm. So it for type inference
Where is that documented?

This looks pretty inconsistent. Either codecs can give arbitrary
return types, then .encode/.decode should also be allowed to
give arbitrary return types, or codecs should be restricted.
What's the point of first allowing a wide interface, and then
narrowing it?

Also, if type inference is the goal, what is the point in allowing
two result types?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread James Y Knight

On Feb 18, 2006, at 2:33 AM, Martin v. Löwis wrote:
> I don't understand. In the rationale of PEP 333, it says
> "The rationale for requiring a dictionary is to maximize portability
> between servers. The alternative would be to define some subset of a
> dictionary's methods as being the standard and portable interface."
>
> That rationale is not endangered: if the environment continues to
> be a dict exactly, servers continue to be guaranteed what precise
> set of operations is available on the environment.

Yes it is endangered.

> Well, as you say: you get a KeyError if there is an error with the  
> key.
> With a default_factory, there isn't normally an error with the key.

But there should be. Consider the case of two servers. One which  
takes all the items out of the dictionary (using items()) and puts  
them in some other data structure. Then it checks if the "Date"  
header has been set. It was not, so it adds it. Consider another  
similar server which checks if the "Date" header has been set on the  
dict passed in by the user. The default_factory then makes one up.  
Different behavior due to internal implementation details of how the  
server uses the dict object, which is what the restriction to  
_exactly_ dict prevents.

Consider another server which takes the dict instance and transports  
it across thread boundaries, from the wsgi-app's thread to the main  
server thread. Because WSGI specifies that you can only use 'dict',  
and the server checked that type(obj) == dict, it is guaranteed that  
using the dict won't run thread-unsafe code. That is now broken,  
since dict.__getitem__ can now invoke arbitrary user code. That is a  
major change.

James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The decorator(s) module

2006-02-18 Thread Georg Brandl

Alex Martelli wrote:
> On Feb 18, 2006, at 12:38 AM, Georg Brandl wrote:
> 
>> Guido van Rossum wrote:
>>> WFM. Patch anyone?
>>
>> Done.
>> http://python.org/sf/1434038
> 
> I reviewed the patch and added a comment on it,  but since the point  
> may be controversial I had better air it here for discussion: in 2.4,  
> property(fset=acallable) does work (maybe silly, but it does make a  
> write-only property) -- with the patch as given, it would stop  
> working (due to attempts to get __doc__ from the None value of fget);  
> I think we should ensure it keeps working (and add a unit test to  
> that effect).

Yes, of course. Thanks for pointing that out.

I updated the patch and hope it's now bullet-proof when no fget argument
is given.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Martin v. Löwis

James Y Knight wrote:
> But there should be. Consider the case of two servers. One which  takes
> all the items out of the dictionary (using items()) and puts  them in
> some other data structure. Then it checks if the "Date"  header has been
> set. It was not, so it adds it. Consider another  similar server which
> checks if the "Date" header has been set on the  dict passed in by the
> user. The default_factory then makes one up.  Different behavior due to
> internal implementation details of how the  server uses the dict object,
> which is what the restriction to  _exactly_ dict prevents.

Right. I would claim that this is an artificial example: you can't
provide a HTTP_DATE value in a default_factory implementation, since
you don't know what the key is.

However, you are now making up a different rationale from the one the
PEP specifies: The PEP says that you need an "exact dict" so that
everybody knows precisely how the  dictionary behaves; instead of having
to define which precise subset of the dict API  is to be used.

*That* goal is still achieved: everybody knows that the dict might
have an on_missing/default_factory implementation. So to find out
whether HTTP_DATE has a value (which might be defaulted), you need
to invoke d['HTTP_DATE'].

> Consider another server which takes the dict instance and transports  it
> across thread boundaries, from the wsgi-app's thread to the main  server
> thread. Because WSGI specifies that you can only use 'dict',  and the
> server checked that type(obj) == dict, it is guaranteed that  using the
> dict won't run thread-unsafe code. That is now broken,  since
> dict.__getitem__ can now invoke arbitrary user code. That is a  major
> change.

Not at all. dict.__getitem__ could always invoke arbitrary user code,
through __hash__.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Adam Olsen

On 2/18/06, James Y Knight <[EMAIL PROTECTED]> wrote:
> On Feb 18, 2006, at 2:33 AM, Martin v. Löwis wrote:
> > Well, as you say: you get a KeyError if there is an error with the
> > key.
> > With a default_factory, there isn't normally an error with the key.
>
> But there should be. Consider the case of two servers. One which
> takes all the items out of the dictionary (using items()) and puts
> them in some other data structure. Then it checks if the "Date"
> header has been set. It was not, so it adds it. Consider another
> similar server which checks if the "Date" header has been set on the
> dict passed in by the user. The default_factory then makes one up.
> Different behavior due to internal implementation details of how the
> server uses the dict object, which is what the restriction to
> _exactly_ dict prevents.

It just occured to me, what affect does this have on repr?  Does it
attempt to store the default_factory in the representation, or does it
remove it?  Is it even possible to store a reference to a builtin such
as list and have eval restore it?

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
 True. However, note that the .encode()/.decode() methods on
 strings and Unicode narrow down the possible return types.
 The corresponding .bytes methods should only allow bytes and
 Unicode.
>>> I forgot that: what is the rationale for that restriction?
>>
>> To assure that only those types can be returned from those
>> methods, ie. instances of basestring, which in return permits
>> type inference for those methods.
> 
> Hmm. So it for type inference
> Where is that documented?

Somewhere in the python-dev mailing list archives ;-)

Seriously, we should probably add this to the documentation.

> This looks pretty inconsistent. Either codecs can give arbitrary
> return types, then .encode/.decode should also be allowed to
> give arbitrary return types, or codecs should be restricted.

No.

As I've said before: the .encode() and .decode() methods
are convenience methods to interface to codecs which take
string/Unicode on input and create string/Unicode output.

> What's the point of first allowing a wide interface, and then
> narrowing it?

The codec interface is an abstract interface. It is a flexible
enough to allow codecs to define possible input and output
types while being strict about the method names and signatures.

Much like the file interface in Python, the copy protocol
or the pickle interface.

> Also, if type inference is the goal, what is the point in allowing
> two result types?

I'm not sure I understand the question: type inference is about
being able to infer the types of (among other things) function
return objects. This is what the restriction guarantees - much
like int() guarantees that you get either an integer or a long.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Josiah Carlson


Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
[snip]
> > Again, the problem is ambiguity; what does bytes.recode(something) mean?
> > Are we encoding _to_ something, or are we decoding _from_ something? 
> 
> This was just an example of one way that might work, but here are my 
> thoughts on why I think it might be good.
> 
> In this case, the ambiguity is reduced as far as the encoding and 
> decodings opperations are concerned.)
> 
>   somestring = encodings.tostr( someunicodestr, 'latin-1')
> 
> It's pretty clear what is happening to me.
> 
>  It will encode to a string an object, named someunicodestr, with 
> the 'latin-1' encoder.

But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
unicode(..., 'latin-1')?

What about string transformations:
somestring = encodings.tostr(somestr, 'base64')

How do we get that back?  encodings.tostr() again is completely
ambiguous, str(somestring, 'base64') seems a bit awkward (switching
namespaces)?


> And also rusult in clear errors if the specified encoding is 
> unavailable, and if it is, if it's not compatible with the given 
> *someunicodestr* obj type.
> 
> Further hints could be gained by.
> 
>  help(encodings.tostr)
> 
> Which could result in... something like...
>  """
>  encoding.tostr( ,  ) -> string
> 
>  Encode a unicode string using a encoder codec to a
>  non-unicode string or transform a non-unicode string
>  to another non-unicode string using an encoder codec.
>  """
> 
> And if that's not enough, then help(encodings) could give more clues. 
> These steps would be what I would do. And then the next thing would be 
> to find the python docs entry on encodings.
> 
> Placing them in encodings seems like a fairly good place to look for 
> these functions if you are working with encodings.  So I find that just 
> as convenient as having them be string methods.
> 
> There is no intermediate default encoding involved above, (the bytes 
> object is used instead), so you wouldn't get some of the messages the 
> present system results in when ascii is the default.
> 
> (Yes, I know it won't when P3K is here also)
> 
> > Are we going to need to embed the direction in the encoding/decoding
> > name (to_base64, from_base64, etc.)?  That doesn't any better than
> > binascii.b2a_base64 .  
> 
> No, that's why I suggested two separate lists (or dictionaries might be 
> better).  They can contain the same names, but the lists they are in 
> determine the context and point to the needed codec.  And that step is 
> abstracted out by putting it inside the encodings.tostr() and 
> encodings.tounicode() functions.
> 
> So either function would call 'base64' from the correct codec list and 
> get the correct encoding or decoding codec it needs.

Either the API you have described is incomplete, you haven't noticed the
directional ambiguity you are describing, or I have completely lost it.


> > What about .reencode and .redecode?  It seems as
> > though the 're' added as a prefix to .encode and .decode makes it
> > clearer that you get the same type back as you put in, and it is also
> > unambiguous to direction.
> 
> But then wouldn't we end up with multitude of ways to do things?
> 
>  s.encode(codec) == s.redecode(codec)
>  s.decode(codec) == s.reencode(codec)
>  unicode(s, codec) == s.decode(codec)
>  str(u, codec) == u.encode(codec)
>  str(s, codec) == s.encode(codec)
>  unicode(s, codec) == s.reencode(codec)
>  str(u, codec) == s.redecode(codec)
>  str(s, codec) == s.redecode(codec)
> 
> Umm .. did I miss any?  Which ones would you remove?
> 
> Which ones of those will succeed with which codecs?

I must not be expressing myself very well.

Right now:
s.encode() -> s
s.decode() -> s, u
u.encode() -> s, u
u.decode() -> u

Martin et al's desired change to encode/decode:
s.decode() -> u
u.encode() -> s

No others.

What my thoughts on .reencode() and .redecode() would get you given
Martin et al's desired change:
s.reencode() -> s (you get encoded strings as strings)
s.redecode() -> s (you get decoded strings as strings)
u.reencode() -> u (you get encoded unicode as unicode)
u.redecode() -> u (you get decoded unicode as unicode)

If one wants to go from unicode to string, one uses .encode(). If one
wants to go from string to unicode, one uses .decode().  If one wants to
keep their type unchanged, but encode or decode the data/text, one would
use .reencode() and .redecode(), depending on whether their source is an
encoded block of data, or the original data they want to encode.

The other bonus is that if given .reencode() and .redecode(), one can
quite easily verify that the source is possible as a source, and that
you would get back the proper type.  How this would occur behind the
scenes is beyond the scope of this discussion, but it seems to me to be
easy, given what I've read about the current mechanism.

Whether the constru

Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-18 Thread Walter Dörwald

M.-A. Lemburg wrote:
> Walter Dörwald wrote:
>> M.-A. Lemburg wrote:
>>> Walter Dörwald wrote:
 [...]
> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
 +1, but I'd like to have a replacement for this, i.e. a function that 
 returns all info the registry has about an encoding:

 1. Name
 2. Encoder function
 3. Decoder function
 4. Stateful encoder factory
 5. Stateful decoder factory
 6. Stream writer factory
 7. Stream reader factory

 and if this is an object with attributes, we won't have any problems if we 
 extend it in the future.
>>> Shouldn't be a problem: just expose the registry dictionary
>>> via the _codecs module.
>>>
>>> The rest can then be done in a Python function defined in
>>> codecs.py using a CodecInfo class.
>>
>> This would require the Python code to call codecs.lookup() and then look 
>> into the codecs dictionary (normalizing the
>> encoding name again). Maybe we should make a version of __PyCodec_Lookup() 
>> that allows 4- and 6-tuples available to Python
>> and use that? The official PyCodec_Lookup() would then have to downgrade the 
>> 6-tuples to 4-tuples.
>
> Hmm, you're right: the dictionary may not have the requested codec info yet 
> (it's only used as cache) and only a call to
> _PyCodec_Lookup() would fill it.

I'm now trying a different approach: codecs.lookup() returns a subclass of 
tuple. We could deprecate calling __getitem__() in
2.5/2.6 and then remove the tuple subclassing later.
 BTW, if we change the API, can we fix the return value of the stateless 
 functions? As the stateless function always
 encodes/decodes the complete string, returning the length of the string 
 doesn't make sense. codecs.getencoder() and
 codecs.getdecoder() would have to continue to return the old variant of 
 the functions, but
 codecs.getinfo("latin-1").encoder would be the new encoding function.
>>> No: you can still write stateless encoders or decoders that do
>>> not process the whole input string. Just because we don't have
>>> any of those in Python, doesn't mean that they can't be written and used. A 
>>> stateless codec might want to leave the work
>>> of buffering bytes at the end of the input data which cannot
>>> be processed to the caller.
>>
>> But what would the call do with that info? It can't retry encoding/decoding 
>> the rejected input, because the state of the
>> codec has been thrown away already.
>
> This depends a lot on the nature of the codec. It may well be
> possible to work on chunks of input data in a stateless way,
> e.g. say you have a string of 4-byte hex values, then the decode
> function would be able to work on 4 bytes each and let the caller
> buffer any remaining bytes for the next call. There'd be no need for keeping 
> state in the decoder function.

So incomplete byte sequence would be silently ignored.

>>> It is also possible to write
>>> stateful codecs on top of such stateless encoding and decoding
>>> functions.
>>
>> That's what the codec helper functions from Python/_codecs.c are for.
>
> I'm not sure what you mean here.

_codecs.utf_8_decode() etc. use (result, count) tuples as the return value, 
because those functions are the building blocks of
the codecs themselves.
>> Anyway, I've started implementing a patch that just adds 
>> codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
>> UTF-16, UTF-16-LE and UTF-16-BE are already working.
>
> Nice :-)

gencodec.py is updated now too. The rest should be manageble too. I'll leave 
updating the CJKV codecs to Hye-Shik though.

Bye,
   Walter Dörwald



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Bernhard Herzog

"Guido van Rossum" <[EMAIL PROTECTED]> writes:

> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.

Why implement it in the dictionary type at all?  If, for intance, the
default value functionality were provided as a decorator, it could be
used with all kinds of mappings.  I.e. you could have something along
these lines:

class defaultwrapper(object):

def __init__(self, base, factory):
self.__base = base
self.__factory = factory

def __getitem__(self, key):
try:
return self.__base[key]
except KeyError:
value = self.__factory()
self.__base[key] = value
return value

def __getattr__(self, attr):
return getattr(self.__base, attr)


def test():
dd = defaultwrapper({}, list)
dd["abc"].append(1)
dd["abc"].append(2)
dd["def"].append(1)
assert sorted(dd.keys()) == ["abc", "def"]
assert sorted(dd.values()) == [[1], [1, 2]]
assert sorted(dd.items()) == [("abc", [1, 2]), ("def", [1])]
assert dd.has_key("abc")
assert not dd.has_key("xyz")


The precise semantics would have to be determined yet, of course.

> OTOH most reviewers here seem to appreciate on_missing() as a way to
> do various other ways of alterning a dict's __getitem__() behavior
> behind a caller's back -- perhaps it could even be (ab)used to
> implement case-insensitive lookup.

case-insensitive lookup could be implemented with another
wrapper/decorator.  If you need both case-insitivity and a default
value, you can easily stack the decorators.

   Bernhard

-- 
Intevation GmbH http://intevation.de/
Skencil   http://skencil.org/
Thuban  http://thuban.intevation.org/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Ron Adam

Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
>>
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
>>
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
> 
> Here's an idea I don't think I've seen before:
> 
> bytes.recode(b, src_encoding, dest_encoding)
> 
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.

Yes, but it's not just the encodings that are implicit, it is also the 
types.

s.encode(codec)  # explicit source type, ? dest type
s.decode(codec)  # explicit source type, ? dest type

encodings.tostr(obj, codec) # implicit *known* source type
# explicit dest type

encodings.tounicode(obj, codec) # implicit *known* source type
# explicit dest type

In this case the source is implicit, but there can be a well defined 
check to validate the source type against the codec being used.  It's my 
feeling the user *knows* what he already has, and so it's more important 
that the resulting object type is explicit.

In your suggestion...

bytes.recode(b, src_encoding, dest_incoding)

Here the encodings are both explicit, but the both the source and the 
destinations of the bytes are not.  Since it working on bytes, they 
could have come from anywhere, and after the translation they would then 
will be cast to the type the user *thinks* it should result in.  A 
source of errors that would likely pass silently.

The way I see it is the bytes type should be a lower level object that 
doesn't care what byte transformation it does. Ie.. they are all one way 
byte to byte transformations determined by context.  And it should have 
the capability to read from and write to types without translating in 
the same step.  Keep it simple.

Then it could be used as a lower level byte translator to implement 
encodings and other translations in encoding methods or functions 
instead of trying to make it replace the higher level functionality.

Cheers,
Ron

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Thomas Wouters

On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:

> It's by no means a Perl attitude.

In your eyes, perhaps. It certainly feels that way to me (or I wouldn't have
said it :). Perl happens to be full of general constructs that were added
because they were easy to add, or they were useful in edgecases. The
encode/decode methods remind me of that, even though I fully understand the
reasoning behind it, and the elegance of the implementation.

> The main reason is symmetry and the fact that strings and Unicode
> should be as similar as possible in order to simplify the task of
> moving from one to the other.

Yes, and this is a design choice I don't agree with. They're different
types. They do everything similarly, except when they are mixed together
(unicode takes precedence, in general, encoding the bytestring from the
default encoding.) Going from one to the other isn't symmetric, though. I
understand that you disagree; the disagreement is on the fundamental choice
of allowing 'encode' and 'decode' to do *more* than going from and to
unicode. I regret that decision, not the decision to make encode and decode
symmetric (which makes sense, after the decision to overgeneralize
encode/decode is made.)

> >  - The return value for the non-unicode encodings depends on the value of
> >the encoding argument.

> Not really: you'll always get a basestring instance.

Which is not a particularly useful distinction, since in any real world
application, you have to be careful not to mix unicode with (non-ascii)
bytestrings. The only way to reliably deal with unicode is to have it
well-contained (when migrating an application from using bytestrings to
using unicode) or to use unicode everywhere, decoding/encoding at
entrypoints. Containment is hard to achieve.

> Still, I believe that this is an educational problem. There are
> a couple of gotchas users will have to be aware of (and this is
> unrelated to the methods in question):
> 
> * "encoding" always refers to transforming original data into
>   a derived form
> 
> * "decoding" always refers to transforming a derived form of
>   data back into its original form
> 
> * for Unicode codecs the original form is Unicode, the derived
>   form is, in most cases, a string
> 
> As a result, if you want to use a Unicode codec such as utf-8,
> you encode Unicode into a utf-8 string and decode a utf-8 string
> into Unicode.
> 
> Encoding a string is only possible if the string itself is
> original data, e.g. some data that is supposed to be transformed
> into a base64 encoded form.
> 
> Decoding Unicode is only possible if the Unicode string itself
> represents a derived form, e.g. a sequence of hex literals.

Most of these gotchas would not have been gotchas had encode/decode only
been usable for unicode encodings.

> > That is why I disagree with the hypergeneralization of the encode/decode
> > methods
[..]
> That's because you only look at one specific task.

> Codecs also unify the various interfaces to common encodings
> such as base64, uu or zip which are not Unicode related.

No, I think you misunderstand. I object to the hypergeneralization of the
*encode/decode methods*, not the codec system. I would have been fine with
another set of methods for non-unicode transformations. Although I would
have been even more fine if they got their encoding not as a string, but as,
say, a module object, or something imported from a module.

Not that I think any of this matters; we have what we have and I'll have to
live with it ;)

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Terry Reedy


"Josiah Carlson" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]

> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> Are we encoding _to_ something, or are we decoding _from_ something?
> Are we going to need to embed the direction in the encoding/decoding
> name (to_base64, from_base64, etc.)?

To me, that seems simple and clear.  b.recode('from_base64') obviously 
requires that b meet the restrictions of base64.  Similarly for 'from_hex'.

> That doesn't any better than binascii.b2a_base64

I think 'from_base64' is *much* better.  I think there are now 4 
string-to-string transform modules that do similar things.  Not optimal to 
me.

 >What about .reencode and .redecode?  It seems as
> though the 're' added as a prefix to .encode and .decode makes it
> clearer that you get the same type back as you put in, and it is also
> unambiguous to direction.

To me, the 're' prefix is awkward, confusing, and misleading.

Terry J. Reedy



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] ssize_t branch merged

2006-02-18 Thread Travis E. Oliphant

Martin v. Löwis wrote:
> Neal Norwitz wrote:
> 
>>I suppose that might be nice, but would require configure magic.  I'm
>>not sure how it could be done on Windows.
> 
> 
> Contributions are welcome. On Windows, it can be hard-coded.
> 
> Actually, something like
> 
> #if SIZEOF_SIZE_T == SIZEOF_INT
> #define PY_SSIZE_T_MAX INT_MAX
> #elif SIZEOF_SIZE_T == SIZEOF_LONG
> #define PY_SSIZE_T_MAX LONG_MAX
> #else
> #error What is size_t equal to?
> #endif
> 
> might work.


Why not just

#if SIZEOF_SIZE_T == 2
#define PY_SSIZE_T_MAX 0x7fff
#elif SIZEOF_SIZE_T == 4
#define PY_SSIZE_T_MAX 0x7fff
#elif SIZEOF_SIZE_T == 8
#define PY_SSIZE_T_MAX 0x7fff
#elif SIZEOF_SIZE_T == 16
#define PY_SSIZE_T_MAX 0x7fff
#endif

?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Phillip J. Eby

At 01:44 PM 02/18/2006 -0500, James Y Knight wrote:
>On Feb 18, 2006, at 2:33 AM, Martin v. Löwis wrote:
> > I don't understand. In the rationale of PEP 333, it says
> > "The rationale for requiring a dictionary is to maximize portability
> > between servers. The alternative would be to define some subset of a
> > dictionary's methods as being the standard and portable interface."
> >
> > That rationale is not endangered: if the environment continues to
> > be a dict exactly, servers continue to be guaranteed what precise
> > set of operations is available on the environment.
>
>Yes it is endangered.

So we'll update the spec to say you can't use a dict that has the default 
set.  It's not reasonable to expect that language changes might not require 
updates to a PEP.  Certainly, we don't have to worry about being backward 
compatible when it's only Python 2.5 that's affected by the change.  :)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Ron Adam


Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
> [snip]
>>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
>>> Are we encoding _to_ something, or are we decoding _from_ something? 
>> This was just an example of one way that might work, but here are my 
>> thoughts on why I think it might be good.
>>
>> In this case, the ambiguity is reduced as far as the encoding and 
>> decodings opperations are concerned.)
>>
>>   somestring = encodings.tostr( someunicodestr, 'latin-1')
>>
>> It's pretty clear what is happening to me.
>>
>>  It will encode to a string an object, named someunicodestr, with 
>> the 'latin-1' encoder.
> 
> But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
> unicode(..., 'latin-1')?

Yes, Just do.

  someunicodestr = encoding.tounicode( somestring, 'latin-1')



> What about string transformations:
> somestring = encodings.tostr(somestr, 'base64')
 >
> How do we get that back?  encodings.tostr() again is completely
> ambiguous, str(somestring, 'base64') seems a bit awkward (switching
> namespaces)?

In the case where a string is converted to another string. It would 
probably be best to have a requirement that they all get converted to 
unicode as an intermediate step.  By doing that it becomes an explicit 
two step opperation.

 # string to string encoding
 u_string = encodings.tounicode(s_string, 'base64')
 s2_string = encodings.tostr(u_string, 'base64')

Or you could have a convenience function to do it in the encodings 
module also.

def strtostr(s, sourcecodec, destcodec):
u = tounicode(s, sourcecodec)
return tostr(u, destcodec)

Then...

s2 = encodings.strtostr(s, 'base64, 'base64)

Which would be kind of pointless in this example, but it would be a good 
way to test a codec.

assert s == s2


>>> Are we going to need to embed the direction in the encoding/decoding
>>> name (to_base64, from_base64, etc.)?  That doesn't any better than
>>> binascii.b2a_base64 .  
>> No, that's why I suggested two separate lists (or dictionaries might be 
>> better).  They can contain the same names, but the lists they are in 
>> determine the context and point to the needed codec.  And that step is 
>> abstracted out by putting it inside the encodings.tostr() and 
>> encodings.tounicode() functions.
>>
>> So either function would call 'base64' from the correct codec list and 
>> get the correct encoding or decoding codec it needs.
> 
> Either the API you have described is incomplete, you haven't noticed the
> directional ambiguity you are describing, or I have completely lost it.

Most likely I gave an incomplete description of the API in this case 
because there are probably several ways to implement it.



>>> What about .reencode and .redecode?  It seems as
>>> though the 're' added as a prefix to .encode and .decode makes it
>>> clearer that you get the same type back as you put in, and it is also
>>> unambiguous to direction.

...

 > I must not be expressing myself very well.
 >
> Right now:
> s.encode() -> s
> s.decode() -> s, u
> u.encode() -> s, u
> u.decode() -> u
> 
> Martin et al's desired change to encode/decode:
> s.decode() -> u
> u.encode() -> s
 >
 > No others.

Which would be similar to the functions I suggested.  The main 
difference is only weather it would be better to have them as methods or 
separate factory functions and the spelling of the names.  Both have 
their advantages I think.


>> The method bytes.recode(), always does a byte transformation which can 
>> be almost anything.  It's the context bytes.recode() is used in that 
>> determines what's happening.  In the above cases, it's using an encoding 
>> transformation, so what it's doing is precisely what you would expect by 
>> it's context.
> 
> Indeed, there is a translation going on, but it is not clear as to
> whether you are encoding _to_ something or _from_ something.  What does
> s.recode('base64') mean?  Are you encoding _to_ base64 or _from_ base64? 
> That's where the ambiguity lies.

Bengt didn't propose adding .recode() to the string types, but only the 
bytes type.  The byte type would "recode" the bytes using a specific 
transformation.  I believe his view is it's a lower level API than 
strings that can be used to implement the higher level encoding API 
with, not replace the encoding API.  Or that is they way I interpreted 
the suggestion.


>> There isn't a bytes.decode(), since that's just another transformation. 
>> So only the one method is needed.  Which makes it easer to learn.
> 
> But ambiguous.

What's ambiguous about it?  It's no more ambiguous than any math 
operation where you can do it one way with one operations and get your 
original value back with the same operation by using an inverse value.

n2=n+1; n3=n+(-1); n==n3
n2=n*2; n3=n*(.5); n==n3


>> Learning how the current system works comes awfully close to reverse 
>

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Josiah Carlson


Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > Ron Adam <[EMAIL PROTECTED]> wrote:
> >> Josiah Carlson wrote:
> > [snip]
> >>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> >>> Are we encoding _to_ something, or are we decoding _from_ something? 
> >> This was just an example of one way that might work, but here are my 
> >> thoughts on why I think it might be good.
> >>
> >> In this case, the ambiguity is reduced as far as the encoding and 
> >> decodings opperations are concerned.)
> >>
> >>   somestring = encodings.tostr( someunicodestr, 'latin-1')
> >>
> >> It's pretty clear what is happening to me.
> >>
> >>  It will encode to a string an object, named someunicodestr, with 
> >> the 'latin-1' encoder.
> > 
> > But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
> > unicode(..., 'latin-1')?
> 
> Yes, Just do.
> 
>   someunicodestr = encoding.tounicode( somestring, 'latin-1')
> 
> > What about string transformations:
> > somestring = encodings.tostr(somestr, 'base64')
>  >
> > How do we get that back?  encodings.tostr() again is completely
> > ambiguous, str(somestring, 'base64') seems a bit awkward (switching
> > namespaces)?
> 
> In the case where a string is converted to another string. It would 
> probably be best to have a requirement that they all get converted to 
> unicode as an intermediate step.  By doing that it becomes an explicit 
> two step opperation.
> 
>  # string to string encoding
>  u_string = encodings.tounicode(s_string, 'base64')
>  s2_string = encodings.tostr(u_string, 'base64')

Except that ambiguates it even further.

Is encodings.tounicode() encoding, or decoding?  According to everything
you have said so far, it would be decoding.  But if I am decoding binary
data, why should it be spending any time as a unicode string?  What do I
mean?

x = f.read() #x contains base-64 encoded binary data
y = encodings.to_unicode(x, 'base64')

y now contains BINARY DATA, except that it is a unicode string

z = encodings.to_str(y, 'latin-1')

Later you define a str_to_str function, which I (or someone else) would
use like:

z = str_to_str(x, 'base64', 'latin-1')

But the trick is that I don't want some unicode string encoded in
latin-1, I want my binary data unencoded.  They may happen to be the
same in this particular example, but that doesn't mean that it makes any
sense to the user.

[...]

> >>> What about .reencode and .redecode?  It seems as
> >>> though the 're' added as a prefix to .encode and .decode makes it
> >>> clearer that you get the same type back as you put in, and it is also
> >>> unambiguous to direction.
> 
> ...
> 
>  > I must not be expressing myself very well.
>  >
> > Right now:
> > s.encode() -> s
> > s.decode() -> s, u
> > u.encode() -> s, u
> > u.decode() -> u
> > 
> > Martin et al's desired change to encode/decode:
> > s.decode() -> u
> > u.encode() -> s
>  >
>  > No others.
> 
> Which would be similar to the functions I suggested.  The main 
> difference is only weather it would be better to have them as methods or 
> separate factory functions and the spelling of the names.  Both have 
> their advantages I think.

While others would disagree, I personally am not a fan of to* or from*
style namings, for either function names (especially in the encodings
module) or methods.  Just a personal preference.

Of course, I don't find the current situation regarding
str/unicode.encode/decode to be confusing either, but maybe it's because
my unicode experience is strictly within the realm of GUI widgets, where
compartmentalization can be easier.


> >> The method bytes.recode(), always does a byte transformation which can 
> >> be almost anything.  It's the context bytes.recode() is used in that 
> >> determines what's happening.  In the above cases, it's using an encoding 
> >> transformation, so what it's doing is precisely what you would expect by 
> >> it's context.

[THIS IS THE AMBIGUITY]
> > Indeed, there is a translation going on, but it is not clear as to
> > whether you are encoding _to_ something or _from_ something.  What does
> > s.recode('base64') mean?  Are you encoding _to_ base64 or _from_ base64? 
> > That's where the ambiguity lies.
> 
> Bengt didn't propose adding .recode() to the string types, but only the 
> bytes type.  The byte type would "recode" the bytes using a specific 
> transformation.  I believe his view is it's a lower level API than 
> strings that can be used to implement the higher level encoding API 
> with, not replace the encoding API.  Or that is they way I interpreted 
> the suggestion.

But again, what would the transformation be?  To something?  From
something?  'to_base64', 'from_base64', 'to_rot13' (which happens to be
identical to) 'from_rot13', ...  Saying it would "recode ... using a
specific transformation" is a cop-out, what would the translation be? 
How would it work?  How would it be sp

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Greg Ewing

Would people perhaps feel better if defaultdict
*wasn't* a subclass of dict, but a distinct mapping
type of its own? That would make it clearer that it's
not meant to be a drop-in replacement for a dict
in arbitrary contexts.

Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Raymond Hettinger

[Greg Ewing]
> Would people perhaps feel better if defaultdict
> *wasn't* a subclass of dict, but a distinct mapping
> type of its own? That would make it clearer that it's
> not meant to be a drop-in replacement for a dict
> in arbitrary contexts.

Absolutely.  That's the right way to avoid Liskov violations from altered 
invariants and API changes.  Besides, with Python's propensity for duck typing, 
there's no reason to subclass when we don't have to.


Raymond


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Greg Ewing

Bengt Richter wrote:

> My guess is that realistically default_factory will be used
> to make clean code for filling a dict, and then turning the factory
> off if it's to be passed into unknown contexts.

This suggests that maybe the autodict behaviour shouldn't
be part of the dict itself, but provided by a wrapper
around the dict.

The you can fill the dict through the wrapper, and still
have a normal dict underneath to use for other purposes.

Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Bengt Richter

On Sat, 18 Feb 2006 10:44:15 +0100 (CET), "=?iso-8859-1?Q?Walter_D=F6rwald?=" 
<[EMAIL PROTECTED]> wrote:

>Guido van Rossum wrote:
>> On 2/17/06, Ian Bicking <[EMAIL PROTECTED]> wrote:
>>> Guido van Rossum wrote:
>>> > d =3D {}
>>> > d.default_factory =3D set
>>> > ...
>>> > d[key].add(value)
>>>
>>> Another option would be:
>>>
>>>d =3D {}
>>>d.default_factory =3D set
>>>d.get_default(key).add(value)
>>>
>>> Unlike .setdefault, this would use a factory associated with the diction=
>ary, and no default value would get passed in.
>>> Unlike the proposal, this would not override __getitem__ (not overriding
>>> __getitem__ is really the only difference with the proposal).  It would =
>be clear reading the code that you were not
>>> implicitly asserting they "key in d" was true.
>>>
>>> "get_default" isn't the best name, but another name isn't jumping out at=
> me at the moment.  Of course, it is not a Pythonic
>>> argument to say that an existing method should be overridden, or functio=
>nality made nameless simply because we can't think
>>> of a name (looking to anonymous functions of course ;)
>>
>> I'm torn. While trying to implement this I came across some ugliness in P=
>yDict_GetItem() -- it would make sense if this also
>> called
>> on_missing(), but it must return a value without incrementing its
>> refcount, and isn't supposed to raise exceptions -- so what to do if on_m=
>issing() returns a value that's not inserted in the
>> dict?
>>
>> If the __getattr__()-like operation that supplies and inserts a
>> dynamic default was a separate method, we wouldn't have this problem.
>>
>> OTOH most reviewers here seem to appreciate on_missing() as a way to do v=
>arious other ways of alterning a dict's
>> __getitem__() behavior behind a caller's back -- perhaps it could even be=
> (ab)used to
>> implement case-insensitive lookup.
>
>I don't like the fact that on_missing()/default_factory can change the beha=
>viour of __getitem__, which upto now has been
>something simple and understandable.
>Why don't we put the on_missing()/default_factory functionality into get() =
>instead?
>
>d.get(key, default) does what it did before. d.get(key) invokes on_missing(=
>) (and dict would have default_factory =3D=3D type(None))
>
OTOH, I forgot why it was desirable in the first place to overload d[k]
with defaulting logic. E.g., why wouldn't d.defaulting[k] be ok to write
when you want the d.default_factory action?

on_missing feels more like a tracing hook though, so maybe it could always
act either way if defined.

Also, for those wanting to avoid lambda:42 as factory, would a callable test
cost a lot? Of course then the default_factory name might require revision.

Regards,
Bengt Richter

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] buildbot is all green

2006-02-18 Thread Neal Norwitz

http://www.python.org/dev/buildbot/

Whoever is first to break the build, buys a round of drinks at PyCon! 
That's over 400 people and counting: 
http://www.python.org/pycon/2006/pycon-attendees.txt

Remember to run the tests *before* checkin. :-)

n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Steve Holden

Martin v. Löwis wrote:
> Guido van Rossum wrote:
> 
>>Feedback?
> 
> 
> I would like this to be part of the standard dictionary type,
> rather than being a subtype.
> 
> d.setdefault([]) (one argument) should install a default value,
> and d.cleardefault() should remove that setting; d.default
> should be read-only. Alternatively, d.default could be assignable
> and del-able.
> 
The issue with setting the default this way is that a copy would have to 
be created if the behavior was to differ from the sometimes-confusing 
default argument behavior for functions.


> Also, I think has_key/in should return True if there is a default.
> 
It certainly seems desirable to see True where d[some_key] doesn't raise 
an exception, but one could argue either way.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Steve Holden

Guido van Rossum wrote:
> On 2/16/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> 
>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>with this behavior (but implemented in C) would be a good addition to
>>the language. It looks like it wouldn't be hard to implement. It could
>>be a builtin named defaultdict. The first, required, argument to the
>>constructor should be the default value. Remaining arguments (even
>>keyword args) are passed unchanged to the dict constructor.
> 
> 
> Thanks for all the constructive feedback. Here are some responses and
> a new proposal.
> 
> - Yes, I'd like to kill setdefault() in 3.0 if not sooner.
> 
> - It would indeed be nice if this was an optional feature of the
> standard dict type.
> 
> - I'm ignoring the request for other features (ordering, key
> transforms). If you want one of these, write a PEP!
> 
> - Many, many people suggested to use a factory function instead of a
> default value. This is indeed a much better idea (although slightly
> more cumbersome for the simplest cases).
> 
One might think about calling it if it were callable, otherwise using it 
literally. Of course this would require jiggery-pokery int eh cases 
where you actually *wantes* the default value to be a callable (you'd 
have to provide a callable to return the callable as a default).

> - Some people seem to think that a subclass constructor signature must
> match the base class constructor signature. That's not so. The
> subclass constructor must just be careful to call the base class
> constructor with the correct arguments. Think of the subclass
> constructor as a factory function.
> 
True, but then this does get in the way of treating the base dict and 
its defaulting subtype polymorphically. That might not be a big issue.

> - There's a fundamental difference between associating the default
> value with the dict object, and associating it with the call. So
> proposals to invent a better name/signature for setdefault() don't
> compete. (As to one specific such proposal, adding an optional bool as
> the 3rd argument to get(), I believe I've explained enough times in
> the past that flag-like arguments that always get a constant passed in
> at the call site are a bad idea and should usually be refactored into
> two separate methods.)
> 
> - The inconsistency introduced by __getitem__() returning a value for
> keys while get(), __contains__(), and keys() etc. don't show it,
> cannot be resolved usefully. You'll just have to live with it.
> Modifying get() to do the same thing as __getitem__() doesn't seem
> useful -- it just takes away a potentially useful operation.
> 
> So here's a new proposal.
> 
> Let's add a generic missing-key handling method to the dict class, as
> well as a default_factory slot initialized to None. The implementation
> is like this (but in C):
> 
> def on_missing(self, key):
>   if self.default_factory is not None:
> value = self.default_factory()
> self[key] = value
> return value
>   raise KeyError(key)
> 
> When __getitem__() (and *only* __getitem__()) finds that the requested
> key is not present in the dict, it calls self.on_missing(key) and
> returns whatever it returns -- or raises whatever it raises.
> __getitem__() doesn't need to raise KeyError any more, that's done by
> on_missing().
> 
> The on_missing() method can be overridden to implement any semantics
> you want when the key isn't found: return a value without inserting
> it, insert a value without copying it, only do it for certain key
> types/values, make the default incorporate the key, etc.
> 
> But the default implementation is designed so that we can write
> 
> d = {}
> d.default_factory = list
> 
> to create a dict that inserts a new list whenever a key is not found
> in __getitem__(), which is most useful in the original use case:
> implementing a multiset so that one can write
> 
> d[key].append(value)
> 
> to add a new key/value to the multiset without having to handle the
> case separately where the key isn't in the dict yet. This also works
> for sets instead of lists:
> 
> d = {}
> d.default_factory = set
> ...
> d[key].add(value)
> 
This seems like a very good compromise.

[non-functional alternatives ...]
> 
regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Josiah Carlson


Greg Ewing <[EMAIL PROTECTED]> wrote:
> Bengt Richter wrote:
> 
> > My guess is that realistically default_factory will be used
> > to make clean code for filling a dict, and then turning the factory
> > off if it's to be passed into unknown contexts.
> 
> This suggests that maybe the autodict behaviour shouldn't
> be part of the dict itself, but provided by a wrapper
> around the dict.
> 
> The you can fill the dict through the wrapper, and still
> have a normal dict underneath to use for other purposes.

I prefer this to changing dictionaries directly.  The actual wrapper
could sit in the collections module, ready for subclassing/replacement
of the on_missing method.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Raymond Hettinger

> > Also, I think has_key/in should return True if there is a default.

> It certainly seems desirable to see True where d[some_key]
> doesn't raise an exception, but one could argue either way.

Some things can be agreed by everyone:

* if __contains__ always returns True, then it is a useless feature (since 
scripts containing a line such as "if k in dd" can always eliminate that line 
without affecting the algorithm).

* if defaultdicts are supposed to be drop-in dict substitutes, then having
__contains__ always return True will violate basic dict invariants:
   del d[some_key]
   assert some_key not in d


Raymond 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread Ron Adam

Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:

> Except that ambiguates it even further.
>
> Is encodings.tounicode() encoding, or decoding?  According to everything
> you have said so far, it would be decoding.  But if I am decoding binary
> data, why should it be spending any time as a unicode string?  What do I
> mean?

Encoding and decoding are relative concepts.  It's all encoding from one
thing to another.  Weather it's "decoding" or "encoding" depends on the
relationship of the current encoding to a standard encoding.

The confusion introduced by "decode" is when the 'default_encoding'
changes, will change, or is unknown.

> x = f.read() #x contains base-64 encoded binary data
> y = encodings.to_unicode(x, 'base64')
> 
> y now contains BINARY DATA, except that it is a unicode string

No, that wasn't what I was describing.  You get a Unicode string object
as the result, not a bytes object with binary data.  See the toy example
at the bottom.

> z = encodings.to_str(y, 'latin-1')
> 
> Later you define a str_to_str function, which I (or someone else) would
> use like:
> 
> z = str_to_str(x, 'base64', 'latin-1')
> 
> But the trick is that I don't want some unicode string encoded in
> latin-1, I want my binary data unencoded.  They may happen to be the
> same in this particular example, but that doesn't mean that it makes any
> sense to the user.

If you want bytes then you would use the bytes() type to get bytes
directly.  Not encode or decode.

 binary_unicode = bytes(unicode_string)

The exact byte order and representation would need to be decided by the
python developers in this case.  The internal representation
'unicode-internal', is UCS-2 I believed.

>> It's no more ambiguous than any math 
>> operation where you can do it one way with one operations and get your 
>> original value back with the same operation by using an inverse value.
>>
>> n2=n+1; n3=n+(-1); n==n3
>> n2=n*2; n3=n*(.5); n==n3
> 
> Ahh, so you are saying 'to_base64' and 'from_base64'.  There is one
> major reason why I don't like that kind of a system: I can't just say
> encoding='base64' and use str.encode(encoding) and str.decode(encoding),
> I necessarily have to use, str.recode('to_'+encoding) and
> str.recode('from_'+encoding) .  Seems a bit awkward.

Yes, but the encodings API could abstract out the 'to_base64' and
'from_base64' so you can just say 'base64' and have it work either way.

Maybe a toy "incomplete" example might help.

# in module bytes.py or someplace else.
class bytes(list):
   """
   bytes methods defined here
   """

# in module encodings.py

# using a dict of lists, but other solutions would
# work just as well.
unicode_codecs = {
   'base64': ('from_base64', 'to_base64'),
   }

def tounicode(obj, from_codec):
b = bytes(obj)
b = b.recode(unicode_codecs[from_codec][0])
return unicode(b)

def tostr(obj, to_codec):
b = bytes(obj)
b = b.recode(unicode_codecs[to_codec][1])
return str(b)

# in your application

import encodings

... a bunch of code ...

u = encodings.tounicode(s, 'base64')

# or if going the other way

s = encodings.tostr(u, 'base64')

Does this help?  Is the relationship between the bytes object and the
encodings API clearer here?  If not maybe we should discuss it further
off line.

Cheers,
Ronald Adam

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Steve Holden

Martin v. Löwis wrote:
> Adam Olsen wrote:
> 
>>Still -1.  It's better, but it violates the principle of encapsulation
>>by mixing how-you-use-it state with what-it-stores state.  In doing
>>that it has the potential to break an API documented as accepting a
>>dict.  Code that expects d[key] to raise an exception (and catches the
>>resulting KeyError) will now silently "succeed".
> 
> 
> Of course it will, and without quotes. That's the whole point.
> 
> 
>>I believe that necessitates a PEP to document it.
> 
> 
> You are missing the rationale of the PEP process. The point is
> *not* documentation. The point of the PEP process is to channel
> and collect discussion, so that the BDFL can make a decision.
> The BDFL is not bound at all to the PEP process.
> 
> To document things, we use (or should use) documentation.
> 
>
One could wish this ideal had been the case for the import extensions 
defined in PEP 302.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006  www.python.org/pycon/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildbot is all green

2006-02-18 Thread Benji York

Neal Norwitz wrote:
> http://www.python.org/dev/buildbot/

If there's interest in slightly nicer buildbot CSS (something like 
http://buildbot.zope.org/) I'd be glad to contribute.
--
Benji York
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Terry Reedy

> Quoting [EMAIL PROTECTED]:
>> The only question in my mind is whether or not getting a non-existent 
>> value
>> under the influence of a given default value should stick that value in 
>> the
>> dictionary or not.

It seems to me that there are at least two types of default dicts, which 
have opposite answers to that question.

One is a 'universal dict' that maps every key to something -- the default 
if nothing else.  That should not have the default ever explicitly entered. 
Udict.keys() should only give the keys *not* mapped to the universal value.

Another is the accumlator dict.  The default value is the identity (0, [], 
or whatever) for the type of accumulation.  An adict must have the identity 
added, even though that null will usually be immedially incremented by +=1 
or .append(ob) or whatever.

Guido's last proposal was for the default default_dict to cater to the 
second type (and others needing the same behavior) while catering to the 
first by making the default fill-in method over-rideable.

It we go with, for instance, wrappers in the collections module instead of 
modification of dict, then perhaps there should be at least two wrappers 
included, with each of these two behaviors.

Terry Jan Reedy



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] ssize_t branch merged

2006-02-18 Thread Martin v. Löwis

Travis E. Oliphant wrote:
> Why not just
> 
> #if SIZEOF_SIZE_T == 2
> #define PY_SSIZE_T_MAX 0x7fff
> #elif SIZEOF_SIZE_T == 4
> #define PY_SSIZE_T_MAX 0x7fff
> #elif SIZEOF_SIZE_T == 8
> #define PY_SSIZE_T_MAX 0x7fff
> #elif SIZEOF_SIZE_T == 16
> #define PY_SSIZE_T_MAX 0x7fff
> #endif

That would not work: 0x7fff is not a valid
integer literal. 0x7fffL might work,
or 0x7fffLL, or 0x7fffi64.
Which of these is correct depends on the compiler.

How to spell 128-bit integral constants, I don't know;
it appears that MS foresees a i128 suffix for them.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Martin v. Löwis

Raymond Hettinger wrote:
>>>Also, I think has_key/in should return True if there is a default.
> * if __contains__ always returns True, then it is a useless feature (since 
> scripts containing a line such as "if k in dd" can always eliminate that line 
> without affecting the algorithm).

If you mean "if __contains__ always returns True for a default dict,
then it is a useless feature", I disagree. The code using "if k in dd"
cannot be eliminated if you don't know that you have a default dict.

> * if defaultdicts are supposed to be drop-in dict substitutes, then having
> __contains__ always return True will violate basic dict invariants:
>del d[some_key]
>assert some_key not in d

If you have a default value, you cannot ultimately del a key. This
sequence is *not* a basic mapping invariant. If it was, then it would
be also an invariant that, after del d[some_key], d[some_key] will
raise a KeyError. This kind of invariant doesn't take into account
that there might be a default value.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildbot is all green

2006-02-18 Thread Nick Coghlan

Neal Norwitz wrote:
> http://www.python.org/dev/buildbot/
> 
> Whoever is first to break the build, buys a round of drinks at PyCon! 
> That's over 400 people and counting: 
> http://www.python.org/pycon/2006/pycon-attendees.txt
> 
> Remember to run the tests *before* checkin. :-)

I don't think we can blame Tim's recent checkins for test_logging subsequently 
breaking on Solaris though ;)

There still seems to be something a bit temperamental in that test. . .

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildbot is all green

2006-02-18 Thread Martin v. Löwis

Benji York wrote:
>>http://www.python.org/dev/buildbot/
> 
> 
> If there's interest in slightly nicer buildbot CSS (something like 
> http://buildbot.zope.org/) I'd be glad to contribute.

I personally don't care much about the visual look of web pages.
However, people have commented that the buildbot page is ugly,
so yes, please do contribute something.

Bonus points for visually separating the "trunk" columns from
the "2.4" columns. Would a vertical line be appropriate? Bigger
spacing?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildbot is all green

2006-02-18 Thread Martin v. Löwis

Neal Norwitz wrote:
> http://www.python.org/dev/buildbot/

Unfortunately, test_logging still fails sporadically on Solaris.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Raymond Hettinger

[Martin v. Löwis]
> If you have a default value, you cannot ultimately del a key. This
> sequence is *not* a basic mapping invariant.

You believe that key deletion is not basic to mappings?


> This kind of invariant doesn't take into account
> that there might be a default value.

Precisely.  Therefore, a defaultdict subclass violates the Liskov Substitution 
Principle.

Of course, the __del__ followed __contains__ sequence is not the only invariant 
that is thrown-off.  There are plenty of examples.  Here's one that is 
absolutely basic to the method's contract:

k, v = dd.popitem()
assert k not in dd

Any code that was expecting a dictionary and uses popitem() as a means of 
looping over and consuming entries will fail.

No one should kid themselves that a default dictionary is a drop-in substitute. 
Much of the dict's API has an ambiguous meaning when applied to defaultdicts.

If all keys are in-theory predefined, what is the meaning of len(dd)?

Should dd.items() include any entries where the value is equal to the default 
or 
should the collection never store those?  If the former, then how do you access 
the entries without looping over the whole contents?  If the latter, then do 
you 
worry that "dd[v]=k" does not imply "(k,v) in dd.items()"?


Raymond 

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildbot is all green

2006-02-18 Thread Georg Brandl

Neal Norwitz wrote:
> http://www.python.org/dev/buildbot/
> 
> Whoever is first to break the build, buys a round of drinks at PyCon! 
> That's over 400 people and counting: 
> http://www.python.org/pycon/2006/pycon-attendees.txt
> 
> Remember to run the tests *before* checkin. :-)

Don't we have a Windows slave yet?

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Martin v. Löwis

Raymond Hettinger wrote:
>> If you have a default value, you cannot ultimately del a key. This
>> sequence is *not* a basic mapping invariant.
> 
> 
> You believe that key deletion is not basic to mappings?

No, not in the sense that the key will go away through deletion.
I view a mapping as a modifiable partial function. There is some
initial key/value association (in a classic mapping, it is initially
empty), and then there are modifications. Key deletion means to
reset the key to the initial association.

> Of course, the __del__ followed __contains__ sequence is not the only
> invariant that is thrown-off.  There are plenty of examples.  Here's one
> that is absolutely basic to the method's contract:
> 
>k, v = dd.popitem()
>assert k not in dd
> 
> Any code that was expecting a dictionary and uses popitem() as a means
> of looping over and consuming entries will fail.

Well, code that loops over a dictionary using popitem typically
terminates when the dictionary becomes false (or its length becomes
zero). That code wouldn't be affected by the behaviour of "in".

> No one should kid themselves that a default dictionary is a drop-in
> substitute. Much of the dict's API has an ambiguous meaning when applied
> to defaultdicts.

Right. But it is only ambiguous until specified. Of course, in the face
of ambiguity, refuse the temptation to guess.

> If all keys are in-theory predefined, what is the meaning of len(dd)?

Taking my definition from the beginning of the message, it is the number
of keys that have been modified from the initial mapping.

> Should dd.items() include any entries where the value is equal to the
> default or should the collection never store those?

It should include all modified items, and none of the unmodified ones.
Explicitly assigning the default value still makes the entry modified;
you need to del it to set it back to "unmodified".

> If the former, then
> how do you access the entries without looping over the whole contents? 

Not sure I understand the question. You use d[k] to access an entry.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Raymond Hettinger

[Terry Reedy]
> One is a 'universal dict' that maps every key to something -- the default if 
> nothing else.  That should not have the default ever explicitly entered. 
> Udict.keys() should only give the keys *not* mapped to the universal value.

Would you consider it a mapping invariant that "k in dd" implies "k in 
dd.keys()"?

Is the notion of __contains__ at odds with notion of universality?


Raymond 

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal: defaultdict

2006-02-18 Thread Josiah Carlson

"Raymond Hettinger" <[EMAIL PROTECTED]> wrote:
> [Martin v. Löwis]
> > This kind of invariant doesn't take into account
> > that there might be a default value.
> 
> Precisely.  Therefore, a defaultdict subclass violates the Liskov 
> Substitution 
> Principle.

class defaultdict(dict):
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.on_missing(key)
def on_missing(self, key):
if not hasattr(self, 'default') or not callable(self.default):
raise KeyError, key
r = self[key] = self.default()
return r

In my opinion, the above implementation as a subclass "does the right
thing" in regards to __del__, __contains__, get, pop, popitem, __len__,
has_key, and anything else I can think of.  Does it violate the Liskov
Substitution Principle?  Yes, but only if user code relies on dd[key]
raising a KeyError on a lack of a key.  This can be easily remedied by
removing the default when it is unneeded, at which point, you get your
Liskov Substitution.

> Of course, the __del__ followed __contains__ sequence is not the only 
> invariant 
> that is thrown-off.  There are plenty of examples.  Here's one that is 
> absolutely basic to the method's contract:
> 
> k, v = dd.popitem()
> assert k not in dd
> 
> Any code that was expecting a dictionary and uses popitem() as a means of 
> looping over and consuming entries will fail.

>>> a = defaultdict()
>>> a.default = list
>>> a['hello']
[]
>>> k, v = a.popitem()
>>> assert k not in a
>>> 

Seems to work for the above implementation.

> No one should kid themselves that a default dictionary is a drop-in 
> substitute. 
> Much of the dict's API has an ambiguous meaning when applied to defaultdicts.

Actually, if one is careful, the dict's API is completely unchanged,
except for direct access to the object via b = a[i].

>>> del a['hello']
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'hello'
>>> 'hello' in a
False
>>> a.get('hello')
>>> a.pop('hello')
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'pop(): dictionary is empty'
>>> a.popitem()
Traceback (most recent call last):
  File "", line 1, in ?
KeyError: 'popitem(): dictionary is empty'
>>> len(a)
0
>>> a.has_key('hello')
False

> If all keys are in-theory predefined, what is the meaning of len(dd)?

It depends on the sequence of actions.  Play around with the above
defaultdict implementation.  From what I understood of Guido's original
post, this is essentially what he was proposing, only implemented in C.

> Should dd.items() include any entries where the value is equal to the default 
> or 
> should the collection never store those?

Yes, it should store any value which was stored via 'dd[k]=v', or any
default value created via access by 'v=dd[k]' .

> If the former, then how do you access 
> the entries without looping over the whole contents?

Presumably one is looking for a single kind of default (empty list, 0,
etc.) because one wanted to accumulate into them, similar to one of the
following...

for item, value in input:
try:
d[item] += value
#or d[item].append(value)
except KeyError:
d[item] = value
#or d[item] = [value]

which becomes

for item in input:
dd[item] += 1
#or dd[item].append(value)

Once accumulation has occurred, iteration over them via .iteritems(),
.items(), .popitem(), etc., would progress exactly the same way as with
a regular dictionary.  If the code which is using the accumulated data
does things like...

for key in wanted_keys:
try:
value = dd[key]
except KeyError:
continue
#do something nontrivial with value

rather than...

for key in wanted_keys:
if key not in dd:
continue
value = dd[key]
#do something nontrivial with value

Then the user has at least three options to make it 'work right':
1. User can change to using 'in' to iterate rather than relying on a
KeyError.
2. User could remember to remove the default.
3. User can create a copy of the default dictionary via dict(dd) and
pass it into the code which relies on the non-defaulting dictionary.

> If the latter, then do you 
> worry that "dd[v]=k" does not imply "(k,v) in dd.items()"?

I personally wouldn't want the latter.

My post probably hasn't convinced you, but much of the confusion, I
believe, is based on Martin's original belief that 'k in dd' should
always return true if there is a default.  One can argue that way, but
then you end up on the circular train of thought that gets you to "you
can't do anything useful if that is the case, .popitem() doesn't work,
len() is undefined, ...".  Keep it simple, keep it sane.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev

70 matches

Mail list logo