Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Ron Adam
Josiah Carlson wrote:
> Greg Ewing <[EMAIL PROTECTED]> wrote:
>>u = unicode(b)
>>u = unicode(b, 'utf8')
>>b = bytes['utf8'](u)
>>u = unicode['base64'](b)   # encoding
>>b = bytes(u, 'base64') # decoding
>>u2 = unicode['piglatin'](u1)   # encoding
>>u1 = unicode(u2, 'piglatin')   # decoding
> 
> Your provided semantics feel cumbersome and confusing to me, as compared
> with str/unicode.encode/decode() .
> 
>  - Josiah

This uses syntax to determine the direction of encoding.  It would be 
easier and clearer to just require two arguments or a tuple.

  u = unicode(b, 'encode', 'base64')
  b = bytes(u, 'decode', 'base64')

  b = bytes(u, 'encode', 'utf-8')
  u = unicode(b, 'decode', 'utf-8')

  u2 = unicode(u1, 'encode', 'piglatin')
  u1 = unicode(u2, 'decode', 'piglatin')



It looks somewhat cleaner if you combine them in a path style string.

  b = bytes(u, 'encode/utf-8')
  u = unicode(b, 'decode/utf-8')

Ron

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Arena-freeing obmalloc ready for testing

2006-03-02 Thread Tim Peters
[Tim Peters]
...
> Only obmalloc.c is changed in that branch, and you can get it directly from:
>
> 
> 

Heck no -- sorry, that pins it to an out-of-date revision.  Use the shorter



to get the current revision.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Arena-freeing obmalloc ready for testing

2006-03-02 Thread Fredrik Lundh
Tim Peters wrote:

> For simpler fun, run this silly little program, and look at memory
> consumption at the prompts:
>
> """
> x = []
> for i in xrange(100):
>x.append([])
> raw_input("full ")
> del x[:]
> raw_input("empty ")
> """
>
> For example, in a release build on WinXP, VM size is about 48MB at the
> "full" prompt, and drops to 3MB at the "empty" prompt.

hurray!

can we have more Tim sprints, please ?





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] ref leak w/except hooks

2006-03-02 Thread Neal Norwitz
The following code leaks a reference.  Original test case from
Lib/test/test_sys.py in test_original_excepthook.

import sys, StringIO
eh = sys.__excepthook__
try:
  raise ValueError(42)
except ValueError, exc:
  exc_type, exc_value, exc_tb = sys.exc_info()
  eh(exc_type, None, None)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Just van Rossum
Ron Adam wrote:

> Josiah Carlson wrote:
> > Greg Ewing <[EMAIL PROTECTED]> wrote:
> >>u = unicode(b)
> >>u = unicode(b, 'utf8')
> >>b = bytes['utf8'](u)
> >>u = unicode['base64'](b)   # encoding
> >>b = bytes(u, 'base64') # decoding
> >>u2 = unicode['piglatin'](u1)   # encoding
> >>u1 = unicode(u2, 'piglatin')   # decoding
> > 
> > Your provided semantics feel cumbersome and confusing to me, as
> > compared with str/unicode.encode/decode() .
> > 
> >  - Josiah
> 
> This uses syntax to determine the direction of encoding.  It would be 
> easier and clearer to just require two arguments or a tuple.
> 
>   u = unicode(b, 'encode', 'base64')
>   b = bytes(u, 'decode', 'base64')
> 
>   b = bytes(u, 'encode', 'utf-8')
>   u = unicode(b, 'decode', 'utf-8')
> 
>   u2 = unicode(u1, 'encode', 'piglatin')
>   u1 = unicode(u2, 'decode', 'piglatin')
> 
> 
> 
> It looks somewhat cleaner if you combine them in a path style string.
> 
>   b = bytes(u, 'encode/utf-8')
>   u = unicode(b, 'decode/utf-8')

It gets from bad to worse :(

I always liked the assymmetry between

u = unicode(s, "utf8")

and

s = u.encode("utf8")

which I think was the original design of the unicode API. Cudos for
whoever came up with that.

When I saw

b = bytes(u, "utf8")

mentioned for the first time, I thought: why on earth must the bytes
constructor be coupled to the unicode API?!?! It makes no sense to me
whatsoever. Bytes have so much more use besides encoded text.

I believe (please correct me if I'm wrong) that the encoding argument of
bytes() was invented to make it easier to write byte literals. Perhaps a
true bytes literal notation is in order after all?

My preference for bytes -> unicode -> bytes API would be this:

u = unicode(b, "utf8")  # just like we have now
b = u.tobytes("utf8")   # like u.encode(), but being explicit
# about the resulting type

As to base64, while it works as a codec ("Why a base64 codec? Because we
can!"), I don't find it a natural API at all, for such conversions.

(I do however agree with Greg Ewing that base64 encoded data is text,
not ascii-encoded bytes ;-)

Just-my-2-cts
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes thoughts

2006-03-02 Thread Nick Coghlan
Greg Ewing wrote:
> Baptiste Carvello wrote:
> 
>> while manipulating binary data will happen mostly with bytes objects, some 
>> operations are better done with ints, like the bit manipulations with the 
>> &|~^ 
>> operators.
> 
> Why not just support bitwise operations directly
> on the bytes object?
> 

+1!

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ref leak w/except hooks

2006-03-02 Thread Aahz
On Thu, Mar 02, 2006, Neal Norwitz wrote:
>
> The following code leaks a reference.  Original test case from
> Lib/test/test_sys.py in test_original_excepthook.

Did you submit a SF bug report?
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] defaultdict and on_missing()

2006-03-02 Thread Aahz
On Wed, Mar 01, 2006, Guido van Rossum wrote:
>
> Operations with two or more arguments are often better expressed as
> function calls -- for example, map() and filter() don't make much
> sense as methods on callables or sequences.

OTOH, my personal style is to always use re.compile() because I can
never remember the order of arguments for re.match()/re.search().
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Arena-freeing obmalloc ready for testing

2006-03-02 Thread Nick Craig-Wood
On Thu, Mar 02, 2006 at 01:43:00AM -0600, Tim Peters wrote:
> I'm optimistic, because the new test compares a quantity already being
> tested by the macro, a second time against 0, and it's hard to get
> cheaper than that.  However, the new branch isn't predictable, so who
> knows?

When compiling with gcc at least you could give the compiler a hint,
eg

  http://kerneltrap.org/node/4705

> For example, in a release build on WinXP, VM size is about 48MB at the
> "full" prompt, and drops to 3MB at the "empty" prompt.  In the trunk
> (without this patch), VM size falls relatively little from what it is
> at the "full" prompt

Excellent work!

-- 
Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] defaultdict and on_missing()

2006-03-02 Thread Barry Warsaw
On Thu, 2006-03-02 at 07:26 -0800, Aahz wrote:
> On Wed, Mar 01, 2006, Guido van Rossum wrote:
> >
> > Operations with two or more arguments are often better expressed as
> > function calls -- for example, map() and filter() don't make much
> > sense as methods on callables or sequences.
> 
> OTOH, my personal style is to always use re.compile() because I can
> never remember the order of arguments for re.match()/re.search().

Agreed.
-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When will regex really go away?

2006-03-02 Thread skip
Neal> I'll do this, except there are some issues:

Neal>  * Lib/reconvert.py imports regex.  Ok to move regex,regsub,recovert 
to lib-old?
Neal>  * ./Demo/pdist/rcslib.py & ./Demo/sockets/mcast.py import regsub
...
Neal>  * A whole mess of Demos and Tools use regex.  What to do about them?
...

How about creating Demo/old and populating it with stuff that imports regex,
regsub or reconvert?

Neal> I don't know how to convert the uses of regsub to re, any
Neal> volunteers?

Whippersnapper...  sheesh!  I still remember when all we had was regex.  And
we were thankful for it, by golly.  Now you'd think the young-uns never knew
it existed. 

As for converting from regex to re that's what reconvert is for.  Give it a
whirl.  The docstring shows how to use it.  Yet another Andrew Kuchling gem
as I recall (or maybe an effbot gem).  Either way, I was happy it was there
when I needed it.  Go in peace, reconvert.

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] defaultdict and on_missing()

2006-03-02 Thread Guido van Rossum
On 3/2/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> On Thu, 2006-03-02 at 07:26 -0800, Aahz wrote:
> > OTOH, my personal style is to always use re.compile() because I can
> > never remember the order of arguments for re.match()/re.search().
>
> Agreed.

I don't have that problem, because the order is the same either way:

 re.compile(pattern).match(line)
 re.match(pattern, line)

:-)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> I'm not saying Python 3 should be written in C++, I'm only saying
> that doing so would have not just disadvantages.

someone also pointed out in private mail (I think; it doesn't seem to
have made it to this list) that CPython's extensive use of "inheritance
by aggregation" is invalid C.

switching to C++ would be one way to address that, of course.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When will regex really go away?

2006-03-02 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> The docstring shows how to use it.  Yet another Andrew Kuchling gem
> as I recall (or maybe an effbot gem).

amk, most likely.

and in 92.65% of all cases, switching from "regex" to "re" involves adding
\ in front of (, | and ) if they don't already have them, and removing \ from
any instances of (, | and ) that already have them.  or something like that.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] defaultdict and on_missing()

2006-03-02 Thread Bill Janssen
> For an example of methods gone horribly wrong, look at Java, where you
> have .length, String.length(), and Collection.size().
> Give me len() any day. I believe Ruby has similar confusing diversity
> for looping (each/forEach).

But Java is plagued with the same disease that hit Modula-3, distrust
of inheritance.  Done right, a base class called
SomethingOfWhichTheLengthCanBeComputed would have been defined, with a
method "length", and all these other classes would have inherited from
it.  Never too late to learn from the mistakes of the past...

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] defaultdict and on_missing()

2006-03-02 Thread Aahz
On Thu, Mar 02, 2006, Guido van Rossum wrote:
> On 3/2/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
>> On Thu, 2006-03-02 at 07:26 -0800, Aahz wrote:
>>>
>>> OTOH, my personal style is to always use re.compile() because I can
>>> never remember the order of arguments for re.match()/re.search().
>>
>> Agreed.
> 
> I don't have that problem, because the order is the same either way:
> 
>  re.compile(pattern).match(line)
>  re.match(pattern, line)

But that would require thinking!  ;-)  More seriously, much as I hate the
way ''.join() looks, I have never gotten mixed up about argument order as
I used to with string.join().
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Arena-freeing obmalloc ready for testing

2006-03-02 Thread Tim Peters
[Tim Peters]
>> ...
>> However, the new branch isn't predictable, so who knows?

[Nick Craig-Wood]
> When compiling with gcc at least you could give the compiler a hint,
> eg
>
>   http://kerneltrap.org/node/4705

By "the new branch isn't predictable", I mean that there's apparently
no way to guess which way it's going to go that's better than flipping
a coin.  If I could annotate the branch with the probability of the
branch being taken, my best guess would be "half the time".

OTOH, if I rearranged the code a little, the new test could become
highly predictable.  OK, I'll do that :-)

> ...
> Excellent work!

Thanks redirected to Evan, who worked hard for a long time on this. 
Thanks, Evan!
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread Stephen J. Turnbull
> "martin" == martin  <[EMAIL PROTECTED]> writes:

martin> There are a few advantages [to C++], though, mainly:

martin> - increased type-safety, in particular for API that isn't
martin> type-checked at all at the moment (e.g. PyArg_ParseTuple)

That's merely an advantage to having a C++ *compiler*.  No need to
actually use the C++ *language*.  :-)

XEmacs has had a policy of compiling without warnings under *both* C
and C++ for about 5 years now, and it catches a lot of stupidity
before it leaves the developer's sandbox.

The feature programmers are occasionally annoyed that a pet C idiom
gets disallowed, but that's the only downside we've experienced.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Josiah Carlson

Just van Rossum <[EMAIL PROTECTED]> wrote:
> 
> Ron Adam wrote:
> 
> > Josiah Carlson wrote:
> > > Greg Ewing <[EMAIL PROTECTED]> wrote:
> > >>u = unicode(b)
> > >>u = unicode(b, 'utf8')
> > >>b = bytes['utf8'](u)
> > >>u = unicode['base64'](b)   # encoding
> > >>b = bytes(u, 'base64') # decoding
> > >>u2 = unicode['piglatin'](u1)   # encoding
> > >>u1 = unicode(u2, 'piglatin')   # decoding
> > > 
> > > Your provided semantics feel cumbersome and confusing to me, as
> > > compared with str/unicode.encode/decode() .
> > > 
> > >  - Josiah
> > 
> > This uses syntax to determine the direction of encoding.  It would be 
> > easier and clearer to just require two arguments or a tuple.
> > 
> >   u = unicode(b, 'encode', 'base64')
> >   b = bytes(u, 'decode', 'base64')
> > 
> >   b = bytes(u, 'encode', 'utf-8')
> >   u = unicode(b, 'decode', 'utf-8')
> > 
> >   u2 = unicode(u1, 'encode', 'piglatin')
> >   u1 = unicode(u2, 'decode', 'piglatin')
> > 
> > 
> > 
> > It looks somewhat cleaner if you combine them in a path style string.
> > 
> >   b = bytes(u, 'encode/utf-8')
> >   u = unicode(b, 'decode/utf-8')
> 
> It gets from bad to worse :(
> 
> I always liked the assymmetry between
> 
> u = unicode(s, "utf8")
> 
> and
> 
> s = u.encode("utf8")
> 
> which I think was the original design of the unicode API. Cudos for
> whoever came up with that.

I personally have never used that mechanism.  I always used
s.decode('utf8') and u.encode('utf8').  I prefer the symmetry that
.encode() and .decode() offer.


> When I saw
> 
> b = bytes(u, "utf8")
> 
> mentioned for the first time, I thought: why on earth must the bytes
> constructor be coupled to the unicode API?!?! It makes no sense to me
> whatsoever.

It's not a 'unicode API'.  See integers for another example where a
second argument to a type object defines how to interpret the other
argument, or even arrays/structs where the first argument defines the
interpretation.


> Bytes have so much more use besides encoded text.

Agreed.


> I believe (please correct me if I'm wrong) that the encoding argument of
> bytes() was invented to make it easier to write byte literals. Perhaps a
> true bytes literal notation is in order after all?

Maybe, but I think the other earlier use-case was for using:
s2 = bytes(s1, 'base64')
If bytes objects recieved an .encode() method, or even a .tobytes()
method.  I could be misremembering.


> My preference for bytes -> unicode -> bytes API would be this:
> 
> u = unicode(b, "utf8")  # just like we have now
> b = u.tobytes("utf8")   # like u.encode(), but being explicit
> # about the resulting type
> 
> As to base64, while it works as a codec ("Why a base64 codec? Because we
> can!"), I don't find it a natural API at all, for such conversions.

Depending on whose definiton of codec you listen to (is it a
compressor/decompressor, or a coder/decoder?), either very little of
what we have as 'codecs' are actual codecs (only zlib, etc.), or all of
them are.

I would imagine that base64, etc., were made into codecs, or really
encodings, because base64 is an 'encoding' of binary data in base64
format.  Similar to the way you can think of utf8 is an 'encoding' of
textual data in utf8 format.  I would argue, due to the "one obvious way
to do it", that using encodings/codecs should be preferred to one-shot
encoding/decoding functions in various modules (with some exceptions).

These exceptions are things like pickle, marshal, struct, etc., which
may take a non-basestring object and convert it into a byte string,
which is arguably an encoding of the object in a particular format.


 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Stephen J. Turnbull
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:

Greg> But the base64 string itself *does* have text semantics.

What do you mean by that?  The strings of abstract "characters"
defined by RFC 3548 cannot be concatenated in general, they may only
be split at 4-character intervals, they can't be reliably searched as
text for a given octet or substring of the underlying binary object,
and deletion or insertion of octets can't be done without decoding and
re-encoding the whole string.  And of course humans can make neither
head nor tail of them in most cases.  The only useful semantics that
they have is "you can apply the base64 decoder" to them.

In other words, by far the most important effect of endowing that
string with "text semantics" is to force programmers to remember not
to use them.

Do you really mean to call that "text semantics"?

Greg> To me this is no different than using a string of decimal
Greg> digit characters to represent an integer, or a string of
Greg> hexadecimal digit characters to represent a bit
Greg> pattern. Would you say that those are not text, either?

"No different"?  OK, I'll take you at your word.

T2YgY291cnNlIEkgd291bGQgY29uc2lkZXIgdGhvc2UgdGV4dC4gIFRoZXkncmUgaHVtYW4t
cmVhZGFibGUu

Greg> What about XML? What would you consider the proper data type
Greg> for an XML document to be inside a Python program -- bytes
Greg> or text?

Neither.  If I must chose one of those ... well, "I know I have a
choice of programming languages, and I won't be using Python for this
task."  Fortunately, there's ElementTree.

What you presumably meant was "what would you consider the proper type
for (P)CDATA?"  And my answer is "text" for text, and "bytes" for
binary data (eg, image or audio).  Let ElementTree handle the wire
format: if an Element's text attribute has type "bytes", convert to
base64 and then to the appropriate coded character set for the
channel.  I don't wanna know about the content transfer encoding, and
I should have no need to.

Greg> You seem to want to reserve the term "text" for data that
Greg> doesn't ever have to be understood even a little bit by a
Greg> computer program, but that seems far too restrictive to me,
Greg> and a long way from established usage.

What I want to reserve "text" for is data streams that nonprogrammer
humans might want to manipulate with pencil, paper, scissors, and
paste, or programmers with re and text[n:m] = text2.  I have no
objection to computers using it, too, and even asking us humans to
respect some restrictions on the use of [:]= and +.  But to tell us to
give up those operations entirely makes it into non-text IMO.

Greg> [The] assumption [that the channel is ASCII-compatible] could
Greg> be very wrong.  What happens if it turns out they really need
Greg> to be encoded as UTF-16, or as EBCDIC?  All hell breaks
Greg> loose, as far as I can see, unless the programmer has kept
Greg> very firmly in mind that there is an implicit ASCII encoding
Greg> involved.

Greg> It's exactly to avoid the need for those kinds of mental
Greg> gymnastics

Agreed, such bookkeeping would be annoying.  But there's no _need_ for
it any way you look at it: just leave binary objects as-is until
you're ready to put them on the wire.[1]  Attach a binary-to-wire codec
to this end of the wire, and inject your data there. This puts the
responsibility where it belongs: with the author of the wire driver.
That's the point, which you already mentioned: nobody but authors of
wire drivers[2] and introspective code will need to _explicitly_ call
.encode('base64').

Greg> that Py3k will have a unified, encoding-agnostic data type
Greg> for all character strings.

Yeah, but if base64 produces character strings, Unicode becomes a
unified, encoding-agnostic data type for all data.  Just base64
everything, and now we don't need a bytes type, right?

Note that this is precisely what Emacs/MULE does (with a variable
width non-Unicode internal encoding and "base256" instead of base64),
so as demented as it may sound, it's all too historically plausible.
And it can be implemented, by accident, at the application program
level.  Why expose our users to increased risk of such trouble?


Footnotes: 
[1]  Of course you may want to manipulate the binary data, even as
text.  But who's going to use the base64 format for that purpose?

[2]  I mean to include those who are writing the git.object_id(),
PGP_key.fingerprint(), and ElementTree.write() methods.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 

Re: [Python-Dev] ref leak w/except hooks

2006-03-02 Thread Brett Cannon
On 3/2/06, Neal Norwitz <[EMAIL PROTECTED]> wrote:
> The following code leaks a reference.  Original test case from
> Lib/test/test_sys.py in test_original_excepthook.
>
> import sys, StringIO
> eh = sys.__excepthook__
> try:
>   raise ValueError(42)
> except ValueError, exc:
>   exc_type, exc_value, exc_tb = sys.exc_info()
>   eh(exc_type, None, None)

Which can be simplified to::

from sys import __excepthook__ as eh

try:
raise BaseException
except:
eh(BaseException, None, None)

It fails if the first argument to sys.__excepthook__ is either a
built-in exception or a classic class.  it looks like strings and
new-style classes do not trigger it.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ref leak w/except hooks

2006-03-02 Thread Brett Cannon
On 3/2/06, Brett Cannon <[EMAIL PROTECTED]> wrote:
> On 3/2/06, Neal Norwitz <[EMAIL PROTECTED]> wrote:
> > The following code leaks a reference.  Original test case from
> > Lib/test/test_sys.py in test_original_excepthook.
> >
> > import sys, StringIO
> > eh = sys.__excepthook__
> > try:
> >   raise ValueError(42)
> > except ValueError, exc:
> >   exc_type, exc_value, exc_tb = sys.exc_info()
> >   eh(exc_type, None, None)
>
> Which can be simplified to::
>
> from sys import __excepthook__ as eh
>
> try:
> raise BaseException
> except:
> eh(BaseException, None, None)
>
> It fails if the first argument to sys.__excepthook__ is either a
> built-in exception or a classic class.  it looks like strings and
> new-style classes do not trigger it.

And is now fixed in rev. 42794 .

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making staticmethod objects callable?

2006-03-02 Thread Guido van Rossum
On 3/1/06, Nicolas Fleury <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > In which context did you find a need for defining a static method and
> > calling it inside the class definition? I'm guessing that what you're
> > playing dubious scoping games.
>
> I'm not.  I almost never use staticmethod actually.  I find them not
> very pythonic, in my humble own definition of pythonic.
>
> But since staticmethod is a standard built-in, I considered valid the
> question of a programmer relatively new to Python (but obviously
> appreciating its dynamic nature) wondering why calling a static method
> inside a class definition doesn't work.  A use case is not hard to
> imagine, especially a private static method called only to build a class
> attribute.

Imagined use cases aren't particularly interesting.

As to the "why" question, the answer is simply that it wasn't
considered an important use case since nobody could come up with a
reason why you'd want to do that apart from exploring the language.

> I don't know the philosophy behind making staticmethod a built-in
> (instead of a function in a module only used in specific occasions), but
> my guess was that what is normal scoping/regrouping in Java/C++/C# was
> worth common use support in Python.  But your comment about "dubious
> scoping games" makes me think I, again, didn't guess right;)

At the time there wasn't much thought about it -- it just seemed
reasonable to support something that was supported by syntax in other
languages with at least a built-in decorator in Python.

> So yes, I'm proposing something I'll probably never use, but I think
> would make Python more "welcoming".

I don't see how adding featuers that nobody uses helps.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making staticmethod objects callable?

2006-03-02 Thread Fredrik Lundh
Guido van Rossum wrote:

> I don't see how adding featuers that nobody uses helps.

the amount of odd staticmethod uses you see on comp.lang.python
these days is staggering.





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Delaney, Timothy (Tim)
Just van Rossum wrote:

> My preference for bytes -> unicode -> bytes API would be this:
> 
> u = unicode(b, "utf8")  # just like we have now
> b = u.tobytes("utf8")   # like u.encode(), but being explicit
> # about the resulting type

+1 - I was going to write exactly the same thing. The `bytes` type
shouldn't know anything about unicode - conversions between bytes and
unicode is entirely the responsibility of the unicode type.

Alternatively, rather than as part of the constructor (though that seems
the obvious place) some people may prefer a classmethod:

unicode.frombytes(cls, encoding)

It gives a nice symmetry.

Tim Delaney
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Delaney, Timothy (Tim)
Delaney, Timothy (Tim) wrote:

> unicode.frombytes(cls, encoding)

unicode.frombytes(encoding) ...

Tim Delaney
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Greg Ewing
Ron Adam wrote:

> This uses syntax to determine the direction of encoding.  It would be 
> easier and clearer to just require two arguments or a tuple.
> 
>   u = unicode(b, 'encode', 'base64')
>   b = bytes(u, 'decode', 'base64')

The point of the exercise was to avoid using the terms
'encode' and 'decode' entirely, since some people claim
to be confused by them.

While I succeeded in that, I concede that the result
isn't particularly intuitive and is arguably even more
confusing.

If we're going to continue to use 'encode' and 'decode',
why not just make them functions:

   b = encode(u, 'utf-8')
   u = decode(b, 'utf-8')

In the case of Unicode encodings, if you get them
backwards you'll get a type error.

The advantage of using functions over methods or
constructor arguments is that they can be applied
uniformly to any input and output types.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread Greg Ewing
Fredrik Lundh wrote:

> someone also pointed out in private mail (I think; it doesn't seem to
> have made it to this list) that CPython's extensive use of "inheritance
> by aggregation" is invalid C.
> 
> switching to C++ would be one way to address that, of course.

A rather heavyweight solution to a problem that does
not seem to have been a problem in practice so far,
only in theory.

Practicality beats purity once again...

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes thoughts

2006-03-02 Thread Baptiste Carvello
Greg Ewing a écrit :
> Why not just support bitwise operations directly
> on the bytes object?
> 

Sure, what counts is that all the nice features that Python has for editing 
binary data are usable with the bytes object.
These include bitwise operations, hex() and oct() representation functions and 
litterals, the struct module (as Paul Svensson kindly reminded me). Do I forget 
something ?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Greg Ewing
Stephen J. Turnbull wrote:

> What you presumably meant was "what would you consider the proper type
> for (P)CDATA?"

No, I mean the whole thing, including all the <...> tags
etc. Like you see when you load an XML file into a text
editor. (BTW, doesn't the fact that you *can* load an
XML file into what we call a "text editor" say something?)

> nobody but authors of
> wire drivers[2] and introspective code will need to _explicitly_ call
> .encode('base64').

Even a wire driver writer will only need it if he's
trying to turn a text wire into a binary wire, as
far as I can see.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread martin
Zitat von Fredrik Lundh <[EMAIL PROTECTED]>:

> > I'm not saying Python 3 should be written in C++, I'm only saying
> > that doing so would have not just disadvantages.
>
> someone also pointed out in private mail (I think; it doesn't seem to
> have made it to this list) that CPython's extensive use of "inheritance
> by aggregation" is invalid C.
>
> switching to C++ would be one way to address that, of course.

My preferred way of fixing it is to do it the "proper" C way, i.e.
make PyObject the first member of each derived type.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread martin
Zitat von Greg Ewing <[EMAIL PROTECTED]>:

> A rather heavyweight solution to a problem that does
> not seem to have been a problem in practice so far,
> only in theory.

The problem does exist in practice. Python is deliberately
build with -fno-strict-aliasing when GCC is used, and might
get compiled incorrectly on any other advanced C compiler.

The problem with that bug is that it is both very hard to
find when it exists, and very hard to dismiss as theoretical,
unless an extensive source code review is performed. Have
you done this review in the Python source code to know that
there is no potential for misinterpretation to make the claim
the problem is only theoretical?

Regards,
Martin




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C++ for CPython 3? (Re: str.count is slow)

2006-03-02 Thread martin
Zitat von "Stephen J. Turnbull" <[EMAIL PROTECTED]>:

> martin> - increased type-safety, in particular for API that isn't
> martin> type-checked at all at the moment (e.g. PyArg_ParseTuple)
>
> That's merely an advantage to having a C++ *compiler*.  No need to
> actually use the C++ *language*.  :-)

I don't understand. How can you use a C++ compiler, but not the C++
language? Either a program is required to conform to the C++ syntax
(in which case it is a C++ program), or it isn't.

In the specific example of ParseTuple, I don't see a C++ solution
without templates, FWIW.

> XEmacs has had a policy of compiling without warnings under *both* C
> and C++ for about 5 years now, and it catches a lot of stupidity
> before it leaves the developer's sandbox.

Right. It might be possible to write C++ programs that are also
C programs, and it is then possible to release binaries of these
through the C compiler. However, in the Python case, I doubt it
would gain that much. As the recent const dilemma shows, C99 and C++98
have, unfortunately, different interpretations of "const" (with the
C interpretation being more strict).

Regards,
Martin


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-03-02 Thread Stephen J. Turnbull
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:

Greg> (BTW, doesn't the fact that you *can* load an XML file into
Greg> what we call a "text editor" say something?)

Why not answer that question for yourself, and then turn that answer
into a description of "text semantics"?

For me, it says that, just like a gzipped file or the Linux kernel, I
can load an XML file into a text editor.  But unlike the .gz or
vmlinuz, I can easily find many useful things to do to the XML string
in the text editor.

Doesn't that make base64 non-text by analogy to other "look but don't
touch" strings like a .gz or vmlinuz?

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com