Re: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot

2011-09-07 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/09/11 07:27, Nick Coghlan wrote:
> It may be the case that with the reduced memory limit, your
> machine may not be able to run concurrent slaves for 2.7, 3.2 and
> 3.x as I believe it does now.

Antoine has changed the buildmaster configuration to only send me a
build simultaneously. It doesn't solve the issue. I don't have enough
resources even for a single build.

I just send this email to the owner of the machine:

"""
XXX, I know you are very busy, but I would like to request
formally the removal of the SWAP capping for my zone.

After investigating the issue, I learn this:

1. Python "make test" launch a python process that can consume >300MB
of RAM.

2. Under Solaris, a 300MB process doing a "fork()" will consume 600MB.
That is, Solaris reserves this much memory just in case the processes
modify their memory (to avoid "out of memory" condition simply because
a process write to its own memory space).

3. So, if a 300MB is forked 10 times, it is going to "virtually" use
3GB. The real memory used is actually far less in the buildbot case,
because the forked process doesn't modify their own memory so much
(forked processes use Copy On Write).

4. So, the required memory to run the buildbots is actually "modest"
compared with the "virtual" memory used.

5. A 4GB SWAP is not enough to run a single buildbot instance. I can
have up to 6 instances, but 4GB is not enough for 1. Python-devs have
modify the buildbot master for only sending me up to two build
simultaneously, trying to help. It is not helping because 4GB of swap
is not enough even for a single instance.

6. With an uncapped SWAP, the actual swapping would be quite low,
because the swap is used to ensure memory reservation for the forked
processes in the worst case (that the forked processes mess with their
own copy of the 300MB address space, COW (Copy On Write)). In practice
4GB of RAM and uncapped SWAP would be enough, with no (or little)
actual swapping.

For this reasons I formally request a reconfiguration of my zone to
uncap my SWAP usage.

The proof is actually very simple:

"""
import time, os

a="a"*1024*1024*512

os.fork() # 2 processes
os.fork() # 4 processes
os.fork() # 16 processes

time.sleep(10)
"""

Running the previous program does this to my swap: (Solaris 10 Update 9)

"""
[root@buffy /]# swap -s
total: 684704k bytes allocated + 3732892k reserved = 4417596k used,
31829688k available
"""

After the programs die, I have this:

"""
[root@buffy /]# swap -s
total: 156680k bytes allocated + 43284k reserved = 199964k used,
36118796k available
"""

In this machine, I have 4GB of RAM, 32GB of swap.

So, this trivial test requires >4GB of RAM+SWAP even if it is actually
using only ~512MB of RAM. Solaris is (rightly) playing safe being sure
the program can actually play/modify its memory space.

X, if you can't/don't want to modify my zone configuration, let me
know, so I can think what to do next. If I have to talk to somebody
else, please let me know.

Sorry for bother your with these details. I really appreciate the
effort you and your team are doing with OpenIndiana in general and
supporting the Python buildbots under OI in particular. I hope we can
solve this situation.

Thanks for your time and effort.

PS: I think that such memory+swap requirements are quite high, anyway,
and I will pursuit it. But in the meantime I need the buildbot online,
as it was a couple of weeks ago :-)

Thanks!.
"""

So, the problem is that a) "make test" takes quite a bit of RAM and b)
the buildbot forks some "big" processes, so the virtual memory needed
is BIG.

Linux is known for "overcommiting" memory. That is, playing fast and
risky not actually reserving memory, hoping the process will not
actually use it or it will do an "exec" inmediatelly, so this problem
can be not apparent under Linux, but it is there.

So I have two questions:

1. Can we reduce the memory footprint of the tests?. I can't
understand why the python test process is taking so much memory.

2. Why buildbot is "forking()" big processes?. Can we do something to
change this?.

I will wait a few days for OpenIndiana team to reply. If the result is
not satisfactory, I will try to setup a VirtualMachine with the
required resources myself. Crossing fingers...

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[email protected] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[email protected] _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmdXr5lgi5GaxT1NAQKmRwP/dyg4qEs+oWt4r365D797+ItbHluu

Re: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot

2011-09-07 Thread Antoine Pitrou
On Wed, 07 Sep 2011 13:38:23 +0200
Jesus Cea  wrote:
> 
> So, the problem is that a) "make test" takes quite a bit of RAM and b)
> the buildbot forks some "big" processes, so the virtual memory needed
> is BIG.

Note that buildbots run "make buildbottest", not "make test".

> So I have two questions:
> 
> 1. Can we reduce the memory footprint of the tests?. I can't
> understand why the python test process is taking so much memory.

Because the test suite will by construction load all the stdlib (minus
the few modules which don't have a test suite), and creates numerous
test scenarios. Depending on the memory allocator, fragmentation can
make it difficult to reclaim memory that has been formally freed after
a test is run.

If "-j" is used, tests get run in a separate process each, so that
approach might be an answer.

> 2. Why buildbot is "forking()" big processes?. Can we do something to
> change this?.

Because we need to test for various functionalities, such as os.fork()
and os.exec*(), but also the command-line behaviour of the interpreter,
the distutils module, the packaging module, the subprocess module, the
multiprocessing module... (this list is not exhaustive).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Antoine Pitrou
On Wed, 07 Sep 2011 11:15:04 +0900
"Stephen J. Turnbull"  wrote:
> Antoine Pitrou writes:
> 
>  > Bytes objects are often used for partly ASCII strings,
> 
> All I can say to that phrase is, "urk, ISO 2022 anyone?"

You could also point out UTF-16 or EBCDIC, but I fail to see how that's
relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
headers?

>  > not arbitrary "arrays of bytes". And making indexing of bytes
>  > objects return ints was IMHO a mistake.
> 
> Bytes objects are not ASCII strings, even though they can be used to
> represent them.

I'm talking about practice, not some idealistic view of the world.
In many use cases (XML, HTML, e-mail headers, many other test-based
protocols), you can get a mixture of ASCII "commands", and opaque
binary stuff (which will or will not, depending on these "commands",
have a meaningful unicode decoding).

In the stdlib, bytes objects are accessed far more often to poke at
some text-like data, than to poke at arbitrary numbers.

> With PEP 393,
> there isn't even really a space excuse.

Of course there is. Any single non-ASCII byte of data mingled with
aforementioned ASCII "commands" will make it switch to a less efficient
representation.

And "surrogateescape" will be a performance problem in itself, when
used on large binary data; if you use "latin1" instead, you are risking
far greater confusion; ask David about that dilemma. :-)

> AFAICS, anything that should be done with ASCII-punned magic numbers
> ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
> case conversion.

So, basically, you're saying that we should remove useful functionality
and tell people to reimplement an adhoc version of it when they need
it. That sounds obnoxious.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Simon Cross
On Tue, Sep 6, 2011 at 10:36 PM, "Martin v. Löwis"  wrote:
>> Which applications? I'm not sure the number of applications using
>> str.swapcase gets even as high as ten.
>
> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.
> I'm deeply convinced that the majority of Python code (and
> I mean *large* majority) is unpublished.
>
> I expect thousands of uses world-wide.

http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs

There are quite a few hits but more people appear to be
re-implementing it than using it (I haven't gone to the trouble of
mining the search results to get an accurate picture though).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Simon Cross
On Wed, Sep 7, 2011 at 6:31 PM, Simon Cross
 wrote:
> http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs
>
> There are quite a few hits but more people appear to be
> re-implementing it than using it (I haven't gone to the trouble of
> mining the search results to get an accurate picture though).

Scratch that -- I should gloss over search results less. It looks like
the most common use case is to provide a consistent string-like API
somewhere else. So removing it is liking to cause headaches (e.g. test
failures) for the people who are wrapping it.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Stephen J. Turnbull
Antoine Pitrou writes:

 > You could also point out UTF-16 or EBCDIC, but I fail to see how that's
 > relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
 > headers?

Yes, of course!  Especially when it's say, packed EUC not encapsulated
in MIME words.  I think Mailman now handles that without crashing, but
it took 10 years.  Most Emacs MUAs still blow chunks on that.  My
procmail recipes and my employer's virus checker both occasionally punt.

The point about ISO 2022 is that it allows arbitrary binary crap in
the stream, delimited by appropriate well-defined constructs.  Just
like the ASCII-like tokens in the protocols you talk about.  But
parsing full-bore ISO 2022 is non-trivial, especially if you're going
to try to provide error-handling that's useful to the user.  Nobody
ever really took it seriously as a solution to the problem of
internationalization in the 15 years or so when it was the only
solution, and even less so once it became clear that UCSes were going
to get traction.

 > >  > not arbitrary "arrays of bytes". And making indexing of bytes
 > >  > objects return ints was IMHO a mistake.
 > > 
 > > Bytes objects are not ASCII strings, even though they can be used to
 > > represent them.
 > 
 > I'm talking about practice,

So am I, and so is Nick.

 > not some idealistic view of the world.
 > In many use cases (XML, HTML, e-mail headers, many other test-based
 > protocols), you can get a mixture of ASCII "commands", and opaque
 > binary stuff (which will or will not, depending on these "commands",
 > have a meaningful unicode decoding).

Yeah, so what?  Those protocol tokens are deliberately chosen to
resemble ASCII text, but you need to parse them out of the binary
sludge somehow, and the surrounding content remains binary sludge
until deserialized or (for text) decoded.  How is having b[0] return a
bytes object, rather than an integer, going to help in that?
Especially if the value is not in the ASCII range?

 > > AFAICS, anything that should be done with ASCII-punned magic numbers
 > > ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
 > > case conversion.
 > 
 > So, basically, you're saying that we should remove useful functionality

No, that *was* Nick's position; I specifically opposed the suggestion
that "lower" and "upper" be removed, and he concurred after a bit of
thought.  And remember, he's talking about removing "swapcase".  Which
RFC defines a protocol where that would be useful?  How about "title"?

 > and tell people to reimplement an adhoc version of it when they
 > need it.

Of course not; I'm with Michael Foord on that: nobody should ever be
asked to reimplement swapcase!  My position is simply that bytes are
not text, and the occasional reminder (such as b[0] returning an
integer, not a bytes object) is good.  My experience has been that it
makes a lot of sense to layer these things, for example transforming a
protocol stream serialized as octets into a more structured object
composed of protocol tokens and payloads.  It's *not* text, and the
relevant techniques are different.

It's like the old saw about "aha, I'll use regexps to solve this
problem!" and now you have *two* problems.

I don't advocate getting rid of regexps, and I don't advocate removing
methods from bytes (although I do dream about it occasionally).  I do
advocate that people think twice before implementing complex text-like
algorithms on binary protocol streams.  If the stream really is
text-like, then transform it into text of a known, well-behaved
encoding, and then apply the powerful text-processing facilities
provided for str.  If it's not, then transform to a token stream or
whatever makes sense.  In both cases, do as little "text processing"
on bytes objects as possible, and put more structure on the content as
soon as possible.

If you really need the efficiency, then do what you need to do.  As I
say, I don't have any practical objection to keeping your tools for
that case.  But such applications, although important (I guess), are a
minority.

 > That sounds obnoxious.

Good advice almost always sounds obnoxious to the recipient.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Glyph Lefkowitz
On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:

> How about "title"?

>>> 'content-length'.title()
'Content-Length'

You might say that the protocol "has" to be case-insensitive so this is a silly 
frill: there are definitely enough case-sensitive crappy bits of network 
middleware out there that this function is critically important for an HTTP 
server.

In general I'd like to defend keeping as many of these methods as possible for 
compatibility (porting to Py3 is already hard enough).  Although even I might 
have a hard time defending 'swapcase', which is never used _at all_ within 
Twisted, on text or bytes.  The only use-case I can think of for that method is 
goofy joke text filters, and it wouldn't be very good at that either.

-glyph

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Nick Coghlan
On Thu, Sep 8, 2011 at 3:51 AM, Glyph Lefkowitz  wrote:
> On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
>
> How about "title"?
>
 'content-length'.title()
> 'Content-Length'
> You might say that the protocol "has" to be case-insensitive so this is a
> silly frill: there are definitely enough case-sensitive crappy bits of
> network middleware out there that this function is critically important for
> an HTTP server.

Actually, the HTTP header case occurred to me as well shortly after
sending my last message, so I think it's a legitimate reason to keep
the methods around on bytes and bytearray.

So, putting my "practicality beats purity" hat back on, I would
describe the status quo as follows:

1. Binary data is not text, so bytes and bytearray are deliberately
conceptualised as arrays of arbitrary integers in the range 0-255
rather than as arrays of 8-bit 'characters'. This distinction is one
of the core design principles separating Python 3 from Python 2.

2. However, the use of ASCII words and characters is a common feature
of many existing wire protocols, so it is useful to be able to
manipulate binary sequences that contain data in an ASCII-compatible
format without having to convert them to text first. Retaining
additional ASCII-based methods also eases the transition to Python 3
for code that manipulates binary data using the 2.x str type.

3. ASCII whitespace characters are used as delimeters in many formats.
Thus, various methods such as split(), partition(), strip() and their
variants, retain their "ASCII whitespace" default arguments and
expandtabs() is also retained.

4. Padding values out to fill fields of a certain size is needed for
some formats. Thus, center(), ljust(), rjust(), zfill() are retained
(again retaining their ASCII space default fill character in the case
of the first 3 methods)

5. Identifying ASCII alphanumeric data is important for some formats.
Thus, isalnum(), isalpha() and isdigit() are retained.

6. Case insensitive ASCII comparisons are important for some formats
(e.g. RFC 822 headers, HTTP headers). Thus, upper(), lower(),
isupper() and islower() are retained.

7. Even correct mixed case ASCII can be important for some formats
(e.g. HTTP headers). Thus, capitalize(), title() and istitle() are
retained.

8. A valid use for swapcase() on binary data has not been identified,
but once all the other ASCII based methods are being kept around for
the various reasons given above, it doesn't seem worth the effort to
get rid of this one (despite the additional implementation effort
needed for alternate implementations).

9. Algorithms that operate purely on binary data or purely on text can
just use literals of the appropriate type (if they use literals at
all). Algorithms that are designed to operate on either kind of data
may want to adopt an implicit decode/encode approach to handle binary
inputs (this allows assumptions regarding the input encoding to be
made explicit).

I'm actually fairly happy with that rationalisation for the current
Python 3 set up. I'd been thinking recently that we would have been
better off if more of the methods that rely on the data using an ASCII
compatible encoding scheme had been removed from bytes and bytearray,
but swapcase() is really the only one we can't give a decent
justification for beyond "it was there in 2.x".

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot

2011-09-07 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/09/11 14:32, Antoine Pitrou wrote:
> If "-j" is used, tests get run in a separate process each, so that
>  approach might be an answer.

Antoine, I think this would be the answer. Each test would be a bit
slower, because I would launch a new python process per test, but I
could run 16 tests in parallel (I have 16 CPUs and, actually, most
tests are not CPU intensive). I sorry to bother you with these details
and waste of time, but could you possibly change my buildbot
configurarion to launch, let's say, 4 test processes in parallel, just
for testing?.

Another option would be to have a single Python process and "fork" for
each test. That would launch each test in a separate process without
requiring a full python interpreter launching each time. Is this the
way "-j" is implemented, or is "-j" something external, like "make -j"?.

BTW, the (nice and helpful) OpenIndiana folks have told me a few hours
ago that they would increase my swap limit to 16GB. I am now waiting
for this change to be done.

I want my six builds in parallel (2.7, 3.2, 3.x, in 32 and 64 bits) back!.

Sorry for wasting your time with these mundane details...

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[email protected] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[email protected] _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmgWgZlgi5GaxT1NAQI/eAP/anenlTjt7NxIzMLK+ME+f84zLurb8MS/
XiLRpVSNDn6TzKnqXtDLfOc6sua81h+ZlpHvuFNHOkK9u/PkmeUKidgoDvASj5Ti
ITUmUxigX1j9ZbD1ITkn53msm1xfug3rw/8+Rh//4ONhhbmhSm8ChZ0iNwtntToG
5SwL3BL2iSI=
=fCJe
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] python -m tokenize in 3.x ?

2011-09-07 Thread Meador Inge
Hi All,

I have been investing some 'tokenize' bugs recently.  As a part of
that investigation I was trying to use '-m tokenize', which works
great in 2.x:

[meadori@motherbrain cpython]$ python2.7 -m tokenize test.py
1,0-1,5:NAME'print'
1,6-1,21:   STRING  '"Hello, World!"'
1,21-1,22:  NEWLINE '\n'
2,0-2,0:ENDMARKER   ''

In 3.x, however, the functionality has been removed and replaced with
some hard-wired test code:

[meadori@motherbrain cpython]$ python3 -m tokenize test.py
TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3),
line='def parseline(self, line):')
TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1,
13), line='def parseline(self, line):')
TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14),
line='def parseline(self, line):')
...

Why is this?  I found the commit where the functionality was removed
[1], but no explanation.  Any objection to adding this feature back?

[1] http://hg.python.org/cpython/rev/51e24512e305/

-- 
# Meador
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Maintenance burden of str.swapcase

2011-09-07 Thread Stephen J. Turnbull
Glyph Lefkowitz writes:
 > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
 > 
 > > How about "title"?
 > 
 > >>> 'content-length'.title()
 > 'Content-Length'
 > 
 > You might say that the protocol "has" to be case-insensitive so
 > this is a silly frill:

Not me, sir.  My whole point about the "bytes should be more like str"
controversy is the dual of that: you don't know what will be coming at
you, so the regularities and (normally allowable) fuzziness of text
processing are inadmissible.

 > there are definitely enough case-sensitive crappy bits of network
 > middleware out there that this function is critically important for
 > an HTTP server.

"Critically important" is surely an overstatement.  You could always
title-case the literal strings containing field names in the source.

The problem with having lots of str-like features on bytes is that you
lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
becomes TOOWDTI "because none of the characters[sic] I'm planning to
process myself are non-ASCII".  This is the road to Babel; it's
workable for one-off scripts but it's asking for long-term trouble in
multi-module applications.  The choice of decoding to str and
processing in that form should be made as attractive as possible.

On the other hand, it is undeniably useful for protocol tokens to have
mnemonic representations even in binary protocols.  Textual
manipulations on those tokens should be convenient.

It seems to me that what might be an improvement over the current
situation (maybe for Py4k only, though) is for bytes and
(PEP-393-style) str to share representation, and have a "cast" method
which would convert from one to the other, validating that the range
contraints on the representation are satisfied.  The problem I see is
that this either sanctions the practice of using latin-1 as "ASCII
plus anything", which is an unpleasant hack, or you'd need to check in
text methods that nothing is done with non-ASCII values other than
checks for set membership (including equality comparison, of course).

OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
in a str that happens to contain only ASCII values would convert the
representation to multioctets (true), and therefore this doesn't give
the desired efficiency properties, is beside the point.  Just don't do
that!  You *can't* do that in a bytes object, anyway; use of str in
this way is a "consenting adults" issue.  You trade off the
convenience of the full suite of text tools vs. the possibility that
somebody might insert such a character -- but for the algorithms
they're going to be using, they shouldn't be doing that anyway.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python -m tokenize in 3.x ?

2011-09-07 Thread Guido van Rossum
My guess that there was no specific intent -- most likely it occurred
to nobody that the main() functionality was actually useful. I'd say
it's fine to put it back, and then document it (so it won't be removed
again :-).

--Guido

On Wed, Sep 7, 2011 at 7:06 PM, Meador Inge  wrote:
> Hi All,
>
> I have been investing some 'tokenize' bugs recently.  As a part of
> that investigation I was trying to use '-m tokenize', which works
> great in 2.x:
>
> [meadori@motherbrain cpython]$ python2.7 -m tokenize test.py
> 1,0-1,5:        NAME    'print'
> 1,6-1,21:       STRING  '"Hello, World!"'
> 1,21-1,22:      NEWLINE '\n'
> 2,0-2,0:        ENDMARKER       ''
>
> In 3.x, however, the functionality has been removed and replaced with
> some hard-wired test code:
>
> [meadori@motherbrain cpython]$ python3 -m tokenize test.py
> TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), 
> line='')
> TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3),
> line='def parseline(self, line):')
> TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1,
> 13), line='def parseline(self, line):')
> TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14),
> line='def parseline(self, line):')
> ...
>
> Why is this?  I found the commit where the functionality was removed
> [1], but no explanation.  Any objection to adding this feature back?
>
> [1] http://hg.python.org/cpython/rev/51e24512e305/
>
> --
> # Meador
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com