Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 15:24 +1000, Nick Coghlan a écrit :
> On Tue, May 24, 2011 at 10:08 AM, Victor Stinner
>  wrote:
> > It's trivial to replace a call to codecs.open() by a call to open(),
> > because the two API are very close. The main different is that
> > codecs.open() doesn't support universal newline, so you have to use
> > open(..., newline='') to keep the same behaviour (keep newlines
> > unchanged). This task can be done by 2to3. But I suppose that most
> > people will be happy with the universal newline mode.
> 
> Is there any reason that codecs.open() can't become a thin wrapper
> around builtin open in 3.3?

Yes, it's trivial to implement codecs.open using:

def open(filename, mode='rb', encoding=None, errors='strict',
buffering=1):
return builtins.open(filename, mode, buffering, 
 encoding, errors, newline='')

But do you we really need two ways to open a file? Extract of import
this:
"There should be one-- and preferably only one --obvious way to do it."

Another example: Python 3.2 has subprocess.Popen, os.popen and
platform.popen to open a subprocess. platform.popen is now deprecated in
Python 3.3. Well, it's already better than Python 2.5 which has
os.popen(), os.popen2(), os.popen3(), os.popen4(), os.spawnl(),
os.spawnle(), os.spawnlp(), os.spawnlpe(), os.spawnv(), os.spawnve(),
os.spawnvp(), os.spawnvpe(), subprocess.Popen, platform.popen and maybe
others :-)

> How API compatible is TextIOWrapper with StreamReader/StreamWriter?

It's fully compatible.

> How hard would it to be change them to be adapters over the main IO
> machinery rather than independent classes?

I don't understand your proposition. We don't need StreamReader and
StreamWriter to open a stream as a file text, only incremental decoders
and encoders. Why do you want to keep them?

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread M.-A. Lemburg
Victor Stinner wrote:
> Hi,
> 
> In Python 2, codecs.open() is the best way to read and/or write files
> using Unicode. But in Python 3, open() is preferred with its fast io
> module. I would like to deprecate codecs.open() because it can be
> replaced by open() and io.TextIOWrapper. I would like your opinion and
> that's why I'm writing this email.

I think you should have moved this part of your email
further up, since it explains the reason why this idea was
rejected for now:

> I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
> want to deprecate codecs.open() & friends because they want to be able
> to write code working on Python 2 and on Python 3 without any change. I
> don't think it's realistic: nontrivial programs require at least the six
> module, and most likely the 2to3 program. The six module can have its
> "codecs.open" function if codecs.open is removed from Python 3.4.

And now for something completely different:

> codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
> classes of the codecs module don't support universal newlines, still
> have some issues with stateful codecs (like UTF-16/32 BOMs), and each
> codec has to implement a StreamReader and a StreamWriter class.
> 
> StreamReader and StreamWriter are stateless codecs (no reset() or
> setstate() method), and so it's not possible to write a generic fix for
> all child classes in the codecs module. Each stateful codec has to
> handle special cases like seek() problems. For example, UTF-16 codec
> duplicates some IncrementalEncoder/IncrementalDecoder code into its
> StreamWriter/StreamReader class.

Please read PEP 100 regarding StreamReader and StreamWriter.
Those codecs parts were explicitly designed to be stateful,
unlike the stateless encoder/decoder methods.

Please read my reply on the ticket:

"""
StreamReader and StreamWriter classes provide the base codec
implementations for stateful interaction with streams. They
define the interface and provide a working implementation for
those codecs that choose not to implement their own variants.

Each codec can, however, implement variants which are optimized
for the specific encoding or intercept certain stream methods
to add functionality or improve the encoding/decoding
performance.

Both are essential parts of the codec interface.

TextIOWrapper and StreamReaderWriter are merely wrappers
around streams that make use of the codecs. They don't
provide any codec logic themselves. That's the conceptual
difference.
"""

> The io module is well tested, supports non-seekable streams, handles
> correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
> newlines including an "universal newline" mode. TextIOWrapper reuses
> incremental encoders and decoders, so BOM issues were fixed only once,
> in TextIOWrapper.
> 
> It's trivial to replace a call to codecs.open() by a call to open(),
> because the two API are very close. The main different is that
> codecs.open() doesn't support universal newline, so you have to use
> open(..., newline='') to keep the same behaviour (keep newlines
> unchanged). This task can be done by 2to3. But I suppose that most
> people will be happy with the universal newline mode.
> 
> I don't see which usecase is not covered by TextIOWrapper. But I know
> some cases which are not supported by StreamReader/StreamWriter.

This is a misunderstanding of the concepts behind the two.

StreamReader and StreamWriters are implemented by the codecs,
they are part of the API that each codec has to provide in order
to register in the Python codecs system. Their purpose is
to provide a stateful interface and work efficiently and
directly on streams rather than buffers.

Here's my reply from the ticket regarding using incremental
encoders/decoders for the StreamReader/Writer parts of the
codec set of APIs:

"""
The point about having them use incremental codecs for encoding and decoding is 
a good one and would
need to be investigated. If possible, we could use incremental 
encoders/decoders for the standard
StreamReader/Writer base classes or add new IncrementalStreamReader/Writer 
classes which then use
the IncrementalEncode/Decoder per default.

Please open a new ticket for this.
"""

> StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not
> used in the Python 3 standard library. I tried removed them: except
> tests of test_codecs which test them directly, the full test suite pass.
>
> Read the issue for more information: http://bugs.python.org/issue8796

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-06-20: EuroPython 2011, Florence, Italy   27 da

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Vinay Sajip
Victor Stinner  haypocalc.com> writes:

> I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
> want to deprecate codecs.open() & friends because they want to be able
> to write code working on Python 2 and on Python 3 without any change. I
> don't think it's realistic: nontrivial programs require at least the six
> module, and most likely the 2to3 program. The six module can have its
> "codecs.open" function if codecs.open is removed from Python 3.4.

What's "non-trivial"? Both pip and virtualenv (widely used programs) were ported
to Python 3 using a single codebase for 2.x and 3.x, because it seemed to
involve the least ongoing maintenance burden. Though these particular programs
don't use codecs.open, I don't see much value in making it harder to write
programs which can run under both 2.x and 3.x; that's not going to speed
adoption of 3.x.

I find 2to3 very useful indeed for showing where changes may need to be made for
2.x/3.x portability, but do not use it as an automatic conversion tool. The six
module is very useful, too, but some projects won't necessarily want to add it
as an additional dependency, and reimplement just the parts they need from that
bag of tricks.

So I would also want to keep codecs.open() and friends, at least for now -
though it makes seems to make sense to implement them as wrappers (as Nick
suggested).

Regards,

Vinay Sajip


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit :
> So I would also want to keep codecs.open() and friends, at least for now

Well, I would agree to keep codecs.open() (if we patch it to reuse
TextIOWrapper and add a note to say that it is kept for backward
compatibiltiy and open() should be preferred in Python 3), but deprecate
StreamReader, StreamWriter and EncodedFile.

As I wrote, codecs.open() is useful in Python 2. But I don't know any
program or library using directly StreamReader or StreamWriter.

I found some projects (ex: twisted-mail, feeds2imap, pyflag, pygsm, ...)
implementing their own Python codec (cool!) and their codec has their
StreamReader and StreamWriter class, but I don't think that these
classes are used.

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :
> Please read PEP 100 regarding StreamReader and StreamWriter.
> Those codecs parts were explicitly designed to be stateful,
> unlike the stateless encoder/decoder methods.

Yes, it is possible to implement stateful StreamReader and StreamWriter
classes and we have such codecs (I gave the example of UTF-16), but the
state is not exposed (getstate / setstate), and so it's not possible to
write generic code to handle the codec state in the base StreamReader
and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0)
for example.

> Each codec can, however, implement variants which are optimized
> for the specific encoding or intercept certain stream methods
> to add functionality or improve the encoding/decoding
> performance.

Can you give me some examples?

> TextIOWrapper and StreamReaderWriter are merely wrappers
> around streams that make use of the codecs. They don't
> provide any codec logic themselves. That's the conceptual
> difference.
> ...
> StreamReader and StreamWriters ... work efficiently and
> directly on streams rather than buffers.

StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
The implementation is maybe different, but the API is just the same, and
so the usecases are just the same.

I don't see in which case I should use StreamReader or StreamWriter
instead TextIOWrapper. I thought that TextIOWrapper is specific to files
on disk, but TextIOWrapper is already used for other usages like
sockets.

> Here's my reply from the ticket regarding using incremental
> encoders/decoders for the StreamReader/Writer parts of the
> codec set of APIs:
> 
> """
> The point about having them use incremental codecs for encoding and
> decoding is a good one and would
> need to be investigated. If possible, we could use incremental
> encoders/decoders for the standard
> StreamReader/Writer base classes or add new
> IncrementalStreamReader/Writer classes which then use
> the IncrementalEncode/Decoder per default.

Why do you want to write a duplicate feature? TextIOWrapper is already
here, it's working and widely used.

I am working on codec issues (like CJK encodings, see #12100, #12057,
#12016) and I would like to remove StreamReader and StreamWriter to have
*less* code to maintain.

If you want to add more code, will be available to maintain it? It looks
like you are busy, some people (not me ;-)) are still
waiting .transform()/.untransform()!

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots update

2011-05-24 Thread Antoine Pitrou
On Mon, 23 May 2011 19:16:36 +0200
Tarek Ziadé  wrote:
> 
> I have now completed the cleanup and we're back on green-land for the
> stable bots.
> 
> The red slaves should get green when they catch up with the latest rev
> (they are slow). If they're not and they are failing in packaging or
> sysconfig let me know.
> 
> Sorry again if it has taken so long. Setting up Solaris and BSD VMs
> took some time ;)

Thank you very much! What a beautiful sight this is:
http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable

(until a sporadic failure comes up, that is)

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Artur Siekielski
2011/5/24 Sturla Molden :
>> Oh, and using explicit shared memory or mmap is much harder, because
>> you have to map the whole object graph into bytes.
>
> It sounds like you need PYRO, POSH or multiprocessing's proxy objects.

PYRO/multiprocessing proxies isn't a comparable solution because of
ORDERS OF MAGNITUDE worser performance. You compare here direct memory
access vs serialization/message passing through sockets/pipes.

POSH might be good, but the project is dead for 8 years. And this
copy-on-write is nice because you don't need changes/restrictions to
your code, or a special garbage collector.


Artur
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] "streams" vs "buffers"

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 10:03:22 +0200
"M.-A. Lemburg"  wrote:
> 
> StreamReader and StreamWriters are implemented by the codecs,
> they are part of the API that each codec has to provide in order
> to register in the Python codecs system. Their purpose is
> to provide a stateful interface and work efficiently and
> directly on streams rather than buffers.

I think you are trying to make a conceptual distinction which doesn't
exist in practice. Your OS uses buffers to represent "streams" to you.

Also, how come StreamReader has internal members named "bytebuffer",
"charbuffer" and "linebuffer"?
There certainly seems to be some (non-trivial) amount of buffering
going on there, and probably quite slow and inefficient since it's pure
Python (TextIOWrapper is written in C).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread M.-A. Lemburg
Victor Stinner wrote:
> Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :
>> Please read PEP 100 regarding StreamReader and StreamWriter.
>> Those codecs parts were explicitly designed to be stateful,
>> unlike the stateless encoder/decoder methods.
> 
> Yes, it is possible to implement stateful StreamReader and StreamWriter
> classes and we have such codecs (I gave the example of UTF-16), but the
> state is not exposed (getstate / setstate), and so it's not possible to
> write generic code to handle the codec state in the base StreamReader
> and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0)
> for example.

So instead of always suggesting to deprecate everything,
how about you come up with a proposal to add meaningful
new methods to those base classes ?

>> Each codec can, however, implement variants which are optimized
>> for the specific encoding or intercept certain stream methods
>> to add functionality or improve the encoding/decoding
>> performance.
> 
> Can you give me some examples?

See the UTF-16 codec in the stdlib for example. This uses
some of the available possibilities to interpret the BOM mark
and then switches the encoder/decoder methods accordingly.

A lot more could be done for other variable length encoding
codecs, e.g. UTF-8, since these often have problems near
the end of a read due to missing bytes.

The base class implementation provides a general purpose
implementation to cover the case, but it's not efficient,
since it doesn't know anything about the encoding
characteristics.

Such an implementation would have to be done per codec
and that's why we have per codec StreamReader/Writer
APIs.

>> TextIOWrapper and StreamReaderWriter are merely wrappers
>> around streams that make use of the codecs. They don't
>> provide any codec logic themselves. That's the conceptual
>> difference.
>> ...
>> StreamReader and StreamWriters ... work efficiently and
>> directly on streams rather than buffers.
> 
> StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
> have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
> The implementation is maybe different, but the API is just the same, and
> so the usecases are just the same.
> 
> I don't see in which case I should use StreamReader or StreamWriter
> instead TextIOWrapper. I thought that TextIOWrapper is specific to files
> on disk, but TextIOWrapper is already used for other usages like
> sockets.

I have no idea why TextIOWrapper was added to the stdlib
instead of making StreamReaderWriter more capable,
since StreamReaderWriter had already been available in Python
since Python 1.6 (and this is being used by codecs.open()).

Perhaps we should deprecate TextIOWrapper instead and
replace it with codecs.StreamReaderWriter ? ;-)

Seriously, I don't see use of TextIOWrapper as an argument
for removing StreamReader/Writer parts of the codecs API.

>> Here's my reply from the ticket regarding using incremental
>> encoders/decoders for the StreamReader/Writer parts of the
>> codec set of APIs:
>>
>> """
>> The point about having them use incremental codecs for encoding and
>> decoding is a good one and would
>> need to be investigated. If possible, we could use incremental
>> encoders/decoders for the standard
>> StreamReader/Writer base classes or add new
>> IncrementalStreamReader/Writer classes which then use
>> the IncrementalEncode/Decoder per default.
> 
> Why do you want to write a duplicate feature? TextIOWrapper is already
> here, it's working and widely used.

See above and please also try to understand why we have per-codec
implementations for streams. I'm tired of repeating myself.

I would much prefer to see the codec-specific functionality
in TextIOWrapper added back to the codecs where it
belongs.

> I am working on codec issues (like CJK encodings, see #12100, #12057,
> #12016) and I would like to remove StreamReader and StreamWriter to have
> *less* code to maintain.
>
> If you want to add more code, will be available to maintain it? It looks
> like you are busy, some people (not me ;-)) are still
> waiting .transform()/.untransform()!

I dropped the ball on the idea after the strong wave of
comments against those methods. People will simply have
to use codecs.encode() and codecs.decode().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-06-20: EuroPython 2011, Florence, Italy   27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
 

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Nick Coghlan
On Tue, May 24, 2011 at 6:58 PM, Victor Stinner
 wrote:
> StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
> have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
> The implementation is maybe different, but the API is just the same, and
> so the usecases are just the same.
>
> I don't see in which case I should use StreamReader or StreamWriter
> instead TextIOWrapper. I thought that TextIOWrapper is specific to files
> on disk, but TextIOWrapper is already used for other usages like
> sockets.

Back up a step here. It's important to remember that the codecs module
*long* predates the existence of the Python 3 I/O model and the io
module in particular.

Just as PEP 302 defines how module importers should be written, PEP
100 defines how text codecs should be written (i.e. in terms of
StreamReader and StreamWriter).

PEP 3116 then defines how such codecs can be used as part of the
overall I/O stack as redesigned for Python 3.

Now, there may be an opportunity here to rationalise things a bit and
re-use the *new* io module interfaces as the basis for an updated
codec API PEP, but we shouldn't be hasty in deprecating an old API
that is about "how to write codecs" just because it is similar to a
shiny new one that is about "how to process I/O data".

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots update

2011-05-24 Thread Nick Coghlan
On Tue, May 24, 2011 at 7:56 PM, Antoine Pitrou  wrote:
> Thank you very much! What a beautiful sight this is:
> http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable
>
> (until a sporadic failure comes up, that is)

I could turn test_crashers back on if you like ;)

Great work to all involved in tidying things up post-merge!

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Walter Dörwald
On 24.05.11 02:08, Victor Stinner wrote:

> [...]
> codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
> classes of the codecs module don't support universal newlines, still
> have some issues with stateful codecs (like UTF-16/32 BOMs), and each
> codec has to implement a StreamReader and a StreamWriter class.
> 
> StreamReader and StreamWriter are stateless codecs (no reset() or
> setstate() method),

They *are* stateful, they just don't expose their state to the public.

> and so it's not possible to write a generic fix for
> all child classes in the codecs module. Each stateful codec has to
> handle special cases like seek() problems.

Yes, which in theory makes it possible to implement shortcuts for
certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
character position by 4 to get the byte position). However AFAICR none
of the readers/writers does that.

> For example, UTF-16 codec
> duplicates some IncrementalEncoder/IncrementalDecoder code into its
> StreamWriter/StreamReader class.

Actually it's the other way round: When I implemented the incremental
codecs, I copied code from the StreamReader/StreamWriter classes.

> The io module is well tested, supports non-seekable streams, handles
> correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
> newlines including an "universal newline" mode. TextIOWrapper reuses
> incremental encoders and decoders, so BOM issues were fixed only once,
> in TextIOWrapper.
> 
> It's trivial to replace a call to codecs.open() by a call to open(),
> because the two API are very close. The main different is that
> codecs.open() doesn't support universal newline, so you have to use
> open(..., newline='') to keep the same behaviour (keep newlines
> unchanged). This task can be done by 2to3. But I suppose that most
> people will be happy with the universal newline mode.
> 
> I don't see which usecase is not covered by TextIOWrapper. But I know
> some cases which are not supported by StreamReader/StreamWriter.

This could be be partially fixed by implementing generic
StreamReader/StreamWriter classes that reuse the incremental codecs, but
I don't think thats worth it.

> [...] 

Servus,
   Walter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 20:25:11 +1000
Nick Coghlan  wrote:
> 
> Just as PEP 302 defines how module importers should be written, PEP
> 100 defines how text codecs should be written (i.e. in terms of
> StreamReader and StreamWriter).
> 
> PEP 3116 then defines how such codecs can be used as part of the
> overall I/O stack as redesigned for Python 3.

The I/O stack doesn't use StreamReader and StreamWriter. That's the
whole point. Stream* have been made useless by the new I/O stack.

> Now, there may be an opportunity here to rationalise things a bit and
> re-use the *new* io module interfaces as the basis for an updated
> codec API PEP, but we shouldn't be hasty in deprecating an old API
> that is about "how to write codecs" just because it is similar to a
> shiny new one that is about "how to process I/O data".

Ok, can you explain us the difference, concretely?

Thanks

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Łukasz Langa

Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:

>> I don't see which usecase is not covered by TextIOWrapper. But I know
>> some cases which are not supported by StreamReader/StreamWriter.
> 
> This could be be partially fixed by implementing generic
> StreamReader/StreamWriter classes that reuse the incremental codecs, but
> I don't think thats worth it.

Why not?

-- 
Best regards,
Łukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit :
> > I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
> > want to deprecate codecs.open() & friends because they want to be able
> > to write code working on Python 2 and on Python 3 without any change. I
> > don't think it's realistic: nontrivial programs require at least the six
> > module, and most likely the 2to3 program. The six module can have its
> > "codecs.open" function if codecs.open is removed from Python 3.4.
> 
> What's "non-trivial"? Both pip and virtualenv (widely used programs) were 
> ported
> to Python 3 using a single codebase for 2.x and 3.x, because it seemed to
> involve the least ongoing maintenance burden. Though these particular programs
> don't use codecs.open, I don't see much value in making it harder to write
> programs which can run under both 2.x and 3.x; that's not going to speed
> adoption of 3.x.

pip has a pip.backwardcompat module which is vey similar to six. If
codecs.open() is deprecated or removed, it will be trivial to add a
wrapper for codecs.open() or open() to six and pip.backwardcompat.
virtualenv.py starts also with a thin compatibility layer.

But yes, each program using a compatibily layer/module will have to be
updated.

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 12:16:49 +0200
Walter Dörwald  wrote:
> 
> > and so it's not possible to write a generic fix for
> > all child classes in the codecs module. Each stateful codec has to
> > handle special cases like seek() problems.
> 
> Yes, which in theory makes it possible to implement shortcuts for
> certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
> character position by 4 to get the byte position). However AFAICR none
> of the readers/writers does that.

And in practice, TextIOWrapper.tell() does a similar optimization in
a generic way. I'm linking to the Python implementation for readability:
http://hg.python.org/cpython/file/5c716437a83a/Lib/_pyio.py#l1741

TextIOWrapper.seek() is straightforward due to the structure of the
integer "cookie" returned by TextIOWrapper.tell().

In practice, TextIOWrapper gets much more love than
Stream{Reader,Writer} because it's an essential part of the new I/O
stack. As Victor said, problems which Stream* have had for years are
solved neatly in TextIOWrapper.

Therefore, leaving Stream{Reader,Writer} in is not a matter of "choice"
and "freedom given to users". It's giving people the misleading
possibility of using non-optimized, poorly debugged, less featureful
implementations of the same basic idea (an unicode stream abstraction).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit :
> Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:
> 
> >> I don't see which usecase is not covered by TextIOWrapper. But I know
> >> some cases which are not supported by StreamReader/StreamWriter.
> > 
> > This could be be partially fixed by implementing generic
> > StreamReader/StreamWriter classes that reuse the incremental codecs, but
> > I don't think thats worth it.
> 
> Why not?

We have already an implementation of this idea, it is called
io.TextIOWrapper.

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Maciej Fijalkowski
On Sun, May 22, 2011 at 1:57 AM, Artur Siekielski
 wrote:
> Hi.
> The problem with reference counters is that they are very often
> incremented/decremented, even for read-only algorithms (like traversal
> of a list). It has two drawbacks:
> 1. CPU cache lines (64 bytes on X86) containing a beginning of a
> PyObject are very often invalidated, resulting in loosing many chances
> to use the CPU caches

Not sure what scenario exactly are you discussing here, but storing
reference counts outside of objects has (at least on a single
processor) worse cache locality than inside objects.

>
> However the drawback is that such design introduces a new level of
> indirection which is a pointer inside a PyObject instead of a direct
> value. Also it seems that the "block" with refcounts would have to be
> a non-trivial data structure.

That would almost certainly be slower for most use cases, except for
the copy-on-write fork. I guess recycler papers might be an
interesting read:
http://www.research.ibm.com/people/d/dfb/recycler.html

This is the best reference-counting GC I'm aware of.

>
> I'm not a compiler/profiling expert so the main question is if such
> design can work, and maybe someone was thinking about something
> similar? And if CPython was profiled for CPU cache usage?

CPython was not designed for CPU cache usage as far as I'm aware.

>From my (heavily biased) point of view, PyPy is a way better platform
to perform such experiments (and PyPy has been profiled for CPU cache
usage). The main advantage is that you can code your GC without the
need to modify the interpreter. On the other hand you obviously don't
get benefits on CPython, but maybe it's worth experimenting.

Cheers,
fijal
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Walter Dörwald
On 24.05.11 12:58, Victor Stinner wrote:
> Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit :
>> Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:
>>
 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.
>>>
>>> This could be be partially fixed by implementing generic
>>> StreamReader/StreamWriter classes that reuse the incremental codecs, but
>>> I don't think thats worth it.
>>
>> Why not?
> 
> We have already an implementation of this idea, it is called
> io.TextIOWrapper.

Exactly.

From another post by Victor:

> As I wrote, codecs.open() is useful in Python 2. But I don't know any
> program or library using directly StreamReader or StreamWriter.

So: implementing this is a lot of work, duplicates existing
functionality and is mostly unused.

Servus,
   Walter




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Stefan Behnel

Maciej Fijalkowski, 24.05.2011 13:31:

CPython was not designed for CPU cache usage as far as I'm aware.


That's a pretty bold statement to make on this list. Even if it wasn't 
originally "designed" for (efficient?) CPU cache usage, it's certainly been 
around for long enough to have received numerous performance tweaks in that 
regard.


I doubt that efficient CPU cache usage was a major design goal of PyPy 
right from the start. IMHO, the project has changed its objectives way too 
many times to claim something like that, especially at the low level where 
the CPU cache becomes relevant. I remember that not so long ago, PyPy was 
hugely memory hungry compared to CPython. Although, one could certainly 
call *that* "designed for CPU cache usage"... ;)


Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden

Den 24.05.2011 11:55, skrev Artur Siekielski:


PYRO/multiprocessing proxies isn't a comparable solution because of
ORDERS OF MAGNITUDE worser performance. You compare here direct memory
access vs serialization/message passing through sockets/pipes.


The bottleneck is likely the serialization, but only if you serialize 
large objects. IPC is always very fast, at least on localhost .


Just out of curiosity, have you considered using a database? Sqlite and 
BSD DB can even be put in shared memory if you want. It sounds like you 
are trying to solve a database problem using os.fork, something which is 
more or less doomed to fail (i.e. you have to replicate all effort put 
into scaling up databases). If a database is too slow, I am rather sure 
you need something else than Python as well.


Sturla
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden

Den 24.05.2011 13:31, skrev Maciej Fijalkowski:


Not sure what scenario exactly are you discussing here, but storing
reference counts outside of objects has (at least on a single
processor) worse cache locality than inside objects.



Artur Siekielski is not talking about cache locality, but copy-on-write 
fork on Linux et al.


When reference counts are updated after forking, memory pages marked 
copy-on-write are copied if they store reference counts. And then he 
quickly runs out of memory. He wants to put reference counts and 
PyObjects in different pages, so only the pages with reference counts 
get copied.


I don't think he cares about cache locality at all, but the rest of us 
do :-)



Sturla





___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden

Den 24.05.2011 11:55, skrev Artur Siekielski:


POSH might be good, but the project is dead for 8 years. And this
copy-on-write is nice because you don't need changes/restrictions to
your code, or a special garbage collector.


Then I have a solution for you, one that is cheaper than anything else 
you are trying to do (taking work hours into account):


BUY MORE RAM!

RAM is damn cheap. You just need more of it. And 64-bit Python :-)


Sturla

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 14:05:26 +0200
Stefan Behnel  wrote:
> 
> I doubt that efficient CPU cache usage was a major design goal of PyPy 
> right from the start. IMHO, the project has changed its objectives way too 
> many times to claim something like that, especially at the low level where 
> the CPU cache becomes relevant. I remember that not so long ago, PyPy was 
> hugely memory hungry compared to CPython. Although, one could certainly 
> call *that* "designed for CPU cache usage"... ;)

Well, to be honest, "hugely memory hungry" doesn't necessarily mean
cache-averse. It depends on the locality of memory access patterns.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Stefan Behnel

Antoine Pitrou, 24.05.2011 14:32:

On Tue, 24 May 2011 14:05:26 +0200Stefan Behnel wrote:


I doubt that efficient CPU cache usage was a major design goal of PyPy
right from the start. IMHO, the project has changed its objectives way too
many times to claim something like that, especially at the low level where
the CPU cache becomes relevant. I remember that not so long ago, PyPy was
hugely memory hungry compared to CPython. Although, one could certainly
call *that* "designed for CPU cache usage"... ;)


Well, to be honest, "hugely memory hungry" doesn't necessarily mean
cache-averse. It depends on the locality of memory access patterns.


Sure. AFAIR (and Maciej is certainly the right person to prove me wrong), 
the problem at the time was that the overall memory footprint of objects 
was too high. That, at least, speaks against efficient cache usage and 
makes it's more likely to result in cache thrashing.


In any case, we're talking about a historical problem they already fixed.

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Nick Coghlan
On Tue, May 24, 2011 at 10:05 PM, Stefan Behnel  wrote:
> Maciej Fijalkowski, 24.05.2011 13:31:
>>
>> CPython was not designed for CPU cache usage as far as I'm aware.
>
> That's a pretty bold statement to make on this list. Even if it wasn't
> originally "designed" for (efficient?) CPU cache usage, it's certainly been
> around for long enough to have received numerous performance tweaks in that
> regard.

As a statement of Guido's original intent, I'd side with Maciej (Guido
has made it pretty clear that he subscribes to the "first, make it
work, and only worry about making it faster if that first approach
isn't good enough" school of thought). Various *parts* of CPython, on
the other hand, have indeed been optimised over the years to be quite
aware of potential low level CPU and RAM effects (e.g. dicts, sorting,
the small object allocator).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Artur Siekielski
2011/5/24 Sturla Molden :
> Den 24.05.2011 11:55, skrev Artur Siekielski:
>>
>> PYRO/multiprocessing proxies isn't a comparable solution because of
>> ORDERS OF MAGNITUDE worser performance. You compare here direct memory
>> access vs serialization/message passing through sockets/pipes.
> The bottleneck is likely the serialization, but only if you serialize large
> objects. IPC is always very fast, at least on localhost .

It cannot be "fast" compared to direct memory access. Here is a
benchmark: summing numbers in a small list in a child process using
multiprocessing "manager": http://dpaste.org/QzKr/ , and using
implicit copy of the structure after fork(): http://dpaste.org/q3eh/.
The first is 200 TIMES SLOWER. It means if the work finishes in 20
seconds using fork(), the same work will require more than one hour
using multiprocessing manager.

> If a database is too slow, I am rather sure you need
> something else than Python as well.

Disk access is about 1000x slower than memory access in C, and Python
in a worst case is 50x slower than C, so there is still a huge win
(not to mention that in a common case Python is only a few times
slower).


Artur
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Terry Reedy

On 5/24/2011 8:25 AM, Sturla Molden wrote:


Artur Siekielski is not talking about cache locality, but copy-on-write
fork on Linux et al.

When reference counts are updated after forking, memory pages marked
copy-on-write are copied if they store reference counts. And then he
quickly runs out of memory. He wants to put reference counts and
PyObjects in different pages, so only the pages with reference counts
get copied.

I don't think he cares about cache locality at all, but the rest of us
do :-)


It seems clear that separating reference counts from objects satisfies a 
specialized need and should be done in a spedial, patched version of 
CPython rather than the general distribution.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit :
> >
> > +.. function:: RAND_bytes(num)
> > +
> > +   Returns *num* cryptographically strong pseudo-random bytes.
> > +
> > +   .. versionadded:: 3.3
> > +
> > +.. function:: RAND_pseudo_bytes(num)
> > +
> > +   Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes,
> > +   is_cryptographic is True if the bytes generated are cryptographically
> > +   strong.
> > +
> > +   .. versionadded:: 3.3
> 
> I am curious what 'cryptographically strong' means, what the real 
> difference is between the above two functions, and how these do not 
> duplicate what is in random.random.

An important feature of a CPRNG (cryptographic pseudo-random number
generator) is that even if you know all of its output, you cannot
rebuild its internal state to guess next (or maybe previous number). The
CPRNG can for example hash its output using SHA-1: you will have to
"break" the SHA-1 hash (maybe using "salt").

Another important feature is that even if you know the internal state,
you will not be able to guess all previous and next numbers, because the
internal state is regulary updated using an external source of entropy.
Use RAND_add() to do that explicitly.

We may add a link to Wikipedia:
http://en.wikipedia.org/wiki/CPRNG

Read the "Requirements" section, it's maybe more correct than my
explanation:
http://en.wikipedia.org/wiki/CPRNG#Requirements

About the random module, it must not be used to generate passwords or
certificates, because it is easy to rebuild the internal state of a
Mersenne Twister generator if you know the previous 624 numbers. Since
you know the state, it's also easy to generate all next numbers. Seed a
Mersenne Twister PRNG doesn't help. See my Hasard project if you would
like to learn more about PRNG ;-)

We may also add a link from random to SSL.RAND_bytes() and
SSL.RAND_pseudo_bytes().

https://bitbucket.org/haypo/hasard/

Victor

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Terry Reedy

On 5/24/2011 6:14 AM, M.-A. Lemburg wrote:


I have no idea why TextIOWrapper was added to the stdlib
instead of making StreamReaderWriter more capable,
since StreamReaderWriter had already been available in Python
since Python 1.6 (and this is being used by codecs.open()).


As I understand it, you (and others) wrote codecs long ago and recently 
other people wrote the new i/o stack, which sometimes uses codecs, and 
when they needed to add a few details, they 'naturally' added them to 
the module they were working on and understood (and planned to rewrite 
in C) rather than to the older module that they maybe did not completely 
understand and which is only in Python.


The Victor comes along to do maintenance on some of the Asian codecs and 
discovers that he needs to make changes in two (or more?) places rather 
than one, which he naturally finds unsatifactory.



Perhaps we should deprecate TextIOWrapper instead and
replace it with codecs.StreamReaderWriter ? ;-)


I think we should separate two issues: removing internal implementation 
duplication and removing external api duplication. I should think that 
the former should not be too controversial. The latter, I know, is more 
contentious. One problem is that stdlib changes that perhaps 'should' 
have been made in 3.0/1 could not be discovered until the moratorium and 
greater focus on the stdlib.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots update

2011-05-24 Thread Terry Reedy

On 5/24/2011 6:27 AM, Nick Coghlan wrote:

On Tue, May 24, 2011 at 7:56 PM, Antoine Pitrou  wrote:

Thank you very much! What a beautiful sight this is:
http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable

(until a sporadic failure comes up, that is)


I could turn test_crashers back on if you like ;)


No need. One xp (but not the other) and win7 turned red again.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread geremy condra
On Tue, May 24, 2011 at 8:44 AM, Terry Reedy  wrote:
> On 5/24/2011 8:25 AM, Sturla Molden wrote:
>
>> Artur Siekielski is not talking about cache locality, but copy-on-write
>> fork on Linux et al.
>>
>> When reference counts are updated after forking, memory pages marked
>> copy-on-write are copied if they store reference counts. And then he
>> quickly runs out of memory. He wants to put reference counts and
>> PyObjects in different pages, so only the pages with reference counts
>> get copied.
>>
>> I don't think he cares about cache locality at all, but the rest of us
>> do :-)
>
> It seems clear that separating reference counts from objects satisfies a
> specialized need and should be done in a spedial, patched version of CPython
> rather than the general distribution.

I'm not sure I agree, especially given that the classical answer to
GIL woes has been to tell people to fork() themselves. There has to be
a lot of code out there that would benefit from this.

Geremy Condra
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: move specialized dir implementations into __dir__ methods (closes #12166)

2011-05-24 Thread Georg Brandl
On 24.05.2011 18:08, benjamin.peterson wrote:
> http://hg.python.org/cpython/rev/8f403199f999
> changeset:   70331:8f403199f999
> user:Benjamin Peterson 
> date:Tue May 24 11:09:06 2011 -0500
> summary:
>   move specialized dir implementations into __dir__ methods (closes #12166)

> +static PyMethodDef module_methods[] = {
> +{"__dir__", module_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> specialized dir() implementation")},
> +{0}
> +};

>  static PyMethodDef type_methods[] = {
>  {"mro", (PyCFunction)mro_external, METH_NOARGS,
>   PyDoc_STR("mro() -> list\nreturn a type's method resolution order")},
> @@ -2585,6 +2661,8 @@
>   PyDoc_STR("__instancecheck__() -> check if an object is an instance")},
>  {"__subclasscheck__", type___subclasscheck__, METH_O,
>   PyDoc_STR("__subclasscheck__() -> check if a class is a subclass")},
> +{"__dir__", type_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> specialized __dir__ implementation for types")},

>  static PyMethodDef object_methods[] = {
>  {"__reduce_ex__", object_reduce_ex, METH_VARARGS,
>   PyDoc_STR("helper for pickle")},
> @@ -3449,6 +3574,8 @@
>   PyDoc_STR("default object formatter")},
>  {"__sizeof__", object_sizeof, METH_NOARGS,
>   PyDoc_STR("__sizeof__() -> size of object in memory, in bytes")},
> +{"__dir__", object_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> default dir() implementation")},

This is interesting: I though we use "->" to specify the return value (or
its type).  __instancecheck__ and __subclasscheck__ set a different
precedent, while __sizeof__ follows.

I didn't look at the files to check for other examples.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: move specialized dir implementations into __dir__ methods (closes #12166)

2011-05-24 Thread Benjamin Peterson
2011/5/24 Georg Brandl :
> On 24.05.2011 18:08, benjamin.peterson wrote:
>> http://hg.python.org/cpython/rev/8f403199f999
>> changeset:   70331:8f403199f999
>> user:        Benjamin Peterson 
>> date:        Tue May 24 11:09:06 2011 -0500
>> summary:
>>   move specialized dir implementations into __dir__ methods (closes #12166)
>
>> +static PyMethodDef module_methods[] = {
>> +    {"__dir__", module_dir, METH_NOARGS,
>> +     PyDoc_STR("__dir__() -> specialized dir() implementation")},
>> +    {0}
>> +};
>
>>  static PyMethodDef type_methods[] = {
>>      {"mro", (PyCFunction)mro_external, METH_NOARGS,
>>       PyDoc_STR("mro() -> list\nreturn a type's method resolution order")},
>> @@ -2585,6 +2661,8 @@
>>       PyDoc_STR("__instancecheck__() -> check if an object is an instance")},
>>      {"__subclasscheck__", type___subclasscheck__, METH_O,
>>       PyDoc_STR("__subclasscheck__() -> check if a class is a subclass")},
>> +    {"__dir__", type_dir, METH_NOARGS,
>> +     PyDoc_STR("__dir__() -> specialized __dir__ implementation for 
>> types")},
>
>>  static PyMethodDef object_methods[] = {
>>      {"__reduce_ex__", object_reduce_ex, METH_VARARGS,
>>       PyDoc_STR("helper for pickle")},
>> @@ -3449,6 +3574,8 @@
>>       PyDoc_STR("default object formatter")},
>>      {"__sizeof__", object_sizeof, METH_NOARGS,
>>       PyDoc_STR("__sizeof__() -> size of object in memory, in bytes")},
>> +    {"__dir__", object_dir, METH_NOARGS,
>> +     PyDoc_STR("__dir__() -> default dir() implementation")},
>
> This is interesting: I though we use "->" to specify the return value (or
> its type).  __instancecheck__ and __subclasscheck__ set a different
> precedent, while __sizeof__ follows.

Yes, I was wondering about that, so I just picked one. :) "->" seems
to be better for return values, though, given the resemblance to
annotations.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Cesare Di Mauro
2011/5/24 Stefan Behnel 

> Maciej Fijalkowski, 24.05.2011 13:31:
>
>  CPython was not designed for CPU cache usage as far as I'm aware.
>>
>
>  That's a pretty bold statement to make on this list. Even if it wasn't
> originally "designed" for (efficient?) CPU cache usage, it's certainly been
> around for long enough to have received numerous performance tweaks in that
> regard.
>
> Stefan


Maybe a change on memory allocation granularity can help here.

Raising it to 16 and 32 bytes for 32 and 64 bits system respectively
guarantees that an access to ob_refcnt and/or ob_type will put on the cache
line some other information for the same object, which is usually required
by itself (except for very simple ones, such as PyNone, PyEllipsis, etc.).

Think about a long, a tuple, a list, a dictionary, ecc.: all of them have
some critical data after these fields, that most likely will be accessed
after INCRef or type checking.

Regards,
Cesare
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots update

2011-05-24 Thread Bill Janssen
Ned Deily  wrote:

> In article <[email protected]>,
>  "Stephen J. Turnbull"  wrote:
> > Are you saying you expect Mac OS X 10.4 "Tiger" to go green once the
> > bots update?  If so, I'm impressed, and "thank you!" to all involved.
> > Apple and MacPorts have long since washed their hands of that release.
> 
> OS X 10.4 does have its quirks that makes it challenging to get all of 
> the tests to run without a few cornercase failures but, besides the 
> buildbots, I still test regularly with 10.4 and occasionally build 
> there, too.  And, FWIW, while top-of-trunk MacPorts may not officially 
> support 10.4, many ports work there just fine including python2.6, 2.7, 
> and 3.1.  (3.2 has a build issue that may get fixed in 3.2.1).

Perhaps more importantly, parc-leopard-1 and parc-tiger-1 are two of the
very few usually-connected buildbots we have running on big-endian
architectures, along with loewis-sun (I *think* Solaris-10 on SPARC is
still big-endian).

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl

2011-05-24 Thread Terry Reedy

On 5/24/2011 12:06 PM, Victor Stinner wrote:

Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit :


+.. function:: RAND_bytes(num)
+
+   Returns *num* cryptographically strong pseudo-random bytes.
+
+   .. versionadded:: 3.3
+
+.. function:: RAND_pseudo_bytes(num)
+
+   Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes,
+   is_cryptographic is True if the bytes generated are cryptographically
+   strong.
+
+   .. versionadded:: 3.3


I am curious what 'cryptographically strong' means, what the real
difference is between the above two functions, and how these do not
duplicate what is in random.random.


An important feature of a CPRNG (cryptographic pseudo-random number
generator) is that even if you know all of its output, you cannot
rebuild its internal state to guess next (or maybe previous number). The
CPRNG can for example hash its output using SHA-1: you will have to
"break" the SHA-1 hash (maybe using "salt").


So it is presumably slower. I still do not get RAND_pseudo_bytes, which 
somehow decides internally what to do.



 Another important feature is that even if you know the internal state,
you will not be able to guess all previous and next numbers, because the
internal state is regulary updated using an external source of entropy.
Use RAND_add() to do that explicitly.

We may add a link to Wikipedia:
http://en.wikipedia.org/wiki/CPRNG


That would be helpful


Read the "Requirements" section, it's maybe more correct than my
explanation:
http://en.wikipedia.org/wiki/CPRNG#Requirements

About the random module, it must not be used to generate passwords or
certificates, because it is easy to rebuild the internal state of a
Mersenne Twister generator if you know the previous 624 numbers. Since
you know the state, it's also easy to generate all next numbers. Seed a
Mersenne Twister PRNG doesn't help. See my Hasard project if you would
like to learn more about PRNG ;-)

We may also add a link from random to SSL.RAND_bytes() and
SSL.RAND_pseudo_bytes().


--
Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Sturla Molden

Den 24.05.2011 17:39, skrev Artur Siekielski:


Disk access is about 1000x slower than memory access in C, and Python
in a worst case is 50x slower than C, so there is still a huge win
(not to mention that in a common case Python is only a few times
slower).


You can put databases in shared memory (e.g. Sqlite and BSDDB have options
for this). On linux you can also mount /dev/shm as ramdisk. Also, why
do you distrust the database developers of Oracle et al. not to do the
suffient optimizations?

Sturla


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Martin (gzlist)
On 24/05/2011, Victor Stinner  wrote:
>
> In Python 2, codecs.open() is the best way to read and/or write files
> using Unicode. But in Python 3, open() is preferred with its fast io
> module. I would like to deprecate codecs.open() because it can be
> replaced by open() and io.TextIOWrapper. I would like your opinion and
> that's why I'm writing this email.

There are some modules that try to stay compatible with Python 2 and 3
without a source translation step. Removing the codecs classes would
mean they'd have to add a few more compatibility hacks, but could be
done.

As an aside, I'm still not sure how the io module should be used.
Example, a simple task I've used StreamWriter classes for is to wrap
stdout. If the stdout.encoding can't represent a character, using
"replace" means you can write any unicode string without throwing a
UnicodeEncodeError.

With the io module, it seems you need to construct a new TextIOWrapper
object, passing the attributes of the old one as parameters, and as
soon as someone passes something that's not a TextIOWrapper (say, a
StringIO object) your code breaks. Is the intention that code dealing
with streams needs to be covered in isinstance checks in Python 3?

Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [pyodbc] Setting values to SQL_* constants while creating a connection object

2011-05-24 Thread srinivasan munisamy
Hi,
I would like to know how to set values to values to SQL_*  constants while
creatinga db connection through pyodbc module.
For example, i am getting a connection object like below:

In [27]: dbh1 =
pyodbc.connect("DSN=;UID=;PWD=;DATABASE=;APP=")

In [28]: dbh1.getinfo(pyodbc.SQL_DESCRIBE_PARAMETER)

Out[28]: True

I want to set this SQL_DESCRIBE_PARAMETER to false for this connection
object. How could i do that?
Please help me in figuring it out.

Thanks,
Srini
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [pyodbc] Setting values to SQL_* constants while creating a connection object

2011-05-24 Thread Terry Reedy

On 5/24/2011 5:09 PM, srinivasan munisamy wrote:

Hi,
I would like to know how to set values to values to SQL_*  constants


Please direct Python use questions to python-listor other user 
discussion forums. Py-dev is for discussion of development of the next 
versions of Python.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl

2011-05-24 Thread Nick Coghlan
On Wed, May 25, 2011 at 3:52 AM, Terry Reedy  wrote:
> On 5/24/2011 12:06 PM, Victor Stinner wrote:
>> An important feature of a CPRNG (cryptographic pseudo-random number
>> generator) is that even if you know all of its output, you cannot
>> rebuild its internal state to guess next (or maybe previous number). The
>> CPRNG can for example hash its output using SHA-1: you will have to
>> "break" the SHA-1 hash (maybe using "salt").
>
> So it is presumably slower. I still do not get RAND_pseudo_bytes, which
> somehow decides internally what to do.

The more important feature here is that it is exposing *OpenSSL's*
random number generation, rather than our own. A CPRNG isn't
*necessarily* slower than a non-crypto one (particularly on systems
with dedicated crypto hardware), but they can definitely fail to
return data if there isn't enough entropy available in the pool (and
the system has to have a usable entropy source in the first place).

The RAND_bytes() documentation should probably make it clearer that
unlike the random module and RAND_pseudo_bytes(), RAND_bytes() can
*fail* (by raising SSLError) if it isn't in a position to provide the
requested random data.

The pseudo_bytes version just encapsulates a fallback technique that
may be suitable in some circumstances: if crypto quality random data
is not available, fall back on PRNG data instead of failing. It is
most suitable for tasks like prototyping an algorithm in Python for
later conversion to C, or similar tasks where it is desirable to use
the OpenSSL PRNG over the one in the random module.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (234021dcad93): sum=61

2011-05-24 Thread Nick Coghlan
On Wed, May 25, 2011 at 1:09 PM,   wrote:
> results for 234021dcad93 on branch "default"
> 
>
> test_packaging leaked [128, 128, 128] references, sum=384

Is there a new cache in packaging that regrtest needs to know about
and either ignore or clear when checking reference counts?

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl

2011-05-24 Thread Petri Lehtinen
Terry Reedy wrote:
> On 5/24/2011 12:06 PM, Victor Stinner wrote:
> >Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit :
> >>>
> >>>+.. function:: RAND_bytes(num)
> >>>+
> >>>+   Returns *num* cryptographically strong pseudo-random bytes.
> >>>+
> >>>+   .. versionadded:: 3.3
> >>>+
> >>>+.. function:: RAND_pseudo_bytes(num)
> >>>+
> >>>+   Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes,
> >>>+   is_cryptographic is True if the bytes generated are cryptographically
> >>>+   strong.
> >>>+
> >>>+   .. versionadded:: 3.3
> >>
> >>I am curious what 'cryptographically strong' means, what the real
> >>difference is between the above two functions, and how these do not
> >>duplicate what is in random.random.
> >
> >An important feature of a CPRNG (cryptographic pseudo-random number
> >generator) is that even if you know all of its output, you cannot
> >rebuild its internal state to guess next (or maybe previous number). The
> >CPRNG can for example hash its output using SHA-1: you will have to
> >"break" the SHA-1 hash (maybe using "salt").
> 
> So it is presumably slower. I still do not get RAND_pseudo_bytes,
> which somehow decides internally what to do.

According to the RAND_bytes manual page from OpenSSL:

RAND_bytes() puts num cryptographically strong pseudo-random
bytes into buf. An error occurs if the PRNG has not been seeded
with enough randomness to ensure an unpredictable byte
sequence.

RAND_pseudo_bytes() puts num pseudo-random bytes into buf.
Pseudo-random byte sequences generated by RAND_pseudo_bytes() will
be unique if they are of sufficient length, but are not
necessarily unpredictable. They can be used for non-cryptographic
purposes and for certain purposes in cryptographic protocols, but
usually not for key generation etc.

And:

RAND_bytes() returns 1 on success, 0 otherwise. The error code can
be obtained by ERR_get_error(3). RAND_pseudo_bytes() returns 1 if
the bytes generated are cryptographically strong, 0 otherwise.
Both functions return -1 if they are not supported by the current
RAND method.

So it seems to me that RAND_bytes() either returns cryptographically
strong data or fails (is it possible to detect the failure with the
Python function? Should this be documented?). RAND_pseudo_bytes()
always succeeds but does not necessarily generate cryptographically
strong data.

> 
> > Another important feature is that even if you know the internal state,
> >you will not be able to guess all previous and next numbers, because the
> >internal state is regulary updated using an external source of entropy.
> >Use RAND_add() to do that explicitly.
> >
> >We may add a link to Wikipedia:
> >http://en.wikipedia.org/wiki/CPRNG
> 
> That would be helpful
> >
> >Read the "Requirements" section, it's maybe more correct than my
> >explanation:
> >http://en.wikipedia.org/wiki/CPRNG#Requirements
> >
> >About the random module, it must not be used to generate passwords or
> >certificates, because it is easy to rebuild the internal state of a
> >Mersenne Twister generator if you know the previous 624 numbers. Since
> >you know the state, it's also easy to generate all next numbers. Seed a
> >Mersenne Twister PRNG doesn't help. See my Hasard project if you would
> >like to learn more about PRNG ;-)
> >
> >We may also add a link from random to SSL.RAND_bytes() and
> >SSL.RAND_pseudo_bytes().

Obviously, the user needs to be familiar with the concept of
"cryptographically strong randomness" to use these functions.

Petri Lehtinen
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com