Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Le mardi 24 mai 2011 à 15:24 +1000, Nick Coghlan a écrit : > On Tue, May 24, 2011 at 10:08 AM, Victor Stinner > wrote: > > It's trivial to replace a call to codecs.open() by a call to open(), > > because the two API are very close. The main different is that > > codecs.open() doesn't support universal newline, so you have to use > > open(..., newline='') to keep the same behaviour (keep newlines > > unchanged). This task can be done by 2to3. But I suppose that most > > people will be happy with the universal newline mode. > > Is there any reason that codecs.open() can't become a thin wrapper > around builtin open in 3.3? Yes, it's trivial to implement codecs.open using: def open(filename, mode='rb', encoding=None, errors='strict', buffering=1): return builtins.open(filename, mode, buffering, encoding, errors, newline='') But do you we really need two ways to open a file? Extract of import this: "There should be one-- and preferably only one --obvious way to do it." Another example: Python 3.2 has subprocess.Popen, os.popen and platform.popen to open a subprocess. platform.popen is now deprecated in Python 3.3. Well, it's already better than Python 2.5 which has os.popen(), os.popen2(), os.popen3(), os.popen4(), os.spawnl(), os.spawnle(), os.spawnlp(), os.spawnlpe(), os.spawnv(), os.spawnve(), os.spawnvp(), os.spawnvpe(), subprocess.Popen, platform.popen and maybe others :-) > How API compatible is TextIOWrapper with StreamReader/StreamWriter? It's fully compatible. > How hard would it to be change them to be adapters over the main IO > machinery rather than independent classes? I don't understand your proposition. We don't need StreamReader and StreamWriter to open a stream as a file text, only incremental decoders and encoders. Why do you want to keep them? Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Victor Stinner wrote: > Hi, > > In Python 2, codecs.open() is the best way to read and/or write files > using Unicode. But in Python 3, open() is preferred with its fast io > module. I would like to deprecate codecs.open() because it can be > replaced by open() and io.TextIOWrapper. I would like your opinion and > that's why I'm writing this email. I think you should have moved this part of your email further up, since it explains the reason why this idea was rejected for now: > I opened an issue for this idea. Brett and Marc-Andree Lemburg don't > want to deprecate codecs.open() & friends because they want to be able > to write code working on Python 2 and on Python 3 without any change. I > don't think it's realistic: nontrivial programs require at least the six > module, and most likely the 2to3 program. The six module can have its > "codecs.open" function if codecs.open is removed from Python 3.4. And now for something completely different: > codecs.open() and StreamReader, StreamWriter and StreamReaderWriter > classes of the codecs module don't support universal newlines, still > have some issues with stateful codecs (like UTF-16/32 BOMs), and each > codec has to implement a StreamReader and a StreamWriter class. > > StreamReader and StreamWriter are stateless codecs (no reset() or > setstate() method), and so it's not possible to write a generic fix for > all child classes in the codecs module. Each stateful codec has to > handle special cases like seek() problems. For example, UTF-16 codec > duplicates some IncrementalEncoder/IncrementalDecoder code into its > StreamWriter/StreamReader class. Please read PEP 100 regarding StreamReader and StreamWriter. Those codecs parts were explicitly designed to be stateful, unlike the stateless encoder/decoder methods. Please read my reply on the ticket: """ StreamReader and StreamWriter classes provide the base codec implementations for stateful interaction with streams. They define the interface and provide a working implementation for those codecs that choose not to implement their own variants. Each codec can, however, implement variants which are optimized for the specific encoding or intercept certain stream methods to add functionality or improve the encoding/decoding performance. Both are essential parts of the codec interface. TextIOWrapper and StreamReaderWriter are merely wrappers around streams that make use of the codecs. They don't provide any codec logic themselves. That's the conceptual difference. """ > The io module is well tested, supports non-seekable streams, handles > correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of > newlines including an "universal newline" mode. TextIOWrapper reuses > incremental encoders and decoders, so BOM issues were fixed only once, > in TextIOWrapper. > > It's trivial to replace a call to codecs.open() by a call to open(), > because the two API are very close. The main different is that > codecs.open() doesn't support universal newline, so you have to use > open(..., newline='') to keep the same behaviour (keep newlines > unchanged). This task can be done by 2to3. But I suppose that most > people will be happy with the universal newline mode. > > I don't see which usecase is not covered by TextIOWrapper. But I know > some cases which are not supported by StreamReader/StreamWriter. This is a misunderstanding of the concepts behind the two. StreamReader and StreamWriters are implemented by the codecs, they are part of the API that each codec has to provide in order to register in the Python codecs system. Their purpose is to provide a stateful interface and work efficiently and directly on streams rather than buffers. Here's my reply from the ticket regarding using incremental encoders/decoders for the StreamReader/Writer parts of the codec set of APIs: """ The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default. Please open a new ticket for this. """ > StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not > used in the Python 3 standard library. I tried removed them: except > tests of test_codecs which test them directly, the full test suite pass. > > Read the issue for more information: http://bugs.python.org/issue8796 -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 24 2011) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2011-06-20: EuroPython 2011, Florence, Italy 27 da
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Victor Stinner haypocalc.com> writes: > I opened an issue for this idea. Brett and Marc-Andree Lemburg don't > want to deprecate codecs.open() & friends because they want to be able > to write code working on Python 2 and on Python 3 without any change. I > don't think it's realistic: nontrivial programs require at least the six > module, and most likely the 2to3 program. The six module can have its > "codecs.open" function if codecs.open is removed from Python 3.4. What's "non-trivial"? Both pip and virtualenv (widely used programs) were ported to Python 3 using a single codebase for 2.x and 3.x, because it seemed to involve the least ongoing maintenance burden. Though these particular programs don't use codecs.open, I don't see much value in making it harder to write programs which can run under both 2.x and 3.x; that's not going to speed adoption of 3.x. I find 2to3 very useful indeed for showing where changes may need to be made for 2.x/3.x portability, but do not use it as an automatic conversion tool. The six module is very useful, too, but some projects won't necessarily want to add it as an additional dependency, and reimplement just the parts they need from that bag of tricks. So I would also want to keep codecs.open() and friends, at least for now - though it makes seems to make sense to implement them as wrappers (as Nick suggested). Regards, Vinay Sajip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit : > So I would also want to keep codecs.open() and friends, at least for now Well, I would agree to keep codecs.open() (if we patch it to reuse TextIOWrapper and add a note to say that it is kept for backward compatibiltiy and open() should be preferred in Python 3), but deprecate StreamReader, StreamWriter and EncodedFile. As I wrote, codecs.open() is useful in Python 2. But I don't know any program or library using directly StreamReader or StreamWriter. I found some projects (ex: twisted-mail, feeds2imap, pyflag, pygsm, ...) implementing their own Python codec (cool!) and their codec has their StreamReader and StreamWriter class, but I don't think that these classes are used. Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit : > Please read PEP 100 regarding StreamReader and StreamWriter. > Those codecs parts were explicitly designed to be stateful, > unlike the stateless encoder/decoder methods. Yes, it is possible to implement stateful StreamReader and StreamWriter classes and we have such codecs (I gave the example of UTF-16), but the state is not exposed (getstate / setstate), and so it's not possible to write generic code to handle the codec state in the base StreamReader and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0) for example. > Each codec can, however, implement variants which are optimized > for the specific encoding or intercept certain stream methods > to add functionality or improve the encoding/decoding > performance. Can you give me some examples? > TextIOWrapper and StreamReaderWriter are merely wrappers > around streams that make use of the codecs. They don't > provide any codec logic themselves. That's the conceptual > difference. > ... > StreamReader and StreamWriters ... work efficiently and > directly on streams rather than buffers. StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all have a file-like API: tell(), seek(), read(), readline(), write(), etc. The implementation is maybe different, but the API is just the same, and so the usecases are just the same. I don't see in which case I should use StreamReader or StreamWriter instead TextIOWrapper. I thought that TextIOWrapper is specific to files on disk, but TextIOWrapper is already used for other usages like sockets. > Here's my reply from the ticket regarding using incremental > encoders/decoders for the StreamReader/Writer parts of the > codec set of APIs: > > """ > The point about having them use incremental codecs for encoding and > decoding is a good one and would > need to be investigated. If possible, we could use incremental > encoders/decoders for the standard > StreamReader/Writer base classes or add new > IncrementalStreamReader/Writer classes which then use > the IncrementalEncode/Decoder per default. Why do you want to write a duplicate feature? TextIOWrapper is already here, it's working and widely used. I am working on codec issues (like CJK encodings, see #12100, #12057, #12016) and I would like to remove StreamReader and StreamWriter to have *less* code to maintain. If you want to add more code, will be available to maintain it? It looks like you are busy, some people (not me ;-)) are still waiting .transform()/.untransform()! Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Stable buildbots update
On Mon, 23 May 2011 19:16:36 +0200 Tarek Ziadé wrote: > > I have now completed the cleanup and we're back on green-land for the > stable bots. > > The red slaves should get green when they catch up with the latest rev > (they are slow). If they're not and they are failing in packaging or > sysconfig let me know. > > Sorry again if it has taken so long. Setting up Solaris and BSD VMs > took some time ;) Thank you very much! What a beautiful sight this is: http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable (until a sporadic failure comes up, that is) Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
2011/5/24 Sturla Molden : >> Oh, and using explicit shared memory or mmap is much harder, because >> you have to map the whole object graph into bytes. > > It sounds like you need PYRO, POSH or multiprocessing's proxy objects. PYRO/multiprocessing proxies isn't a comparable solution because of ORDERS OF MAGNITUDE worser performance. You compare here direct memory access vs serialization/message passing through sockets/pipes. POSH might be good, but the project is dead for 8 years. And this copy-on-write is nice because you don't need changes/restrictions to your code, or a special garbage collector. Artur ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] "streams" vs "buffers"
On Tue, 24 May 2011 10:03:22 +0200 "M.-A. Lemburg" wrote: > > StreamReader and StreamWriters are implemented by the codecs, > they are part of the API that each codec has to provide in order > to register in the Python codecs system. Their purpose is > to provide a stateful interface and work efficiently and > directly on streams rather than buffers. I think you are trying to make a conceptual distinction which doesn't exist in practice. Your OS uses buffers to represent "streams" to you. Also, how come StreamReader has internal members named "bytebuffer", "charbuffer" and "linebuffer"? There certainly seems to be some (non-trivial) amount of buffering going on there, and probably quite slow and inefficient since it's pure Python (TextIOWrapper is written in C). Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Victor Stinner wrote: > Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit : >> Please read PEP 100 regarding StreamReader and StreamWriter. >> Those codecs parts were explicitly designed to be stateful, >> unlike the stateless encoder/decoder methods. > > Yes, it is possible to implement stateful StreamReader and StreamWriter > classes and we have such codecs (I gave the example of UTF-16), but the > state is not exposed (getstate / setstate), and so it's not possible to > write generic code to handle the codec state in the base StreamReader > and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0) > for example. So instead of always suggesting to deprecate everything, how about you come up with a proposal to add meaningful new methods to those base classes ? >> Each codec can, however, implement variants which are optimized >> for the specific encoding or intercept certain stream methods >> to add functionality or improve the encoding/decoding >> performance. > > Can you give me some examples? See the UTF-16 codec in the stdlib for example. This uses some of the available possibilities to interpret the BOM mark and then switches the encoder/decoder methods accordingly. A lot more could be done for other variable length encoding codecs, e.g. UTF-8, since these often have problems near the end of a read due to missing bytes. The base class implementation provides a general purpose implementation to cover the case, but it's not efficient, since it doesn't know anything about the encoding characteristics. Such an implementation would have to be done per codec and that's why we have per codec StreamReader/Writer APIs. >> TextIOWrapper and StreamReaderWriter are merely wrappers >> around streams that make use of the codecs. They don't >> provide any codec logic themselves. That's the conceptual >> difference. >> ... >> StreamReader and StreamWriters ... work efficiently and >> directly on streams rather than buffers. > > StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all > have a file-like API: tell(), seek(), read(), readline(), write(), etc. > The implementation is maybe different, but the API is just the same, and > so the usecases are just the same. > > I don't see in which case I should use StreamReader or StreamWriter > instead TextIOWrapper. I thought that TextIOWrapper is specific to files > on disk, but TextIOWrapper is already used for other usages like > sockets. I have no idea why TextIOWrapper was added to the stdlib instead of making StreamReaderWriter more capable, since StreamReaderWriter had already been available in Python since Python 1.6 (and this is being used by codecs.open()). Perhaps we should deprecate TextIOWrapper instead and replace it with codecs.StreamReaderWriter ? ;-) Seriously, I don't see use of TextIOWrapper as an argument for removing StreamReader/Writer parts of the codecs API. >> Here's my reply from the ticket regarding using incremental >> encoders/decoders for the StreamReader/Writer parts of the >> codec set of APIs: >> >> """ >> The point about having them use incremental codecs for encoding and >> decoding is a good one and would >> need to be investigated. If possible, we could use incremental >> encoders/decoders for the standard >> StreamReader/Writer base classes or add new >> IncrementalStreamReader/Writer classes which then use >> the IncrementalEncode/Decoder per default. > > Why do you want to write a duplicate feature? TextIOWrapper is already > here, it's working and widely used. See above and please also try to understand why we have per-codec implementations for streams. I'm tired of repeating myself. I would much prefer to see the codec-specific functionality in TextIOWrapper added back to the codecs where it belongs. > I am working on codec issues (like CJK encodings, see #12100, #12057, > #12016) and I would like to remove StreamReader and StreamWriter to have > *less* code to maintain. > > If you want to add more code, will be available to maintain it? It looks > like you are busy, some people (not me ;-)) are still > waiting .transform()/.untransform()! I dropped the ball on the idea after the strong wave of comments against those methods. People will simply have to use codecs.encode() and codecs.decode(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 24 2011) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2011-06-20: EuroPython 2011, Florence, Italy 27 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On Tue, May 24, 2011 at 6:58 PM, Victor Stinner wrote: > StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all > have a file-like API: tell(), seek(), read(), readline(), write(), etc. > The implementation is maybe different, but the API is just the same, and > so the usecases are just the same. > > I don't see in which case I should use StreamReader or StreamWriter > instead TextIOWrapper. I thought that TextIOWrapper is specific to files > on disk, but TextIOWrapper is already used for other usages like > sockets. Back up a step here. It's important to remember that the codecs module *long* predates the existence of the Python 3 I/O model and the io module in particular. Just as PEP 302 defines how module importers should be written, PEP 100 defines how text codecs should be written (i.e. in terms of StreamReader and StreamWriter). PEP 3116 then defines how such codecs can be used as part of the overall I/O stack as redesigned for Python 3. Now, there may be an opportunity here to rationalise things a bit and re-use the *new* io module interfaces as the basis for an updated codec API PEP, but we shouldn't be hasty in deprecating an old API that is about "how to write codecs" just because it is similar to a shiny new one that is about "how to process I/O data". Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Stable buildbots update
On Tue, May 24, 2011 at 7:56 PM, Antoine Pitrou wrote: > Thank you very much! What a beautiful sight this is: > http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable > > (until a sporadic failure comes up, that is) I could turn test_crashers back on if you like ;) Great work to all involved in tidying things up post-merge! Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On 24.05.11 02:08, Victor Stinner wrote: > [...] > codecs.open() and StreamReader, StreamWriter and StreamReaderWriter > classes of the codecs module don't support universal newlines, still > have some issues with stateful codecs (like UTF-16/32 BOMs), and each > codec has to implement a StreamReader and a StreamWriter class. > > StreamReader and StreamWriter are stateless codecs (no reset() or > setstate() method), They *are* stateful, they just don't expose their state to the public. > and so it's not possible to write a generic fix for > all child classes in the codecs module. Each stateful codec has to > handle special cases like seek() problems. Yes, which in theory makes it possible to implement shortcuts for certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the character position by 4 to get the byte position). However AFAICR none of the readers/writers does that. > For example, UTF-16 codec > duplicates some IncrementalEncoder/IncrementalDecoder code into its > StreamWriter/StreamReader class. Actually it's the other way round: When I implemented the incremental codecs, I copied code from the StreamReader/StreamWriter classes. > The io module is well tested, supports non-seekable streams, handles > correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of > newlines including an "universal newline" mode. TextIOWrapper reuses > incremental encoders and decoders, so BOM issues were fixed only once, > in TextIOWrapper. > > It's trivial to replace a call to codecs.open() by a call to open(), > because the two API are very close. The main different is that > codecs.open() doesn't support universal newline, so you have to use > open(..., newline='') to keep the same behaviour (keep newlines > unchanged). This task can be done by 2to3. But I suppose that most > people will be happy with the universal newline mode. > > I don't see which usecase is not covered by TextIOWrapper. But I know > some cases which are not supported by StreamReader/StreamWriter. This could be be partially fixed by implementing generic StreamReader/StreamWriter classes that reuse the incremental codecs, but I don't think thats worth it. > [...] Servus, Walter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On Tue, 24 May 2011 20:25:11 +1000 Nick Coghlan wrote: > > Just as PEP 302 defines how module importers should be written, PEP > 100 defines how text codecs should be written (i.e. in terms of > StreamReader and StreamWriter). > > PEP 3116 then defines how such codecs can be used as part of the > overall I/O stack as redesigned for Python 3. The I/O stack doesn't use StreamReader and StreamWriter. That's the whole point. Stream* have been made useless by the new I/O stack. > Now, there may be an opportunity here to rationalise things a bit and > re-use the *new* io module interfaces as the basis for an updated > codec API PEP, but we shouldn't be hasty in deprecating an old API > that is about "how to write codecs" just because it is similar to a > shiny new one that is about "how to process I/O data". Ok, can you explain us the difference, concretely? Thanks Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16: >> I don't see which usecase is not covered by TextIOWrapper. But I know >> some cases which are not supported by StreamReader/StreamWriter. > > This could be be partially fixed by implementing generic > StreamReader/StreamWriter classes that reuse the incremental codecs, but > I don't think thats worth it. Why not? -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit : > > I opened an issue for this idea. Brett and Marc-Andree Lemburg don't > > want to deprecate codecs.open() & friends because they want to be able > > to write code working on Python 2 and on Python 3 without any change. I > > don't think it's realistic: nontrivial programs require at least the six > > module, and most likely the 2to3 program. The six module can have its > > "codecs.open" function if codecs.open is removed from Python 3.4. > > What's "non-trivial"? Both pip and virtualenv (widely used programs) were > ported > to Python 3 using a single codebase for 2.x and 3.x, because it seemed to > involve the least ongoing maintenance burden. Though these particular programs > don't use codecs.open, I don't see much value in making it harder to write > programs which can run under both 2.x and 3.x; that's not going to speed > adoption of 3.x. pip has a pip.backwardcompat module which is vey similar to six. If codecs.open() is deprecated or removed, it will be trivial to add a wrapper for codecs.open() or open() to six and pip.backwardcompat. virtualenv.py starts also with a thin compatibility layer. But yes, each program using a compatibily layer/module will have to be updated. Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On Tue, 24 May 2011 12:16:49 +0200
Walter Dörwald wrote:
>
> > and so it's not possible to write a generic fix for
> > all child classes in the codecs module. Each stateful codec has to
> > handle special cases like seek() problems.
>
> Yes, which in theory makes it possible to implement shortcuts for
> certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
> character position by 4 to get the byte position). However AFAICR none
> of the readers/writers does that.
And in practice, TextIOWrapper.tell() does a similar optimization in
a generic way. I'm linking to the Python implementation for readability:
http://hg.python.org/cpython/file/5c716437a83a/Lib/_pyio.py#l1741
TextIOWrapper.seek() is straightforward due to the structure of the
integer "cookie" returned by TextIOWrapper.tell().
In practice, TextIOWrapper gets much more love than
Stream{Reader,Writer} because it's an essential part of the new I/O
stack. As Victor said, problems which Stream* have had for years are
solved neatly in TextIOWrapper.
Therefore, leaving Stream{Reader,Writer} in is not a matter of "choice"
and "freedom given to users". It's giving people the misleading
possibility of using non-optimized, poorly debugged, less featureful
implementations of the same basic idea (an unicode stream abstraction).
Regards
Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit : > Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16: > > >> I don't see which usecase is not covered by TextIOWrapper. But I know > >> some cases which are not supported by StreamReader/StreamWriter. > > > > This could be be partially fixed by implementing generic > > StreamReader/StreamWriter classes that reuse the incremental codecs, but > > I don't think thats worth it. > > Why not? We have already an implementation of this idea, it is called io.TextIOWrapper. Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
On Sun, May 22, 2011 at 1:57 AM, Artur Siekielski wrote: > Hi. > The problem with reference counters is that they are very often > incremented/decremented, even for read-only algorithms (like traversal > of a list). It has two drawbacks: > 1. CPU cache lines (64 bytes on X86) containing a beginning of a > PyObject are very often invalidated, resulting in loosing many chances > to use the CPU caches Not sure what scenario exactly are you discussing here, but storing reference counts outside of objects has (at least on a single processor) worse cache locality than inside objects. > > However the drawback is that such design introduces a new level of > indirection which is a pointer inside a PyObject instead of a direct > value. Also it seems that the "block" with refcounts would have to be > a non-trivial data structure. That would almost certainly be slower for most use cases, except for the copy-on-write fork. I guess recycler papers might be an interesting read: http://www.research.ibm.com/people/d/dfb/recycler.html This is the best reference-counting GC I'm aware of. > > I'm not a compiler/profiling expert so the main question is if such > design can work, and maybe someone was thinking about something > similar? And if CPython was profiled for CPU cache usage? CPython was not designed for CPU cache usage as far as I'm aware. >From my (heavily biased) point of view, PyPy is a way better platform to perform such experiments (and PyPy has been profiled for CPU cache usage). The main advantage is that you can code your GC without the need to modify the interpreter. On the other hand you obviously don't get benefits on CPython, but maybe it's worth experimenting. Cheers, fijal ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On 24.05.11 12:58, Victor Stinner wrote: > Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit : >> Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16: >> I don't see which usecase is not covered by TextIOWrapper. But I know some cases which are not supported by StreamReader/StreamWriter. >>> >>> This could be be partially fixed by implementing generic >>> StreamReader/StreamWriter classes that reuse the incremental codecs, but >>> I don't think thats worth it. >> >> Why not? > > We have already an implementation of this idea, it is called > io.TextIOWrapper. Exactly. From another post by Victor: > As I wrote, codecs.open() is useful in Python 2. But I don't know any > program or library using directly StreamReader or StreamWriter. So: implementing this is a lot of work, duplicates existing functionality and is mostly unused. Servus, Walter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Maciej Fijalkowski, 24.05.2011 13:31: CPython was not designed for CPU cache usage as far as I'm aware. That's a pretty bold statement to make on this list. Even if it wasn't originally "designed" for (efficient?) CPU cache usage, it's certainly been around for long enough to have received numerous performance tweaks in that regard. I doubt that efficient CPU cache usage was a major design goal of PyPy right from the start. IMHO, the project has changed its objectives way too many times to claim something like that, especially at the low level where the CPU cache becomes relevant. I remember that not so long ago, PyPy was hugely memory hungry compared to CPython. Although, one could certainly call *that* "designed for CPU cache usage"... ;) Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Den 24.05.2011 11:55, skrev Artur Siekielski: PYRO/multiprocessing proxies isn't a comparable solution because of ORDERS OF MAGNITUDE worser performance. You compare here direct memory access vs serialization/message passing through sockets/pipes. The bottleneck is likely the serialization, but only if you serialize large objects. IPC is always very fast, at least on localhost . Just out of curiosity, have you considered using a database? Sqlite and BSD DB can even be put in shared memory if you want. It sounds like you are trying to solve a database problem using os.fork, something which is more or less doomed to fail (i.e. you have to replicate all effort put into scaling up databases). If a database is too slow, I am rather sure you need something else than Python as well. Sturla ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Den 24.05.2011 13:31, skrev Maciej Fijalkowski: Not sure what scenario exactly are you discussing here, but storing reference counts outside of objects has (at least on a single processor) worse cache locality than inside objects. Artur Siekielski is not talking about cache locality, but copy-on-write fork on Linux et al. When reference counts are updated after forking, memory pages marked copy-on-write are copied if they store reference counts. And then he quickly runs out of memory. He wants to put reference counts and PyObjects in different pages, so only the pages with reference counts get copied. I don't think he cares about cache locality at all, but the rest of us do :-) Sturla ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Den 24.05.2011 11:55, skrev Artur Siekielski: POSH might be good, but the project is dead for 8 years. And this copy-on-write is nice because you don't need changes/restrictions to your code, or a special garbage collector. Then I have a solution for you, one that is cheaper than anything else you are trying to do (taking work hours into account): BUY MORE RAM! RAM is damn cheap. You just need more of it. And 64-bit Python :-) Sturla ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
On Tue, 24 May 2011 14:05:26 +0200 Stefan Behnel wrote: > > I doubt that efficient CPU cache usage was a major design goal of PyPy > right from the start. IMHO, the project has changed its objectives way too > many times to claim something like that, especially at the low level where > the CPU cache becomes relevant. I remember that not so long ago, PyPy was > hugely memory hungry compared to CPython. Although, one could certainly > call *that* "designed for CPU cache usage"... ;) Well, to be honest, "hugely memory hungry" doesn't necessarily mean cache-averse. It depends on the locality of memory access patterns. Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Antoine Pitrou, 24.05.2011 14:32: On Tue, 24 May 2011 14:05:26 +0200Stefan Behnel wrote: I doubt that efficient CPU cache usage was a major design goal of PyPy right from the start. IMHO, the project has changed its objectives way too many times to claim something like that, especially at the low level where the CPU cache becomes relevant. I remember that not so long ago, PyPy was hugely memory hungry compared to CPython. Although, one could certainly call *that* "designed for CPU cache usage"... ;) Well, to be honest, "hugely memory hungry" doesn't necessarily mean cache-averse. It depends on the locality of memory access patterns. Sure. AFAIR (and Maciej is certainly the right person to prove me wrong), the problem at the time was that the overall memory footprint of objects was too high. That, at least, speaks against efficient cache usage and makes it's more likely to result in cache thrashing. In any case, we're talking about a historical problem they already fixed. Stefan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
On Tue, May 24, 2011 at 10:05 PM, Stefan Behnel wrote: > Maciej Fijalkowski, 24.05.2011 13:31: >> >> CPython was not designed for CPU cache usage as far as I'm aware. > > That's a pretty bold statement to make on this list. Even if it wasn't > originally "designed" for (efficient?) CPU cache usage, it's certainly been > around for long enough to have received numerous performance tweaks in that > regard. As a statement of Guido's original intent, I'd side with Maciej (Guido has made it pretty clear that he subscribes to the "first, make it work, and only worry about making it faster if that first approach isn't good enough" school of thought). Various *parts* of CPython, on the other hand, have indeed been optimised over the years to be quite aware of potential low level CPU and RAM effects (e.g. dicts, sorting, the small object allocator). Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
2011/5/24 Sturla Molden : > Den 24.05.2011 11:55, skrev Artur Siekielski: >> >> PYRO/multiprocessing proxies isn't a comparable solution because of >> ORDERS OF MAGNITUDE worser performance. You compare here direct memory >> access vs serialization/message passing through sockets/pipes. > The bottleneck is likely the serialization, but only if you serialize large > objects. IPC is always very fast, at least on localhost . It cannot be "fast" compared to direct memory access. Here is a benchmark: summing numbers in a small list in a child process using multiprocessing "manager": http://dpaste.org/QzKr/ , and using implicit copy of the structure after fork(): http://dpaste.org/q3eh/. The first is 200 TIMES SLOWER. It means if the work finishes in 20 seconds using fork(), the same work will require more than one hour using multiprocessing manager. > If a database is too slow, I am rather sure you need > something else than Python as well. Disk access is about 1000x slower than memory access in C, and Python in a worst case is 50x slower than C, so there is still a huge win (not to mention that in a common case Python is only a few times slower). Artur ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
On 5/24/2011 8:25 AM, Sturla Molden wrote: Artur Siekielski is not talking about cache locality, but copy-on-write fork on Linux et al. When reference counts are updated after forking, memory pages marked copy-on-write are copied if they store reference counts. And then he quickly runs out of memory. He wants to put reference counts and PyObjects in different pages, so only the pages with reference counts get copied. I don't think he cares about cache locality at all, but the rest of us do :-) It seems clear that separating reference counts from objects satisfies a specialized need and should be done in a spedial, patched version of CPython rather than the general distribution. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl
Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit : > > > > +.. function:: RAND_bytes(num) > > + > > + Returns *num* cryptographically strong pseudo-random bytes. > > + > > + .. versionadded:: 3.3 > > + > > +.. function:: RAND_pseudo_bytes(num) > > + > > + Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes, > > + is_cryptographic is True if the bytes generated are cryptographically > > + strong. > > + > > + .. versionadded:: 3.3 > > I am curious what 'cryptographically strong' means, what the real > difference is between the above two functions, and how these do not > duplicate what is in random.random. An important feature of a CPRNG (cryptographic pseudo-random number generator) is that even if you know all of its output, you cannot rebuild its internal state to guess next (or maybe previous number). The CPRNG can for example hash its output using SHA-1: you will have to "break" the SHA-1 hash (maybe using "salt"). Another important feature is that even if you know the internal state, you will not be able to guess all previous and next numbers, because the internal state is regulary updated using an external source of entropy. Use RAND_add() to do that explicitly. We may add a link to Wikipedia: http://en.wikipedia.org/wiki/CPRNG Read the "Requirements" section, it's maybe more correct than my explanation: http://en.wikipedia.org/wiki/CPRNG#Requirements About the random module, it must not be used to generate passwords or certificates, because it is easy to rebuild the internal state of a Mersenne Twister generator if you know the previous 624 numbers. Since you know the state, it's also easy to generate all next numbers. Seed a Mersenne Twister PRNG doesn't help. See my Hasard project if you would like to learn more about PRNG ;-) We may also add a link from random to SSL.RAND_bytes() and SSL.RAND_pseudo_bytes(). https://bitbucket.org/haypo/hasard/ Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On 5/24/2011 6:14 AM, M.-A. Lemburg wrote: I have no idea why TextIOWrapper was added to the stdlib instead of making StreamReaderWriter more capable, since StreamReaderWriter had already been available in Python since Python 1.6 (and this is being used by codecs.open()). As I understand it, you (and others) wrote codecs long ago and recently other people wrote the new i/o stack, which sometimes uses codecs, and when they needed to add a few details, they 'naturally' added them to the module they were working on and understood (and planned to rewrite in C) rather than to the older module that they maybe did not completely understand and which is only in Python. The Victor comes along to do maintenance on some of the Asian codecs and discovers that he needs to make changes in two (or more?) places rather than one, which he naturally finds unsatifactory. Perhaps we should deprecate TextIOWrapper instead and replace it with codecs.StreamReaderWriter ? ;-) I think we should separate two issues: removing internal implementation duplication and removing external api duplication. I should think that the former should not be too controversial. The latter, I know, is more contentious. One problem is that stdlib changes that perhaps 'should' have been made in 3.0/1 could not be discovered until the moratorium and greater focus on the stdlib. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Stable buildbots update
On 5/24/2011 6:27 AM, Nick Coghlan wrote: On Tue, May 24, 2011 at 7:56 PM, Antoine Pitrou wrote: Thank you very much! What a beautiful sight this is: http://www.python.org/dev/buildbot/all/waterfall?category=3.x.stable (until a sporadic failure comes up, that is) I could turn test_crashers back on if you like ;) No need. One xp (but not the other) and win7 turned red again. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
On Tue, May 24, 2011 at 8:44 AM, Terry Reedy wrote: > On 5/24/2011 8:25 AM, Sturla Molden wrote: > >> Artur Siekielski is not talking about cache locality, but copy-on-write >> fork on Linux et al. >> >> When reference counts are updated after forking, memory pages marked >> copy-on-write are copied if they store reference counts. And then he >> quickly runs out of memory. He wants to put reference counts and >> PyObjects in different pages, so only the pages with reference counts >> get copied. >> >> I don't think he cares about cache locality at all, but the rest of us >> do :-) > > It seems clear that separating reference counts from objects satisfies a > specialized need and should be done in a spedial, patched version of CPython > rather than the general distribution. I'm not sure I agree, especially given that the classical answer to GIL woes has been to tell people to fork() themselves. There has to be a lot of code out there that would benefit from this. Geremy Condra ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: move specialized dir implementations into __dir__ methods (closes #12166)
On 24.05.2011 18:08, benjamin.peterson wrote:
> http://hg.python.org/cpython/rev/8f403199f999
> changeset: 70331:8f403199f999
> user:Benjamin Peterson
> date:Tue May 24 11:09:06 2011 -0500
> summary:
> move specialized dir implementations into __dir__ methods (closes #12166)
> +static PyMethodDef module_methods[] = {
> +{"__dir__", module_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> specialized dir() implementation")},
> +{0}
> +};
> static PyMethodDef type_methods[] = {
> {"mro", (PyCFunction)mro_external, METH_NOARGS,
> PyDoc_STR("mro() -> list\nreturn a type's method resolution order")},
> @@ -2585,6 +2661,8 @@
> PyDoc_STR("__instancecheck__() -> check if an object is an instance")},
> {"__subclasscheck__", type___subclasscheck__, METH_O,
> PyDoc_STR("__subclasscheck__() -> check if a class is a subclass")},
> +{"__dir__", type_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> specialized __dir__ implementation for types")},
> static PyMethodDef object_methods[] = {
> {"__reduce_ex__", object_reduce_ex, METH_VARARGS,
> PyDoc_STR("helper for pickle")},
> @@ -3449,6 +3574,8 @@
> PyDoc_STR("default object formatter")},
> {"__sizeof__", object_sizeof, METH_NOARGS,
> PyDoc_STR("__sizeof__() -> size of object in memory, in bytes")},
> +{"__dir__", object_dir, METH_NOARGS,
> + PyDoc_STR("__dir__() -> default dir() implementation")},
This is interesting: I though we use "->" to specify the return value (or
its type). __instancecheck__ and __subclasscheck__ set a different
precedent, while __sizeof__ follows.
I didn't look at the files to check for other examples.
Georg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: move specialized dir implementations into __dir__ methods (closes #12166)
2011/5/24 Georg Brandl :
> On 24.05.2011 18:08, benjamin.peterson wrote:
>> http://hg.python.org/cpython/rev/8f403199f999
>> changeset: 70331:8f403199f999
>> user: Benjamin Peterson
>> date: Tue May 24 11:09:06 2011 -0500
>> summary:
>> move specialized dir implementations into __dir__ methods (closes #12166)
>
>> +static PyMethodDef module_methods[] = {
>> + {"__dir__", module_dir, METH_NOARGS,
>> + PyDoc_STR("__dir__() -> specialized dir() implementation")},
>> + {0}
>> +};
>
>> static PyMethodDef type_methods[] = {
>> {"mro", (PyCFunction)mro_external, METH_NOARGS,
>> PyDoc_STR("mro() -> list\nreturn a type's method resolution order")},
>> @@ -2585,6 +2661,8 @@
>> PyDoc_STR("__instancecheck__() -> check if an object is an instance")},
>> {"__subclasscheck__", type___subclasscheck__, METH_O,
>> PyDoc_STR("__subclasscheck__() -> check if a class is a subclass")},
>> + {"__dir__", type_dir, METH_NOARGS,
>> + PyDoc_STR("__dir__() -> specialized __dir__ implementation for
>> types")},
>
>> static PyMethodDef object_methods[] = {
>> {"__reduce_ex__", object_reduce_ex, METH_VARARGS,
>> PyDoc_STR("helper for pickle")},
>> @@ -3449,6 +3574,8 @@
>> PyDoc_STR("default object formatter")},
>> {"__sizeof__", object_sizeof, METH_NOARGS,
>> PyDoc_STR("__sizeof__() -> size of object in memory, in bytes")},
>> + {"__dir__", object_dir, METH_NOARGS,
>> + PyDoc_STR("__dir__() -> default dir() implementation")},
>
> This is interesting: I though we use "->" to specify the return value (or
> its type). __instancecheck__ and __subclasscheck__ set a different
> precedent, while __sizeof__ follows.
Yes, I was wondering about that, so I just picked one. :) "->" seems
to be better for return values, though, given the resemblance to
annotations.
--
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
2011/5/24 Stefan Behnel > Maciej Fijalkowski, 24.05.2011 13:31: > > CPython was not designed for CPU cache usage as far as I'm aware. >> > > That's a pretty bold statement to make on this list. Even if it wasn't > originally "designed" for (efficient?) CPU cache usage, it's certainly been > around for long enough to have received numerous performance tweaks in that > regard. > > Stefan Maybe a change on memory allocation granularity can help here. Raising it to 16 and 32 bytes for 32 and 64 bits system respectively guarantees that an access to ob_refcnt and/or ob_type will put on the cache line some other information for the same object, which is usually required by itself (except for very simple ones, such as PyNone, PyEllipsis, etc.). Think about a long, a tuple, a list, a dictionary, ecc.: all of them have some critical data after these fields, that most likely will be accessed after INCRef or type checking. Regards, Cesare ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Stable buildbots update
Ned Deily wrote: > In article <[email protected]>, > "Stephen J. Turnbull" wrote: > > Are you saying you expect Mac OS X 10.4 "Tiger" to go green once the > > bots update? If so, I'm impressed, and "thank you!" to all involved. > > Apple and MacPorts have long since washed their hands of that release. > > OS X 10.4 does have its quirks that makes it challenging to get all of > the tests to run without a few cornercase failures but, besides the > buildbots, I still test regularly with 10.4 and occasionally build > there, too. And, FWIW, while top-of-trunk MacPorts may not officially > support 10.4, many ports work there just fine including python2.6, 2.7, > and 3.1. (3.2 has a build issue that may get fixed in 3.2.1). Perhaps more importantly, parc-leopard-1 and parc-tiger-1 are two of the very few usually-connected buildbots we have running on big-endian architectures, along with loewis-sun (I *think* Solaris-10 on SPARC is still big-endian). Bill ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl
On 5/24/2011 12:06 PM, Victor Stinner wrote: Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit : +.. function:: RAND_bytes(num) + + Returns *num* cryptographically strong pseudo-random bytes. + + .. versionadded:: 3.3 + +.. function:: RAND_pseudo_bytes(num) + + Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes, + is_cryptographic is True if the bytes generated are cryptographically + strong. + + .. versionadded:: 3.3 I am curious what 'cryptographically strong' means, what the real difference is between the above two functions, and how these do not duplicate what is in random.random. An important feature of a CPRNG (cryptographic pseudo-random number generator) is that even if you know all of its output, you cannot rebuild its internal state to guess next (or maybe previous number). The CPRNG can for example hash its output using SHA-1: you will have to "break" the SHA-1 hash (maybe using "salt"). So it is presumably slower. I still do not get RAND_pseudo_bytes, which somehow decides internally what to do. Another important feature is that even if you know the internal state, you will not be able to guess all previous and next numbers, because the internal state is regulary updated using an external source of entropy. Use RAND_add() to do that explicitly. We may add a link to Wikipedia: http://en.wikipedia.org/wiki/CPRNG That would be helpful Read the "Requirements" section, it's maybe more correct than my explanation: http://en.wikipedia.org/wiki/CPRNG#Requirements About the random module, it must not be used to generate passwords or certificates, because it is easy to rebuild the internal state of a Mersenne Twister generator if you know the previous 624 numbers. Since you know the state, it's also easy to generate all next numbers. Seed a Mersenne Twister PRNG doesn't help. See my Hasard project if you would like to learn more about PRNG ;-) We may also add a link from random to SSL.RAND_bytes() and SSL.RAND_pseudo_bytes(). -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython optimization: storing reference counters outside of objects
Den 24.05.2011 17:39, skrev Artur Siekielski: Disk access is about 1000x slower than memory access in C, and Python in a worst case is 50x slower than C, so there is still a huge win (not to mention that in a common case Python is only a few times slower). You can put databases in shared memory (e.g. Sqlite and BSDDB have options for this). On linux you can also mount /dev/shm as ramdisk. Also, why do you distrust the database developers of Oracle et al. not to do the suffient optimizations? Sturla ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
On 24/05/2011, Victor Stinner wrote: > > In Python 2, codecs.open() is the best way to read and/or write files > using Unicode. But in Python 3, open() is preferred with its fast io > module. I would like to deprecate codecs.open() because it can be > replaced by open() and io.TextIOWrapper. I would like your opinion and > that's why I'm writing this email. There are some modules that try to stay compatible with Python 2 and 3 without a source translation step. Removing the codecs classes would mean they'd have to add a few more compatibility hacks, but could be done. As an aside, I'm still not sure how the io module should be used. Example, a simple task I've used StreamWriter classes for is to wrap stdout. If the stdout.encoding can't represent a character, using "replace" means you can write any unicode string without throwing a UnicodeEncodeError. With the io module, it seems you need to construct a new TextIOWrapper object, passing the attributes of the old one as parameters, and as soon as someone passes something that's not a TextIOWrapper (say, a StringIO object) your code breaks. Is the intention that code dealing with streams needs to be covered in isinstance checks in Python 3? Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [pyodbc] Setting values to SQL_* constants while creating a connection object
Hi,
I would like to know how to set values to values to SQL_* constants while
creatinga db connection through pyodbc module.
For example, i am getting a connection object like below:
In [27]: dbh1 =
pyodbc.connect("DSN=;UID=;PWD=;DATABASE=;APP=")
In [28]: dbh1.getinfo(pyodbc.SQL_DESCRIBE_PARAMETER)
Out[28]: True
I want to set this SQL_DESCRIBE_PARAMETER to false for this connection
object. How could i do that?
Please help me in figuring it out.
Thanks,
Srini
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [pyodbc] Setting values to SQL_* constants while creating a connection object
On 5/24/2011 5:09 PM, srinivasan munisamy wrote: Hi, I would like to know how to set values to values to SQL_* constants Please direct Python use questions to python-listor other user discussion forums. Py-dev is for discussion of development of the next versions of Python. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl
On Wed, May 25, 2011 at 3:52 AM, Terry Reedy wrote: > On 5/24/2011 12:06 PM, Victor Stinner wrote: >> An important feature of a CPRNG (cryptographic pseudo-random number >> generator) is that even if you know all of its output, you cannot >> rebuild its internal state to guess next (or maybe previous number). The >> CPRNG can for example hash its output using SHA-1: you will have to >> "break" the SHA-1 hash (maybe using "salt"). > > So it is presumably slower. I still do not get RAND_pseudo_bytes, which > somehow decides internally what to do. The more important feature here is that it is exposing *OpenSSL's* random number generation, rather than our own. A CPRNG isn't *necessarily* slower than a non-crypto one (particularly on systems with dedicated crypto hardware), but they can definitely fail to return data if there isn't enough entropy available in the pool (and the system has to have a usable entropy source in the first place). The RAND_bytes() documentation should probably make it clearer that unlike the random module and RAND_pseudo_bytes(), RAND_bytes() can *fail* (by raising SSLError) if it isn't in a position to provide the requested random data. The pseudo_bytes version just encapsulates a fallback technique that may be suitable in some circumstances: if crypto quality random data is not available, fall back on PRNG data instead of failing. It is most suitable for tasks like prototyping an algorithm in Python for later conversion to C, or similar tasks where it is desirable to use the OpenSSL PRNG over the one in the random module. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (234021dcad93): sum=61
On Wed, May 25, 2011 at 1:09 PM, wrote: > results for 234021dcad93 on branch "default" > > > test_packaging leaked [128, 128, 128] references, sum=384 Is there a new cache in packaging that regrtest needs to know about and either ignore or clear when checking reference counts? Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12049: Add RAND_bytes() and RAND_pseudo_bytes() functions to the ssl
Terry Reedy wrote: > On 5/24/2011 12:06 PM, Victor Stinner wrote: > >Le mardi 24 mai 2011 à 11:27 -0400, Terry Reedy a écrit : > >>> > >>>+.. function:: RAND_bytes(num) > >>>+ > >>>+ Returns *num* cryptographically strong pseudo-random bytes. > >>>+ > >>>+ .. versionadded:: 3.3 > >>>+ > >>>+.. function:: RAND_pseudo_bytes(num) > >>>+ > >>>+ Returns (bytes, is_cryptographic): bytes are *num* pseudo-random bytes, > >>>+ is_cryptographic is True if the bytes generated are cryptographically > >>>+ strong. > >>>+ > >>>+ .. versionadded:: 3.3 > >> > >>I am curious what 'cryptographically strong' means, what the real > >>difference is between the above two functions, and how these do not > >>duplicate what is in random.random. > > > >An important feature of a CPRNG (cryptographic pseudo-random number > >generator) is that even if you know all of its output, you cannot > >rebuild its internal state to guess next (or maybe previous number). The > >CPRNG can for example hash its output using SHA-1: you will have to > >"break" the SHA-1 hash (maybe using "salt"). > > So it is presumably slower. I still do not get RAND_pseudo_bytes, > which somehow decides internally what to do. According to the RAND_bytes manual page from OpenSSL: RAND_bytes() puts num cryptographically strong pseudo-random bytes into buf. An error occurs if the PRNG has not been seeded with enough randomness to ensure an unpredictable byte sequence. RAND_pseudo_bytes() puts num pseudo-random bytes into buf. Pseudo-random byte sequences generated by RAND_pseudo_bytes() will be unique if they are of sufficient length, but are not necessarily unpredictable. They can be used for non-cryptographic purposes and for certain purposes in cryptographic protocols, but usually not for key generation etc. And: RAND_bytes() returns 1 on success, 0 otherwise. The error code can be obtained by ERR_get_error(3). RAND_pseudo_bytes() returns 1 if the bytes generated are cryptographically strong, 0 otherwise. Both functions return -1 if they are not supported by the current RAND method. So it seems to me that RAND_bytes() either returns cryptographically strong data or fails (is it possible to detect the failure with the Python function? Should this be documented?). RAND_pseudo_bytes() always succeeds but does not necessarily generate cryptographically strong data. > > > Another important feature is that even if you know the internal state, > >you will not be able to guess all previous and next numbers, because the > >internal state is regulary updated using an external source of entropy. > >Use RAND_add() to do that explicitly. > > > >We may add a link to Wikipedia: > >http://en.wikipedia.org/wiki/CPRNG > > That would be helpful > > > >Read the "Requirements" section, it's maybe more correct than my > >explanation: > >http://en.wikipedia.org/wiki/CPRNG#Requirements > > > >About the random module, it must not be used to generate passwords or > >certificates, because it is easy to rebuild the internal state of a > >Mersenne Twister generator if you know the previous 624 numbers. Since > >you know the state, it's also easy to generate all next numbers. Seed a > >Mersenne Twister PRNG doesn't help. See my Hasard project if you would > >like to learn more about PRNG ;-) > > > >We may also add a link from random to SSL.RAND_bytes() and > >SSL.RAND_pseudo_bytes(). Obviously, the user needs to be familiar with the concept of "cryptographically strong randomness" to use these functions. Petri Lehtinen ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
