Re: PEP 393 vs UTF-8 Everywhere
eryk sun : > On Sat, Jan 21, 2017 at 8:21 PM, Pete Forman wrote: >> Marko Rauhamaa writes: >> py> low = '\uDC37' >>> >>> That should raise a SyntaxError exception. >> >> Quite. [...] > > CPython allows surrogate codes for use with the "surrogateescape" and > "surrogatepass" error handlers, which are used for POSIX and Windows > file-system encoding, respectively. Yes, but at the cost of violating Unicode, leading to unprintable strings etc. In my opinion, Python should have "stayed pure" instead of playing cheap tricks with surrogates. (Of course, Unicode itself is a mess, but that's another story.) Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 393 vs UTF-8 Everywhere
Steve D'Aprano : > On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote: >> Also, [surrogates] don't exist as Unicode code points. Python >> shouldn't allow surrogate characters in strings. > > Not quite. This is where it gets a bit messy and confusing. The bottom > line is: surrogates *are* code points, but they aren't *characters*. All animals are equal, but some animals are more equal than others. > Strings which contain surrogates are strictly speaking illegal, > although some programming languages (including Python) allow them. Python shouldn't allow them. > The Unicode standard defines surrogates as follows: > [...] > > - Surrogate Code Point. A Unicode code point in the range > U+D800..U+DFFF. Reserved for use by UTF-16, The writer of the standard is playing word games, maybe to offer a fig leaf to Windows, Java et al. > By the letter of the Unicode standard, [Python] should not do this, > but nevertheless it does and it appears to do no real harm and have > some benefit. I'm afraid Python's choice may lead to exploitable security holes in Python programs. >>> py> low = '\uDC37' >> >> That should raise a SyntaxError exception. > > If Python was strictly conforming, that is correct, but it turns out > there are some useful things you can do with strings if you allow > surrogates. Conceptual confusion is a high price to pay for such tricks. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: How to create a socket.socket() object from a socket fd?
On 2017-01-22 01:03, Grant Edwards wrote: > On 2017-01-21, Christian Heimes wrote: > >> You might be interested in my small module >> https://pypi.python.org/pypi/socketfromfd/ . I just releases a new >> version with a fix for Python 2. Thanks for the hint! :) >> >> The module correctly detects address family, socket type and proto from >> a fd. It works correctly with e.g. IPv6 or Unix sockets. Ticket >> https://bugs.python.org/issue28134 has additional background information >> on the matter. > > Yes, thanks! > > Just a few minutes ago I stumbled across that issue. For Python3, I > was using: > > sock = socket.socket(fileno=fd) > > But as you point out in that issue, the Python3 docs are wrong: when > using socket.socket(fileno=fd) you _do_ have to specify the correct > family and type parameters that correspond to the socket file > descriptor. So, I starting looking for os.getsockopt(), which doesn't > exist. > > I see you use ctypes to call gestsockopt (that was going to be my next > step). > > I suspect the code I'm working will end up being re-written in C for > the real product (so that it can run in-process in a thread rather > than as an external helper process). If not, I'll have to use your > module (or something like it) so that the solution will work on both > IPv4 and IPv6 TCP sockets (I'd also like it to work with Unix domain > sockets, but the piece at the other end of the socket connection > currently only supports TCP). I wanted to fix the function before 3.6.0 came out but I faced some resistance. My approach was deemed too magic and not fully functional on some platforms. Other core devs are underestimating the severity of the issue. Even in simple examples with IPv4 and IPv6 it breaks getpeername(). I'd appreciate if you could jump in a leave a comment on the ticket to show your support. :) By the way I just pushed another commit with a new feature just for you. But see for yourself: https://github.com/tiran/socketfromfd/commit/0a2fd6dae86267cedea5ae8b956e2876e6057c74 Christian -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 393 vs UTF-8 Everywhere
On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote: > Steve D'Aprano : > >> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote: >>> Also, [surrogates] don't exist as Unicode code points. Python >>> shouldn't allow surrogate characters in strings. >> >> Not quite. This is where it gets a bit messy and confusing. The bottom >> line is: surrogates *are* code points, but they aren't *characters*. > > All animals are equal, but some animals are more equal than others. Huh? >> Strings which contain surrogates are strictly speaking illegal, >> although some programming languages (including Python) allow them. > > Python shouldn't allow them. That's one opinion. >> The Unicode standard defines surrogates as follows: >> [...] >> >> - Surrogate Code Point. A Unicode code point in the range >> U+D800..U+DFFF. Reserved for use by UTF-16, > > The writer of the standard is playing word games, maybe to offer a fig > leaf to Windows, Java et al. Seriously? >> By the letter of the Unicode standard, [Python] should not do this, >> but nevertheless it does and it appears to do no real harm and have >> some benefit. > > I'm afraid Python's choice may lead to exploitable security holes in > Python programs. Feel free to back up that with an actual demonstration of an exploit, rather than just FUD. py> low = '\uDC37' >>> >>> That should raise a SyntaxError exception. >> >> If Python was strictly conforming, that is correct, but it turns out >> there are some useful things you can do with strings if you allow >> surrogates. > > Conceptual confusion is a high price to pay for such tricks. There's a lot to comprehend about Unicode. I don't see that Python's non-strict implementation is harder to understand than the strict version. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 393 vs UTF-8 Everywhere
Steve D'Aprano : > On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote: > >> Steve D'Aprano : >> >>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote: Also, [surrogates] don't exist as Unicode code points. Python shouldn't allow surrogate characters in strings. >>> >>> Not quite. This is where it gets a bit messy and confusing. The >>> bottom line is: surrogates *are* code points, but they aren't >>> *characters*. >> >> All animals are equal, but some animals are more equal than others. > > Huh? There is no difference between 0xD800 and 0xD800. They are both numbers that don't--and won't--represent anything in Unicode. It's pointless to call one a "code point" and not the other one. A code point that isn't code for anything can barely be called a code point. I'm guessing 0xD800 is called a code point because it was always called that. It was dropped out when UTF-16 was invented but they didn't want to "demote" the number retroactively, especially since Windows and Java already were allowing them in strings. >>> By the letter of the Unicode standard, [Python] should not do this, >>> but nevertheless it does and it appears to do no real harm and have >>> some benefit. >> >> I'm afraid Python's choice may lead to exploitable security holes in >> Python programs. > > Feel free to back up that with an actual demonstration of an exploit, > rather than just FUD. It might come as a surprise to programmers that pathnames cannot be UTF-encoded or displayed. Also, those situations might not show up during testing but only with appropriately crafted input. Marko -- https://mail.python.org/mailman/listinfo/python-list
Is Python SSL API thread-safe?
Is the Python SSL API thread-safe with respect to recv() and send()? IOW, can I have one thread doing blocking recv() calls on an SSL connection object while "simultaneously" a second thread is calling send() on that same connection object? I assumed that was allowed, but I can't find anything in the documentation that actually says it is. -- Grant -- https://mail.python.org/mailman/listinfo/python-list
Re: Is Python SSL API thread-safe?
On 2017-01-22, Grant Edwards wrote: > Is the Python SSL API thread-safe with respect to recv() and send()? > > IOW, can I have one thread doing blocking recv() calls on an SSL > connection object while "simultaneously" a second thread is calling > send() on that same connection object? I think this question is equivalent to asking "is OpenSSL thread-safe", the answer to which would appear to be "yes": https://www.openssl.org/docs/man1.0.2/crypto/threads.html (the necessary functions mentioned on that page, threadid_func and locking_function are indeed set by Python). -- https://mail.python.org/mailman/listinfo/python-list
Re: Is Python SSL API thread-safe?
On 2017-01-22 21:18, Grant Edwards wrote: > Is the Python SSL API thread-safe with respect to recv() and send()? > > IOW, can I have one thread doing blocking recv() calls on an SSL > connection object while "simultaneously" a second thread is calling > send() on that same connection object? > > I assumed that was allowed, but I can't find anything in the > documentation that actually says it is. OpenSSL and Python's ssl module are thread-safe. However IO is not safe concerning reentrancy. You cannot safely share a SSLSocket between threads without a mutex. Certain aspects of the TLS protocol can cause interesting side effects. A recv() call can send data across a wire and a send() call can receive data from the wire, e.g. during re-keying. In order to archive reentrancy, you have to do all IO yourself by operating the SSL connection in non-blocking mode or with a Memorio-BIO https://docs.python.org/3/library/ssl.html#ssl-nonblocking -- https://mail.python.org/mailman/listinfo/python-list
pyuno in Libreoffice 5.1.4.2
Does anyone know if the changes outlined here [1] have been implemented? Supposedly changes have been made to pyuno to make it more pythonic. Item 2 Cellranges says that: cell = sheet.getCellByPosition(cellCol + col, cellRow + row) Can be written as: cell = sheet.cellrange[cellRow + row, cellCol + col] But when I try that I get: Traceback (most recent call last): File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 68, in keyPressed move_selected_cell(1, 0) File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 112, in move_selected_cell cell = sheet.cellrange[cellRow + row, cellCol + col] AttributeError: cellrange Or maybe I am misunderstanding how to use it. Regards, Jim [1] https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30 -- https://mail.python.org/mailman/listinfo/python-list
Re: pyuno in Libreoffice 5.1.4.2
On 2017-01-23 00:10, Jim wrote: Does anyone know if the changes outlined here [1] have been implemented? Supposedly changes have been made to pyuno to make it more pythonic. Item 2 Cellranges says that: cell = sheet.getCellByPosition(cellCol + col, cellRow + row) Can be written as: cell = sheet.cellrange[cellRow + row, cellCol + col] But when I try that I get: Traceback (most recent call last): File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 68, in keyPressed move_selected_cell(1, 0) File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 112, in move_selected_cell cell = sheet.cellrange[cellRow + row, cellCol + col] AttributeError: cellrange Or maybe I am misunderstanding how to use it. I think it might be direct indexing of the sheet: cell = sheet[cellRow + row, cellCol + col] Regards, Jim [1] https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30 -- https://mail.python.org/mailman/listinfo/python-list
Re: Is Python SSL API thread-safe?
On 2017-01-22, Christian Heimes wrote: > On 2017-01-22 21:18, Grant Edwards wrote: >> Is the Python SSL API thread-safe with respect to recv() and send()? >> >> IOW, can I have one thread doing blocking recv() calls on an SSL >> connection object while "simultaneously" a second thread is calling >> send() on that same connection object? >> >> I assumed that was allowed, but I can't find anything in the >> documentation that actually says it is. > > OpenSSL and Python's ssl module are thread-safe. However IO is not safe > concerning reentrancy. You cannot safely share a SSLSocket between > threads without a mutex. Certain aspects of the TLS protocol can cause > interesting side effects. A recv() call can send data across a wire and > a send() call can receive data from the wire, e.g. during re-keying. > > In order to archive reentrancy, you have to do all IO yourself by > operating the SSL connection in non-blocking mode or with a Memorio-BIO > https://docs.python.org/3/library/ssl.html#ssl-nonblocking IOW, what I'm doing is not safe. Rats. -- Grant -- https://mail.python.org/mailman/listinfo/python-list
Re: pyuno in Libreoffice 5.1.4.2
On 01/22/2017 07:02 PM, MRAB wrote: On 2017-01-23 00:10, Jim wrote: Does anyone know if the changes outlined here [1] have been implemented? Supposedly changes have been made to pyuno to make it more pythonic. Item 2 Cellranges says that: cell = sheet.getCellByPosition(cellCol + col, cellRow + row) Can be written as: cell = sheet.cellrange[cellRow + row, cellCol + col] But when I try that I get: Traceback (most recent call last): File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 68, in keyPressed move_selected_cell(1, 0) File "/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", line 112, in move_selected_cell cell = sheet.cellrange[cellRow + row, cellCol + col] AttributeError: cellrange Or maybe I am misunderstanding how to use it. I think it might be direct indexing of the sheet: cell = sheet[cellRow + row, cellCol + col] Regards, Jim [1] https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30 You are correct, that worked. Thank you very much. Regards, Jim -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 393 vs UTF-8 Everywhere
On Mon, 23 Jan 2017 02:19 am, Marko Rauhamaa wrote: > Steve D'Aprano : > >> On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote: >> >>> Steve D'Aprano : >>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote: > Also, [surrogates] don't exist as Unicode code points. Python > shouldn't allow surrogate characters in strings. Not quite. This is where it gets a bit messy and confusing. The bottom line is: surrogates *are* code points, but they aren't *characters*. >>> >>> All animals are equal, but some animals are more equal than others. >> >> Huh? > > There is no difference between 0xD800 and 0xD800. Arithmetic disagrees: py> 0xD800 == 0xD800 False > They are both > numbers that don't--and won't--represent anything in Unicode. Your use of hex notation 0x... indicates that you're talking about code units rather than U+... code points. The first one 0xD800 could be: - a Little Endian double-byte code unit for 'Ø' in either UCS-2 or UTF-16; - a Big Endian double-byte code unit that has no special meaning in UCS-2; - one half of a surrogate pair (two double-byte code units) in Big Endian UTF-16, encoding some unknown supplementary code point. The second one 0xD800 could be: - a C long (four-byte int) 3623878656, which is out of range for Big Endian UCS-4 or UTF-32; - the Little Endian four-byte code unit for 'Ø' in either UCS-4 or UTF-32. > It's pointless to call one a "code point" and not the other one. Neither of them are code points. You're confusing the concrete representation with the abstract character. Perhaps you meant to compare the code point U+D800 to, well, there's no comparison to be made, because "U+D800" is not valid and is completely out of range. The largest code point is U+10. > A code point > that isn't code for anything can barely be called a code point. It does have a purpose. Or even more than one. - It ensures that there is a one-to-one mapping between code points and code units in any specific encoding and byte-order. - By reserving those code points, it ensures that they cannot be accidentally used by the standard for something else. - It makes it easier to talk about the entities: "U+D800 is a surrogate code point reserved for UTF-16 surrogates", as opposed to "U+D800 isn't anything, but if it was something, it would be a code point reserved for UTF-16 surrogates". - Or worse, forcing us to talk in terms of code units (implementation) instead of abstract characters, which is painfully verbose: "0xD800 in Big Endian UTF-16, or 0x00D8 in Little Endian UTF-16, or 0xD800 in Big Endian UTF-32, or 0x00D8 in Little Endian UTF-16, doesn't map to any code point but is reserved for UTF-16 surrogate pairs." And, an entirely unforeseen purpose: - It allows languages like Python to (ab)use surrogate code points for round-tripping file names which aren't valid Unicode. [...] >>> I'm afraid Python's choice may lead to exploitable security holes in >>> Python programs. >> >> Feel free to back up that with an actual demonstration of an exploit, >> rather than just FUD. > > It might come as a surprise to programmers that pathnames cannot be > UTF-encoded or displayed. Many things come as surprises to programmers, and many pathnames cannot be UTF-encoded. To be precise, Mac OS requires pathnames to be both valid and normalised UTF-8, and it would be nice if that practice spread. But Windows only requires pathnames to consist of UCS-2 code points, and Linux pathnames are arbitrary bytes that may include characters which are illegal on Windows. So you don't need to involve surrogates to have undecodable pathnames. > Also, those situations might not show up > during testing but only with appropriately crafted input. I'm not seeing a security exploit here. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list