Re: PEP 393 vs UTF-8 Everywhere

2017-01-22 Thread Marko Rauhamaa
eryk sun :

> On Sat, Jan 21, 2017 at 8:21 PM, Pete Forman  wrote:
>> Marko Rauhamaa  writes:
>>
 py> low = '\uDC37'
>>>
>>> That should raise a SyntaxError exception.
>>
>> Quite. [...]
>
> CPython allows surrogate codes for use with the "surrogateescape" and
> "surrogatepass" error handlers, which are used for POSIX and Windows
> file-system encoding, respectively.

Yes, but at the cost of violating Unicode, leading to unprintable
strings etc. In my opinion, Python should have "stayed pure" instead of
playing cheap tricks with surrogates.

(Of course, Unicode itself is a mess, but that's another story.)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 393 vs UTF-8 Everywhere

2017-01-22 Thread Marko Rauhamaa
Steve D'Aprano :

> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
>> Also, [surrogates] don't exist as Unicode code points. Python
>> shouldn't allow surrogate characters in strings.
>
> Not quite. This is where it gets a bit messy and confusing. The bottom
> line is: surrogates *are* code points, but they aren't *characters*.

All animals are equal, but some animals are more equal than others.

> Strings which contain surrogates are strictly speaking illegal,
> although some programming languages (including Python) allow them.

Python shouldn't allow them.

> The Unicode standard defines surrogates as follows:
> [...]
>
> - Surrogate Code Point. A Unicode code point in the range 
>   U+D800..U+DFFF. Reserved for use by UTF-16,

The writer of the standard is playing word games, maybe to offer a fig
leaf to Windows, Java et al.

> By the letter of the Unicode standard, [Python] should not do this,
> but nevertheless it does and it appears to do no real harm and have
> some benefit.

I'm afraid Python's choice may lead to exploitable security holes in
Python programs.

>>> py> low = '\uDC37'
>> 
>> That should raise a SyntaxError exception.
>
> If Python was strictly conforming, that is correct, but it turns out
> there are some useful things you can do with strings if you allow
> surrogates.

Conceptual confusion is a high price to pay for such tricks.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to create a socket.socket() object from a socket fd?

2017-01-22 Thread Christian Heimes
On 2017-01-22 01:03, Grant Edwards wrote:
> On 2017-01-21, Christian Heimes  wrote:
> 
>> You might be interested in my small module
>> https://pypi.python.org/pypi/socketfromfd/ . I just releases a new
>> version with a fix for Python 2. Thanks for the hint! :)
>>
>> The module correctly detects address family, socket type and proto from
>> a fd. It works correctly with e.g. IPv6 or Unix sockets. Ticket
>> https://bugs.python.org/issue28134 has additional background information
>> on the matter.
> 
> Yes, thanks!
> 
> Just a few minutes ago I stumbled across that issue.  For Python3, I
> was using:
> 
>   sock = socket.socket(fileno=fd)
> 
> But as you point out in that issue, the Python3 docs are wrong: when
> using socket.socket(fileno=fd) you _do_ have to specify the correct
> family and type parameters that correspond to the socket file
> descriptor. So, I starting looking for os.getsockopt(), which doesn't
> exist.
> 
> I see you use ctypes to call gestsockopt (that was going to be my next
> step).
> 
> I suspect the code I'm working will end up being re-written in C for
> the real product (so that it can run in-process in a thread rather
> than as an external helper process).  If not, I'll have to use your
> module (or something like it) so that the solution will work on both
> IPv4 and IPv6 TCP sockets (I'd also like it to work with Unix domain
> sockets, but the piece at the other end of the socket connection
> currently only supports TCP).

I wanted to fix the function before 3.6.0 came out but I faced some
resistance. My approach was deemed too magic and not fully functional on
some platforms. Other core devs are underestimating the severity of the
issue. Even in simple examples with IPv4 and IPv6 it breaks
getpeername(). I'd appreciate if you could jump in a leave a comment on
the ticket to show your support. :)

By the way I just pushed another commit with a new feature just for you.
But see for yourself:

https://github.com/tiran/socketfromfd/commit/0a2fd6dae86267cedea5ae8b956e2876e6057c74

Christian
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 393 vs UTF-8 Everywhere

2017-01-22 Thread Steve D'Aprano
On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote:

> Steve D'Aprano :
> 
>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
>>> Also, [surrogates] don't exist as Unicode code points. Python
>>> shouldn't allow surrogate characters in strings.
>>
>> Not quite. This is where it gets a bit messy and confusing. The bottom
>> line is: surrogates *are* code points, but they aren't *characters*.
> 
> All animals are equal, but some animals are more equal than others.

Huh?


>> Strings which contain surrogates are strictly speaking illegal,
>> although some programming languages (including Python) allow them.
> 
> Python shouldn't allow them.

That's one opinion.


>> The Unicode standard defines surrogates as follows:
>> [...]
>>
>> - Surrogate Code Point. A Unicode code point in the range
>>   U+D800..U+DFFF. Reserved for use by UTF-16,
> 
> The writer of the standard is playing word games, maybe to offer a fig
> leaf to Windows, Java et al.

Seriously?


>> By the letter of the Unicode standard, [Python] should not do this,
>> but nevertheless it does and it appears to do no real harm and have
>> some benefit.
> 
> I'm afraid Python's choice may lead to exploitable security holes in
> Python programs.

Feel free to back up that with an actual demonstration of an exploit, rather
than just FUD.


 py> low = '\uDC37'
>>> 
>>> That should raise a SyntaxError exception.
>>
>> If Python was strictly conforming, that is correct, but it turns out
>> there are some useful things you can do with strings if you allow
>> surrogates.
> 
> Conceptual confusion is a high price to pay for such tricks.

There's a lot to comprehend about Unicode. I don't see that Python's
non-strict implementation is harder to understand than the strict version.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 393 vs UTF-8 Everywhere

2017-01-22 Thread Marko Rauhamaa
Steve D'Aprano :

> On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote:
>
>> Steve D'Aprano :
>> 
>>> On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
 Also, [surrogates] don't exist as Unicode code points. Python
 shouldn't allow surrogate characters in strings.
>>>
>>> Not quite. This is where it gets a bit messy and confusing. The
>>> bottom line is: surrogates *are* code points, but they aren't
>>> *characters*.
>> 
>> All animals are equal, but some animals are more equal than others.
>
> Huh?

There is no difference between 0xD800 and 0xD800. They are both
numbers that don't--and won't--represent anything in Unicode. It's
pointless to call one a "code point" and not the other one. A code point
that isn't code for anything can barely be called a code point.

I'm guessing 0xD800 is called a code point because it was always called
that. It was dropped out when UTF-16 was invented but they didn't want
to "demote" the number retroactively, especially since Windows and Java
already were allowing them in strings.

>>> By the letter of the Unicode standard, [Python] should not do this,
>>> but nevertheless it does and it appears to do no real harm and have
>>> some benefit.
>> 
>> I'm afraid Python's choice may lead to exploitable security holes in
>> Python programs.
>
> Feel free to back up that with an actual demonstration of an exploit,
> rather than just FUD.

It might come as a surprise to programmers that pathnames cannot be
UTF-encoded or displayed. Also, those situations might not show up
during testing but only with appropriately crafted input.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Is Python SSL API thread-safe?

2017-01-22 Thread Grant Edwards
Is the Python SSL API thread-safe with respect to recv() and send()?

IOW, can I have one thread doing blocking recv() calls on an SSL
connection object while "simultaneously" a second thread is calling
send() on that same connection object?

I assumed that was allowed, but I can't find anything in the
documentation that actually says it is.

-- 
Grant



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is Python SSL API thread-safe?

2017-01-22 Thread Jon Ribbens
On 2017-01-22, Grant Edwards  wrote:
> Is the Python SSL API thread-safe with respect to recv() and send()?
>
> IOW, can I have one thread doing blocking recv() calls on an SSL
> connection object while "simultaneously" a second thread is calling
> send() on that same connection object?

I think this question is equivalent to asking "is OpenSSL thread-safe",
the answer to which would appear to be "yes":
https://www.openssl.org/docs/man1.0.2/crypto/threads.html
(the necessary functions mentioned on that page, threadid_func and
locking_function are indeed set by Python).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is Python SSL API thread-safe?

2017-01-22 Thread Christian Heimes
On 2017-01-22 21:18, Grant Edwards wrote:
> Is the Python SSL API thread-safe with respect to recv() and send()?
> 
> IOW, can I have one thread doing blocking recv() calls on an SSL
> connection object while "simultaneously" a second thread is calling
> send() on that same connection object?
> 
> I assumed that was allowed, but I can't find anything in the
> documentation that actually says it is.

OpenSSL and Python's ssl module are thread-safe. However IO is not safe
concerning reentrancy. You cannot safely share a SSLSocket between
threads without a mutex. Certain aspects of the TLS protocol can cause
interesting side effects. A recv() call can send data across a wire and
a send() call can receive data from the wire, e.g. during re-keying.

In order to archive reentrancy, you have to do all IO yourself by
operating the SSL connection in non-blocking mode or with a Memorio-BIO
https://docs.python.org/3/library/ssl.html#ssl-nonblocking

-- 
https://mail.python.org/mailman/listinfo/python-list


pyuno in Libreoffice 5.1.4.2

2017-01-22 Thread Jim

Does anyone know if the changes outlined here [1] have been implemented?

Supposedly changes have been made to pyuno to make it more pythonic.

Item 2 Cellranges says that:
cell = sheet.getCellByPosition(cellCol + col, cellRow + row)
Can be written as:
cell = sheet.cellrange[cellRow + row, cellCol + col]

But when I try that I get:

Traceback (most recent call last):
  File 
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", 
line 68, in keyPressed

move_selected_cell(1, 0)
  File 
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py", 
line 112, in move_selected_cell

cell = sheet.cellrange[cellRow + row, cellCol + col]
AttributeError: cellrange

Or maybe I am misunderstanding how to use it.

Regards,  Jim

[1] 
https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30 



--
https://mail.python.org/mailman/listinfo/python-list


Re: pyuno in Libreoffice 5.1.4.2

2017-01-22 Thread MRAB

On 2017-01-23 00:10, Jim wrote:

Does anyone know if the changes outlined here [1] have been implemented?

Supposedly changes have been made to pyuno to make it more pythonic.

Item 2 Cellranges says that:
cell = sheet.getCellByPosition(cellCol + col, cellRow + row)
Can be written as:
cell = sheet.cellrange[cellRow + row, cellCol + col]

But when I try that I get:

Traceback (most recent call last):
   File
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py",
line 68, in keyPressed
 move_selected_cell(1, 0)
   File
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py",
line 112, in move_selected_cell
 cell = sheet.cellrange[cellRow + row, cellCol + col]
AttributeError: cellrange

Or maybe I am misunderstanding how to use it.


I think it might be direct indexing of the sheet:

cell = sheet[cellRow + row, cellCol + col]


Regards,  Jim

[1]
https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30




--
https://mail.python.org/mailman/listinfo/python-list


Re: Is Python SSL API thread-safe?

2017-01-22 Thread Grant Edwards
On 2017-01-22, Christian Heimes  wrote:
> On 2017-01-22 21:18, Grant Edwards wrote:
>> Is the Python SSL API thread-safe with respect to recv() and send()?
>> 
>> IOW, can I have one thread doing blocking recv() calls on an SSL
>> connection object while "simultaneously" a second thread is calling
>> send() on that same connection object?
>> 
>> I assumed that was allowed, but I can't find anything in the
>> documentation that actually says it is.
>
> OpenSSL and Python's ssl module are thread-safe. However IO is not safe
> concerning reentrancy. You cannot safely share a SSLSocket between
> threads without a mutex. Certain aspects of the TLS protocol can cause
> interesting side effects. A recv() call can send data across a wire and
> a send() call can receive data from the wire, e.g. during re-keying.
>
> In order to archive reentrancy, you have to do all IO yourself by
> operating the SSL connection in non-blocking mode or with a Memorio-BIO
> https://docs.python.org/3/library/ssl.html#ssl-nonblocking

IOW, what I'm doing is not safe.  Rats.

-- 
Grant



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pyuno in Libreoffice 5.1.4.2

2017-01-22 Thread Jim

On 01/22/2017 07:02 PM, MRAB wrote:

On 2017-01-23 00:10, Jim wrote:

Does anyone know if the changes outlined here [1] have been implemented?

Supposedly changes have been made to pyuno to make it more pythonic.

Item 2 Cellranges says that:
cell = sheet.getCellByPosition(cellCol + col, cellRow + row)
Can be written as:
cell = sheet.cellrange[cellRow + row, cellCol + col]

But when I try that I get:

Traceback (most recent call last):
   File
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py",

line 68, in keyPressed
 move_selected_cell(1, 0)
   File
"/home/jfb/.config/libreoffice/4/user/Scripts/python/enter_INV/enter_INV.py",

line 112, in move_selected_cell
 cell = sheet.cellrange[cellRow + row, cellCol + col]
AttributeError: cellrange

Or maybe I am misunderstanding how to use it.


I think it might be direct indexing of the sheet:

cell = sheet[cellRow + row, cellCol + col]


Regards,  Jim

[1]
https://cgit.freedesktop.org/libreoffice/core/commit/?id=af8143bc40cf2cfbc12e77c9bb7de01b655f7b30



You are correct, that worked. Thank you very much.

Regards,  Jim


--
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 393 vs UTF-8 Everywhere

2017-01-22 Thread Steve D'Aprano
On Mon, 23 Jan 2017 02:19 am, Marko Rauhamaa wrote:

> Steve D'Aprano :
> 
>> On Sun, 22 Jan 2017 07:34 pm, Marko Rauhamaa wrote:
>>
>>> Steve D'Aprano :
>>> 
 On Sun, 22 Jan 2017 06:52 am, Marko Rauhamaa wrote:
> Also, [surrogates] don't exist as Unicode code points. Python
> shouldn't allow surrogate characters in strings.

 Not quite. This is where it gets a bit messy and confusing. The
 bottom line is: surrogates *are* code points, but they aren't
 *characters*.
>>> 
>>> All animals are equal, but some animals are more equal than others.
>>
>> Huh?
> 
> There is no difference between 0xD800 and 0xD800. 

Arithmetic disagrees:

py> 0xD800 == 0xD800
False


> They are both 
> numbers that don't--and won't--represent anything in Unicode.

Your use of hex notation 0x... indicates that you're talking about code
units rather than U+... code points. The first one 0xD800 could be:

- a Little Endian double-byte code unit for 'Ø' in either UCS-2 or UTF-16;

- a Big Endian double-byte code unit that has no special meaning in UCS-2;

- one half of a surrogate pair (two double-byte code units) in Big Endian
  UTF-16, encoding some unknown supplementary code point.

The second one 0xD800 could be:

- a C long (four-byte int) 3623878656, which is out of range for Big Endian
  UCS-4 or UTF-32;

- the Little Endian four-byte code unit for 'Ø' in either UCS-4 or UTF-32.


> It's pointless to call one a "code point" and not the other one. 

Neither of them are code points. You're confusing the concrete
representation with the abstract character.

Perhaps you meant to compare the code point U+D800 to, well, there's no
comparison to be made, because "U+D800" is not valid and is completely
out of range. The largest code point is U+10.


> A code point 
> that isn't code for anything can barely be called a code point.

It does have a purpose. Or even more than one.

- It ensures that there is a one-to-one mapping between code points and
  code units in any specific encoding and byte-order.

- By reserving those code points, it ensures that they cannot be
  accidentally used by the standard for something else.

- It makes it easier to talk about the entities: "U+D800 is a surrogate 
  code point reserved for UTF-16 surrogates", as opposed to "U+D800 isn't
  anything, but if it was something, it would be a code point reserved 
  for UTF-16 surrogates".

- Or worse, forcing us to talk in terms of code units (implementation)
  instead of abstract characters, which is painfully verbose:

  "0xD800 in Big Endian UTF-16, or 0x00D8 in Little Endian UTF-16, or 
  0xD800 in Big Endian UTF-32, or 0x00D8 in Little Endian 
  UTF-16, doesn't map to any code point but is reserved for UTF-16
  surrogate pairs."


And, an entirely unforeseen purpose:

- It allows languages like Python to (ab)use surrogate code points for
  round-tripping file names which aren't valid Unicode.


[...]
>>> I'm afraid Python's choice may lead to exploitable security holes in
>>> Python programs.
>>
>> Feel free to back up that with an actual demonstration of an exploit,
>> rather than just FUD.
> 
> It might come as a surprise to programmers that pathnames cannot be
> UTF-encoded or displayed. 

Many things come as surprises to programmers, and many pathnames cannot be
UTF-encoded.

To be precise, Mac OS requires pathnames to be both valid and normalised
UTF-8, and it would be nice if that practice spread. But Windows only
requires pathnames to consist of UCS-2 code points, and Linux pathnames are
arbitrary bytes that may include characters which are illegal on Windows.
So you don't need to involve surrogates to have undecodable pathnames.


> Also, those situations might not show up 
> during testing but only with appropriately crafted input.

I'm not seeing a security exploit here.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list