[issue854511] Thai encoding alias for 'cp874'

2017-08-29 Thread era

era added the comment:

Closing the entire enhancement request just because one detail is off seems 
insane.

Anyway, until the day in the distant future when Python can support encoding 
names in common circulation, http://stackoverflow.com/a/1064191/874188 offers a 
crude workaround.


import encodings

if 'windows_874' not in encodings.aliases.aliases:
encodings.aliases.aliases['windows_874'] = 'cp874'

This is tricky in a number of ways; in practice, this snippet needs to be at 
the very start of your source file. Also, the underscore is correct even for 
email encoding names like =?windows-874?Q?hello=3F?= which use a dash (the dash 
gets remapped to underscore internally when looking up the encoding alias).

--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue854511>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35547] email.parser / email.policy does correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

2018-12-20 Thread era


era  added the comment:

I don't think this is a bug. My impression is that encoded words should be 
decodable in isolation.

--
nosy: +era

___
Python tracker 
<https://bugs.python.org/issue35547>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36261] email examples should not gratuitously mess with preamble

2019-03-11 Thread era


New submission from era :

Several of the examples in the email module documentation modify the preamble.

This is not good practice. The email MIME preamble is really only useful for 
communicating information about MIME itself, not for general human-readable 
content like 'Our family reunion'.

The MIME preamble is problematic because it typically only supports ASCII and 
often defaults to an English-language message, even when applications are used 
in locales where English is not widely understood.  For this reason, it is 
moderately useful to be able to override the preamble from Python code; but 
this should by no means be done routinely, and the documentation should 
certainly not demonstrate this in basic examples.

--
components: email
messages: 337657
nosy: barry, era, r.david.murray
priority: normal
severity: normal
status: open
title: email examples should not gratuitously mess with preamble
type: behavior
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue36261>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34459] email.contentmanager should use IANA encoding

2018-08-22 Thread era


New submission from era :

https://github.com/python/cpython/blob/3.7/Lib/email/contentmanager.py#L64 
currently contains the following code:

def get_text_content(msg, errors='replace'):
content = msg.get_payload(decode=True)
charset = msg.get_param('charset', 'ASCII')
return content.decode(charset, errors=errors)

This breaks when the IANA character set is not identical to the Python encoding 
name. For example, pass it a message with

Content-type: text/plain; charset=cp-850

This breaks for two separate reasons (and I will report two separate bugs); the 
IANA character-set label should be looked up and converted to a Python codec 
name (that's this bug) and the character-set alias 'cp-850' is not defined in 
the lookup table in the place.

There are probably other places in contentmanager.py where a similar mapping 
should take place. 

I do not have a proper patch, but in general outline, the fix would look like

+import email.charset
+
 def get_text_content(msg, errors='replace'):
content = msg.get_payload(decode=True)
charset = msg.get_param('charset', 'ASCII')
-   return content.decode(charset, errors=errors)
+   encoding = Charset(charset).output_charset()
+   return content.decode(encoding, errors=errors)

This was discovered in this Stack Overflow post: 
https://stackoverflow.com/a/51961225/874188

--
components: email
messages: 323869
nosy: barry, era, r.david.murray
priority: normal
severity: normal
status: open
title: email.contentmanager should use IANA encoding
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue34459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34460] email.charset: common IANA labels missing

2018-08-22 Thread era


New submission from era :

The email.charset module should contain common informal character-set 
identifiers even if they are not formally specified in a IANA RFC.

>From a quick grep of a pile of recent email, I find the following:

   46 "cp-850"
6 "windows-874"

For scale, the same collection contained around 10,000 messages with "utf-8" 
and 2,000 with "iso-8859-1".  Still, the fact that there are multiple 
occurrences in a spool of recent messages indicates that they are fairly common.

Currently, the email module throws a traceback if you attempt to parse a 
message whose character set is not known to Python. This is not possible to 
prevent in the general case, but making it more robust with encodings which are 
reasonably prevalent in the wild would definitely be desirable.  

For what it's worth, "cp-850" is apparently an alias for IBM code page 850 
which is defined with the name "cp850" in RFC1345.  "windows-874" is an 
official designation which is detailed in 
https://www.iana.org/assignments/charset-reg/windows-874 which is apparently 
equivalent to the Python codec "cp784".

--
components: email
messages: 323870
nosy: barry, era, r.david.murray
priority: normal
severity: normal
status: open
title: email.charset: common IANA labels missing
versions: Python 3.6

___
Python tracker 
<https://bugs.python.org/issue34460>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34459] email.contentmanager should use IANA encoding

2018-08-22 Thread era


era  added the comment:

https://bugs.python.org/issue34460 now requests the addition of "cp-850" and 
"windows-784" as charset aliases in the email.charset module.

--

___
Python tracker 
<https://bugs.python.org/issue34459>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17305] IDNA2008 encoding missing

2013-12-02 Thread era

era added the comment:

At least the following existing domain names are rejected by the current 
implementation, apparently because they are not IDNA2003-compatible.

XNNNC9BXA1KSA.COM
XN--14-CUD4D3A.COM
XN--YGB4AR5HPA.COM
XN---14-00E9E9A.COM
XN--MGB2DAM4BK.COM
XN--6-ZHCPPA1B7A.COM
XN--3-YMCCH8IVAY.COM
XN--3-YMCLXLE2A3F.COM
XN--4-ZHCJXA0E.COM
XN--014-QQEUW.COM
XN--118-Y2EK60DC2ZB.COM

As a workaround, in the code where I needed to process these, I used a fallback 
to string[4:].decode('punycode'); this was in a code path where I had already 
lowercased the string and established that string[0:4] == 'xn--'.

As a partial remedy, supporting a relaxed interpretation of the spec somehow 
would be useful; see also (tangentially) issue #12263.

--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue17305>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1757072] Zipfile robustness

2014-11-02 Thread era

era added the comment:

For those who cannot update just yet, see also the workaround at 
http://stackoverflow.com/a/21996397/874188

--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue1757072>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22929] cp874 encoding almost empty

2014-11-24 Thread era

New submission from era:

I created a simple script to map character codes in the 8bit range to Unicode 
for simple lookup:

https://github.com/tripleee/8bit

In the generated output, on Python 2.6.6 (but corroborated on Python 2.7.6), 
almost all character codes come up as "undefined" in CP874.

According to http://en.wikipedia.org/wiki/ISO/IEC_8859-11, CP874 should be a 
superset of ISO-8859-11, with a few character codes *added* in the ISO control 
range.

--
components: Unicode
messages: 231596
nosy: era, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: cp874 encoding almost empty
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue22929>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22929] cp874 encoding almost empty

2014-11-24 Thread era

era added the comment:

My apologies -- I already attemptd to close this as a mistake on my part, but 
apparently, that failed too.  )-:  Sorry.

--
resolution:  -> not a bug
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue22929>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17254] add thai encoding aliases to encodings.aliases

2015-01-22 Thread era

Changes by era :


--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue17254>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24430] ZipFile.read() cannot decrypt multiple members from Windows 7zfm

2015-06-11 Thread era

New submission from era:

The attached archive from the Windows version of the 7z file manager (7zFM 
version 9.20) cannot be decrypted into memory.  The first file succeeds, but 
the second one fails.

The following small program is able to unzip other encrypted zip archives 
(tried one created by Linux 7z version 9.04 on Debian from the package 
p7zip-full, and one from plain zip 3.0-3 which comes from the InfoZip 
distribution, as well as a number of archives of unknown provenance) but fails 
on the attached one.

from zipfile import ZipFile
from sys import argv

container = ZipFile(argv[1])
for member in container.namelist():
print("member %s" % member)
try:
extracted = container.read(member)
print("extracted %s" % repr(extracted)[0:64])
except RuntimeError, err:
extracted = container.read(member, 'hello')
container.setpassword('hello')
print("extracted with password 'hello': %s" % repr(extracted)[0:64])

Here is the output and backtrace:

member hello/
extracted ''
member hello/goodbye.txt
Traceback (most recent call last):
  File "./nst.py", line 13, in 
extracted = container.read(member, 'hello')
  File "/usr/lib/python2.6/zipfile.py", line 834, in read
return self.open(name, "r", pwd).read()
  File "/usr/lib/python2.6/zipfile.py", line 901, in open
raise RuntimeError("Bad password for file", name)
RuntimeError: ('Bad password for file', 'hello/goodbye.txt')

The 7z command is able to extract it just fine:

$ 7z -phello x /tmp/hello.zip

7-Zip 9.04 beta  Copyright (c) 1999-2009 Igor Pavlov  2009-05-30
p7zip Version 9.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

Processing archive: /tmp/hello.zip

Extracting  hello
Extracting  hello/goodbye.txt
Extracting  hello/hello.txt

Everything is Ok

Folders: 1
Files: 2
Size:   15
Compressed: 560

--
files: hello.zip
messages: 245165
nosy: era
priority: normal
severity: normal
status: open
title: ZipFile.read() cannot decrypt multiple members from Windows 7zfm
Added file: http://bugs.python.org/file39680/hello.zip

___
Python tracker 
<http://bugs.python.org/issue24430>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24430] ZipFile.read() cannot decrypt multiple members from Windows 7zFM

2015-06-11 Thread era

Changes by era :


--
components: +Library (Lib)
title: ZipFile.read() cannot decrypt multiple members from Windows 7zfm -> 
ZipFile.read() cannot decrypt multiple members from Windows 7zFM
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue24430>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24430] ZipFile.read() cannot decrypt multiple members from Windows 7zFM

2015-06-11 Thread era

era added the comment:

The call to .setpassword() doesn't seem to make any difference.  I was hoping 
it would offer a workaround, but it didn't.

--

___
Python tracker 
<http://bugs.python.org/issue24430>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28122] email.header.decode_header can not decode string with quotation

2016-09-13 Thread era

era added the comment:

The double quotes around the "human readable" part of the email address are not 
allowed.  Python is handling this correctly.

--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue28122>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28577] ipaddress.ip_network(...).hosts() returns nothing for an IPv4 /32

2016-11-01 Thread era

New submission from era:

I would expect the following code to return ['10.9.8.8'] but it returns an 
empty list.

yosemite-osx$ python3
Python 3.5.1 (default, Dec 26 2015, 18:08:53) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ipaddress
>>> list(ipaddress.ip_network('10.9.8.7/32').hosts())
[]

This seems to happen for every /32 address.  I'm guessing the logic which wants 
to exclude the gateway and broadcast addresses from a block should treat a /32 
as a special case.

I tried to look for a previous bug submission but I could not find one.  As 
such, it seems peculiar if this has not been reported before.  Is this actually 
expected behavior by some rule I am overlooking?

I tested on Linux 3.4 and OSX Yosemite Homebrew / Python 3.5.1.

--
components: Library (Lib)
messages: 279855
nosy: era
priority: normal
severity: normal
status: open
title: ipaddress.ip_network(...).hosts() returns nothing for an IPv4 /32
versions: Python 3.4, Python 3.5

___
Python tracker 
<http://bugs.python.org/issue28577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28577] ipaddress.ip_network(...).hosts() returns nothing for an IPv4 /32

2016-11-01 Thread era

era added the comment:

(Meh, silly typo, of course the expected output is ['10.9.8.7'], sorry about 
that!)

--

___
Python tracker 
<http://bugs.python.org/issue28577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28577] ipaddress.ip_network(...).hosts() returns nothing for an IPv4 /32

2016-11-01 Thread era

era added the comment:

@xiang.zhang thanks for the quick reply.

I find this behavior surprising.  If I process a list of addresses, like

ips = (
 '10.9.8.7/32'
 '10.11.12.8/28'
)

for test in ['10.9.8.7', '10.11.12.10']:
  if test in [str(y) for x in ips for y in ipaddress.ip_network(x).hosts()]:
print('{0} found'.format(test))
  else:
print('{0} not found'.format(test))

I would expect both addresses to print "found", but that's not how the current 
implementation works.

I agree that the /28 should not include the gateway and broadcast addresses, 
but I would not expect the explicitly listed /32 address to completely 
disappear from the output.

Are my expectations incorrect?  For code like this, what should I use instead, 
if not hosts()?

--

___
Python tracker 
<http://bugs.python.org/issue28577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28577] ipaddress.ip_network(...).hosts() returns nothing for an IPv4 /32

2016-11-01 Thread era

era added the comment:

Quick googling did not turn up anything like a credible authoritative reference 
for this, but in actual practice, I have seen /32 used to designate a single 
individual IP address in CIDR notation quite a lot.

I can see roughly three options:

  1. Status quo.  Silently surprise users who expect this to work.
  2. Silently fix.  Hard-code /32 to return a range of one IP address.
  3. Let users choose.  Similarly to the "strict=True" keyword argument in the 
constructor method, the code could allow for either lenient or strict semantics.

By the by, I don't see how the bug you linked to is relevant here, did you 
mistype the bug number?  #27863 is about _elementtree

--

___
Python tracker 
<http://bugs.python.org/issue28577>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27683] ipaddress subnet slicing iterator malfunction

2016-11-01 Thread era

era added the comment:

#28577 requests a similar special case for /32

--
nosy: +era

___
Python tracker 
<http://bugs.python.org/issue27683>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com