[issue9980] str(float) failure

2010-09-29 Thread John Machin
Changes by John Machin : -- nosy: +sjmachin ___ Python tracker <http://bugs.python.org/issue9980> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin
John Machin added the comment: Can somebody please review my "doc patch" submitted 2 months ago? -- ___ Python tracker <http://bugs.python.org/issue7198> ___ __

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin
John Machin added the comment: Skip, The changes that I suggested have NOT been made. Please re-read the doc page you pointed to. The "writer" paragraph does NOT mention that newline='' is required when writing. The "writer" examples do NOT include newline=&

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin
John Machin added the comment: The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been reviewed, let alone applied. Sibling bug #7198 has been closed in error. Somebody please help. -- nosy: +skip.montanaro ___ Python tracker <h

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin
John Machin added the comment: Terry, I have already made the point """the docs bug is #7198. This is the meaningful-exception bug.""" My review is """changing 'should' to 'must' is not very useful without a consistent interpr

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-23 Thread John Machin
John Machin added the comment: Please re-open this. The binary/text mode problem still exists with Python 3.X on Windows. Quite simply, there is no option available to the caller to open the output file in binary mode, because the module is throwing str objects at the file. The module's

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-26 Thread John Machin
John Machin added the comment: Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for csv.writer. It does NOT mention newline='', and neither does the example. Please fix. Other problems with the examples: (1) They encourage a bad habit (open inside the call t

[issue7198] Extraneous newlines with csv.writer on Windows

2011-01-19 Thread John Machin
John Machin added the comment: "docpatch" for 3.x csv docs: In the csv.writer docs, insert the sentence "If csvfile is a file object, it should be opened with newline=''." immediately after the sentence "csvfile can be any object with a write() method.

[issue10954] No warning for csv.writer API change

2011-01-20 Thread John Machin
John Machin added the comment: I believe that both csv.reader and csv.writer should fail with a meaningful message if mode is binary or newline is not '' -- ___ Python tracker <http://bugs.python.o

[issue10954] No warning for csv.writer API change

2011-01-22 Thread John Machin
John Machin added the comment: I don't understand "Changing csv api is a feature request that could only happen in 3.3". This is NOT a request for an API change. Lennert's point is that an API change was made in 3.0 as compared with 2.6 but there is no fixer in 2to3. What

[issue10954] No warning for csv.writer API change

2011-01-23 Thread John Machin
John Machin added the comment: Skip, the docs bug is #7198. This is the meaningful-exception bug. -- ___ Python tracker <http://bugs.python.org/issue10

[issue11204] re module: strange behaviour of space inside {m, n}

2011-02-12 Thread John Machin
New submission from John Machin : A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL TEXT "b{1, 3}" in normal mode and "b

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-30 Thread John Machin
New submission from John Machin : Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed "Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example: >>> print(ascii(b"\xc2\x41\x42&

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-31 Thread John Machin
John Machin added the comment: @lemburg: "failing byte" seems rather obvious: first byte that you meet that is not valid in the current state. I don't understand your explanation, especially "does not have the high bit set". I think you mean "is a valid start

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-31 Thread John Machin
John Machin added the comment: @ezio.melotti: Your second sentence is true, but it is not the whole truth. Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered part of the sequence because they (like 00-7F) are invalid as continuation bytes; they are either st

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: #ezio.melotti: """I'm considering valid all the bytes that start with '10...'""" Sorry, WRONG. Read what I wrote: """Further, some bytes in the range 80-BF are NOT always valid as the first con

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: Unicode has been frozen at 0x10. That's it. There is no such thing as a valid 5-byte or 6-byte UTF-8 string. -- ___ Python tracker <http://bugs.python.org/i

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: @lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now says 21 bits is it. F5-FF are declared to be invalid. I don't understand what you mean by "supporting those possibilities". The code is correctly issuing an error me

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: Patch review: Preamble: pardon my ignorance of how the codebase works, but trunk unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the r79542 that the

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: Chapter 3, page 94: """As a consequence of the well-formedness conditions specified in Table 3-7, the following byte values are disallowed in UTF-8: C0–C1, F5–FF""" Of course they should be handled by the simple expedient of setti

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin
John Machin added the comment: @lemburg: """perhaps applying the same logic as for the other sequences is a better strategy""" What other sequences??? F5-FF are invalid bytes; they don't start valid sequences. What same logic?? At the start of a charact

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-03 Thread John Machin
New submission from John Machin : According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932: http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http://demo.icu-project.org/icu-bin

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-04 Thread John Machin
John Machin added the comment: Thanks, Martin. Issue closed as far as I'm concerned. -- ___ Python tracker <http://bugs.python.org/issue8308> ___ ___ Pytho

[issue4574] reading UTF16-encoded text file crashes if \r on 64-char boundary

2008-12-07 Thread John Machin
New submission from John Machin <[EMAIL PROTECTED]>: Problem in the newline handling in io.py, class IncrementalNewlineDecoder, method decode. It reads text files in 128- byte chunks. Converting CR LF to \n requires special case handling when '\r' is detected at the end of the

[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.

2008-12-15 Thread John Machin
New submission from John Machin : These methods are parallel to str.join, seem to work as expected, and have "help" entries. However there is nothing in the Library Reference Manual about them. >>> help(bytearray.join) Help on method_descriptor: join(...) B.joi

[issue4669] bytes,join and bytearray.join not in manual; help for bytes.join is wrong.

2008-12-19 Thread John Machin
John Machin added the comment: Terry, you are right. I missed that. My report was based on looking via the index and finding only "(str method)", no "(byte[sarray] method)". ___ Python tracker <http://bu

[issue4742] 3.0 distutils byte-compiling -> Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
New submission from John Machin : File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool and (coincidentally) is still valid 2.x syntax. There are no syntax errors reported by any of the following: \python26\python -c "import foo3" \python26\python foo3.py

[issue4742] 3.0 distutils byte-compiling -> Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
John Machin added the comment: A clue: >>> print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252'))) '\xa0\u201c\u201d\xb7' Could be that it only happens where there's a cp1252 character that's not in latin1; see files x93.py and x94.py (have problem)

[issue4742] 3.0 distutils byte-compiling -> Syntax error: unknown encoding: cp1252

2008-12-24 Thread John Machin
Changes by John Machin : Removed file: http://bugs.python.org/file12445/py3encbug.zip ___ Python tracker <http://bugs.python.org/issue4742> ___ ___ Python-bugs-list mailin

[issue4743] intra-pkg multiple import (import local1, local2) not fixed

2008-12-24 Thread John Machin
New submission from John Machin : In a package, "import local1, local2" is not fixed. Here's some real live 2to3 output showing the problem and the workaround: import ExcelFormulaParser, ExcelFormulaLexer -import ExcelFormulaParser -import ExcelFormulaLexer +from . import Exc

[issue4742] 3.0 distutils byte-compiling -> Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin
John Machin added the comment: TWO POINTS: (1) I am not very concerned about chars like \x9d which are not valid in the declared encoding; I am more concerned with chars like \x93 and \x94 which *ARE* valid in the declared encoding. Please ensure that these cases are included in tests. (2

[issue4742] 3.0 distutils byte-compiling -> Syntax error: unknown encoding: cp1252

2008-12-30 Thread John Machin
John Machin added the comment: (1) what am I supposed to infer from "Yup"?? That all of that \x9d stuff was a mistake? (2) +def tearDown(self): +pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc') +if os.path.exists(pyc_file): +

[issue4626] compile() doesn't ignore the source encoding when a string is passed in

2008-12-30 Thread John Machin
Changes by John Machin : -- nosy: +sjmachin ___ Python tracker <http://bugs.python.org/issue4626> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue4971] Incorrect title case

2009-01-17 Thread John Machin
John Machin added the comment: Martin:"""Considering this note, the simple titlecase of U+01C5 *is* U+01C4: the titlecase value is omitted, hence it is the same as uppercase, hence it is U+01C4.""" Perhaps we are looking at different files; in the Unicode 5.1 Uni

[issue5107] built-in open(..., encoding=vague_default)

2009-01-29 Thread John Machin
New submission from John Machin : Docs say """The default encoding is platform dependent""" but don't say how to find out what that is, or how it is determined. On my Windows XP SP3 setup, the default is cp1252, but the best/only guess at finding out witho

[issue13782] xml.etree.ElementTree: Element.append doesn't type-check its argument

2012-01-13 Thread John Machin
New submission from John Machin : import xml.etree.ElementTree as et node = et.Element('x') node.append(not_an_Element_instance) 2.7 and 3.2 produce no complaint at all. 2.6 and 3.1 produce an AssertionError. However cElementTree in all 4 versions produces a TypeError. Please fix 2

[issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin
New submission from John Machin : Expected behaviour illustrated using "C": >>> import re >>> re.findall(r'[\C]', 'CCC') ['C', 'C', 'C'] >>> re.compile(r'[\C]', 128) literal 67 <_sre.SRE_Patte

[issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin
John Machin added the comment: @ezio: Of course the context is "inside a character class". I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that is the treatment applied to all other C-like control char escapes (2) the docs say

[issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin
John Machin added the comment: @Ezio: Comparison of the behaviour of \letter inside/outside character classes is irrelevant. The rules for inside can be expressed simply as: 1. Letters dDsSwW are special; they represent categories as documented, and do in fact have a similar meaning outside

[issue13899] re pattern r"[\A]" should work like "A" but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin
John Machin added the comment: Whoops: "normal Python rules for backslash escapes" should have had a note "but revert to the C behaviour of stripping the \ from unrecognised escapes" which is what re appears to do i

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-03 Thread John Machin
John Machin added the comment: Problem is memory leak from repeated calls of e.g. compiled_pattern.search(some_text). Task Manager performance panel shows increasing memory usage with regex but not with re. It appears to be cumulative i.e. changing to another pattern or text doesn't re

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-10 Thread John Machin
John Machin added the comment: Adding to vbr's report: [2.6.2, Win XP SP3] (1) bug mallocs memory inside loop (2) also happens to regex.findall with patterns 'a{0,0}' and '\B' (3) regex.sub('', 'x', 'abcde') has

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-11 Thread John Machin
John Machin added the comment: What is the expected timing comparison with re? Running the Aug10#3 version on Win XP SP3 with Python 2.6.3, I see regex typically running at only 20% to %50 of the speed of re in ASCII mode, with not-very-atypical tests (find all Python identifiers in a line

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-15 Thread John Machin
John Machin added the comment: Simplification of mark's first two problems: Problem 1: looks like regex's negative look-head assertion is broken >>> re.findall(r'(?!a)\w', 'abracadabra') ['b', 'r', 'c', 'd', 'b

[issue4847] csv fails when file is opened in binary mode

2009-02-23 Thread John Machin
John Machin added the comment: Sorry, folks, we've got an understanding problem here. CSV files are typically NOT created by text editors. They are created e.g. by "save as csv" from a spreadsheet program, or as an output option by some database query program. They can have

[issue5455] csv module no longer works as expected when file opened in binary mode

2009-03-08 Thread John Machin
John Machin added the comment: This is in effect a duplicate of issue 4847. Summary: The docs are CORRECT. The 3.X implementation is WRONG. The 2.X implementation is CORRECT. See examples in my comment on issue 4847. -- message_count: 3.0 -> 4.0 nosy: +sjmachin nosy_count: 2.0 -&g

[issue4847] csv fails when file is opened in binary mode

2009-03-08 Thread John Machin
John Machin added the comment: Before patching, could we discuss the requirements? There are two different concepts: (1) "text" file (assume that CR and/or LF are line terminators, and provide methods for accessing a line at a time) versus "binary" file (no such assumptions

[issue4847] csv fails when file is opened in binary mode

2009-03-08 Thread John Machin
John Machin added the comment: ... and it looks like Option 2 might already *almost* be in place. Continuing with the previous example (book1.csv has embedded lone LFs): C:\devel\csv>\python30\python -c "import csv; print(repr(list(csv.reader(open('book1.csv',&#x

[issue4847] csv fails when file is opened in binary mode

2009-03-09 Thread John Machin
John Machin added the comment: pitrou> Please look at the doc for open() and io.TextIOWrapper. The `newline` parameter defaults to None, which means universal newlines with newline translation. Setting to '' (yes, the empty string) enables universal newlines but disables newlin

[issue5095] msi missing from "bdist --help-formats"

2009-03-25 Thread John Machin
John Machin added the comment: The 2.6.1 documentation consists of a *single* line: "distutils.command.bdist_msi — Build a Microsoft Installer binary package". AFAICT this is the *only* mention of "msi" in the docs (outside the msilib module). I heard about it only by word-o

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-07-03 Thread John Machin
John Machin added the comment: About the E0 80 81 61 problem: my interpretation is that you are correct, the 80 is not valid in the current state (start byte == E0), so no look-ahead, three FFFDs must be issued followed by 0061. I don't really care about issuing too many FFFDs so long