Changes by John Machin :
--
nosy: +sjmachin
___
Python tracker
<http://bugs.python.org/issue9980>
___
___
Python-bugs-list mailing list
Unsubscribe:
John Machin added the comment:
Can somebody please review my "doc patch" submitted 2 months ago?
--
___
Python tracker
<http://bugs.python.org/issue7198>
___
__
John Machin added the comment:
Skip, The changes that I suggested have NOT been made. Please re-read the doc
page you pointed to. The "writer" paragraph does NOT mention that newline='' is
required when writing. The "writer" examples do NOT include newline=&
John Machin added the comment:
The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been
reviewed, let alone applied. Sibling bug #7198 has been closed in error.
Somebody please help.
--
nosy: +skip.montanaro
___
Python tracker
<h
John Machin added the comment:
Terry, I have already made the point """the docs bug is #7198. This is the
meaningful-exception bug."""
My review is """changing 'should' to 'must' is not very useful without a
consistent interpr
John Machin added the comment:
Please re-open this. The binary/text mode problem still exists with Python 3.X
on Windows. Quite simply, there is no option available to the caller to open
the output file in binary mode, because the module is throwing str objects at
the file. The module's
John Machin added the comment:
Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for
csv.writer. It does NOT mention newline='', and neither does the example.
Please fix.
Other problems with the examples: (1) They encourage a bad habit (open inside
the call t
John Machin added the comment:
"docpatch" for 3.x csv docs:
In the csv.writer docs, insert the sentence "If csvfile is a file object, it
should be opened with newline=''." immediately after the sentence "csvfile can
be any object with a write() method.
John Machin added the comment:
I believe that both csv.reader and csv.writer should fail with a meaningful
message if mode is binary or newline is not ''
--
___
Python tracker
<http://bugs.python.o
John Machin added the comment:
I don't understand "Changing csv api is a feature request that could only
happen in 3.3". This is NOT a request for an API change. Lennert's point is
that an API change was made in 3.0 as compared with 2.6 but there is no fixer
in 2to3. What
John Machin added the comment:
Skip, the docs bug is #7198. This is the meaningful-exception bug.
--
___
Python tracker
<http://bugs.python.org/issue10
New submission from John Machin :
A pattern like r"b{1,3}\Z" matches "b", "bb", and "bbb", as expected. There is
no documentation of the behaviour of r"b{1, 3}\Z" -- it matches the LITERAL
TEXT "b{1, 3}" in normal mode and "b
New submission from John Machin :
Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed "Constraints on
Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't
comply. Using the Unicode example:
>>> print(ascii(b"\xc2\x41\x42&
John Machin added the comment:
@lemburg: "failing byte" seems rather obvious: first byte that you meet that is
not valid in the current state. I don't understand your explanation, especially
"does not have the high bit set". I think you mean "is a valid start
John Machin added the comment:
@ezio.melotti: Your second sentence is true, but it is not the whole truth.
Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered
part of the sequence because they (like 00-7F) are invalid as continuation
bytes; they are either st
John Machin added the comment:
#ezio.melotti: """I'm considering valid all the bytes that start with '10...'"""
Sorry, WRONG. Read what I wrote: """Further, some bytes in the range 80-BF are
NOT always valid as the first con
John Machin added the comment:
Unicode has been frozen at 0x10. That's it. There is no such thing as a
valid 5-byte or 6-byte UTF-8 string.
--
___
Python tracker
<http://bugs.python.org/i
John Machin added the comment:
@lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now
says 21 bits is it. F5-FF are declared to be invalid. I don't understand what
you mean by "supporting those possibilities". The code is correctly issuing an
error me
John Machin added the comment:
Patch review:
Preamble: pardon my ignorance of how the codebase works, but trunk
unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k
unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the
r79542 that the
John Machin added the comment:
Chapter 3, page 94: """As a consequence of the well-formedness conditions
specified in Table 3-7, the following byte values are disallowed in UTF-8:
C0–C1, F5–FF"""
Of course they should be handled by the simple expedient of setti
John Machin added the comment:
@lemburg: """perhaps applying the same logic as for the other sequences is a
better strategy"""
What other sequences??? F5-FF are invalid bytes; they don't start valid
sequences. What same logic?? At the start of a charact
New submission from John Machin :
According to the following references, the bytes 80, A0, FD, FE, and FF are not
defined in cp932:
http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://demo.icu-project.org/icu-bin
John Machin added the comment:
Thanks, Martin. Issue closed as far as I'm concerned.
--
___
Python tracker
<http://bugs.python.org/issue8308>
___
___
Pytho
New submission from John Machin <[EMAIL PROTECTED]>:
Problem in the newline handling in io.py, class
IncrementalNewlineDecoder, method decode. It reads text files in 128-
byte chunks. Converting CR LF to \n requires special case handling
when '\r' is detected at the end of the
New submission from John Machin :
These methods are parallel to str.join, seem to work as expected, and
have "help" entries. However there is nothing in the Library Reference
Manual about them.
>>> help(bytearray.join)
Help on method_descriptor:
join(...)
B.joi
John Machin added the comment:
Terry, you are right. I missed that. My report was based on looking via
the index and finding only "(str method)", no "(byte[sarray] method)".
___
Python tracker
<http://bu
New submission from John Machin :
File foo3.py is [cut down (orig 87Kb)] output of 2to3 conversion tool
and (coincidentally) is still valid 2.x syntax. There are no syntax
errors reported by any of the following:
\python26\python -c "import foo3"
\python26\python foo3.py
John Machin added the comment:
A clue:
>>> print(ascii(b'\xa0\x93\x94\xb7'.decode('cp1252')))
'\xa0\u201c\u201d\xb7'
Could be that it only happens where there's a cp1252 character that's
not in latin1; see files x93.py and x94.py (have problem)
Changes by John Machin :
Removed file: http://bugs.python.org/file12445/py3encbug.zip
___
Python tracker
<http://bugs.python.org/issue4742>
___
___
Python-bugs-list mailin
New submission from John Machin :
In a package, "import local1, local2" is not fixed. Here's some real
live 2to3 output showing the problem and the workaround:
import ExcelFormulaParser, ExcelFormulaLexer
-import ExcelFormulaParser
-import ExcelFormulaLexer
+from . import Exc
John Machin added the comment:
TWO POINTS:
(1) I am not very concerned about chars like \x9d which are not valid in
the declared encoding; I am more concerned with chars like \x93 and \x94
which *ARE* valid in the declared encoding. Please ensure that these
cases are included in tests.
(2
John Machin added the comment:
(1) what am I supposed to infer from "Yup"?? That all of that \x9d stuff
was a mistake?
(2)
+def tearDown(self):
+pyc_file = os.path.join(os.path.dirname(__file__), 'cp1252.pyc')
+if os.path.exists(pyc_file):
+
Changes by John Machin :
--
nosy: +sjmachin
___
Python tracker
<http://bugs.python.org/issue4626>
___
___
Python-bugs-list mailing list
Unsubscribe:
John Machin added the comment:
Martin:"""Considering this note, the simple titlecase of U+01C5 *is*
U+01C4: the titlecase value is omitted, hence it is the same as
uppercase, hence it is U+01C4."""
Perhaps we are looking at different files; in the Unicode 5.1
Uni
New submission from John Machin :
Docs say """The default encoding is platform dependent""" but don't say
how to find out what that is, or how it is determined. On my Windows XP
SP3 setup, the default is cp1252, but the best/only guess at finding out
witho
New submission from John Machin :
import xml.etree.ElementTree as et
node = et.Element('x')
node.append(not_an_Element_instance)
2.7 and 3.2 produce no complaint at all.
2.6 and 3.1 produce an AssertionError.
However cElementTree in all 4 versions produces a TypeError.
Please fix 2
New submission from John Machin :
Expected behaviour illustrated using "C":
>>> import re
>>> re.findall(r'[\C]', 'CCC')
['C', 'C', 'C']
>>> re.compile(r'[\C]', 128)
literal 67
<_sre.SRE_Patte
John Machin added the comment:
@ezio: Of course the context is "inside a character class".
I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that
is the treatment applied to all other C-like control char escapes (2) the docs
say
John Machin added the comment:
@Ezio: Comparison of the behaviour of \letter inside/outside character classes
is irrelevant. The rules for inside can be expressed simply as:
1. Letters dDsSwW are special; they represent categories as documented, and do
in fact have a similar meaning outside
John Machin added the comment:
Whoops: "normal Python rules for backslash escapes" should have had a note "but
revert to the C behaviour of stripping the \ from unrecognised escapes" which
is what re appears to do i
John Machin added the comment:
Problem is memory leak from repeated calls of e.g.
compiled_pattern.search(some_text). Task Manager performance panel shows
increasing memory usage with regex but not with re. It appears to be
cumulative i.e. changing to another pattern or text doesn't re
John Machin added the comment:
Adding to vbr's report: [2.6.2, Win XP SP3] (1) bug mallocs memory
inside loop (2) also happens to regex.findall with patterns 'a{0,0}' and
'\B' (3) regex.sub('', 'x', 'abcde') has
John Machin added the comment:
What is the expected timing comparison with re? Running the Aug10#3
version on Win XP SP3 with Python 2.6.3, I see regex typically running
at only 20% to %50 of the speed of re in ASCII mode, with
not-very-atypical tests (find all Python identifiers in a line
John Machin added the comment:
Simplification of mark's first two problems:
Problem 1: looks like regex's negative look-head assertion is broken
>>> re.findall(r'(?!a)\w', 'abracadabra')
['b', 'r', 'c', 'd', 'b
John Machin added the comment:
Sorry, folks, we've got an understanding problem here. CSV files are
typically NOT created by text editors. They are created e.g. by "save as
csv" from a spreadsheet program, or as an output option by some database
query program. They can have
John Machin added the comment:
This is in effect a duplicate of issue 4847.
Summary:
The docs are CORRECT.
The 3.X implementation is WRONG.
The 2.X implementation is CORRECT.
See examples in my comment on issue 4847.
--
message_count: 3.0 -> 4.0
nosy: +sjmachin
nosy_count: 2.0 -&g
John Machin added the comment:
Before patching, could we discuss the requirements?
There are two different concepts:
(1) "text" file (assume that CR and/or LF are line terminators, and
provide methods for accessing a line at a time) versus "binary" file (no
such assumptions
John Machin added the comment:
... and it looks like Option 2 might already *almost* be in place.
Continuing with the previous example (book1.csv has embedded lone LFs):
C:\devel\csv>\python30\python -c "import csv;
print(repr(list(csv.reader(open('book1.csv',
John Machin added the comment:
pitrou> Please look at the doc for open() and io.TextIOWrapper. The
`newline` parameter defaults to None, which means universal newlines
with newline translation. Setting to '' (yes, the empty string) enables
universal newlines but disables newlin
John Machin added the comment:
The 2.6.1 documentation consists of a *single* line:
"distutils.command.bdist_msi — Build a Microsoft Installer binary
package". AFAICT this is the *only* mention of "msi" in the docs
(outside the msilib module). I heard about it only by word-o
John Machin added the comment:
About the E0 80 81 61 problem: my interpretation is that you are correct, the
80 is not valid in the current state (start byte == E0), so no look-ahead,
three FFFDs must be issued followed by 0061. I don't really care about issuing
too many FFFDs so long
51 matches
Mail list logo