[issue46337] urllib.parse: Allow more flexibility in schemes and URL resolution behavior
karl added the comment: Just to note that there is a maintained list of officially accepted schemes at IANA. https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml In addition there is a list of unofficial schemes on wikipedia https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue46337> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13294] http.server - HEAD request when no resource is defined.
New submission from karl : A very simple HTTP server #!/usr/bin/python3 import http.server from os import chdir # CONFIG ROOTPATH = '/Your/path/' PORT = 8000 # CODE def run(server_class=http.server.HTTPServer, server_handler=http.server.SimpleHTTPRequestHandler): server_address = ('', PORT) httpd = server_class(server_address, server_handler) httpd.serve_forever() class MyRequestHandler(http.server.SimpleHTTPRequestHandler): def do_GET(self): pass if __name__ == '__main__': chdir(ROOTPATH) print("server started on PORT: "+str(PORT)) run(server_handler=MyRequestHandler) Let's start the server. % python3 serveur1.py server started on PORT: 8000 And let's do a GET request with curl. % curl -v http://localhost:8000/ * About to connect() to localhost port 8000 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 8000 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 > OpenSSL/0.9.8r zlib/1.2.5 > Host: localhost:8000 > Accept: */* > * Empty reply from server * Connection #0 to host localhost left intact curl: (52) Empty reply from server * Closing connection #0 The server sends nothing because GET is not defined and I haven't defined anything in case of errors. So far so good. Now let's do a HEAD request on the same resource. % curl -vsI http://localhost:8000/ * About to connect() to localhost port 8000 (#0) * Trying ::1... Connection refused * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 8000 (#0) > HEAD / HTTP/1.1 > User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 > OpenSSL/0.9.8r zlib/1.2.5 > Host: localhost:8000 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK HTTP/1.0 200 OK < Server: SimpleHTTP/0.6 Python/3.1.2 Server: SimpleHTTP/0.6 Python/3.1.2 < Date: Sun, 30 Oct 2011 14:19:00 GMT Date: Sun, 30 Oct 2011 14:19:00 GMT < Content-type: text/html; charset=utf-8 Content-type: text/html; charset=utf-8 < Content-Length: 346 Content-Length: 346 < * Closing connection #0 The server shows in the log the request localhost - - [30/Oct/2011 10:19:00] "HEAD / HTTP/1.1" 200 - And is answering. I would suggest that the default behavior is to have something similar to the one for the GET aka nothing. Or to modify the library code that for any resources not yet defined. The server answers a code 403 Forbidden. I could submit a patch in the next few days. -- components: Library (Lib) messages: 146639 nosy: karlcow, orsenthil priority: normal severity: normal status: open title: http.server - HEAD request when no resource is defined. type: feature request versions: Python 3.1, Python 3.2 ___ Python tracker <http://bugs.python.org/issue13294> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13295] html5 template for Lib/http/server.py
New submission from karl : The code has a set of old HTML templates. Here is a patch to change it to very simple html5 templates. -- components: Library (Lib) files: server-html5.patch keywords: patch messages: 146641 nosy: karlcow, orsenthil priority: normal severity: normal status: open title: html5 template for Lib/http/server.py type: feature request versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file23554/server-html5.patch ___ Python tracker <http://bugs.python.org/issue13295> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13295] html5 template for Lib/http/server.py
karl added the comment: Ezio, Martin, HTML 3.2, HTML 4.01 are not outdated. They have stable specifications. That said their doctypes have not influence at all in browsers. The html5 doctype has been chosen because it was the minimal string of characters that put the browsers into strict mode rendering (See Quirks Mode in CSS). The W3C validator is the only tool implementing an SGML parser able to understand HTML 3.2 and HTML 4.01. Note also that the W3C validtor includes an html5 validator if the concern is the validity of the output. -- ___ Python tracker <http://bugs.python.org/issue13295> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13295] html5 template for Lib/http/server.py
karl added the comment: Yup. I doesn't bring anything except putting the output in line with the reality of browsers implementations. You may close it. I don't mind. -- ___ Python tracker <http://bugs.python.org/issue13295> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13294] http.server - HEAD request when no resource is defined.
karl added the comment: Eric, Two possible solutions to explore: Either the HEAD reports exactly the same thing than a GET without the body, because it is the role of the GET, but that means indeed adding support for the HEAD. or creating a catch-all answer for all unknown or not implemented methods with a "501 Method not implemented" response from the server. Right now the HEAD returns something :) I still need to propose a patch. Daily job get into the way :) -- ___ Python tracker <http://bugs.python.org/issue13294> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: @Luke did you have the opportunity to look at http://greenbytes.de/tech/webdav/rfc6265.html If there is something which doesn't match reality in that document that would be cool to have feedback about it. -- ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6098] xml.dom.minidom incorrectly claims DOM Level 3 conformance
karl added the comment: The source of 3.1/lib/python3.1/xml/dom/__init__.py is correct === minidom -- A simple implementation of the Level 1 DOM with namespace support added (based on the Level 2 specification) and other minor Level 2 functionality. === Even the level 2 implementation is partial. Some of the Level 3 implementation are based on a 9 April 2002 Working Draft. Comments like these ones are into the code. # Node interfaces from Level 3 (WD 9 April 2002) To note that there will be a need for a big code change in xml.dom to implement in the future webdomcore. Maybe a xml.dom.webdomcore would be welcome. http://www.w3.org/TR/domcore/ The request is valid. -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue6098> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5762] AttributeError: 'NoneType' object has no attribute 'replace'
karl added the comment: This following markup creates the mistake as described earlier in the comments This markup doesn't It returns When using this markup It outputs the right markup, So the mistake occurs really when xmlns="". I have checked and the following markup is a conformant markup according to the XML specification so xmlns="" or bar="" are conformant on the root element. XML Namespaces are defined in another specification. http://www.w3.org/TR/REC-xml-names/. In the section of Namespaces default http://www.w3.org/TR/REC-xml-names/#defaulting, The specification is clear. "The attribute value in a default namespace declaration MAY be empty. This has the same effect, within the scope of the declaration, of there being no default namespace." the proposed "if data:" earlier in the comment solves the issue. I have attached a unit testcase as required by Mark Lawrence (BreamoreBoy) -- keywords: +patch nosy: +karlcow Added file: http://bugs.python.org/file19239/test-minidom-xmlns.patch ___ Python tracker <http://bugs.python.org/issue5762> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: The rules for parsing and setting the cookies are different. Server should always produce strict cookies only. So the production rules are to be done accordingly to the specification. Adam Barth is working right now on an update of the "HTTP State Management Mechanism" specification. See http://tools.ietf.org/html/draft-ietf-httpstate-cookie The name production rules are still defined in RFC2696 What browsers ignores or not in characters depends from browsers to browsers. (IETF server is down right now, and I can't link to the appropriate section for parsing the values.) -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: Ah the server is back the rules for the User Agents are defined here http://tools.ietf.org/html/draft-ietf-httpstate-cookie#section-5 -- ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: John: Ah sorry, if I misunderstood. The bug seems to say that it is about the Cookie Name and legal chars for this cookie name. What I was trying to say is that the processing of the Cookie Name is different depending if you are a client or a server *and* that there is a specification being developed by Adam Barth (part of browser vendors) to obsolete RFC 2109. In the case of Server sending to the Client Set-Cookie: Name=Value The rules for production of the cookies must be strict. Always. aka the module is used for creating a cookie and indeed the "colon" character is forbidden. The "token" syntax for valid chars and invalid chars are defined now in RFC2696. It means that any US-ASCII characters EXCEPT those are authorized: control characters (octets 0-31) and DEL (octet 127) and, the following characters “(“, “)”, “<”, “>”, �...@”, “,”, “;”, “:”, “", “/”, “[“, “]”, “?”, “=”, “{“, “}”, the double quote character itself, US-ASCII SP (octet 32) or the tabulation (octet 9) Then if you use the Cookie Module for a client it is not anymore the same story. In the case of Client storing the value of the cookie sent by a server. See the section "5.2. The Set-Cookie Header", http://tools.ietf.org/html/draft-ietf-httpstate-cookie-20#section-5.2 quote: If the user agent does not ignore the Set-Cookie header field in its entirety, the user agent MUST parse the field-value of the Set-Cookie header field as a set-cookie-string (defined below). NOTE: The algorithm below is more permissive than the grammar in Section 4.1. For example, the algorithm strips leading and trailing whitespace from the cookie name and value (but maintains internal whitespace), whereas the grammar in Section 4.1 forbids whitespace in these positions. User agents use this algorithm so as to interoperate with servers that do not follow the recommendations in Section 4." /quote then the algorithm is described. Which means that what the server will parse will not be necessary what the server have generated. Section 5.4 says how the Cookie Header should be sent to the server with an algorithm for what will receive the server. John, do you think there is a missing algorithm for parsing the value of cookie header when sent by the client? -- ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: agreed. :) Then my question about parsing rules for libraries. Is interoperability a plus here. -- ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2193] Cookie Colon Name Bug
karl added the comment: @aclover see my comment http://bugs.python.org/issue2193#msg125423 Adam Barth is working for Google on Chrome. The RFC being written is made in cooperation with other browser developers. If you have comments about this RFC you are welcome to add comment on freenode at #whatwg. -- ___ Python tracker <http://bugs.python.org/issue2193> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18182] xml.dom.createElement() does not take implicit namespaces into account
karl added the comment: The current specification as of today documents https://dom.spec.whatwg.org/#dom-document-createelementns If you run this in the browser console, var nsdoc = 'http://foo.bar/zoo'; var xmldoc = document.implementation.createDocument(nsdoc, 'Zoo', null); var cpd = document.createElementNS(nsdoc, 'Compound'); var chimp = document.createElementNS(nsdoc, 'Chimp'); cpd.appendChild(chimp) xmldoc.documentElement.appendChild(cpd); /* serializing */ var docserializer = new XMLSerializer(); var flatxml = docserializer.serializeToString(xmldoc); flatxml you get: http://foo.bar/zoo";> but if you run this in the browser console, var nsdoc = 'http://foo.bar/zoo'; var xmldoc = document.implementation.createDocument(nsdoc, 'Zoo', null); var cpd = document.createElement('Compound'); var chimp = document.createElement('Chimp'); cpd.appendChild(chimp) xmldoc.documentElement.appendChild(cpd); /* serializing */ var docserializer = new XMLSerializer(); var flatxml = docserializer.serializeToString(xmldoc); flatxml you get: http://foo.bar/zoo";> http://www.w3.org/1999/xhtml";> which is a complete different beast. I don't think there is an issue here. And we can close this bug safely. -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue18182> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22377] %Z in strptime doesn't match EST and others
karl added the comment: I created a PR following the recommendations of p-ganssle https://github.com/python/cpython/pull/16507 -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue22377> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
karl added the comment: @zach.ware @r.david.murray So I was looking at that issue. There is a lot of work. I had a couple of questions, because there are different categories # Empty tests for existing functions. This seems to be straightforward as they would complete the module. Example: ```python def testGetAttributeNode(self): pass ``` https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/test/test_minidom.py#L412 which refers to: `GetAttributeNode` https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/xml/dom/minidom.py#L765-L768 https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/test/test_minidom.py#L285-L294 # Tests without any logical reference in the module. This is puzzling because I'm not sure which DOM feature they should be testing. For example: ``` def testGetAttrList(self): pass ``` https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/test/test_minidom.py#L383-L384 Or maybe this is just supposed to test Element.attributes returning a list of attributes, such as `NamedNodeMap [ def="ghi", jkl="mno"]` returned by a browser. ``` >>> import xml.dom.minidom >>> from xml.dom.minidom import parse, Node, Document, parseString >>> from xml.dom.minidom import getDOMImplementation >>> dom = parseString("") >>> el = dom.documentElement >>> el.setAttribute("def", "ghi") >>> el.setAttribute("jkl", "mno") >>> el.attributes ``` or is it supposed to test something like ``` >>> el.attributes.items() [('def', 'ghi'), ('jkl', 'mno')] ``` This is slightly confusing. And the missing docstrings are not making it easier. # Tests which do not really test the module(?) I think for example about this, which is testing that `del` is working, but it doesn't have anything to do with the DOM. ``` def testDeleteAttr(self): dom = Document() child = dom.appendChild(dom.createElement("abc")) self.confirm(len(child.attributes) == 0) child.setAttribute("def", "ghi") self.confirm(len(child.attributes) == 1) del child.attributes["def"] self.confirm(len(child.attributes) == 0) dom.unlink() ``` https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/test/test_minidom.py#L285-L294 Specifically when there is a function for it: `removeAttribute` https://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-6D6AC0F9 which is tested just below that test. https://github.com/python/cpython/blob/3e04cd268ee9a57f95dc78d8974b21a6fac3f666/Lib/test/test_minidom.py#L296-L305 so I guess these should be removed or do I miss something in the testing logic? # Missing docstrings. Both the testing module and the module lack a lot of docstrings. Would it be good to fix this too, probably in a separate commit. # DOM Level 2 So the module intent is to implement DOM Level 2. but does that make sense in the light of https://dom.spec.whatwg.org/ Should minidom tries to follow the current DOM spec? -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
karl added the comment: err… Errata on my previous comment. """ Simple implementation of the Level 1 DOM. Namespaces and other minor Level 2 features are also supported. """ https://github.com/python/cpython/blob/c65119d5bfded03f80a9805889391b66fa7bf551/Lib/xml/dom/minidom.py#L1-L3 https://www.w3.org/TR/REC-DOM-Level-1/ -- ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9004] datetime.utctimetuple() should not set tm_isdst flag to 0
karl added the comment: @gaurav The pull request https://github.com/python/cpython/pull/10870 has been closed in favor of https://github.com/python/cpython/pull/15773 which has already been merged. So we can probably close here. -- message_count: 7.0 -> 8.0 nosy: +karlcow nosy_count: 7.0 -> 8.0 pull_requests: +16145 pull_request: https://github.com/python/cpython/pull/15773 ___ Python tracker <https://bugs.python.org/issue9004> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1375011] http.cookies, Cookie.py: Improper handling of duplicate cookies
karl added the comment: Relevant spec https://tools.ietf.org/html/rfc6265 -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue1375011> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44423] copy2 / sendfile fails on linux with large file
New submission from karl : by copy a large file e.g. -rwxrwxr-x 1 1002 1001 5359338160 Feb 9 2019 xxx_file_xxx.mdx copy2 / sendfile / fastcopy fails with: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/usr/local/lib/python3.8/dist-packages/pybcpy/diff_bak_copy.py", line 212, in _init_copy_single shutil.copy2(f, dest_path) File "/usr/lib/python3.8/shutil.py", line 432, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/usr/lib/python3.8/shutil.py", line 272, in copyfile _fastcopy_sendfile(fsrc, fdst) File "/usr/lib/python3.8/shutil.py", line 169, in _fastcopy_sendfile raise err File "/usr/lib/python3.8/shutil.py", line 149, in _fastcopy_sendfile sent = os.sendfile(outfd, infd, offset, blocksize) OSError: [Errno 75] Value too large for defined data type: 'xxx_file_xxx.mdx' -> 'dest/xxx_file_xxx.mdx' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.8/dist-packages/pybcpy/__main__.py", line 433, in main_func() File "/usr/local/lib/python3.8/dist-packages/pybcpy/__main__.py", line 425, in main_func args.func(args) File "/usr/local/lib/python3.8/dist-packages/pybcpy/__main__.py", line 75, in cmd_init ) = dbak.init_backup_repo(tarmode=args.tar) File "/usr/local/lib/python3.8/dist-packages/pybcpy/diff_bak_copy.py", line 231, in init_backup_repo files = p.map(self._init_copy_single, ifiles) File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value OSError: [Errno 75] Value too large for defined data type: 'xxx_file_xxx.mdx' -> 'dest/xxx_file_xxx.mdx' reference to code: https://github.com/kr-g/pybcpy/blob/master/pybcpy/diff_bak_copy.py -- messages: 395862 nosy: kr-g priority: normal severity: normal status: open title: copy2 / sendfile fails on linux with large file type: crash versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue44423> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44423] copy2 / sendfile fails on linux with large file
karl added the comment: could not reproduce the error -- stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue44423> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25479] Increase unit test coverage for abc.py
Change by karl : -- keywords: +patch nosy: +karlcow nosy_count: 2.0 -> 3.0 pull_requests: +22875 pull_request: https://github.com/python/cpython/pull/24034 ___ Python tracker <https://bugs.python.org/issue25479> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25479] Increase unit test coverage for abc.py
karl added the comment: @iritkatriel Github PR done. https://github.com/python/cpython/pull/24034 -- ___ Python tracker <https://bugs.python.org/issue25479> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4643] cgitb.html fails if getattr call raises exception
Change by karl : -- nosy: +karlcow nosy_count: 4.0 -> 5.0 pull_requests: +22878 stage: test needed -> patch review pull_request: https://github.com/python/cpython/pull/24038 ___ Python tracker <https://bugs.python.org/issue4643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4643] cgitb.html fails if getattr call raises exception
karl added the comment: Converted into GitHub PR https://github.com/python/cpython/pull/24038 -- ___ Python tracker <https://bugs.python.org/issue4643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4643] cgitb.html fails if getattr call raises exception
karl added the comment: > The getattr call here has a default value, so it should not raise > AttributeError. It should also not raise any other exception because a valid > implementation of __getattr__ should raise only AttributeError: but probably the intent of the patch here is to surface a meaningful error with regards to the script, more than an error on the cgitb python lib itself as this is a traceback tool. A bit like an HTML validator which continue to process things to give some kind of meaning. Diving into previous issues about scanvars https://bugs.python.org/issue966992 Similar https://bugs.python.org/issue1047397 The last comment is basically this issue here. There is a patch which is lot better for this bug and which addresses the issues here. https://github.com/python/cpython/pull/15094 It has not been merged yet. Not sure why or if there is anything missing. Probably this bug could be closed as a duplicate of https://bugs.python.org/issue966992 And someone needs to push to push the latest bits for https://github.com/python/cpython/pull/15094 What do you think @iritkatriel ? I will close my PR. -- ___ Python tracker <https://bugs.python.org/issue4643> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41748] HTMLParser: parsing error
karl added the comment: Ezio, TL,DR: Testing in browsers and adding two tests for this issue. Should I create a PR just for the tests? https://github.com/python/cpython/blame/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/test/test_htmlparser.py#L479-L485 A: comma without spaces --- Tests for browsers: data:text/html,text Serializations: * Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) * Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits)) * Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2)) Same serialization in these 3 rendering engines text Adding: def test_comma_between_unquoted_attributes(self): # bpo 41748 self._run_check('', [('starttag', 'div', [('class', 'bar,baz=asd')])]) ❯ ./python.exe -m test -v test_htmlparser … test_comma_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase) ... ok … Ran 47 tests in 0.168s OK == Tests result: SUCCESS == 1 test OK. Total duration: 369 ms Tests result: SUCCESS So this is working as expected for the first test. B: comma with spaces Tests for browsers: data:text/html,text Serializations: * Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) * Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits)) * Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2)) Same serialization in these 3 rendering engines text Adding def test_comma_with_space_between_unquoted_attributes(self): # bpo 41748 self._run_check('', [('starttag', 'div', [ ('class', 'bar'), (',baz', 'asd')])]) ❯ ./python.exe -m test -v test_htmlparser This is failing. == FAIL: test_comma_with_space_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase) -- Traceback (most recent call last): File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 493, in test_comma_with_space_between_unquoted_attributes self._run_check('', File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 95, in _run_check self.fail("received events did not match expected events" + AssertionError: received events did not match expected events Source: '' Expected: [('starttag', 'div', [('class', 'bar'), (',baz', 'asd')])] Received: [('data', '')] -- I started to look into the code of parser.py which I'm not familiar (yet) with. https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/html/parser.py#L42-L52 Do you have a suggestion to fix it? -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue41748> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41748] HTMLParser: comma in attribute values with/without space
Change by karl : -- title: HTMLParser: parsing error -> HTMLParser: comma in attribute values with/without space ___ Python tracker <https://bugs.python.org/issue41748> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41748] HTMLParser: comma in attribute values with/without space
karl added the comment: Ah! This is fixing it diff --git a/Lib/html/parser.py b/Lib/html/parser.py index 6083077981..790666 100644 --- a/Lib/html/parser.py +++ b/Lib/html/parser.py @@ -44,7 +44,7 @@ (?:\s*=+\s*# value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value - |(?!['"])[^>\s]* # bare value + |(?!['"])[^>]* # bare value ) (?:\s*,)* # possibly followed by a comma )?(?:\s|/(?!>))* Ran 48 tests in 0.175s OK == Tests result: SUCCESS == -- ___ Python tracker <https://bugs.python.org/issue41748> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41748] HTMLParser: comma in attribute values with/without space
Change by karl : -- keywords: +patch pull_requests: +22904 stage: test needed -> patch review pull_request: https://github.com/python/cpython/pull/24072 ___ Python tracker <https://bugs.python.org/issue41748> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28937] str.split(): allow removing empty strings (when sep is not None)
Change by karl : -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue28937> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42821] HTMLParser: subsequent duplicate attributes should be ignored
New submission from karl : This comes up while working on issue 41748 browser input data:text/html,text browser output text Actual HTMLParser output see https://github.com/python/cpython/pull/24072#discussion_r551158342 ('starttag', 'div', [('class', 'bar'), ('class', 'foo')])] Expected HTMLParser output ('starttag', 'div', [('class', 'bar')])] -- components: Library (Lib) messages: 384308 nosy: karlcow priority: normal severity: normal status: open title: HTMLParser: subsequent duplicate attributes should be ignored versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue42821> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25258] HtmlParser doesn't handle void element tags correctly
karl added the comment: The parsing rules for tokenization of html are at https://html.spec.whatwg.org/multipage/parsing.html#tokenization In the stack of open elements, there are specific rules for certain elements. https://html.spec.whatwg.org/multipage/parsing.html#special from a DOM point of view, there is indeed no difference in between https://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%3Cimg%20src%3D%22somewhere%22%3E%3Cimg%20src%3D%22somewhere%22%2F%3E -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue25258> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25258] HtmlParser doesn't handle void element tags correctly
karl added the comment: I wonder if the confusion comes from the name. The HTMLParser is kind of a tokenizer more than a full HTML parser, but that's probably a detail. It doesn't create a DOM Tree which you can access, but could help you to build a DOM Tree (!= DOM Document object) https://html.spec.whatwg.org/multipage/parsing.html#overview-of-the-parsing-model > Implementations that do not support scripting do not have to actually create > a DOM Document object, but the DOM tree in such cases is still used as the > model for the rest of the specification. -- ___ Python tracker <https://bugs.python.org/issue25258> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
karl added the comment: @zach.ware @r.david.murray I'm going through the source currently. I see that the test file is using: class MinidomTest(unittest.TestCase): def confirm(self, test, testname = "Test"): self.assertTrue(test, testname) Is there a specific reason to use this form instead of just directly using self.assertEqual or similar forms for new tests or reorganizing some of the tests. I see that it is used for example for giving a testname but def testAAA(self): dom = parseString("") el = dom.documentElement el.setAttribute("spam", "jam2") self.confirm(el.toxml() == '', "testAAA") testAAA is not specifically helping. :) -- ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
karl added the comment: These methods are not used anywhere in the code. https://github.com/python/cpython/blob/5c30145afb6053998e3518befff638d207047f00/Lib/xml/dom/minidom.py#L71-L80 What was the purpose when they were created… hmm maybe blame would give clue. Ah they were added a long time ago https://github.com/python/cpython/commit/73678dac48e5858e40cba6d526970cba7e7c769c#diff-365c30899ded02b18a2d8f92de47af6ca213eefe7883064c8723598da600ea42R83-R88 but never used? or was it in the spirit to reserve the keyword for future use? https://developer.mozilla.org/en-US/docs/Web/API/Node/firstChild -- ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
karl added the comment: Ah no. They ARE used through defproperty and minicompat.py get = getattr(klass, ("_get_" + name)) -- ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19683] test_minidom has many empty tests
Change by karl : -- pull_requests: +22980 stage: needs patch -> patch review pull_request: https://github.com/python/cpython/pull/24152 ___ Python tracker <https://bugs.python.org/issue19683> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41748] HTMLParser: comma in attribute values with/without space
karl added the comment: Status: The PR should be ready and completed https://github.com/python/cpython/pull/24072 and eventually be merged at a point. Thanks to ezio.melotti for the wonderful guidance. -- ___ Python tracker <https://bugs.python.org/issue41748> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36661] Missing dataclass decorator import in dataclasses module docs
karl added the comment: This should be closed. The PR has been merged and the doc is now up to date. -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue36661> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40236] datetime.datetime.strptime get day error
karl added the comment: Same on macOS 10.15.6 (19G73) Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34) [Clang 6.0 (clang-600.0.57)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import datetime >>> datetime.datetime.strptime("2024-0-3 00:00:00", "%Y-%W-%w %H:%M:%S") datetime.datetime(2024, 1, 3, 0, 0) >>> datetime.datetime.strptime("2024-1-3 00:00:00", "%Y-%W-%w %H:%M:%S") datetime.datetime(2024, 1, 3, 0, 0) Also https://pubs.opengroup.org/onlinepubs/007908799/xsh/strptime.html note that iso8601 doesn't have this issue. %V - ISO 8601 week of the year as a decimal number [01, 53]. https://en.wikipedia.org/wiki/ISO_week_date -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue40236> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40236] datetime.datetime.strptime get day error
karl added the comment: Also this. >>> import datetime >>> d0 = datetime.datetime.strptime("2024-0-3 00:00:00", "%Y-%W-%w %H:%M:%S") >>> d0.strftime("%Y-%W-%w %H:%M:%S") '2024-01-3 00:00:00' >>> d1 = datetime.datetime.strptime("2024-1-3 00:00:00", "%Y-%W-%w %H:%M:%S") >>> d1.strftime("%Y-%W-%w %H:%M:%S") '2024-01-3 00:00:00' >>> d2301 = datetime.datetime.strptime("2023-0-1 00:00:00", "%Y-%W-%w %H:%M:%S") >>> d2311 = datetime.datetime.strptime("2023-1-1 00:00:00", "%Y-%W-%w %H:%M:%S") >>> d2301 datetime.datetime(2022, 12, 26, 0, 0) >>> d2311 datetime.datetime(2023, 1, 2, 0, 0) >>> d2311.strftime("%Y-%W-%w %H:%M:%S") '2023-01-1 00:00:00' >>> d2301.strftime("%Y-%W-%w %H:%M:%S") '2022-52-1 00:00:00' Week 0 2023 became Week 52 2022 (which is correct but might lead to surprises) -- ___ Python tracker <https://bugs.python.org/issue40236> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42104] xml.etree should support contains() function
New submission from karl : In XPath 1.0 The function contains() is available > Function: boolean contains(string, string) > The contains function returns true if the first argument string contains the > second argument string, and otherwise returns false. In https://www.w3.org/TR/1999/REC-xpath-19991116/#function-contains ``` One attribute: doc Two Attributes: doc test One Attribute: test ``` Currently, we can do this ``` >>> from lxml import etree >>> root = etree.fromstring(""" ...One attribute ...Two Attributes: doc test ...Two Attributes: doc2 test ... ... """) >>> elts = root.xpath("//p[@class='doc']") >>> elts, etree.tostring(elts[0]) ([], b'One attribute\n ') ``` One way of extracting the list of 2 elements which contains the attribute doc with XPath is: ``` >>> root.xpath("//p[contains(@class, 'doc')]") [, ] >>> [etree.tostring(elt) for elt in root.xpath("//p[contains(@class, 'doc')]")] [b'One attribute: doc\n ', b'Two Attributes: doc test\n '] ``` There is no easy way to extract all elements containing a "doc" value in a multi-values attribute in python 3.10 with xml.etree, which is quite common in html. ``` >>> import xml.etree.ElementTree as ET >>> root = ET.fromstring(""" ...One attribute: doc ...Two Attributes: doc test ...One Attribute: test ... """ ... ) >>> root.xpath("//p[contains(@class, 'doc')]") Traceback (most recent call last): File "", line 1, in AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'xpath' ``` -- components: Library (Lib) messages: 379185 nosy: karlcow priority: normal severity: normal status: open title: xml.etree should support contains() function type: enhancement versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue42104> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24772] Smaller viewport shifts the "expand left menu" character into the text
karl added the comment: I'm at Mozilla All Hands this week. I'll check if my solution still makes sense next week and will make a pull request and/or propose another solution. Thanks for the reminder. adding to my calendar. -- ___ Python tracker <https://bugs.python.org/issue24772> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24772] Smaller viewport shifts the "expand left menu" character into the text
karl added the comment: So I had time to look at it today. And it would probably be better to solve https://bugs.python.org/issue23312 which would make this one here useless and would actually provide a solution for many people. -- ___ Python tracker <https://bugs.python.org/issue24772> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23312] google thinks the docs are mobile unfriendly
karl added the comment: This issue should probably be addressed now on https://github.com/python/python-docs-theme -- nosy: +karlcow ___ Python tracker <https://bugs.python.org/issue23312> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23312] google thinks the docs are mobile unfriendly
karl added the comment: I created https://github.com/python/python-docs-theme/issues/30 -- ___ Python tracker <https://bugs.python.org/issue23312> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8136] urllib.unquote decodes percent-escapes with Latin-1
karl added the comment: #8143 was fixed. Python 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib >>> urllib.unquote(u"%CE%A3") u'\xce\xa3' What should become of this one? -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue8136> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15799] httplib client and statusline
New submission from karl: The current parsing of HTTP status line seems strange with regards to its definition in HTTP. http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l307 Currently the code is version, status, reason = line.split(None, 2) >>> status1 = "HTTP/1.1 200 OK" >>> status2 = "HTTP/1.1 200 " >>> status3 = "HTTP/1.1 200" >>> status1.split(None, 2) ['HTTP/1.1', '200', 'OK'] >>> status2.split(None, 2) ['HTTP/1.1', '200'] >>> status3.split(None, 2) ['HTTP/1.1', '200'] According to the production rules of HTTP/1.1 bis only status1 and status2 are valid. status-line = HTTP-version SP status-code SP reason-phrase CRLF — http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-20#section-3.1.2 with reason-phrase = *( HTAB / SP / VCHAR / obs-text ) aka 0 or more characters. I'm also not sure what are the expected ValueError with additional parsing rules which seems even more bogus. First modification should be >>> status1.split(' ', 2) ['HTTP/1.1', '200', 'OK'] >>> status2.split(' ', 2) ['HTTP/1.1', '200', ''] Which would be correct for the first two, with an empty reason-phrase The third one is still no good. >>> status3.split(' ', 2) ['HTTP/1.1', '200'] An additional check could be done with len(status.split(' ', 2)) == 3 Will return False in the third case. Do you want me to create a patch and a test for it? -- messages: 169293 nosy: karlcow priority: normal severity: normal status: open title: httplib client and statusline type: enhancement versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue15799> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15799] httplib client and statusline
karl added the comment: ok. status lines 1 and 2 are valid. the third one is invalid and should trigger a raise BadStatusLine(line) The code at line 318 is bogus as it will parse happily the third line without raising an exception. http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l318 -- ___ Python tracker <http://bugs.python.org/issue15799> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15799] httplib client and statusline
karl added the comment: Fair enough, it could be a warning when * more than one space in between http version and status code * if there is a missing space after the status code I'm not advocating for being strict only. I'm advocating for giving the tools to developer to assess that things are right and choose or not to ignore and having to avoid to patch the libraries or rewrite modules when you create code which needs to be strict for specifically validating responses and requests. :) ps: I haven't checked yet if the server counter part of httplib was strict in the production rule. -- ___ Python tracker <http://bugs.python.org/issue15799> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15799] httplib client and statusline
karl added the comment: So what do we do with it? Do I created a patch or do we close that bug? :) No hard feelings about it. -- ___ Python tracker <http://bugs.python.org/issue15799> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21325] Missing Generic EXIF library for images in the standard library
New submission from karl: There is a room for a consistent and good EXIF library for the Python Standard Library. -- components: Library (Lib) messages: 216978 nosy: karlcow priority: normal severity: normal status: open title: Missing Generic EXIF library for images in the standard library type: enhancement ___ Python tracker <http://bugs.python.org/issue21325> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
karl added the comment: Mark, The code is using urllib for demonstrating the issue with wikipedia and other sites which are blocking python-urllib user agents because it is used by many spam harvesters. The proposal is about giving a possibility in robotparser lib to add a feature for setting up the user-agent. -- ___ Python tracker <http://bugs.python.org/issue15851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
karl added the comment: Note that one of the proposal is to just document in https://docs.python.org/3/library/urllib.robotparser.html the proposal made in msg169722 (available in 3.4+) robotparser.URLopener.version = 'MyVersion' -- ___ Python tracker <http://bugs.python.org/issue15851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
karl added the comment: → python Python 2.7.5 (default, Mar 9 2014, 22:15:05) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import robotparser >>> rp = robotparser.RobotFileParser('http://somesite.test.site/robots.txt') >>> rp.read() >>> Let's check the server logs: 127.0.0.1 - - [23/Jun/2014:08:44:37 +0900] "GET /robots.txt HTTP/1.0" 200 92 "-" "Python-urllib/1.17" Robotparser by default was using in 2.* the Python-urllib/1.17 user agent which is traditionally blocked by many sysadmins. A solution has been already proposed above: This is the proposed test for 3.4 import urllib.robotparser import urllib.request opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'MyUa/0.1')] urllib.request.install_opener(opener) rp = urllib.robotparser.RobotFileParser('http://localhost:') rp.read() The issue is not anymore about changing the lib, but just about documenting on how to change the RobotFileParser default UA. We can change the title of this issue if it's confusing. Or close it and open a new one for documenting what makes it easier :) Currently robotparser.py imports urllib user agent. http://hg.python.org/cpython/file/7dc94337ef67/Lib/urllib/request.py#l364 It's a common failure we encounter when using urllib in general, including robotparser. As for wikipedia, they fixed their server side user agent sniffing, and do not filter anymore python-urllib. GET /robots.txt HTTP/1.1 Accept: */* Accept-Encoding: gzip, deflate, compress Host: en.wikipedia.org User-Agent: Python-urllib/1.17 HTTP/1.1 200 OK Accept-Ranges: bytes Age: 3161 Cache-control: s-maxage=3600, must-revalidate, max-age=0 Connection: keep-alive Content-Encoding: gzip Content-Length: 5208 Content-Type: text/plain; charset=utf-8 Date: Sun, 22 Jun 2014 23:59:16 GMT Last-modified: Tue, 26 Nov 2013 17:39:43 GMT Server: Apache Set-Cookie: GeoIP=JP:Tokyo:35.6850:139.7514:v4; Path=/; Domain=.wikipedia.org Vary: X-Subdomain Via: 1.1 varnish, 1.1 varnish, 1.1 varnish X-Article-ID: 19292575 X-Cache: cp1065 miss (0), cp4016 hit (1), cp4009 frontend hit (215) X-Content-Type-Options: nosniff X-Language: en X-Site: wikipedia X-Varnish: 2529666795, 2948866481 2948865637, 4134826198 4130750894 Many other sites still do. :) -- versions: +Python 3.4 -Python 3.5 ___ Python tracker <http://bugs.python.org/issue15851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12455] urllib2 forces title() on header names, breaking some requests
karl added the comment: Mark, I'm happy to followup. I will be in favor of removing any capitalization and not to change headers whatever they are. Because it doesn't matter per spec. Browsers do not care about the capitalization. And I haven't identified Web Compatibility issues regarding the capitalization. That said, it seems that Cal msg139512 had an issue, I would love to know which server/API had this behavior to fill a but at http://webcompat.com/ So… Where do we stand? Feature or removing anything which modifies the capitalization of headers? -- ___ Python tracker <http://bugs.python.org/issue12455> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: @Mark, yup, I can do that. I just realized that since my contribution there was a PSF Contributor Agreement. This is signed. I need to dive a bit again in the code to remember where things fail. -- ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15873] datetime: add ability to parse RFC 3339 dates and times
karl added the comment: I had the issue today. I needed to parse a date with the following format. 2014-04-04T23:59:00+09:00 and could not with strptime. I see a discussion in March 2014 http://code.activestate.com/lists/python-ideas/26883/ but no followup. For references: http://www.w3.org/TR/NOTE-datetime http://tools.ietf.org/html/rfc3339 -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue15873> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15873] datetime: add ability to parse RFC 3339 dates and times
karl added the comment: On closer inspection, Anders Hovmöller proposal doesn't work. https://github.com/boxed/iso8601 At least for the microseconds part. In http://tools.ietf.org/html/rfc3339#section-5.6, the microsecond part is defined as: time-secfrac= "." 1*DIGIT In http://www.w3.org/TR/NOTE-datetime, same thing: s= one or more digits representing a decimal fraction of a second Anders considers it to be only six digits. It can be more or it can be less. :) Will comment on github too. -- ___ Python tracker <http://bugs.python.org/issue15873> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15873] datetime: add ability to parse RFC 3339 dates and times
karl added the comment: Noticed some people doing the same thing https://github.com/tonyg/python-rfc3339 http://home.blarg.net/~steveha/pyfeed.html https://wiki.python.org/moin/WorkingWithTime -- ___ Python tracker <http://bugs.python.org/issue15873> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15873] datetime: add ability to parse RFC 3339 dates and times
karl added the comment: After inspections, the best library for parsing RFC3339 style date is definitely: https://github.com/tonyg/python-rfc3339/ Main code at https://github.com/tonyg/python-rfc3339/blob/master/rfc3339.py -- ___ Python tracker <http://bugs.python.org/issue15873> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: Ok this is an attempt at solving the issue with lowercase. I find my get_header a bit complicated, but if you have a better idea. :) I'll modify the patches. I have try to run the tests on the mac here but I have an issue currently. → ./python.exe -V Python 3.5.0a0 Traceback (most recent call last): File "./Tools/scripts/patchcheck.py", line 6, in import subprocess File "/Users/karl/code/cpython/Lib/subprocess.py", line 353, in import signal ImportError: No module named 'signal' make: *** [patchcheck] Error 1 -- Added file: http://bugs.python.org/file36695/issue-5550-3.patch ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
Changes by karl : Removed file: http://bugs.python.org/file36695/issue-5550-3.patch ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: And I had to do a typo in patch3. Submitting patch4. Sorry about that. -- Added file: http://bugs.python.org/file36698/issue-5550-4.patch ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)
karl added the comment: Just a follow up for giving the stable version of the now new RFC version for HTTP 1.1 HTTP header field names parsing http://tools.ietf.org/html/rfc7230#section-3.2.4 -- ___ Python tracker <http://bugs.python.org/issue17322> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: Ok my tests are ok. → ./python.exe -m unittest -v Lib/test/test_urllib2net.py test_close (Lib.test.test_urllib2net.CloseSocketTest) ... ok test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... FAIL test_file (Lib.test.test_urllib2net.OtherNetworkTests) ... test_ftp (Lib.test.test_urllib2net.OtherNetworkTests) ... skipped "Resource 'ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/00README-Legal-Rules-Regs' is not available" test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_redirect_url_withfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_sites_no_connection_close (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_urlwithfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_ftp_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok I wonder if test_custom_headers fails because of my modifications. -- ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22478] tests for urllib2net are in bad shapes
New submission from karl: → ./python.exe -V Python 3.4.2rc1+ → hg tip changeset: 92532:6dcc96fa3970 tag: tip parent: 92530:ad45c2707006 parent: 92531:8eb4eec8626c user:Benjamin Peterson date:Mon Sep 22 22:44:21 2014 -0400 summary: merge 3.4 (#22459) When working on issue #5550, I realized that some tests are currently failing. Here the log of running: → ./python.exe -m unittest -v Lib/test/test_urllib2net.py test_close (Lib.test.test_urllib2net.CloseSocketTest) ... ok test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... FAIL test_file (Lib.test.test_urllib2net.OtherNetworkTests) ... test_ftp (Lib.test.test_urllib2net.OtherNetworkTests) ... skipped "Resource 'ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/00README-Legal-Rules-Regs' is not available" test_redirect_url_withfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_sites_no_connection_close (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_urlwithfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok test_ftp_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_ftp_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok test_http_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok == ERROR: test_file (Lib.test.test_urllib2net.OtherNetworkTests) (url='file:/Users/karl/code/cpython/%40test_61795_tmp') -- Traceback (most recent call last): File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 243, in _test_urls f = urlopen(url, req, TIMEOUT) File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 33, in wrapped return _retry_thrice(func, exc, *args, **kwargs) File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 23, in _retry_thrice return func(*args, **kwargs) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 447, in open req = Request(fullurl, data) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 267, in __init__ origin_req_host = request_host(self) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 250, in request_host host = _cut_port_re.sub("", host, 1) TypeError: expected string or buffer == ERROR: test_file (Lib.test.test_urllib2net.OtherNetworkTests) (url=('file:///nonsensename/etc/passwd', None, )) ------ Traceback (most recent call last): File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 243, in _test_urls f = urlopen(url, req, TIMEOUT) File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 33, in wrapped return _retry_thrice(func, exc, *args, **kwargs) File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 23, in _retry_thrice return func(*args, **kwargs) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 447, in open req = Request(fullurl, data) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 267, in __init__ origin_req_host = request_host(self) File "/Users/karl/code/cpython/Lib/urllib/request.py", line 250, in request_host host = _cut_port_re.sub("", host, 1) TypeError: expected string or buffer == FAIL: test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) -- Traceback (most recent call last): File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 186, in test_custom_headers self.assertEqual(request.get_header('User-agent'), 'Test-Agent') AssertionError: 'Python-urllib/3.4' != 'Test-Agent' - Python-urllib/3.4 + Test-Agent -- Ran 16 tests in 124.879s FAILED (failures=1, errors=2, skipped=1) -- components: Tests messages: 227417 nosy: karlcow priority: normal severity: normal status: open title: tests for urllib2net are in bad shapes versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue22478> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: Opened issue #22478 for the tests failing. Not related to my modification. -- ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22478] tests for urllib2net are in bad shapes
karl added the comment: ok let's see → ./python.exe -m unittest -v Lib.test.test_urllib2net.OtherNetworkTests.test_custom_headers test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... FAIL == FAIL: test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) -- Traceback (most recent call last): File "/Users/karl/code/cpython/Lib/test/test_urllib2net.py", line 186, in test_custom_headers self.assertEqual(request.get_header('User-agent'), 'Test-Agent') AssertionError: 'Python-urllib/3.4' != 'Test-Agent' - Python-urllib/3.4 + Test-Agent -- Ran 1 test in 0.551s FAILED (failures=1) → ./python.exe Python 3.4.2rc1+ (3.4:8eb4eec8626c+, Sep 23 2014, 21:53:11) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.request >>> url = 'http://127.0.0.1/' >>> opener = urllib.request.build_opener() >>> request = urllib.request.Request(url) >>> request.header_items() [] >>> request.headers {} >>> request.add_header('User-Agent', 'Test-Agent') >>> request.headers {'User-agent': 'Test-Agent'} >>> request.header_items() [('User-agent', 'Test-Agent')] >>> opener.open(request) >>> request.get_header('User-agent'), 'Test-Agent' ('Test-Agent', 'Test-Agent') >>> request.header_items() [('User-agent', 'Test-Agent'), ('Host', '127.0.0.1')] >>> request.headers {'User-agent': 'Test-Agent'} OK so far so good. And my server recorded 127.0.0.1 - - [24/Sep/2014:17:07:41 +0900] "GET / HTTP/1.1" 200 9897 "-" "Test-Agent" Let's do it the way, the test has been designed. → ./python.exe Python 3.4.2rc1+ (3.4:8eb4eec8626c+, Sep 23 2014, 21:53:11) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.request >>> url = 'http://127.0.0.1/' >>> opener = urllib.request.build_opener() >>> request = urllib.request.Request(url) >>> request.header_items() [] >>> opener.open(request) >>> request.header_items() [('User-agent', 'Python-urllib/3.4'), ('Host', '127.0.0.1')] >>> request.has_header('User-agent') True >>> request.add_header('User-Agent', 'Test-Agent') >>> opener.open(request) >>> request.get_header('User-agent'), 'Test-Agent' ('Python-urllib/3.4', 'Test-Agent') >>> request.add_header('Foo', 'bar') >>> request.header_items() [('User-agent', 'Test-Agent'), ('Host', '127.0.0.1'), ('Foo', 'bar')] >>> opener.open(request) >>> request.header_items() [('User-agent', 'Test-Agent'), ('Host', '127.0.0.1'), ('Foo', 'bar')] >>> request.get_header('User-agent'), 'Test-Agent' ('Python-urllib/3.4', 'Test-Agent') >>> request.headers {'User-agent': 'Test-Agent', 'Foo': 'bar'} And the server recorded. 127.0.0.1 - - [24/Sep/2014:17:12:52 +0900] "GET / HTTP/1.1" 200 9897 "-" "Python-urllib/3.4" 127.0.0.1 - - [24/Sep/2014:17:12:52 +0900] "GET / HTTP/1.1" 200 9897 "-" "Python-urllib/3.4" 127.0.0.1 - - [24/Sep/2014:17:14:15 +0900] "GET / HTTP/1.1" 200 9897 "-" "Python-urllib/3.4" So it seems that User-Agent is immutable once it has been set the first time. Not in the same dictionary. >>> request.unredirected_hdrs {'User-agent': 'Python-urllib/3.4', 'Host': '127.0.0.1'} -- ___ Python tracker <http://bugs.python.org/issue22478> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22478] tests for urllib2net are in bad shapes
karl added the comment: Ah! the User-Agent (or anything which is in unredirected_hdrs) will not be updated if it has already been set once. https://hg.python.org/cpython/file/064f6baeb6bd/Lib/urllib/request.py#l1154 >>> headers = dict(request.unredirected_hdrs) >>> headers {'User-agent': 'Python-urllib/3.4', 'Host': '127.0.0.1'} >>> request.headers {'User-agent': 'Test-Agent', 'Foo': 'cool'} >>> headers.update(dict((k, v) for k, v in request.headers.items() if k not in >>> headers)) >>> headers {'User-agent': 'Python-urllib/3.4', 'Host': '127.0.0.1', 'Foo': 'cool'} -- ___ Python tracker <http://bugs.python.org/issue22478> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
karl added the comment: OK after fixing my repo (Thanks orsenthil) I got the tests running properly. The inspection order of the two dictionary was not right, so I had to modify a bit the patch. → ./python.exe -m unittest -v Lib.test.test_urllib2net.OtherNetworkTests.test_headers_case_sensitivity test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok -- Ran 1 test in 0.286s OK → ./python.exe -m unittest -v Lib.test.test_urllib2net.OtherNetworkTests.test_custom_headers test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... ok -- Ran 1 test in 0.575s OK New patch issue5550-5.patch unlinking issue5550-4.patch -- Added file: http://bugs.python.org/file36717/issue5550-5.patch ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case
Changes by karl : Removed file: http://bugs.python.org/file36698/issue-5550-4.patch ___ Python tracker <http://bugs.python.org/issue5550> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15799] httplib client and statusline
karl added the comment: Let's close this. >>> "HTTP/1.1301 ".split(None, 2) ['HTTP/1.1', '301'] >>> "HTTP/1.1301 ".split(' ', 2) ['HTTP/1.1', '', ' 301 '] I think it would be nice to have a way to warn without stopping, but the last comment from r.david.murray makes sense too. :) -- resolution: -> not a bug status: open -> closed ___ Python tracker <http://bugs.python.org/issue15799> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17319] http.server.BaseHTTPRequestHandler send_response_only doesn't check the type and value of the code.
karl added the comment: Where this is defined in the new RFC. http://tools.ietf.org/html/rfc7230#section-3.1.2 status-line = HTTP-version SP status-code SP reason-phrase CRLF Things to enforce status-code= 3DIGIT Response status code are now defined in http://tools.ietf.org/html/rfc7231#section-6 with something important. HTTP status codes are extensible. HTTP clients are not required to understand the meaning of all registered status codes, though such understanding is obviously desirable. However, a client MUST understand the class of any status code, as indicated by the first digit, and treat an unrecognized status code as being equivalent to the x00 status code of that class, with the exception that a recipient MUST NOT cache a response with an unrecognized status code. For example, if an unrecognized status code of 471 is received by a client, the client can assume that there was something wrong with its request and treat the response as if it had received a 400 (Bad Request) status code. The response message will usually contain a representation that explains the status. That should help. The full registry of status code is defined here http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml @dmi.baranov In the patch +def _is_valid_status_code(code): +return isinstance(code, int) and 0 <= code <= 999 Maybe there is a missing check where the len(str(code)) == 3 -- ___ Python tracker <http://bugs.python.org/issue17319> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18119] urllib.FancyURLopener does not treat URL fragments correctly
karl added the comment: This is the correct behavior GET http://example.com/foo with a response containing 302 and Location: /bar#test must trigger http://example.com/bar#test Location is defined in http://tools.ietf.org/html/rfc7231#section-7.1.2 7.1.2. Location The "Location" header field is used in some responses to refer to a specific resource in relation to the response. The type of relationship is defined by the combination of request method and status code semantics. Location = URI-reference The field value consists of a single URI-reference. When it has the form of a relative reference ([RFC3986], Section 4.2), the final value is computed by resolving it against the effective request URI ([RFC3986], Section 5). A bit after in the spec. If the Location value provided in a 3xx (Redirection) response does not have a fragment component, a user agent MUST process the redirection as if the value inherits the fragment component of the URI reference used to generate the request target (i.e., the redirection inherits the original reference's fragment, if any). For example, a GET request generated for the URI reference "http://www.example.org/~tim"; might result in a 303 (See Other) response containing the header field: Location: /People.html#tim which suggests that the user agent redirect to "http://www.example.org/People.html#tim"; Likewise, a GET request generated for the URI reference "http://www.example.org/index.html#larry"; might result in a 301 (Moved Permanently) response containing the header field: Location: http://www.example.net/index.html which suggests that the user agent redirect to "http://www.example.net/index.html#larry";, preserving the original fragment identifier. -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue18119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18119] urllib.FancyURLopener does not treat URL fragments correctly
karl added the comment: Takahashi-san, Ah sorry misunderstood which part your were talking about. I assume wrongly you were talking about navigation. Yes for the request which is sent to the server it should be http://tools.ietf.org/html/rfc7230#section-5.3.1 So refactoring your example. 1st request: GET /foo HTTP/1.1 Accept: text/html Host: example.com server response HTTP/1.1 302 Found Location: /bar#test second request must be GET /bar HTTP/1.1 Accept: text/html Host: example.com As for the navigation context is indeed part of the piece of code taking in charge the document after being parsed and not the one doing the HTTP request. (putting it here just that people understand) (to be tested) For server side receiving invalid Request-line http://tools.ietf.org/html/rfc7230#section-3.1.1 Recipients of an invalid request-line SHOULD respond with either a 400 (Bad Request) error or a 301 (Moved Permanently) redirect with the request-target properly encoded. A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain. -- ___ Python tracker <http://bugs.python.org/issue18119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18119] urllib.FancyURLopener does not treat URL fragments correctly
karl added the comment: In class urlopen_HttpTests https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l191 there is a test for invalid redirects def test_invalid_redirect(self): https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l247 And one for fragments def test_url_fragment(self): https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l205 which refers to http://bugs.python.org/issue11703 code in https://hg.python.org/cpython/file/d5688a94a56c/Lib/urllib.py -- ___ Python tracker <http://bugs.python.org/issue18119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18119] urllib.FancyURLopener does not treat URL fragments correctly
karl added the comment: OK I fixed the code. The issue is here https://hg.python.org/cpython/file/1e1c6e306eb4/Lib/urllib/request.py#l656 newurl = urlunparse(urlparts) Basically it reinjects the fragment in the new url. The fix is easy. if urlparts.fragment: urlparts = list(urlparts) urlparts[5] = "" newurl = urlunparse(urlparts) I was trying to make a test for it, but failed. Could someone help me for the test so I can complete the patch? Added the code patch only. -- keywords: +patch Added file: http://bugs.python.org/file36832/issue18119-code-only.patch ___ Python tracker <http://bugs.python.org/issue18119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11448] docs for HTTPConnection.set_tunnel are ambiguous
karl added the comment: ooops right, my bad. s/on port 8080. We first/on port 8080, we first/ better? -- ___ Python tracker <http://bugs.python.org/issue11448> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue747320] rfc2822 formatdate functionality duplication
karl added the comment: Eric, what do you recommend to move forward with this bug and patches? Need guidance. Do you have an example for "(A minor thing: I would use “attribute” instead of “variable” in the docstrings.)" Also which code base I should use? A lot of water has gone under the bridge in one year. :) -- ___ Python tracker <http://bugs.python.org/issue747320> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3791] bsddb not completely removed
karl added the comment: On the mac version there is an issue with the python version installed by default. Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/dbhash.py", line 5, in import bsddb File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/bsddb/__init__.py", line 51, in import _bsddb ImportError: No module named _bsddb -- components: +Macintosh -Windows nosy: +karlcow versions: +Python 2.5 -Python 3.0 ___ Python tracker <http://bugs.python.org/issue3791> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue747320] rfc2822 formatdate functionality duplication
karl added the comment: http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-22#section-7.1.1 quoting from HTTP 1.1 bis Prior to 1995, there were three different formats commonly used by servers to communicate timestamps. For compatibility with old implementations, all three are defined here. The preferred format is a fixed-length and single-zone subset of the date and time specification used by the Internet Message Format [RFC5322]. HTTP-date= IMF-fixdate / obs-date An example of the preferred format is Sun, 06 Nov 1994 08:49:37 GMT; IMF-fixdate Examples of the two obsolete formats are Sunday, 06-Nov-94 08:49:37 GMT ; obsolete RFC 850 format Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format A recipient that parses a timestamp value in an HTTP header field MUST accept all three formats. A sender MUST generate the IMF- fixdate format when sending an HTTP-date value in a header field. What http.server.BaseHTTPRequestHandler.date_time_string is currently doing >>> import time >>> timestamp = time.time() >>> weekdayname = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] >>> monthname = [None,'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun','Jul', 'Aug', >>> 'Sep', 'Oct', 'Nov', 'Dec'] >>> year, month, day, hh, mm, ss, wd, y, z = time.gmtime(timestamp) >>> s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (weekdayname[wd],day, >>> monthname[month], year,hh, mm, ss) >>> s 'Mon, 25 Feb 2013 19:26:34 GMT' what email.utils.formatdate is doing: >>> import email.utils >>> email.utils.formatdate(timeval=None,localtime=False, usegmt=True) 'Mon, 25 Feb 2013 19:40:04 GMT' >>> import time >>> ts = time.time() >>> email.utils.formatdate(timeval=ts,localtime=False, usegmt=True) 'Mon, 25 Feb 2013 19:51:50 GMT' I createad a patch s = email.utils.formatdate(timestamp, False, True) I didn't touch the log method which has a different format which is anyway not compatible with email.utils. -- keywords: +patch nosy: +karlcow Added file: http://bugs.python.org/file29240/server.patch ___ Python tracker <http://bugs.python.org/issue747320> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7370] BaseHTTPServer reinventing rfc822 date formatting
karl added the comment: I think it is now fixed by my patch in http://bugs.python.org/issue747320 -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue7370> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue747320] rfc2822 formatdate functionality duplication
karl added the comment: Made a mistake in the previous server.patch, use server.2.patch -- Added file: http://bugs.python.org/file29241/server2.patch ___ Python tracker <http://bugs.python.org/issue747320> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue747320] rfc2822 formatdate functionality duplication
Changes by karl : Removed file: http://bugs.python.org/file29240/server.patch ___ Python tracker <http://bugs.python.org/issue747320> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11448] docs for HTTPConnection.set_tunnel are ambiguous
karl added the comment: This is a possible additional example for set_tunnel, modification of python3.3/html/_sources/library/http.client.txt Hope it helps. -- nosy: +karlcow Added file: http://bugs.python.org/file29243/http.client.patch ___ Python tracker <http://bugs.python.org/issue11448> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17302] HTTP/2.0 - Implementations/Testing efforts
New submission from karl: Are there plans to develop an HTTP/2.0 library in parallel of the specification development? It will not be ready before years, but it would be good to have an evolving implementation. Or should it be done outside of python.org? Reference: https://github.com/http2 -- components: Library (Lib) messages: 183086 nosy: karlcow priority: normal severity: normal status: open title: HTTP/2.0 - Implementations/Testing efforts versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue17302> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17302] HTTP/2.0 - Implementations/Testing efforts
karl added the comment: agreed on HTTP/1.1, is there a plan to fix it too ;) because the current http.server seems to be untouchable without breaking stuff all around :) -- ___ Python tracker <http://bugs.python.org/issue17302> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12921] http.server.BaseHTTPRequestHandler.send_error and trailing newline
karl added the comment: Testing your code in Listing 1 → curl -sI http://localhost:9000/ HTTP/1.0 501 Unsupported method ('HEAD') Server: BaseHTTP/0.6 Python/3.3.0 Date: Tue, 26 Feb 2013 23:38:32 GMT Content-Type: text/html;charset=utf-8 Connection: close So this is normal, http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-22#section-6.6.2 except that it would be better to use "501 Not Implemented" through the prose is optional. The content-type is also kind of useless. That would deserve to open another bug. And → curl http://localhost:9000/ Server: BaseHTTP/0.6 Python/3.3.0 Date: Tue, 26 Feb 2013 23:39:46 GMT Content-Type: text/html;charset=utf-8 Connection: close http://www.w3.org/TR/html4/strict.dtd";> Error response Error response Error code: 500 Message: Traceback (most recent call last): File "server.py", line 9, in do_GET assert(False) AssertionError . Error code explanation: 500 - Server got itself in trouble. OK. The server is answering with HTTP/1.0 and then a Traceback… which has nothing to do here. We can see that in more details with a telnet → telnet localhost 9000 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET / HTTP/1.1 Host: localhost:9000 HTTP/1.0 500 Traceback (most recent call last): File "server.py", line 9, in do_GET assert(False) AssertionError Server: BaseHTTP/0.6 Python/3.3.0 Date: Tue, 26 Feb 2013 23:49:04 GMT Content-Type: text/html;charset=utf-8 Connection: close http://www.w3.org/TR/html4/strict.dtd";> Error response Error response Error code: 500 Message: Traceback (most recent call last): File "server.py", line 9, in do_GET assert(False) AssertionError . Error code explanation: 500 - Server got itself in trouble. Note that when not sending the traceback with the following code #!/usr/bin/env python3.3 import http.server import traceback class httphandler(http.server.BaseHTTPRequestHandler): def do_GET(self): try: assert(False) except: self.send_error(500) if __name__=='__main__': addr=('',9000) http.server.HTTPServer(addr,httphandler).serve_forever() Everything is working well. → telnet localhost 9000 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET / HTTP/1.1 Host: localhost:9000 HTTP/1.0 500 Internal Server Error Server: BaseHTTP/0.6 Python/3.3.0 Date: Tue, 26 Feb 2013 23:51:46 GMT Content-Type: text/html;charset=utf-8 Connection: close http://www.w3.org/TR/html4/strict.dtd";> Error response Error response Error code: 500 Message: Internal Server Error. Error code explanation: 500 - Server got itself in trouble. Connection closed by foreign host. I'm looking at http://hg.python.org/cpython/file/3.3/Lib/http/server.py#l404 For the second part of your message. I don't think the two issues should be mixed. Maybe open another bug report. -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue12921> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12921] http.server.BaseHTTPRequestHandler.send_error and trailing newline
karl added the comment: OK I had understand a bit better. self.send_error(code, msg) is used for * The body * The HTTP header * and the log That's bad, very bad. I do not think it should be used for the HTTP Header at all. -- ___ Python tracker <http://bugs.python.org/issue12921> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12921] http.server.BaseHTTPRequestHandler.send_error and trailing newline
karl added the comment: ok I modify the code of server.py so that the server doesn't send the private message but the one which is already assigned by the library as it should. If there is a need for customization, there should be two separate variables, but which could lead to the same issues. After modifications this is what I get. → telnet localhost 9000 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET / HTTP/1.1 Host: localhost:9000 HTTP/1.0 500 Internal Server Error Server: BaseHTTP/0.6 Python/3.3.0 Date: Wed, 27 Feb 2013 00:21:21 GMT Content-Type: text/html;charset=utf-8 Connection: close http://www.w3.org/TR/html4/strict.dtd";> Error response Error response Error code: 500 Message: Traceback (most recent call last): File "server.py", line 11, in do_GET assert(False) AssertionError . Error code explanation: 500 - Server got itself in trouble. Connection closed by foreign host. I joined the patch: server.issue12921.patch -- keywords: +patch Added file: http://bugs.python.org/file29255/server.issue12921.patch ___ Python tracker <http://bugs.python.org/issue12921> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17302] HTTP/2.0 - Implementations/Testing efforts
karl added the comment: Read the thread. Thanks Antoine. Better understanding. I'm still discovering how the community is really working. Trying to fix a few things in the mean time http://bugs.python.org/issue12921 http://bugs.python.org/issue747320 http://bugs.python.org/issue11448 http://bugs.python.org/issue7370 (maybe this one is a duplicate of 747320) This one http://bugs.python.org/issue15799 which is still "open", make me thinks, that it might in the new class to have things for strict production rules, and some parsing rules with a warning mode, if the user cares. Thanks again for the context Antoine. Maybe we should close this bug as postponed? -- ___ Python tracker <http://bugs.python.org/issue17302> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17319] http.server.BaseHTTPRequestHandler send_response_only doesn't check the type and value of the code.
New submission from karl: def send_response_only(self, code, message=None): http://hg.python.org/cpython/file/3.3/Lib/http/server.py#l448 There is no type checking on code or if the code is appropriate. Let's take == #!/usr/bin/env python3.3 import http.server class HTTPHandler(http.server.BaseHTTPRequestHandler): "A very simple server" def do_GET(self): self.send_response(200) self.send_header('Content-type', 'text/plain') self.end_headers() self.wfile.write(bytes('Response body\n\n', 'latin1')) if __name__ == '__main__': addr = ('', 9000) http.server.HTTPServer(addr, HTTPHandler).serve_forever() = A request is working well. = → http GET localhost:9000 HTTP/1.0 200 OK Server: BaseHTTP/0.6 Python/3.3.0 Date: Thu, 28 Feb 2013 04:00:44 GMT Content-type: text/plain Response body = And the server log is 127.0.0.1 - - [27/Feb/2013 23:00:44] "GET / HTTP/1.1" 200 - Then let's try = #!/usr/bin/env python3.3 import http.server class HTTPHandler(http.server.BaseHTTPRequestHandler): "A very simple server" def do_GET(self): self.send_response(999) self.send_header('Content-type', 'text/plain') self.end_headers() self.wfile.write(bytes('Response body\n\n', 'latin1')) if __name__ == '__main__': addr = ('', 9000) http.server.HTTPServer(addr, HTTPHandler).serve_forever() = The response is = → http GET localhost:9000 HTTP/1.0 999 Server: BaseHTTP/0.6 Python/3.3.0 Date: Thu, 28 Feb 2013 03:55:54 GMT Content-type: text/plain Response body = and the log server is 127.0.0.1 - - [27/Feb/2013 22:55:12] "GET / HTTP/1.1" 999 - And finally = #!/usr/bin/env python3.3 import http.server class HTTPHandler(http.server.BaseHTTPRequestHandler): "A very simple server" def do_GET(self): self.send_response('foobar') self.send_header('Content-type', 'text/plain') self.end_headers() self.wfile.write(bytes('Response body\n\n', 'latin1')) if __name__ == '__main__': addr = ('', 9000) http.server.HTTPServer(addr, HTTPHandler).serve_forever() = The response is then = → http GET localhost:9000 HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: / = and the server log is 127.0.0.1 - - [27/Feb/2013 22:56:39] "GET / HTTP/1.1" foobar - Exception happened during processing of request from ('127.0.0.1', 53917) Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socketserver.py", line 306, in _handle_request_noblock self.process_request(request, client_address) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socketserver.py", line 332, in process_request self.finish_request(request, client_address) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socketserver.py", line 345, in finish_request self.RequestHandlerClass(request, client_address, self) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socketserver.py", line 666, in __init__ self.handle() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/server.py", line 400, in handle self.handle_one_request() File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/server.py", line 388, in handle_one_request method() File "../25/server.py", line 8, in do_GET self.send_response('foobar') File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/server.py", line 444, in send_response self.send_response_only(code, message) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/server.py", line 459, in send_response_only (self.protocol_version, code, message)).encode( TypeError: %d format: a number is required, not str Both error messages and server logs are not very helpful. Shall we fix it? What others think? I guess there should be test cases too? I'm happy to make unit test cases
[issue17307] HTTP PUT request Example
karl added the comment: Sentil: About the PUT/POST, I would say: A POST and PUT methods differs only by the intent of the enclosed representation. In the case of a POST, the target resource (URI) on the server decides what is the meaning of the enclosed representation in the POST request. In a PUT request, the enclosed representation is meant to replace the state of the target resource (URI) on the server. It is why PUT is idempotent. About the code: * http.client I would remove the following comment, because the term "file" is confusing in HTTP terms. # This will create a resource file with contents of BODY or I would say more exactly # This creates an HTTP message # with the content of BODY as the enclosed representation # for the resource http://localhost:8080/foobar >>> import http.client >>> BODY = "Some data" >>> conn = http.client.HTTPConnection("localhost", 8080) >>> conn.request("PUT", "/foobar", BODY) # The actual PUT Request >>> response = conn.getresponse() >>> print(resp.status, response.reason) Maybe it would be good to display the message which is sent so people can understand what goes on the wire. * urllib the client code for urllib doesn't create challenge, I would had a content-type but no hard feeling about it. On the other hand the server code makes me a bit uncomfortable. It sets people into believing that this is the way you should reply to a PUT which is not really the case. 1. If the resource was not existing and has been successfully created, the server MUST answer 204. 2. if the resource already exists and has been successfully replaced/modified, then the server SHOULD answer either 200 or 204 (depending on the design choice) There are plenty of other cases depending on the constraints. See for the details http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-22#section-4.3.4 If we keep the server code, I would be willing to have a note saying that it is not usable as-is in production code. Does that make sense? :) -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue17307> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12455] urllib2 forces title() on header names, breaking some requests
karl added the comment: Note that HTTP header fields are case-insensitive. See http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging#section-3.2 Each HTTP header field consists of a case-insensitive field name followed by a colon (":"), optional whitespace, and the field value. Basically the author of a request can set them to whatever he/she wants. But we should, IMHO, respect the author intent. It might happen that someone will choose a specific combination of casing to deal with broken servers and/or proxies. So a cycle of set/get/send should not modify at all the headers. -- nosy: +karlcow ___ Python tracker <http://bugs.python.org/issue12455> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17322] urllib.request add_header() currently allows trailing spaces
New submission from karl: For HTTP header field names parsing, see http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4 No whitespace is allowed between the header field-name and colon. In the past, differences in the handling of such whitespace have led to security vulnerabilities in request routing and response handling. A server MUST reject any received request message that contains whitespace between a header field-name and colon with a response code of 400 (Bad Request). A proxy MUST remove any such whitespace from a response message before forwarding the message downstream. In python3.3 currently >>> import urllib.request >>> req = urllib.request.Request('http://www.example.com/') >>> req.add_header('FoO ', 'Yeah') >>> req.header_items() [('Foo ', 'Yeah'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.example.com')] The space has not been removed. So we should fix that at least. This is a bug. I'm not familiar with the specific security issues mentioned in the spec. Note that many things can be done too: :/ >>> req.add_header('FoO \n blah', 'Yeah') >>> req.add_header('Foo:Bar\nFoo2', 'Yeah') >>> req.header_items() [('Foo:bar\nfoo2', 'Yeah'), ('Foo \n blah', 'Yeah'), ('Foo ', 'Yeah'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.example.com')] I will check for making a patch tomorrow. -- components: Library (Lib) messages: 183234 nosy: karlcow, orsenthil priority: normal severity: normal status: open title: urllib.request add_header() currently allows trailing spaces versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue17322> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)
Changes by karl : -- title: urllib.request add_header() currently allows trailing spaces -> urllib.request add_header() currently allows trailing spaces (and other weird stuff) ___ Python tracker <http://bugs.python.org/issue17322> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17322] urllib.request add_header() currently allows trailing spaces (and other weird stuff)
karl added the comment: Yet another one leading spaces :( >>> req = urllib.request.Request('http://www.example.com/') >>> req.header_items() [] >>> req.add_header(' Foo3', 'Ooops') >>> req.header_items() [(' foo3', 'Ooops')] >>> req.headers {' foo3': 'Ooops'} -- ___ Python tracker <http://bugs.python.org/issue17322> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com