[issue37254] POST large file to server (using http.server.CGIHTTPRequestHandler), always reset by server.
New submission from shajianrui : Windows 10, python 3.7 I met a problem when using the http.server module. I set up a base server with class HTTPServer and CGIHTTPRequestHandler(Not using thread or fork) and tried to POST a large file (>2MB), then I find the server always reset the connection. In some very rare situation the post operation could be finished(Very slow) but the CGI script I'm posting to always show that an incomplete file is received(Called "incomplete file issue"). ==First Try=== At first I think (Actually a misunderstanding but lead to a passable walkaround) that "self.rfile.read(nbytes) " at LINE 1199 is not blocking, so it finish receiving just before the POST operation finished. Then I modify the line like this below: 1198if self.command.lower() == "post" and nbytes > 0: 1199#data = self.rfile.read(nbytes) 【The original line, I comment out it.】 databuf = bytearray(nbytes) datacount = 0 while datacount + 1 < nbytes: buf = self.rfile.read(self.request.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF) #print("Get " + str(len(buf)) + " bytes.") for i in range(len(buf)): databuf[datacount] = buf[i] datacount += 1 if datacount == nbytes: #print("Done.") break data = bytes(databuf) 【Now get the data.】 In this modification I just try to repeatedly read 65536(Default number of socket) bytes from rfile until I get nbytes of bytes. Now it works well(Correct file received), and is much faster then the POSTing process when using the original http.server module(If "incomplete file issue" appear). ==Second Try== However, now I know that there is no problem with "whether it is blocking" because "self.rfile.read()" should be blocked if the file is not POSTed completely. I check the tcp stream with wireshark and find that in the middle of the transfer, the recv window of server is always 256, so I think that the problem is at the variable "rbufsize", which is transfered to makefile() when the rfile of the XXXRequestHandler Object is created. At least it is the problem of the low speed. But I dont know whether it lead to the reset operation and the incomplete file issue. I go back to the original version of the http.server module. Then I make a subclass of socketserver.StreamRequestHandler, override its setup() method(firstly I copy the codes of setup() from StreamRequestHandler, and modify Line770)(770 is the line number in socketserver module, but I create the new subclass in a new file.): 770 #self.rfile = self.connection.makefile('rb', self.rbufsize) self.rfile = self.connection.makefile('rb', 65536) Then the POST process become much faster(Then my first modification)! But the server print Error: File "c:\Users\Administrator\Desktop\cgi-server-test\modified_http_server_bad.py", line 1204, in run_cgi【A copy of http.server module】 while select.select([self.rfile._sock], [], [], 0)[0]: 【at line 1204】 AttributeError: '_io.BufferedReader' object has no attribute '_sock' Because I know it want to get the socket of the current RequestHandler, I just modify http.server module and change "self.rfile._sock" into "self.connection"(I dont know if it would cause problem, it is just a walkaround). OK, It now work well again. The CGI script can get the correct file(return the correct SHA1 of the file uploaded), and the POST process is REALLY MUCH FASTER! = Question = So here is the problem: 1- What cause the server resetting the connection? Seem it is because the default buffer size of the rfile is too small. 2- What cause the cgi script getting the incomplete file? I really have no idea about it. Seems this problem also disappear if I enlarge the buffer. Other information: 1- The "incomplete file issue" usually appear at the first POST to the server, and almost all of the other POST connections are reset. 2- If the server start resetting connections, another "incomplete file issue" will never appear anymore (Actually it happen, but Chrome only show a RESET page, see 4- below.). 3- If the server start resetting connections, it take a long time to terminate the server with Ctrl+C. 4- When the connection is reset, the response printed by the cgi script is received correctly and it show that cgi script receive an incomplete file, the byte count is much fewer than correct number.(I use Chrome to do the POST, so it just show a reset message and the real response is ignored) Please help. --
[issue37301] CGIHTTPServer doesn't handle long POST requests
shajianrui added the comment: I have the same problem, and use a similar walk-around: 1. I set the rbufsize to -1 2. I use self.connection.recv instead of self.rfile.read(), like this: while select.select([self.connection], [], [], 0)[0]: if not self.connection.recv(1): However, when I go through the code, I find that at line 967 in server.py(I am using python 3.7), there is a comment: # Make rfile unbuffered -- we need to read one line and then pass # the rest to a subprocess, so we can't use buffered input. rbufsize = 0 Seems for some reasons the author set rfile unbuffered, and I know nothing about it. If you know much about it, please give me some hints, thank you. -- nosy: +shajianrui ___ Python tracker <https://bugs.python.org/issue37301> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37301] CGIHTTPServer doesn't handle long POST requests
shajianrui added the comment: @vsbogd,Thank you for your reply. But I find another problem. I derive a subclass from sockerserver.StreamRequestHandler, and set the rbufsize to 0(As CGIHTTPRequestHandler do), like this demo below: testserver.py: import socketserver class TestRequestHandler(socketserver.StreamRequestHandler): rbufsize = 0 ###simulate CGIHTTPRequestHandler def handle(self): while True: data = self.rfile.read(65536*1024) ###client should send 65536*1024 bytes. print(len(data)) if len(data) == 0: print("Connection closed.") break s = socketserver.TCPServer(("0.0.0.0", 8001), TestRequestHandler) s.serve_forever() testclient.py: import socket data = bytearray(65536*1024) for i in range(65536*1024): data[i] = 64#Whatever you set. c = socket.socket(socket.AF_INET, socket.SOCK_STREAM) c.connect(("127.0.0.1", 8001)) c.send(data) The testserver.py can get the whole 65536*1024 data in every "data = self.rfile.read(65536*1024)" line. The normal output of testserver.py is: testserver.py output: 67108864 0 Connection closed. 67108864 0 Connection closed. In other words, this problem of "rfile.read(nbytes)" cannot be reproduce in this demo. I dont know why, it seems this is not only the problem of the "rfile.read(nbytes)". I guess the CGIHTTPRequestHandler actually do something that make the "rfile.read(nbytes)" perform weirdly. However, I fail to find such a line in the code. -- ___ Python tracker <https://bugs.python.org/issue37301> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37301] CGIHTTPServer doesn't handle long POST requests
shajianrui added the comment: Yes I reproduce this problem with a slight modification to your demo. Using your origianl version I fail to reproduce, maybe 2*65536 bytes size is too small. I just change the size of data (in test_cgi_client.py ) to 8*65536, 16*65536 and 32*65536, and the result is shown on the image I attach. >From the image you can see - From 1 to 7 the length received doesn't change: always 195540 bytes.And the process of sending are really slow. - At 8th test the server receive complete data. And the sending is finished in a flash.(Maybe the socket suddenly enlarge the buffer?) - From 9 to 10 seems the "buffer" become smaller and smaller. However, in my demo(Post in my last message), the data can be up to 65536*1024 bytes and 【seldom】 produce this problem. I use "seldom" because I now confirm: If too many (more than 10) testclient.py are executed at the same time, the testserver.py will produce the problem too. Like this: testserver.py output: Connection closed. 67108864 0 Connection closed. 67108864 0 Connection closed. 67108864 0 Connection closed. 195640 # From here the problem show up. 42440740 2035240 9327940 13074300 35004 0 Connection closed. 67108864 0 Connection closed. Seems this is a normal behavior of rfile.read() that it may not return as many bytes as we tell it to read. Now I have a problem: Why the bytes returned from "rfile.read()" is so few when the rfile is in CGIHTTPRequestHandler? -- Added file: https://bugs.python.org/file48428/image.PNG ___ Python tracker <https://bugs.python.org/issue37301> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37301] CGIHTTPServer doesn't handle long POST requests
shajianrui added the comment: Sorry for replying so late, and thank you very much for your reply and explanation. At first reply to you last post: I think at least in the non-unix environment, the CGIHTTPRequestHandler read the whole(expected) data from rfile and then transfer it to the CGI script. And considering the code for Unix environment, I dont think to set the rbufsize to -1 is a good idea. I prefer a safer way: to do a read() loop and read until nbytes received. It is much slower but more compatible. Like this: if self.command.lower() == "post" and nbytes > 0: #data = self.rfile.read(nbytes) #Original code at line 1199 databuf = bytearray(nbytes) datacount = 0 while datacount + 1 < nbytes: buf = self.rfile.read(self.request.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)) #You can set your number. if len(buf) == 0: print("Connection closed before nbytes reached.") break for i in range(len(buf)): databuf[datacount] = buf[i] datacount += 1 if datacount == nbytes: break data = bytes(databuf) This code is only for explanation... Not for use... -- ___ Python tracker <https://bugs.python.org/issue37301> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com