[issue37254] POST large file to server (using http.server.CGIHTTPRequestHandler), always reset by server.

2019-06-12 Thread shajianrui

New submission from shajianrui :

Windows 10, python 3.7 

I met a problem when using the http.server module. I set up a base server with 
class HTTPServer and CGIHTTPRequestHandler(Not using thread or fork) and tried 
to POST a large file (>2MB), then I find the server always reset the 
connection. In some very rare situation the post operation could be 
finished(Very slow) but the CGI script I'm posting to always show that an 
incomplete file is received(Called "incomplete file issue").

==First Try===

At first I think (Actually a misunderstanding but lead to a passable 
walkaround) that "self.rfile.read(nbytes) " at LINE 1199 is not blocking, so it 
finish receiving just before the POST operation finished. Then I modify the 
line like this below:

1198if self.command.lower() == "post" and nbytes > 0:
1199#data = self.rfile.read(nbytes) 【The original line, I 
comment out it.】
databuf = bytearray(nbytes)
datacount = 0
while datacount + 1 < nbytes:
buf = 
self.rfile.read(self.request.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)
#print("Get " + str(len(buf)) + " bytes.")
for i in range(len(buf)):
databuf[datacount] = buf[i]
datacount += 1
if datacount == nbytes:
#print("Done.")
break
data = bytes(databuf)   【Now get the data.】

In this modification I just try to repeatedly read 65536(Default number of 
socket) bytes from rfile until I get nbytes of bytes. Now it works well(Correct 
file received), and is much faster then the POSTing process when using the 
original http.server module(If "incomplete file issue" appear).

==Second Try==

However, now I know that there is no problem with "whether it is blocking" 
because "self.rfile.read()" should be blocked if the file is not POSTed 
completely. 

I check the tcp stream with wireshark and find that in the middle of the 
transfer, the recv window of server is always 256, so I think that the problem 
is at the variable "rbufsize", which is transfered to makefile() when the rfile 
of the XXXRequestHandler Object is created. At least it is the problem of the 
low speed. But I dont know whether it lead to the reset operation and the 
incomplete file issue.

I go back to the original version of the http.server module. Then I make a 
subclass of socketserver.StreamRequestHandler, override its setup() 
method(firstly I copy the codes of setup() from StreamRequestHandler, and 
modify Line770)(770 is the line number in socketserver module, but I create the 
new subclass in a new file.):

770 #self.rfile = self.connection.makefile('rb', self.rbufsize)
self.rfile = self.connection.makefile('rb', 65536)

Then the POST process become much faster(Then my first modification)!

But the server print Error:

File 
"c:\Users\Administrator\Desktop\cgi-server-test\modified_http_server_bad.py", 
line 1204, in run_cgi【A copy of http.server module】
while select.select([self.rfile._sock], [], [], 0)[0]:  【at 
line 1204】
AttributeError: '_io.BufferedReader' object has no attribute '_sock'

Because I know it want to get the socket of the current RequestHandler, I just 
modify http.server module and change "self.rfile._sock" into 
"self.connection"(I dont know if it would cause problem, it is just a 
walkaround). 

OK, It now work well again. The CGI script can get the correct file(return the 
correct SHA1 of the file uploaded), and the POST process is REALLY MUCH FASTER!

= Question =

So here is the problem:
1- What cause the server resetting the connection? Seem it is because the 
default buffer size of the rfile is too small.
2- What cause the cgi script getting the incomplete file? I really have no idea 
about it. Seems this problem also disappear if I enlarge the buffer.

Other information:
1- The "incomplete file issue" usually appear at the first POST to the server, 
and almost all of the other POST connections are reset.
2- If the server start resetting connections, another "incomplete file issue" 
will never appear anymore (Actually it happen, but Chrome only show a RESET 
page, see 4- below.).
3- If the server start resetting connections, it take a long time to terminate 
the server with Ctrl+C.
4- When the connection is reset, the response printed by the cgi script is 
received correctly and it show that cgi script receive an incomplete file, the 
byte count is much fewer than correct number.(I use Chrome to do the POST, so 
it just show a reset message and the real response is ignored)

Please help.

--

[issue37301] CGIHTTPServer doesn't handle long POST requests

2019-06-16 Thread shajianrui


shajianrui  added the comment:

I have the same problem, and use a similar walk-around:
1. I set the rbufsize to -1
2. I use self.connection.recv instead of self.rfile.read(), like this:
while select.select([self.connection], [], [], 0)[0]:
if not self.connection.recv(1):

However, when I go through the code, I find that at line 967 in server.py(I am 
using python 3.7), there is a comment:
# Make rfile unbuffered -- we need to read one line and then pass
# the rest to a subprocess, so we can't use buffered input.
rbufsize = 0

Seems for some reasons the author set rfile unbuffered, and I know nothing 
about it.

If you know much about it, please give me some hints, thank you.

--
nosy: +shajianrui

___
Python tracker 
<https://bugs.python.org/issue37301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37301] CGIHTTPServer doesn't handle long POST requests

2019-06-18 Thread shajianrui


shajianrui  added the comment:

@vsbogd,Thank you for your reply. But I find another problem.

I derive a subclass from sockerserver.StreamRequestHandler, and set the 
rbufsize to 0(As CGIHTTPRequestHandler do), like this demo below:

testserver.py:
import socketserver
class TestRequestHandler(socketserver.StreamRequestHandler):
rbufsize = 0  ###simulate CGIHTTPRequestHandler
def handle(self):
while True:
data = self.rfile.read(65536*1024) ###client should 
send 65536*1024 bytes.
print(len(data))
if len(data) == 0:
print("Connection closed.")
break
s = socketserver.TCPServer(("0.0.0.0", 8001), 
TestRequestHandler)
s.serve_forever()

testclient.py:
import socket
data = bytearray(65536*1024)
for i in range(65536*1024):
data[i] = 64#Whatever you set.
c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
c.connect(("127.0.0.1", 8001))
c.send(data)

The testserver.py can get the whole 65536*1024 data in every "data = 
self.rfile.read(65536*1024)" line. The normal output of testserver.py is:

testserver.py output:
67108864
0
Connection closed.
67108864
0
Connection closed.

In other words, this problem of "rfile.read(nbytes)" cannot be reproduce in 
this demo.

I dont know why, it seems this is not only the problem of the 
"rfile.read(nbytes)". I guess the CGIHTTPRequestHandler actually do something 
that make the "rfile.read(nbytes)" perform weirdly. However, I fail to find 
such a line in the code.

--

___
Python tracker 
<https://bugs.python.org/issue37301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37301] CGIHTTPServer doesn't handle long POST requests

2019-06-19 Thread shajianrui

shajianrui  added the comment:

Yes I reproduce this problem with a slight modification to your demo. 

Using your origianl version I fail to reproduce, maybe 2*65536 bytes size is 
too small.

I just change the size of data (in test_cgi_client.py ) to 8*65536, 16*65536 
and 32*65536, and the result is shown on the image I attach.

>From the image you can see
- From 1 to 7 the length received doesn't change: always
  195540 bytes.And the process of sending are really slow.
- At 8th test the server receive complete data. And the 
  sending is finished in a flash.(Maybe the socket suddenly
  enlarge the buffer?)
- From 9 to 10 seems the "buffer" become smaller and smaller.

However, in my demo(Post in my last message), the data can be up to 65536*1024 
bytes and 【seldom】 produce this problem.

I use "seldom" because I now confirm: If too many (more than 10) testclient.py 
are executed at the same time, the testserver.py will produce the problem too. 
Like this:

testserver.py output:
Connection closed.
67108864
0
Connection closed.
67108864
0
Connection closed.
67108864
0
Connection closed.
195640   # From here the problem show up.
42440740
2035240
9327940
13074300
35004
0
Connection closed.
67108864
0
Connection closed.

Seems this is a normal behavior of rfile.read() that it may not return as many 
bytes as we tell it to read.

Now I have a problem: Why the bytes returned from "rfile.read()" is so few when 
the rfile is in CGIHTTPRequestHandler?

--
Added file: https://bugs.python.org/file48428/image.PNG

___
Python tracker 
<https://bugs.python.org/issue37301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37301] CGIHTTPServer doesn't handle long POST requests

2019-06-22 Thread shajianrui


shajianrui  added the comment:

Sorry for replying so late, and thank you very much for your reply and 
explanation.

At first reply to you last post: I think at least in the non-unix environment, 
the CGIHTTPRequestHandler read the whole(expected) data from rfile and then 
transfer it to the CGI script.

And considering the code for Unix environment, I dont think to set the rbufsize 
to -1 is a good idea. I prefer a safer way: to do a read() loop and read until 
nbytes received. It is much slower but more compatible. Like this:

if self.command.lower() == "post" and nbytes > 0:
#data = self.rfile.read(nbytes) #Original 
code at line 1199
databuf = bytearray(nbytes)
datacount = 0
while datacount + 1 < nbytes:
buf = 
self.rfile.read(self.request.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF))   
 #You can set your number.
if len(buf) == 0:
print("Connection closed before nbytes reached.")
break
for i in range(len(buf)):
databuf[datacount] = buf[i]
datacount += 1
if datacount == nbytes:
break
data = bytes(databuf)

This code is only for explanation... Not for use...

--

___
Python tracker 
<https://bugs.python.org/issue37301>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com