Thanks Andre for so much info.
Yes we do have statistics for the requests in terms of how long they take:
As you see, 99.996% are less than 100ms, 0.002% are between 100ms and
200ms, another 0.002% are greater than 200ms. Max QPS is 2314.
You just remind me that timeout is may due to network latency. because
mobile App may have a worse network when they access to the system. We
will take Continue investigation on this issue.
Thanks again.
On 2018/3/28 星期三 PM 9:18, André Warnier (tomcat) wrote:
Hi.
This is a bit of a different discussion, which is why I am now marking
it OT.
I was quite impressed by the numbers you list below (and I still am,
nevertheless).
So my first impression was that I am totally incompetent to comment
further, because I have never dealt with such numbers, and I have no
idea of the kind of hardware/software architecture which is needed to
deal with that kind of thing.
But then I made some calculations, based on the number of servers
which you mentioned earlier, and that leads - at least on the surface
- to a quite different view :
100 servers * 24 cores = 2400 cores
42,368,982 requests per day
On average thus :
= 423,689 per server / day
= 17,653 per core / day
= 735 per core / hour
= 12 per core / minute
= 5 s / request
(and that does not look so exceptional anymore - except for your
servers budget)
Of course, I am sure that the kind of averages which I calculate above
are very rough, that the load on your systems is probably not evenly
spread, that not all the time of those 2400 cores is dedicated to this
application, and so on.
But as a starting point, it provides at least one observation :
You have 42,368,982 requests per day, of which 5619 fail with a
timeout, and that is 0,013% of the total.
And we know that the timeout of a HTTP client is normally in the range
of 3-5 minutes.
Yet we see above that, as a very rough average, the mean time for a
request is in the order of 5 seconds.
So at least on the surface, it would look like the requests that fail,
take at least approximately 36 times longer (3 min * 60 = 180 s,
divided by 5) than the average.
So are these 0,013% of requests really exceptional in how long they
take to complete ?
And if yes, do you know why ?
Do you have any statistics which classify the requests in terms of how
long they take ?
Like :
- between 1 and 5 seconds : n1
- between 6 and 15 seconds : n2
...
- more than nnn seconds : nx (subject to client timeout, so error in
the log)
On 28.03.2018 13:13, PANG J. wrote:
As shown below,
Last day total requests are 42,368,982, not all are successful, but
42,362,363 are right.
The failed requests are timeout.
Thanks.
On 2018/3/28 星期三 PM 6:37, André Warnier (tomcat) wrote:
On 28.03.2018 12:31, PANG J. wrote:
what the client I meant is mobile App.
mobile App gets the result from server via SDK.
Ok. But it is very likely that your "mobile app SDK", also has a
timeout after it sends
a request to a server. Or are you /sure/ that it waits forever ?
/Precisely what/ makes you think that it is a server-side timeout ?
in future we may move the computing task into App itself.
But currently they are running on server side.
thanks.
On 2018/3/28 星期三 PM 6:11, André Warnier (tomcat) wrote:
I believe that the timeout which Pang J. is mentioning, may be the
browser-side timeout,
which is fixed at the browser level at about 5 minutes or so.
When a browser sends a request to a server, and it does receive
/some/ response within
the next +-5 minutes, then the browser will drop the connection to
the server, and pop
up a message saying "sorry, the server appears not to respond.."
In other words, it is not a server timeout, it is a client timeout.
The only way to avoid this, is to insure that the server sends at
least /some/ temporary
response to the client (*), regularly, so that this browser
timeout does not occur.
Unfortunately, that is a bit more complicated to set up, than just
some parameter
somewhere.
But there must be plenty of past discussions of this issue already
on the www, and
solution guidelines.