Sven,
compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @
3.20GHz × 4, on linux ...
Conncurrent request: 8
$ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 7734 bytes
Concurrency Level: 8
Time taken for tests: 0.192 seconds
Complete requests: 10240
Failed requests: 0
Keep-Alive requests: 10143
Total transferred: 80658152 bytes
HTML transferred: 79196160 bytes
Requests per second: 53414.29 [#/sec] (mean)
Time per request: 0.150 [ms] (mean)
Time per request: 0.019 [ms] (mean, across all concurrent requests)
Transfer rate: 410871.30 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 0 0 0.2 0 3
Waiting: 0 0 0.2 0 3
Total: 0 0 0.2 0 3
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 0
95% 1
98% 1
99% 1
100% 3 (longest request)
And here with 1000 concurrent request ...
$ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests
Server Software:
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /
Document Length: 7734 bytes
Concurrency Level: 1000
Time taken for tests: 0.225 seconds
Complete requests: 10240
Failed requests: 0
Keep-Alive requests: 10232
Total transferred: 80660288 bytes
HTML transferred: 79196160 bytes
Requests per second: 45583.23 [#/sec] (mean)
Time per request: 21.938 [ms] (mean)
Time per request: 0.022 [ms] (mean, across all concurrent requests)
Transfer rate: 350642.85 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 3.3 0 23
Processing: 0 6 16.1 0 198
Waiting: 0 6 16.1 0 198
Total: 0 7 18.0 0 211
Percentage of the requests served within a certain time (ms)
50% 0
66% 2
75% 6
80% 10
90% 21
95% 32
98% 47
99% 108
100% 211 (longest request)
Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
Joachim,
On 15 Dec 2016, at 11:43, jtuc...@objektfabrik.de wrote:
Victor,
Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
If I tell you that my current estimate is that a Smalltalk image with Seaside
will not be able to handle more than 20 concurrent users, in many cases even
less.
Seriously? That is kinda a low number, I would expect more for each image.
Certainly it depends much on many things, but it is certainly very low for a
rough estimate, why you say that?
seriously, I think 20 is very optimistic for several reasons.
One, you want to be fast and responsive for every single user, so there is
absolutely no point in going too close to any limit. It's easy to lose users by
providing bad experience.
Second, in a CRUD Application, you mostly work a lot with DB queries. And you connect to
all kinds of stuff and do I/O. Some of these things simply block the VM. Even if that is
only for 0.3 seconds, you postpone processing for each "unaffected" user by
these 0.3 seconds, so this adds to significant delays in response time. And if you do
some heavy db operations, 0.3 seconds is not a terribly bad estimate. Add to that the
materialization and stuff within the Smalltalk image.
Seaside adapters usually start off green threads for each request. But there
are things that need to be serialized (like in a critical Block). So in
reality, users block each other way more often than you'd like.
So if you asked me to give a more realistic estimation, I'd correct myself down
to a number between 5 and probably a maximum of 10 users. Everything else means
you must use all those fancy tricks and tools people mention in this thread.
So what you absolutely need to do is start with an estimate of 5 concurrent
users per image and look for ways to distribute work among servers/images so
that these blocking situations are down to a minimum. If you find your software
works much better, congratulate yourself and stack up new machines more slowly
than initially estimated.
Before you turn around and say: Smalltalk is unsuitable for the web, let's take
a brief look at what concurrent users really means. Concurrent users are users
that request some processing from the server at they very same time (maybe
within an interval of 200-400msec). This is not the same as 5 people being
currently logged on to the server and requesting something sometimes. 5
concurrent users can be 20, 50, 100 users who are logged in at the same time.
Then there is this sad "share all vs. share nothing" argument. In Seaside you
keep all your objects alive (read from db and materialized) between web requests. IN
share nothing, you read everything back from disc/db whenever a request comes in. This
also takes time and ressources (and pssibly blocks the server for the blink of an eye or
two). You exchange RAM with CPU cycles and I/O. It is extremely hard to predict what
works better, and I guess nobody ever made A/B tests. It's all just theoretical bla bla
and guesses of what definitely must be better in one's world.
Why do I come up with this share everything stuff? Because it usually means
that each user that is logged on holds onto a load of objects on the server
side (session storage), like their user account, shopping card, settings, last
purchases, account information and whatnot. That's easily a list of a few
thousand objects (and be it only Proxies) that take up space and want to be
inspected by the garbage collector. So each connected user not only needs CPU
cycles whenever they send a request to the server, but also uses RAM. In our
case, this can easily be 5-10 MB of objects per user. Add to that the shadow
copies that your persistence mechanism needs for undo and stuff, and all the
data Seaside needs for Continuations etc, and each logged on users needs 15, 20
or more MB of object space. Connect ten users and you have 150-200 MB. That is
not a problem per se, but also means there is some hard limit, especially in a
32 bit world. You don't want your server to slow down because it cannot
allocate new memory or can't find contiguous slots for stuff and GCs all the
time.
To sum up, I think the number of influencing factors is way too high to really
give a good estimate. Our experience (based on our mix of computation and I/O)
says that 5 concurrent users per image is doable without negative impact on
other users. Some operations take so much time that you really need to move
them out of the front-facing image and distribute work to backend servers. More
than 5 is probably possible but chances are that there are operations that will
affect all users and with every additional user there is a growing chance that
you have 2 or more requesting the yery same operation within a very short
interval. This will make things worse and worse.
So I trust in you guys having lots of cool tools around and knowing loads of
tricks to wrench out much more power of a single Smalltalk image, but you also
need to take a look at your productivity and speed in creating new features and
fixing bugs. Sometimes throwing hardware at a problem like growth and starting
with a clever architecture to scale on multiple layers is just the perfect
thing to do. To me, handling 7 instead of 5 concurrent users is not such a big
win as long as we are not in a posotion where we have so many users that this
really matters. For sites like Amazon, Google, Facebook etc. saving 40% in
server cost by optimizing the software (investing a few man years) is
significant. I hope we'll soon change our mind about this question ;-)
So load balancing and services outsourced to backend servers are key to
scalability. This, btw, is not smalltalk specific (some people seem to think
you won't get these problems in Java or Ruby because they are made for the
web...).
Joachim
Everything you say, all your considerations, especially the last paragraph
is/are correct and I agree.
But some people will only remember the very low number you seem to be
suggesting (which is more of a worse case scenario, with Seaside+blocking/slow
connections to back end systems).
One the other hand, plain HTTP access to a Pharo image can be quite fast. Here is
quick & dirty benchmark I just did on one of our modern/big machines (inside an
LXD container, light load) using a single stock image on Linux.
$ pharo Pharo.image printVersion
[version] 4.0 #40626
$ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests
Server Software: Zinc
Server Hostname: 127.0.0.1
Server Port: 1701
Document Path: /bytes/32
Document Length: 32 bytes
Concurrency Level: 8
Time taken for tests: 1.945 seconds
Complete requests: 10240
Failed requests: 0
Keep-Alive requests: 10240
Total transferred: 2109440 bytes
HTML transferred: 327680 bytes
Requests per second: 5265.17 [#/sec] (mean)
Time per request: 1.519 [ms] (mean)
Time per request: 0.190 [ms] (mean, across all concurrent requests)
Transfer rate: 1059.20 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 2
Processing: 0 2 8.0 2 309
Waiting: 0 1 8.0 1 309
Total: 0 2 8.0 2 309
Percentage of the requests served within a certain time (ms)
50% 2
66% 2
75% 2
80% 2
90% 2
95% 3
98% 3
99% 3
100% 309 (longest request)
More than 5K req/s (10K requests, 8 concurrent clients).
Granted, this is only for just 32 bytes payload and the loopback network
interface. But this is the other end of the interval, the maximum speed.
A more realistic payload (7K HTML) gives the following:
$ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 1024 requests
Completed 2048 requests
Completed 3072 requests
Completed 4096 requests
Completed 5120 requests
Completed 6144 requests
Completed 7168 requests
Completed 8192 requests
Completed 9216 requests
Completed 10240 requests
Finished 10240 requests
Server Software: Zinc
Server Hostname: 127.0.0.1
Server Port: 1701
Document Path: /dw-bench
Document Length: 7734 bytes
Concurrency Level: 8
Time taken for tests: 7.874 seconds
Complete requests: 10240
Failed requests: 0
Keep-Alive requests: 10240
Total transferred: 80988160 bytes
HTML transferred: 79196160 bytes
Requests per second: 1300.46 [#/sec] (mean)
Time per request: 6.152 [ms] (mean)
Time per request: 0.769 [ms] (mean, across all concurrent requests)
Transfer rate: 10044.25 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 1 6 183.4 1 7874
Waiting: 1 6 183.4 1 7874
Total: 1 6 183.4 1 7874
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 1
99% 1
100% 7874 (longest request)
That is more than 1K req/s.
In both cases we are talking about sub 1ms req/resp cycles !
I think all commercial users of Pharo today know what is possible and what
needs to be done to achieve their goals. Pure speed might not be the main
consideration, ease/speed/joy of development and just being capable of solving
complex problems and offering compelling solutions to end users is probably
more important.
Sven