Re: [Pharo-users] real world pharo web application set ups

Sven Van Caekenberghe Fri, 16 Dec 2016 01:07:14 -0800

I did not say we are the fastest, for from it. I absolutely do not want to go 
into a contest, there is no point in doing so.


(The dw-bench page was meant to be generated dynamically on each request 
without caching, did you do that too ?).

My point was: Pharo is good enough for most web applications. The rest of the 
challenge is standard software architecture, design and development. I choose 
to do that in Pharo because I like it so much. It is perfectly fine by me that 
99.xx % of the world makes other decisions, for whatever reason.

> On 16 Dec 2016, at 09:57, volkert <volk...@nivoba.de> wrote:
> 
> Sven,
> 
> compare with an erlang vm (Cowboy) on a standard pc, i5-4570 CPU @ 3.20GHz × 
> 4, on linux ...
> 
> Conncurrent request: 8
> 
> $ ab -k -c 8 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
> 
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
> 
> 
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
> 
> Document Path:          /
> Document Length:        7734 bytes
> 
> Concurrency Level:      8
> Time taken for tests:   0.192 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10143
> Total transferred:      80658152 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    53414.29 [#/sec] (mean)
> Time per request:       0.150 [ms] (mean)
> Time per request:       0.019 [ms] (mean, across all concurrent requests)
> Transfer rate:          410871.30 [Kbytes/sec] received
> 
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    0   0.0      0       0
> Processing:     0    0   0.2      0       3
> Waiting:        0    0   0.2      0       3
> Total:          0    0   0.2      0       3
> 
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      0
>  75%      0
>  80%      0
>  90%      0
>  95%      1
>  98%      1
>  99%      1
> 100%      3 (longest request)
> 
> 
> And here with 1000 concurrent request ...
> 
> $ab -k -c 1000 -n 10240 http://127.0.0.1:8080/
> This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
> 
> Benchmarking 127.0.0.1 (be patient)
> Completed 1024 requests
> Completed 2048 requests
> Completed 3072 requests
> Completed 4096 requests
> Completed 5120 requests
> Completed 6144 requests
> Completed 7168 requests
> Completed 8192 requests
> Completed 9216 requests
> Completed 10240 requests
> Finished 10240 requests
> 
> 
> Server Software:
> Server Hostname:        127.0.0.1
> Server Port:            8080
> 
> Document Path:          /
> Document Length:        7734 bytes
> 
> Concurrency Level:      1000
> Time taken for tests:   0.225 seconds
> Complete requests:      10240
> Failed requests:        0
> Keep-Alive requests:    10232
> Total transferred:      80660288 bytes
> HTML transferred:       79196160 bytes
> Requests per second:    45583.23 [#/sec] (mean)
> Time per request:       21.938 [ms] (mean)
> Time per request:       0.022 [ms] (mean, across all concurrent requests)
> Transfer rate:          350642.85 [Kbytes/sec] received
> 
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    1   3.3      0      23
> Processing:     0    6  16.1      0     198
> Waiting:        0    6  16.1      0     198
> Total:          0    7  18.0      0     211
> 
> Percentage of the requests served within a certain time (ms)
>  50%      0
>  66%      2
>  75%      6
>  80%     10
>  90%     21
>  95%     32
>  98%     47
>  99%    108
> 100%    211 (longest request)
> 
> 
> 
> Am 15.12.2016 um 15:00 schrieb Sven Van Caekenberghe:
>> Joachim,
>> 
>>> On 15 Dec 2016, at 11:43, jtuc...@objektfabrik.de wrote:
>>> 
>>> Victor,
>>> 
>>> Am 14.12.16 um 19:23 schrieb Vitor Medina Cruz:
>>>> If I tell you that my current estimate is that a Smalltalk image with 
>>>> Seaside will not be able to handle more than 20 concurrent users, in many 
>>>> cases even less.
>>>> 
>>>> Seriously? That is kinda a low number, I would expect more for each image. 
>>>> Certainly it depends much on many things, but it is certainly very low for 
>>>> a rough estimate, why you say that?
>>> seriously, I think 20 is very optimistic for several reasons.
>>> 
>>> One, you want to be fast and responsive for every single user, so there is 
>>> absolutely no point in going too close to any limit. It's easy to lose 
>>> users by providing bad experience.
>>> 
>>> Second, in a CRUD Application, you mostly work a lot with DB queries. And 
>>> you connect to all kinds of stuff and do I/O. Some of these things simply 
>>> block the VM. Even if that is only for 0.3 seconds, you postpone processing 
>>> for each "unaffected" user by these 0.3 seconds, so this adds to 
>>> significant delays in response time. And if you do some heavy db 
>>> operations, 0.3 seconds is not a terribly bad estimate. Add to that the 
>>> materialization and stuff within the Smalltalk image.
>>> 
>>> Seaside adapters usually start off green threads for each request. But 
>>> there are things that need to be serialized (like in a critical Block). So 
>>> in reality, users block each other way more often than you'd like.
>>> 
>>> So if you asked me to give a more realistic estimation, I'd correct myself 
>>> down to a number between 5 and probably a maximum of 10 users. Everything 
>>> else means you must use all those fancy tricks and tools people mention in 
>>> this thread.
>>> So what you absolutely need to do is start with an estimate of 5 concurrent 
>>> users per image and look for ways to distribute work among servers/images 
>>> so that these blocking situations are down to a minimum. If you find your 
>>> software works much better, congratulate yourself and stack up new machines 
>>> more slowly than initially estimated.
>>> 
>>> 
>>> Before you turn around and say: Smalltalk is unsuitable for the web, let's 
>>> take a brief look at what concurrent users really means. Concurrent users 
>>> are users that request some processing from the server at they very same 
>>> time (maybe within an interval of 200-400msec). This is not the same as 5 
>>> people being currently logged on to the server and requesting something 
>>> sometimes. 5 concurrent users can be 20, 50, 100 users who are logged in at 
>>> the same time.
>>> 
>>> Then there is this sad "share all vs. share nothing" argument. In Seaside 
>>> you keep all your objects alive (read from db and materialized) between web 
>>> requests. IN share nothing, you read everything back from disc/db whenever 
>>> a request comes in. This also takes time and ressources (and pssibly blocks 
>>> the server for the blink of an eye or two). You exchange RAM with CPU 
>>> cycles and I/O. It is extremely hard to predict what works better, and I 
>>> guess nobody ever made A/B tests. It's all just theoretical bla bla and 
>>> guesses of what definitely must be better in one's world.
>>> 
>>> Why do I come up with this share everything stuff? Because it usually means 
>>> that each user that is logged on holds onto a load of objects on the server 
>>> side (session storage), like their user account, shopping card, settings, 
>>> last purchases, account information and whatnot. That's easily a list of a 
>>> few thousand objects (and be it only Proxies) that take up space and want 
>>> to be inspected by the garbage collector. So each connected user not only 
>>> needs CPU cycles whenever they send a request to the server, but also uses 
>>> RAM. In our case, this can easily be 5-10 MB of objects per user. Add to 
>>> that the shadow copies that your persistence mechanism needs for undo and 
>>> stuff, and all the data Seaside needs for Continuations etc, and each 
>>> logged on users needs 15, 20 or more MB of object space. Connect ten users 
>>> and you have 150-200 MB. That is not a problem per se, but also means there 
>>> is some hard limit, especially in a 32 bit world. You don't want your 
>>> server to slow down because it cannot allocate new memory or can't find 
>>> contiguous slots for stuff and GCs all the time.
>>> 
>>> To sum up, I think the number of influencing factors is way too high to 
>>> really give a good estimate. Our experience (based on our mix of 
>>> computation and I/O) says that 5 concurrent users per image is doable 
>>> without negative impact on other users. Some operations take so much time 
>>> that you really need to move them out of the front-facing image and 
>>> distribute work to backend servers. More than 5 is probably possible but 
>>> chances are that there are operations that will affect all users and with 
>>> every additional user there is a growing chance that you have 2 or more 
>>> requesting the yery same operation within a very short interval. This will 
>>> make things worse and worse.
>>> 
>>> So I trust in you guys having lots of cool tools around and knowing loads 
>>> of tricks to wrench out much more power of a single Smalltalk image, but 
>>> you also need to take a look at your productivity and speed in creating new 
>>> features and fixing bugs. Sometimes throwing hardware at a problem like 
>>> growth and starting with a clever architecture to scale on multiple layers 
>>> is just the perfect thing to do. To me, handling 7 instead of 5 concurrent 
>>> users is not such a big win as long as we are not in a posotion where we 
>>> have so many users that this really matters. For sites like Amazon, Google, 
>>> Facebook etc. saving 40% in server cost by optimizing the software 
>>> (investing a few man years) is significant. I hope we'll soon change our 
>>> mind about this question ;-)
>>> 
>>> So load balancing and services outsourced to backend servers are key to 
>>> scalability. This, btw, is not smalltalk specific (some people seem to 
>>> think you won't get these problems in Java or Ruby because they are made 
>>> for the web...).
>>> 
>>> Joachim
>> Everything you say, all your considerations, especially the last paragraph 
>> is/are correct and I agree.
>> 
>> But some people will only remember the very low number you seem to be 
>> suggesting (which is more of a worse case scenario, with 
>> Seaside+blocking/slow connections to back end systems).
>> 
>> One the other hand, plain HTTP access to a Pharo image can be quite fast. 
>> Here is quick & dirty benchmark I just did on one of our modern/big machines 
>> (inside an LXD container, light load) using a single stock image on Linux.
>> 
>> 
>> $ pharo Pharo.image printVersion
>> [version] 4.0 #40626
>> 
>> $ pharo Pharo.image eval 'ZnServer startDefaultOn: 1701. 1 hour wait' &
>> 
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/bytes/32
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>> 
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>> 
>> 
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>> 
>> Document Path:          /bytes/32
>> Document Length:        32 bytes
>> 
>> Concurrency Level:      8
>> Time taken for tests:   1.945 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      2109440 bytes
>> HTML transferred:       327680 bytes
>> Requests per second:    5265.17 [#/sec] (mean)
>> Time per request:       1.519 [ms] (mean)
>> Time per request:       0.190 [ms] (mean, across all concurrent requests)
>> Transfer rate:          1059.20 [Kbytes/sec] received
>> 
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       2
>> Processing:     0    2   8.0      2     309
>> Waiting:        0    1   8.0      1     309
>> Total:          0    2   8.0      2     309
>> 
>> Percentage of the requests served within a certain time (ms)
>>   50%      2
>>   66%      2
>>   75%      2
>>   80%      2
>>   90%      2
>>   95%      3
>>   98%      3
>>   99%      3
>>  100%    309 (longest request)
>> 
>> 
>> More than 5K req/s (10K requests, 8 concurrent clients).
>> 
>> Granted, this is only for just 32 bytes payload and the loopback network 
>> interface. But this is the other end of the interval, the maximum speed.
>> 
>> A more realistic payload (7K HTML) gives the following:
>> 
>> 
>> $ ab -k -c 8 -n 10240 http://127.0.0.1:1701/dw-bench
>> This is ApacheBench, Version 2.3 <$Revision: 1638069 $>
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Licensed to The Apache Software Foundation, http://www.apache.org/
>> 
>> Benchmarking 127.0.0.1 (be patient)
>> Completed 1024 requests
>> Completed 2048 requests
>> Completed 3072 requests
>> Completed 4096 requests
>> Completed 5120 requests
>> Completed 6144 requests
>> Completed 7168 requests
>> Completed 8192 requests
>> Completed 9216 requests
>> Completed 10240 requests
>> Finished 10240 requests
>> 
>> 
>> Server Software:        Zinc
>> Server Hostname:        127.0.0.1
>> Server Port:            1701
>> 
>> Document Path:          /dw-bench
>> Document Length:        7734 bytes
>> 
>> Concurrency Level:      8
>> Time taken for tests:   7.874 seconds
>> Complete requests:      10240
>> Failed requests:        0
>> Keep-Alive requests:    10240
>> Total transferred:      80988160 bytes
>> HTML transferred:       79196160 bytes
>> Requests per second:    1300.46 [#/sec] (mean)
>> Time per request:       6.152 [ms] (mean)
>> Time per request:       0.769 [ms] (mean, across all concurrent requests)
>> Transfer rate:          10044.25 [Kbytes/sec] received
>> 
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0    0   0.0      0       0
>> Processing:     1    6 183.4      1    7874
>> Waiting:        1    6 183.4      1    7874
>> Total:          1    6 183.4      1    7874
>> 
>> Percentage of the requests served within a certain time (ms)
>>   50%      1
>>   66%      1
>>   75%      1
>>   80%      1
>>   90%      1
>>   95%      1
>>   98%      1
>>   99%      1
>>  100%   7874 (longest request)
>> 
>> 
>> That is more than 1K req/s.
>> 
>> In both cases we are talking about sub 1ms req/resp cycles !
>> 
>> I think all commercial users of Pharo today know what is possible and what 
>> needs to be done to achieve their goals. Pure speed might not be the main 
>> consideration, ease/speed/joy of development and just being capable of 
>> solving complex problems and offering compelling solutions to end users is 
>> probably more important.
>> 
>> Sven
>> 
>> 
>> 
> 
> 
>

Re: [Pharo-users] real world pharo web application set ups

Reply via email to