> Am 27.06.2018 um 15:08 schrieb Andrei Stebakov <lisper...@gmail.com>:
> 
> Thank you guys for your insiteful answers. I wish we could have some kind of 
> article summarizing those approaches so that next devs wouldn't have to 
> reinvent the wheel but start with some tried approach and maybe improve it.
> As I only scratched the surface learning Pharo, I may have some naive 
> questions.
> Does the fact (fact?) that Pharo uses green threads (not native os threads) 
> impact the performance? 

Yes and no. There is nothing wrong with green threads. They are super 
lightweight and enable some sort of parallelism. If you look at Erlang/OTP it 
handles ten thousands of green threads easily. The performance bottleneck is 
due to the fact that you cannot utilize multiple cores of a CPU. So it is usual 
to have some images being spread out to separate cores and the images handle 
things concurrently. 

> With two Pharo images running in parallel on two core system, how does it 
> handle multiple requests at a time? There must always be some unblocked 
> thread waiting for connections and delegating requests to request handlers in 
> different green threads (using fork operation). Is my understanding correct?

Not completely. It is also a green thread accepting connections. The priority 
is given due to a socket waiting on a system resource that gets signalled if a 
connection comes in. 

> So even if one of those threads has to wait on a long IO operation (say from 
> DB2) that shouldn't impact the performance of other handlers?

Exactly. That is the way through orchestration to have maximum throughput.

> I think that in most cases the CPU time for request processing is minal as 
> the bottleneck is in lengthy IO operations , DB waits and calling external 
> REST-ful services. So two images on two cores should be enough to handle 
> hundreds of simultaneous requests since most of the times the threads will 
> wait on external operations, not using the local CPU.

Yes, it depends on the use case of course.

> Please let me know if this summary that I got from this thread makes sense.
> Yes, I fully agree that using docker pharo containers under some load 
> balancing is the way to go. 
> 
I think your summary is pretty accurate. Docker also the advantage that it uses 
a lot of shared memory. So starting 100 pharo images most resources including 
the vm are in memory only once. 

Hope it helps,

Norbert
>> On Wed, Jun 27, 2018, 04:10 jtuc...@objektfabrik.de 
>> <jtuc...@objektfabrik.de> wrote:
>> Norbert,
>> 
>> 
>> thanks for your insighgts, explanations and thoughts. It is good to read and 
>> learn from people who are a step or two ahead...
>> 
>>> Am 27.06.18 um 09:31 schrieb Norbert Hartl:
>>> Joachim,
>>> 
>>>> Am 27.06.2018 um 07:42 schrieb jtuc...@objektfabrik.de:
>>>> 
>>>> Norbert,
>>>> 
>>>>> Am 26.06.18 um 21:41 schrieb Norbert Hartl:
>>>>> 
>>>>> 
>>>>> Am 26.06.2018 um 20:44 schrieb Andrei Stebakov <lisper...@gmail.com>:
>>>>> 
>>>>>> What would be an example for load balancer for Pharo images? Can we run 
>>>>>> multiple images on the same server or for the sake of balancing 
>>>>>> configuration we can only run one image per server?
>>>>>> 
>>>>> There are a lot of possibilities. You can start multiple images on 
>>>>> different ports and use nginx with an upstream rule to load balance. I 
>>>>> would recommend using docker for spawning multiple images on a host. 
>>>>> Again with nginx as frontend load balancer. The point is that you can 
>>>>> have at least twice as muh inages running then you have CPU cores. And of 
>>>>> course a lot more.
>>>>> 
>>>> 
>>>> the last time I checked nginx, the load balancing and sticky session stuff 
>>>> was not available in the free edition. So I guess you either pay for nginx 
>>>> (which I think is good) or you know some free 3d party addons...
>>>> 
>>> there is the upstream module which provides load balancing. But you are 
>>> right I think sticky sessions is not part of it. The closest you get IIRC 
>>> is IP based hashing.
>> I see.
>> 
>>> 
>>>> I wonder what exactly the benefit of Docker is in that game? On our 
>>>> servers we run 10 images on 4 cores with HT (8 virtual cores) and very 
>>>> rareley have real performance problems. We use Glorp, so there is a lot of 
>>>> SQL queriing going on for quite basic things already. So my guess would be 
>>>> that your "2 images per core"  are conservative and leave air for even a 
>>>> third one, depending on all the factors already discussed here.
>>>>  
>>> Docker is pretty nice. You can have the exact same deployment artefact 
>>> started multiple times. I used tools like daemontools,         monit, etc. 
>>> before but starting the image, assigning ports etc. you have to do yourself 
>>> which is cumbersome and I don’t like any of those tools anymore. If you 
>>> created your docker image you can start that multiple times because 
>>> networking is virtualized all images can have the same port serving e.g.
>> 
>> oh, I see. This is a plus. We're not using any containers and have to 
>> provide individual configurations for each image we start up. Works well, 
>> not too many moving parts (our resources are very limited) and we try to 
>> keep things as simple as possible. As long as we can live with providing a 
>> statically sized pool of machines and images and load doesn't vary too much, 
>> this is not too bad. But once you need to dynamically add and remove images 
>> for coping with load peeks and lows, our approach will probably become 
>> cumbersome and complicated.
>> OTOH, I guess usind Docker just means solving the same problems on another 
>> level - but I guess there are lots of toosl in the Container area that can 
>> help here (like the trafik thing mentioned in another thread).
>> 
>>> 
>>> I think talking about performance these days is not easy. Modern machines 
>>> are so fast that you need a lot of users before you experience any problems.
>> ... depending on your usage of resources. As I said, we're using SQL heavily 
>> because of the way Glorp works. So it is easy to introduce bottlenecks even 
>> for smaller jobs.
>>> The mention of „2 images per core“ I need to explain. A CPU core can 
>>> execute only one thing at a time. Therefor 1 image per core would be 
>>> enough. The second one is for that time slices where there are gaps in 
>>> processing meaning the process is suspended, switched etc. It is just the 
>>> rule of thumb that it is good to have one process waiting in the scheduling 
>>> queue so it can step in as soon as there is free cycles. The „2 images per 
>>> core“ have the assumption that you can put an arbitrary load on one image. 
>>> So with this assumption a third image won’t give you anything because it 
>>> cannot do anything the other two images cannot do.
>>> So according to the „hard“ facts it does not help having more than two 
>>> images. On the other hand each image is single threaded and using more 
>>> images lowers the probability that processes get blocked because they are 
>>> executed within one image. On yet another hand if you use a database a lot 
>>> of the time for a process is waiting for the response of the database so 
>>> other processes can be executed. And and and…. So in the end you have to 
>>> try it. 
>> 
>> You are correct. The third image con anly jump in if both the others are in 
>> a wait state. It "feels" as if there was enough air for a third one to 
>> operate, but we'd have to try if that holds true.
>> 
>>> 
>>>> What's not to underestimate is all the stuff around monitoring and 
>>>> restarting images when things go wrong, but that's another story...
>>>> 
>>> Docker has a restart policy so restarting shouldn’t be an issue with it. 
>>> Monitoring is always hard. I use prometheus with grafana but that is quite 
>>> a bit to set up. But in the end you get graphs and you can define alerts 
>>> for system value thresholds. 
>> Well, that is also true for monit (which we use), the question always is: 
>> what do you make of those numbers. We have situations in whcih an Image 
>> responds to http requests as if all were good. But for some reason, DB2 
>> sometimes takes forever to answer queries, and will probably answer with a 
>> "cannot handle requests at this time" after literally a minute or so. Other 
>> DB connections work well in parallel. We're still looking for ways to 
>> recognize such situations externally (and think about moving from DB2 to 
>> PostgreSQL).
>> 
>>> If the topic gets accepted Marcus and me will tell about these things at 
>>> ESUG. 
>> 
>> So if anybody from the program committee is reading this: Please accept and 
>> schedule Norbert's and Macus' talk, I'll be sticking to their lips and I 
>> guess I won't be alone ;-)
>> 
>> 
>> Joachim
>> 
>>> 
>>>  
>>> Norbert
>>>> Joachim
>>>> 
>>>>  
>>>> -- 
>>>> -----------------------------------------------------------------------
>>>> Objektfabrik Joachim Tuchel          mailto:jtuc...@objektfabrik.de
>>>> Fliederweg 1                         http://www.objektfabrik.de
>>>> D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
>>>> Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1
>>>> 
>>> 
>> 
>> -- 
>> -----------------------------------------------------------------------
>> Objektfabrik Joachim Tuchel          mailto:jtuc...@objektfabrik.de
>> Fliederweg 1                         http://www.objektfabrik.de
>> D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
>> Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1
>> 

Reply via email to