If I have been using modperl for development, does it influence I drive Maserati? :)
> unsubscribe. > > On Fri, Dec 25, 2020 at 10:30 PM André Warnier (tomcat/perl) < > a...@ice-sa.com> wrote: > >> Hello James. >> Bravo and many thanks for this excellent overview of your activities. Of >> course the setup >> (in your previous message) and the activities are very impressive by >> themselves. >> But in addition, even though your message is not in itself a perl >> advocacy message, I feel >> that it would have its right place in some perl/mod_perl advocacy forum, >> because it >> touches on some general idea which are valid /also/ for perl and mod_perl. >> It was very refreshing to read for once a clear exposé of why it is still >> important >> nowadays to think before programming, to program efficiently, and to >> choose the right tool >> for the job at hand (be it perl, mod_perl, or any other) without the kind >> of off-the-cuff >> general a-priori which tend to plague these discussions. >> >> And even though our own (commercial) activities and setups do not have >> anything even close >> to the scope which you describe, I would like to say that the same basic >> principles which >> you mention in your exposé are just as valid when you scale-down as when >> you scale-up. >> ("--you can’t just throw memory, CPUs, power at a problem – you have to >> think – how can I do what I need to do with the least resources..") >> Even when you think of a single server, or a single server rack, at any >> one period in time >> there is always a practical limit as to how much memory or CPUs you can >> fit in a given >> server, or how many servers you can fit in a rack, or how many additional >> Gb of bandwidth >> you can allocate per server, beyond which there is a sudden "quantum >> jump" as to how >> practical and cost-effective a whole project becomes. >> In that sense, I particulary enjoyed your examples of the database and of >> the additional >> power line. >> >> >> On 24.12.2020 02:38, James Smith wrote: >> > We don’t use perl for everything, yes we use it for web data, yes we >> still use it as the >> > glue language in a lot of cases, the most complex stuff is done with C >> (not even C++ as >> > that is too slow). Others on site use Python, Java, Rust, Go, PHP, >> along with looking at >> > using GPUs in cases where code can be highly parallelised >> > >> > It is not just one application – but many, many applications… All with >> a common goal of >> > understanding the human genome, and using it to assist in developing >> new understanding and >> > techniques which can advance health care. >> > >> > We are a very large sequencing centre (one of the largest in the world) >> – what I was >> > pointing out is that you can’t just throw memory, CPUs, power at a >> problem – you have to >> > think – how can I do what I need to do with the least resources. Rather >> than what >> > resources can I throw at the problem. >> > >> > Currently we are acting as the central repository for all COVID-19 >> sequencing in the UK, >> > along with one of the largest “wet” labs sequencing data for it – and >> that is half the >> > sequenced samples in the whole world. The UK is sequencing more >> COVID-19 genomes a day >> > than most other countries have sequenced since the start of the >> pandemic in Feb/Mar. This >> > has lead to us discovering a new more transmissible version of the >> virus, and it what part >> > of the country the different strains are present – no other country in >> the world has the >> > information, technology or infrastructure in place to achieve this. >> > >> > But this is just a small part of the genomic sequencing we are looking >> at – we work on: >> > * other pathogens – e.g. Plasmodium (Malaria); >> > * cancer genomes (and how effective drugs are); >> > * are a major part of the Human Cell Atlas which is looking at how the >> expression of genes >> > (in the simplest terms which ones are switched on and switched off) are >> different in >> > different tissues; >> > * sequencing the genomes of other animals to understand their evolution; >> > * and looking at some other species in detail, to see what we can learn >> from them when >> > they have defective genes; >> > >> > Although all these are currently scaled back so that we can work >> relentlessly to support >> > the medical teams and other researchers get on top of COVID-19. >> > >> > What is interesting is that many of the developers we have on campus >> (well all wfh at the >> > moment) are all (relatively) old as we learnt to develop code on >> machines with limited CPU >> > and limited memory – so that things had to be efficient, had to be >> compact…. And that is >> > as important now as it was 20 or 30 years ago – the data we handle is >> going up faster than >> > Moore’s Law! Many of us have pride in doing things as efficiently as >> possible. >> > >> > It took around 10 years to sequence and assemble the first human genome >> {well we are still >> > tinkering with it and filling in the gaps} – now at the institute we >> can sequence and >> > assemble around 400 human genomes in a day – to the same quality! >> > >> > So most of our issues are due to the scale of the problems we face – >> e.g. the human genome >> > has 3 billion base-pairs (A, C, G, Ts) , so normal solutions don’t >> scale to that (once >> > many years ago we looked at setting up an Oracle database where there >> was at least 1 row >> > for every base pair – recording all variants (think of them as spelling >> mistakes, for >> > example a T rather than an A, or an extra letter inserted or deleted) >> for that base pair… >> > The schema was set up – and then they realised it would take 12 months >> to load the data >> > which we had then (which is probably less than a millionth of what we >> have now)! >> > >> > Moving compute off site is a problem as the transfer of the level of >> data we have would >> > cause a problem – you can’t easily move all the data to the compute – >> so you have to bring >> > the compute to the data. >> > >> > The site I worked on before I became a more general developer was doing >> that – and the >> > code that was written 12-15 years ago is actually still going strong – >> it has seen a few >> > changes over the year – many displays have had to be redeveloped as the >> scale of the data >> > has got so big that even the summary pages we produced 10 years ago >> have to be summarised >> > because they are so large. >> > >> > *From:*Mithun Bhattacharya <mit...@gmail.com> >> > *Sent:* 24 December 2020 00:06 >> > *To:* mod_perl list <modperl@perl.apache.org> >> > *Subject:* Re: Confused about two development utils [EXT] >> > >> > James would you be able to share more info about your setup ? >> > >> > 1. What exactly is your application doing which requires so much memory >> and CPU - is it >> > something like gene splicing (no i don't know much about it beyond >> Jurassic Park :D ) >> > >> > 2. Do you feel Perl was the best choice for whatever you are doing and >> if yes then why ? >> > How much of your stuff is using mod_perl considering you mentioned not >> much is web related ? >> > >> > 3. What are the challenges you are currently facing with your >> implementation ? >> > >> > On Wed, Dec 23, 2020 at 6:58 AM James Smith <j...@sanger.ac.uk <mailto: >> j...@sanger.ac.uk>> >> > wrote: >> > >> > Oh but memory is a problem – but not if you have just a small >> cluster of machines! >> > >> > Our boxes are larger than that – but they all run virtual machine >> {only a small >> > proportion web related} – machines/memory would rapidly become in >> our data centre - we >> > run VMWARE [995 hosts] and openstack [10,000s of hosts] + a >> selection of large memory >> > machines {measured in TBs of memory per machine }. >> > >> > We would be looking at somewhere between 0.5 PB and 1 PB of memory >> – not just the >> > price of buying that amount of memory - for many machines we need >> the fastest memory >> > money can buy for the workload, but we would need a lot more CPUs >> then we currently >> > have as we would need a larger amount of machines to have 64GB >> virtual machines {we >> > would get 2 VMs per host. We currently have approx. 1-2000 CPUs >> running our hardware >> > (last time I had a figure) – it would probably need to go to >> approximately 5-10,000! >> > It is not just the initial outlay but the environmental and >> financial cost of running >> > that number of machines, and finding space to run them without >> putting the cooling >> > costs through the roof!! That is without considering what >> additional constraints on >> > storage having the extra machines may have (at the last count a >> year ago we had over >> > 30 PBytes of storage on side – and a large amount of offsite backup. >> > >> > We would also stretch the amount of power we can get from the >> national grid to power >> > it all - we currently have 3 feeds from different part of the >> national grid (we are >> > fortunately in position where this is possible) and the dedicated >> link we would need >> > to add more power would be at least 50 miles long! >> > >> > So - managing cores/memory is vitally important to us – moving to >> the cloud is an >> > option we are looking at – but that is more than 4 times the price >> of our onsite >> > set-up (with substantial discounts from AWS) and would require an >> upgrade of our >> > existing link to the internet – which is currently 40Gbit of data >> (I think). >> > >> > Currently we are analysing a very large amounts of data directly >> linked to the current >> > major world problem – this is why the UK is currently being >> isolated as we have >> > discovered and can track a new strain, in near real time – other >> countries have no >> > ability to do this – we in a day can and do handle, sequence and >> analyse more samples >> > than the whole of France has sequenced since February. We probably >> don’t have more of >> > the new variant strain than in other areas of the world – it is >> just that we know we >> > have because of the amount of sequencing and analysis that we in >> the UK have done. >> > >> > *From:*Matthias Peng <pengmatth...@gmail.com <mailto: >> pengmatth...@gmail.com>> >> > *Sent:* 23 December 2020 12:02 >> > *To:* mod_perl list <modperl@perl.apache.org <mailto: >> modperl@perl.apache.org>> >> > *Subject:* Re: Confused about two development utils [EXT] >> > >> > Today memory is not serious problem, each of our server has 64GB >> memory. >> > >> > >> > Forgot to add - so our FCGI servers need a lot (and I mean a >> lot) more memory than >> > the mod_perl servers to serve the same level of content (just >> in case memory blows >> > up with FCGI backends) >> > >> > -----Original Message----- >> > From: James Smith <j...@sanger.ac.uk <mailto:j...@sanger.ac.uk>> >> > Sent: 23 December 2020 11:34 >> > To: André Warnier (tomcat/perl) <a...@ice-sa.com <mailto: >> a...@ice-sa.com>>; >> > modperl@perl.apache.org <mailto:modperl@perl.apache.org> >> > Subject: RE: Confused about two development utils [EXT] >> > >> > >> > > This costs memory, and all the more since many perl modules >> are not >> > thread-safe, so if you use them in your code, at this moment >> the only safe way to >> > do it is to use the Apache httpd prefork model. This means that >> each Apache httpd >> > child process has its own copy of the perl interpreter, which >> means that the >> > memory used by this embedded perl interpreter has to be counted >> n times (as many >> > times as there are Apache httpd child processes running at any >> one time). >> > >> > This isn’t quite true - if you load modules before the process >> forks then they can >> > cleverly share the same parts of memory. It is useful to be >> able to "pre-load" >> > core functionality which is used across all functions {this is >> the case in Linux >> > anyway}. It also speeds up child process generation as the >> modules are already in >> > memory and converted to byte code. >> > >> > One of the great advantages of mod_perl is Apache2::SizeLimit >> which can blow away >> > large child process - and then if needed create new ones. This >> is not the case >> > with some of the FCGI solutions as the individual processes can >> grow if there is a >> > memory leak or a request that retrieves a large amount of >> content (even if not >> > served), but perl can't give the memory back. So FCGI processes >> only get bigger >> > and bigger and eventually blow up memory (or hit swap first) >> > >> > >> > >> > >> > >> > -- >> > The Wellcome Sanger Institute is operated by Genome Research >> Limited, a charity >> > registered in England with number 1021457 and a company >> registered in England >> > with number 2742969, whose registered office is 215 Euston >> Road, London, NW1 >> <https://www.google.com/maps/search/215+Euston+Road,+London,+NW1?entry=gmail&source=g> >> 2 >> > [google.com] >> > < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_maps_search_s-2B215-2BEuston-2BRoad-2C-2BLondon-2C-2BNW1-2B2-3Fentry-3Dgmail-26source-3Dg&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=oH2yp0ge1ecj4oDX0XM7vQ&m=friR8ykiZ-NWYdX6SrbT_ogNXEVR-4ixdkrhy5khQjA&s=xU3F4xE2ugQuDWHZ4GtDn9mPBCKcJJOI0PYScsSNjSg&e= >> >BE. >> > >> > >> > >> > -- >> > The Wellcome Sanger Institute is operated by Genome Research >> > Limited, a charity registered in England with number 1021457 >> and a >> > company registered in England with number 2742969, whose >> registered >> > office is 215 Euston Road, London, NW1 >> <https://www.google.com/maps/search/215+Euston+Road,+London,+NW1?entry=gmail&source=g> >> 2 [google.com] >> > < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_maps_search_s-2B215-2BEuston-2BRoad-2C-2BLondon-2C-2BNW1-2B2-3Fentry-3Dgmail-26source-3Dg&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=oH2yp0ge1ecj4oDX0XM7vQ&m=friR8ykiZ-NWYdX6SrbT_ogNXEVR-4ixdkrhy5khQjA&s=xU3F4xE2ugQuDWHZ4GtDn9mPBCKcJJOI0PYScsSNjSg&e= >> >BE. >> > >> > -- The Wellcome Sanger Institute is operated by Genome Research >> Limited, a charity >> > registered in England with number 1021457 and a company registered >> in England with >> > number 2742969, whose registered office is 215 Euston Road, >> London, NW1 2BE >> <https://www.google.com/maps/search/215+Euston+Road,+London,+NW1+2BE?entry=gmail&source=g> >> . >> > >> > -- The Wellcome Sanger Institute is operated by Genome Research >> Limited, a charity >> > registered in England with number 1021457 and a company registered in >> England with number >> > 2742969, whose registered office is 215 Euston Road, London, NW1 2BE >> <https://www.google.com/maps/search/215+Euston+Road,+London,+NW1+2BE?entry=gmail&source=g> >> . >> >>