Hi Pablo

>In my university, we have a room with 24 computers and one nfs server
>serving the home folders for all of them. SAGE is installed in each of
>the computers individually. As the course progresses, we're running into
>severe performance problems when using SAGE in this setting. We have now
>switched to local access, and we can proceed with the course without
>problems, but we'd like to have the home folders shared between the
>different computers if at all possible.
>  * If only one, or a few computers log in, performance is good.
>  * If all the students use SAGE at once with local access, performance
>is good, too.
>  * When they log into their nfs accounts, performance is poor but,
>after a while that is getting longer, students can work normally.
>  * As a side comment, the nfs server seems to have enough RAM, CPU and
>bandwith idle when all the computers struggle to open up SAGE.
>  So I'd say our problem is related to the big size of the .mozilla and
>.sage folders going through the nfs folder (compared to the small
>configuration folders of other programs). As the course progresses,
>these folders are getting bigger, and that would explain the performance
>issues and non-issues.
>
>  My questions are:
>  * Does this make sense to you?
>  * Has any of you tried a similar configuration?
>  * Any hints on how we can get shared folders back? Maybe samba would
>do better? Maybe rsync the folders on login and logout? Maybe use a
>single SAGE server?

I run a Ubuntu Dapper server with 130 NFS clients. 
The server is a SUN V60X with 6G RAM and 2x 3Ghz xeon chips,
7 years old, still running strong. It has 6x U320 scsi 10KRPM
SCSI disks in hardware RAID5.  

Recently it had problems when 50 users were already on and then
a group of 54 students walked into a lab and logged on simutlaneously
to start sage. There were problems without SAGE, but this was,
I think, worse during the SAGE course.

Though CPU usage and RAM usage were low, I/O would spike
and I/O wait climb to 30% or 50% or more for 30 minutes,
and Load Average would climb from 0.1 to 25 for 30 minutes.
Users would have 30s to 120s waits on a click on a gnome
desktop (all clients are Linux too), and hard restart
machines. This despite the desktop clients recently having
been upgraded from 2.4Ghz/512M_RAM/7200RPM_disks to 
3.0Ghz/4G_RAM/10KRPM_disks.

Running the package sysstat collects stats and commands
sar and sar -b showed clearly that I/O was the culprit
on the server. The package htop is a great improvement
on top though doesn't show the i/o wait by default.

I had already fixed a mozilla problem in Ubuntu Jaunty
clients downloading  61M .mozilla/firefox/default.87w/urlclassifier3.sqlite
(each user stores this huge anti-phishing file, but even
when that was solved) mozilla caused server slowdown, more
so than just gnome. The versions of sage did not matter,
4.0 through 4.1, patched to save space, 
/usr/local/src/sage/devel/sage-AIMS-autosave-patch/sage/server/notebook/user_conf.py
 
we set 
'max_history_length':10 # default was 100
'autosave_interval':120*60 # default was 60*60
I'm not sure whether less sage auto-saving activity here
also helps reduce the load on the NFS server. The sage
installs are local to each PC in /usr/local/src/, but the
.sage directories are on the central home server. Each
student has a desktop icon running sage -notebook on
their own PC. 

I did the following:
- upgrade the RAM from 3G to 6G. Services on that server
(imap, print, dhcp, some other things run next to NFS)
use little RAM, but the kernel file cache fills up all of it!
I believe this helps a little to moderate the effects during load.
- install the linux-server-kernel from ubuntu, which 
uses a deadline I/O scheduler and more tweaks:
http://www.ubuntu.com/products/whatisubuntu/serveredition/features/kernel
This made a massive difference, the load only climbs to 14 
after that, and it only climbs for 10 minutes, and the lag on 
clients is only around 10s during that time.

I have not had to make other tweaks. I am looking at
two new SUN X4150 with 8G RAM, 8x300G SAS drives
on hardware raid 6, and drbd (software network raid 0)
over it as upgrades which will probably improve the speed.
I can now relax as the kernel scheduler has reduced this
problem to be less of a priority.

One idea was to move just .sage locally (like your rsync
idea). My first reaction as system admin is that 
it is a bit of a cowboy hack, not a clever tweak.
But there may be merit. There is local scracth space
on each client. Students tend to sit in the same spot
so I may not even have to rsync all the time if they
stuck to the same computer. Just make a .sage symlink into
local space and let sage create files there. Or to have some
preservation make a .sage-on-nfs which rsyncs to/from a local
.sage each login/logout. Rsyncing around login time seems clunky, 
perhaps offload it to the user with a "backup sage history to 
file server" icon and a reverse one "get sage history from
file server". If that is too much info for the user, replace
the desktop icon with sync-from-server, start sage, hope they 
let the sync-back-to-server complete.

Like I said, I didn't have to move .sage locally.
I'd start by looking at the server kernel/hard disk/raid specs, actually.

regards,
Jan



-- 
   .~. 
   /V\     Jan Groenewald
  /( )\    www.aims.ac.za
  ^^-^^ 

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to