[sage-devel] Re: parallel computation

MarkV Mon, 19 May 2008 18:59:48 -0700


Hi Devs,
Firstly thanks for making the effort to get SAGE this far.
At the moment I'm a MMA user tracking its progress, and have some
'user-level' questions about SAGE's development, specifically related
to parallel calculations. I'll also indulge in some observations that
may be a little off-topic (?).


Context: I'm a member of the Business faculty within Sydney Uni - home
of the Magma black-hole it seems, though I was totally unaware of that
software :)
My use case is not computer algebra research -  so this may all be off
topic.
I have used MMA's computer algebra engine, usually as one step on the
way to substituting a data observation for a symbol, rather than an
end in itself - though I have done that too.
I have also 'played' with writing some MPI based C code - with lots of
frustration (no parallel debugger)
 and no 'real' success, again no parallel debugger :)

The questions: [remember, non-mathematician, so be gentle and try not
to laugh out loud :)]

On Feb 8 2007, 1:29 pm, "William Stein" <[EMAIL PROTECTED]> wrote:
> On Wed, 07 Feb2007 18:07:32 -0700, didier deshommes <[EMAIL PROTECTED]> wrote:
>
> > On 2/2/07, David Harvey <[EMAIL PROTECTED]> wrote:
>
> >> Hi everyone,
>
> >> It would be good if someone who was at theparallelcomputing workshop
> >> could volunteer to give a talk about parallelism issues at SAGE Days 3,
> >> as they might relate to SAGE. I'm sure some of the other SAGE
> >> developers would be interested to hear about it, and we all probably
> >> will have some thoughts after letting it percolate around for two
> >> weeks.
>
> > Here are some questions that I would love to see answered:
> > * OpenMP vs MPI: which one would be better for SAGE and why?
> > * Is there a clear winner in the python world forparallel
> > applications? googling for either "open mp python" or "mpi python"
> > shows several modules.
>
> MPI.  It's industry strength, very mature, openmpi is an excellent
> implementation, python supports it very well, as does IPython,
> and it's widely used.  OpenMP is barely deployed in comparison.
> SAGE is about using today's techology, not tomorrows.   That said,
> I bet most of the implementation ofparallelalgorithms in SAGE
> will not require MPI at all (i.e, they'll use either pthreads
> for low level things, ipython for midlevel, and dsage for
> tast farming).
>
> > * Are there any plans to rewrite basic algorithms so that they are
> >parallel-friendly?
>

Q1) Is the focus on performing symbolic calculations in parallel, or
are numerically oriented calculations currently being considered?

> Yes.   But this won't happen on a grand scale until at least a dozen
> sample applications get written to use parallelism (not re-written --
> all sequential algorithms will stay), and we learn from the outcome
> of this.

Q2)  Have the MPI based PETSC and TAO libraries being considered for
inclusion? Considered and rejected?

PETSC: http://www-unix.mcs.anl.gov/petsc/petsc-as/  (and
http://acts.nersc.gov/petsc/)
TAO: http://www-unix.mcs.anl.gov/tao/

Rather than being an extra wheel on the SAGE 'car' they might
represent the addition of a couple of extra cylinders to the SAGE car
engine :)

> > * Does anyone know of computer algebra systems that use that take
> > advantage of more thant 1 CPU?
>
> Maple, Mathematica, MATLAB, andParallelGap.  But with the first
> three who knows how or what they do.  The last is dated.

Wolfram has gridMathematica and its personal grid edition, but I've
not used them - for reasons I mention below in some observations.
I was never certain what this meant for their symbolic calculations.

> > * How much of a issue is thread-safety in current libraries that SAGE
> > uses? Are there ways around it?
>
> Python is not thread safe.   This stops a number of (probably bad,
> in retrospect!) ideas in their tracks.
>
> I doubt threaded techniques are going to be used much, if at all,
> in the core SAGE library.  Probably IPython will be used a lot
> in the core library, and dsage will be used a lot by end users.
>

Some observations.
Disclaimer.  These observations relate to a particular use case and
aren't general.  Nothing mentioned here is intended to criticize or
invalidate the SAGE effort.

 - With parallel programming it seems that a good (any?) parallel
debugger becomes essential.  I found things getting very tricky when
writing for MPI.

 - The SAGE notebook interface is an excellent idea, it seems similar
to a MMA notebook in some respects. I wonder if it won't, for the same
reasons, become the case that SAGE users come to need something like
the Wolfram workbench?  I found that application (Eclipse based) to be
very useful in writing some MMA scripts.  I guess this might be the
case for some SAGE (non-math research) users.

 - Putting the above two observations together it seemed that a SAGE
IDE might lie ahead.  Continuing the 'build a car, not another wheel'
philosophy, I thought it might be worth pointing out that the Eclipse
project has a Parallel tools project, as well as a dynamic languages
toolkit and Pydev.  Tying these together as a SAGE workbench might be
something the Eclipse foundation, or one of its members, might be
interested in as sponsor for a SAGE IDE effort? (Disclaimer I use
Netbeans (primarily) _and_ Eclipse)
Eclipse PTP: http://www.eclipse.org/ptp/
Eclipse DLTK: http://www.eclipse.org/dltk/
Pydev: http://pydev.sourceforge.net/

 - MPI vs threading.  This is very application specific.  One
potential use of SAGE is to do very sophisticated calculations on very
large data sets. I don't wish to depreciate the importance of 'best-
speed' implementations, their benefits accrue if uses one or more
cpu's.  Nonetheless I have found that there quickly comes a point
where data related resources demands, latencies and 'issues' can
dominate calculation time.  In these cases an MPI based algorithm that
I can throw 100 computer's cpus, memory, hard-disk and network
bandwidths dominates/beats a single computer (multiple core) multiple
threaded implementation.  I'd be very happy with single threaded but
genuinely (beyond one machine) scalable algorithms.

 - Please think of a data point as a symbol in a computer algebra
system :) That is, if at all possible, treat data points as first
class citizens. My MMA gripe and the reason for not using
gridMathematica is that it is atrocious at handling data.  I quickly
realized that writing to use gridMMA with my data would be more
painful and take longer than if I switched to learn R and used Condor
- calling MMA only when desperately in need of some arbitrary
precision calculation.

 - If you aren't already familiar with Amazon's web services I'd
strongly encourage taking an hour or two to explore them.
Specifically the Amazon machine images.  To my mind this is a seismic
shift in the availability of computing power, and can change you
mindset when thinking about SAGE applications. Given that 1TB will
become available on demand I do think the 'sophisticated calculation
on massive datasets' will become a much more common use case.  It will
also shift what you consider to be a 'standard' use case - Example:
For USD 80.00 I can employ 100 machines each with 8 cpu's for one
hour, i.e. 800 cpu's, and +100x8GB memory.  For some more $'s each
instance could have up to 1TB (or probably higher in a year or so) of
storage :)
Currently I use an Amazon machine image as a (Gnome) desktop machine -
I know one user considered using an 8-cpu instance to compile their
code more rapidly and this might help some SAGE devs?

Again thanks for all the exceptional efforts, I'll continue to watch
in anticipation (Currently my opensource efforts are spent on a Ruby
ORM - Sequel).

Regards
Mark

> William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: parallel computation

Reply via email to