[sage-devel] Re: parallel computation

William Stein Mon, 19 May 2008 20:17:57 -0700

On Mon, May 19, 2008 at 6:57 PM, MarkV <[EMAIL PROTECTED]> wrote:
> Q1) Is the focus on performing symbolic calculations in parallel, or
> are numerically oriented calculations currently being considered?


Both are equally important and relevant to Sage.

>> Yes.   But this won't happen on a grand scale until at least a dozen
>> sample applications get written to use parallelism (not re-written --
>> all sequential algorithms will stay), and we learn from the outcome
>> of this.
>
> Q2)  Have the MPI based PETSC and TAO libraries being considered for
> inclusion?

No, they have never been considered.  Just shipping MPI itself
(a dependency) would significantly complicate Sage.   We do make
mpi4py and openmpi available as optional packages.

>  Considered and rejected?
>
> PETSC: http://www-unix.mcs.anl.gov/petsc/petsc-as/  (and
> http://acts.nersc.gov/petsc/)
> TAO: http://www-unix.mcs.anl.gov/tao/
>
> Rather than being an extra wheel on the SAGE 'car' they might
> represent the addition of a couple of extra cylinders to the SAGE car
> engine :)
>
>> > * Does anyone know of computer algebra systems that use that take
>> > advantage of more thant 1 CPU?
>>
>> Maple, Mathematica, MATLAB, andParallelGap.  But with the first
>> three who knows how or what they do.  The last is dated.
>
> Wolfram has gridMathematica and its personal grid edition, but I've
> not used them - for reasons I mention below in some observations.
> I was never certain what this meant for their symbolic calculations.
>
>> > * How much of a issue is thread-safety in current libraries that SAGE
>> > uses? Are there ways around it?
>>
>> Python is not thread safe.   This stops a number of (probably bad,
>> in retrospect!) ideas in their tracks.
>>
>> I doubt threaded techniques are going to be used much, if at all,
>> in the core SAGE library.  Probably IPython will be used a lot
>> in the core library, and dsage will be used a lot by end users.
>>
>
> Some observations.
> Disclaimer.  These observations relate to a particular use case and
> aren't general.  Nothing mentioned here is intended to criticize or
> invalidate the SAGE effort.
>
>  - With parallel programming it seems that a good (any?) parallel
> debugger becomes essential.  I found things getting very tricky when
> writing for MPI.

What do you mean by a good parallel debugger?  Are there some
examples you have in mind?

Writing good parallel code is usually not easy. (It depends on the
application though.  Sometimes it easy.)

>  - The SAGE notebook interface is an excellent idea, it seems similar
> to a MMA notebook in some respects. I wonder if it won't, for the same
> reasons, become the case that SAGE users come to need something like
> the Wolfram workbench?  I found that application (Eclipse based) to be
> very useful in writing some MMA scripts.  I guess this might be the
> case for some SAGE (non-math research) users.

I think there are several IDE for doing Python development.  With
a little work one could use any of them for Sage, since after all
Sage can be viewed as just another Python library.

>  - Putting the above two observations together it seemed that a SAGE
> IDE might lie ahead.  Continuing the 'build a car, not another wheel'
> philosophy, I thought it might be worth pointing out that the Eclipse
> project has a Parallel tools project, as well as a dynamic languages
> toolkit and Pydev.  Tying these together as a SAGE workbench might be
> something the Eclipse foundation, or one of its members, might be
> interested in as sponsor for a SAGE IDE effort? (Disclaimer I use
> Netbeans (primarily) _and_ Eclipse)
> Eclipse PTP: http://www.eclipse.org/ptp/
> Eclipse DLTK: http://www.eclipse.org/dltk/
> Pydev: http://pydev.sourceforge.net/

The only possible situation in which I could imagine there being a SAGE
IDE would be something web-based.  If it isn't web based, it's pointless -- just
use Pydev/Eclipse, as you suggest above, or one of the many many other
non-web-based Python IDE's such as WingIDE, PyDev, Eric3, Boa,
BlackAdder, or Komodo.

Right now a lot of people already use the Sage notebook as part of their
workflow from idea to peer reviewed code accepted into Sage.

>  - MPI vs threading.  This is very application specific.  One
> potential use of SAGE is to do very sophisticated calculations on very
> large data sets. I don't wish to depreciate the importance of 'best-
> speed' implementations, their benefits accrue if uses one or more
> cpu's.  Nonetheless I have found that there quickly comes a point
> where data related resources demands, latencies and 'issues' can
> dominate calculation time.  In these cases an MPI based algorithm that
> I can throw 100 computer's cpus, memory, hard-disk and network
> bandwidths dominates/beats a single computer (multiple core) multiple
> threaded implementation.  I'd be very happy with single threaded but
> genuinely (beyond one machine) scalable algorithms.

To be honest, I don't know anybody who uses Sage who actually programs
directly with MPI.  For parallel computation, people use IPython1, DSage,
or the new simple and robust parallel job queue system that Gary
Furnish recently implemented (which we're now using a *lot* for automated
testing and building of Sage).

>  - Please think of a data point as a symbol in a computer algebra
> system :)

I have no clue what that means.  I don't know what a "symbol in a computer
algebra system" means.  Do you mean an indeterminate like x (e.g., a polynomial
ring variable)?   Or a symbolic variable x like in "sin(x^2 + 1)".

> That is, if at all possible, treat data points as first
> class citizens. My MMA gripe and the reason for not using
> gridMathematica is that it is atrocious at handling data.  I quickly
> realized that writing to use gridMMA with my data would be more
> painful and take longer than if I switched to learn R and used Condor
> - calling MMA only when desperately in need of some arbitrary
> precision calculation.

This comment would be vastly more helpful if you could formulate
what about MMA makes it atrocious at handling data.  I could imagine
many ways in which I would find it atrocious at handling specific
data, but I don't know what you're thinking of.

>  - If you aren't already familiar with Amazon's web services I'd
> strongly encourage taking an hour or two to explore them.
> Specifically the Amazon machine images.  To my mind this is a seismic
> shift in the availability of computing power, and can change you
> mindset when thinking about SAGE applications. Given that 1TB will
> become available on demand I do think the 'sophisticated calculation
> on massive datasets' will become a much more common use case.  It will
> also shift what you consider to be a 'standard' use case - Example:
> For USD 80.00 I can employ 100 machines each with 8 cpu's for one
> hour, i.e. 800 cpu's, and +100x8GB memory.  For some more $'s each
> instance could have up to 1TB (or probably higher in a year or so) of
> storage :)
> Currently I use an Amazon machine image as a (Gnome) desktop machine -
> I know one user considered using an 8-cpu instance to compile their
> code more rapidly and this might help some SAGE devs?

Cool. Some of us Sage devs are at universities with
access to supercomputers.  These are no good for interactive work,
but are great for some things.

William

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: parallel computation

Reply via email to