[sage-devel] Re: parallel computation

Mark V Mon, 19 May 2008 22:48:36 -0700

Hi William,
Thanks for taking time to respond so promptly.

On Tue, May 20, 2008 at 1:17 PM, William Stein <[EMAIL PROTECTED]> wrote:
> On Mon, May 19, 2008 at 6:57 PM, MarkV <[EMAIL PROTECTED]> wrote:
>> Q1) Is the focus on performing symbolic calculations in parallel, or
>> are numerically oriented calculations currently being considered?
>
> Both are equally important and relevant to Sage.
>


Great.

>>> Yes.   But this won't happen on a grand scale until at least a dozen
>>> sample applications get written to use parallelism (not re-written --
>>> all sequential algorithms will stay), and we learn from the outcome
>>> of this.
>>
>> Q2)  Have the MPI based PETSC and TAO libraries being considered for
>> inclusion?
>
> No, they have never been considered.  Just shipping MPI itself
> (a dependency) would significantly complicate Sage.   We do make

Yes my impression is to build MPI in would take some effort.
I don't underestimate that effort nor the competing demands for
(scarce) dev resources.

> mpi4py and openmpi available as optional packages.

Thanks I'll look into those.

>
>>  Considered and rejected?
>>
>> PETSC: http://www-unix.mcs.anl.gov/petsc/petsc-as/  (and
>> http://acts.nersc.gov/petsc/)
>> TAO: http://www-unix.mcs.anl.gov/tao/
>>
>> Rather than being an extra wheel on the SAGE 'car' they might
>> represent the addition of a couple of extra cylinders to the SAGE car
>> engine :)
>>
>>> > * Does anyone know of computer algebra systems that use that take
>>> > advantage of more thant 1 CPU?
>>>
>>> Maple, Mathematica, MATLAB, andParallelGap.  But with the first
>>> three who knows how or what they do.  The last is dated.
>>
>> Wolfram has gridMathematica and its personal grid edition, but I've
>> not used them - for reasons I mention below in some observations.
>> I was never certain what this meant for their symbolic calculations.
>>
>>> > * How much of a issue is thread-safety in current libraries that SAGE
>>> > uses? Are there ways around it?
>>>
>>> Python is not thread safe.   This stops a number of (probably bad,
>>> in retrospect!) ideas in their tracks.
>>>
>>> I doubt threaded techniques are going to be used much, if at all,
>>> in the core SAGE library.  Probably IPython will be used a lot
>>> in the core library, and dsage will be used a lot by end users.
>>>
>>
>> Some observations.
>> Disclaimer.  These observations relate to a particular use case and
>> aren't general.  Nothing mentioned here is intended to criticize or
>> invalidate the SAGE effort.
>>
>>  - With parallel programming it seems that a good (any?) parallel
>> debugger becomes essential.  I found things getting very tricky when
>> writing for MPI.
>
> What do you mean by a good parallel debugger?  Are there some
> examples you have in mind?

Entus TotalView was the only one I saw that looked like it would do
the job, but then it was USD 3k for a license, annual from memory.
Subsequently I think it has been taken over by intel - I haven't been
following that closely since I'm not likely to afford such fees -
getting a MMA license was a big deal :)

As far as I know the Eclipse PTP is the only opensource parallel
coding IDE effort.
Not having looked for a while this might have changed?

>
> Writing good parallel code is usually not easy. (It depends on the
> application though.  Sometimes it easy.)
>

Yep, it gave me a bloody nose and put me on the canvas on a few occasions.

>>  - The SAGE notebook interface is an excellent idea, it seems similar
>> to a MMA notebook in some respects. I wonder if it won't, for the same
>> reasons, become the case that SAGE users come to need something like
>> the Wolfram workbench?  I found that application (Eclipse based) to be
>> very useful in writing some MMA scripts.  I guess this might be the
>> case for some SAGE (non-math research) users.
>
> I think there are several IDE for doing Python development.  With
> a little work one could use any of them for Sage, since after all
> Sage can be viewed as just another Python library.

Yes that is true.

>
>>  - Putting the above two observations together it seemed that a SAGE
>> IDE might lie ahead.  Continuing the 'build a car, not another wheel'
>> philosophy, I thought it might be worth pointing out that the Eclipse
>> project has a Parallel tools project, as well as a dynamic languages
>> toolkit and Pydev.  Tying these together as a SAGE workbench might be
>> something the Eclipse foundation, or one of its members, might be
>> interested in as sponsor for a SAGE IDE effort? (Disclaimer I use
>> Netbeans (primarily) _and_ Eclipse)
>> Eclipse PTP: http://www.eclipse.org/ptp/
>> Eclipse DLTK: http://www.eclipse.org/dltk/
>> Pydev: http://pydev.sourceforge.net/
>
> The only possible situation in which I could imagine there being a SAGE
> IDE would be something web-based.  If it isn't web based, it's pointless -- 
> just

Fair enough.

> use Pydev/Eclipse, as you suggest above, or one of the many many other
> non-web-based Python IDE's such as WingIDE, PyDev, Eric3, Boa,
> BlackAdder, or Komodo.

Yes there is probably nothing preventing anyone using PTP and Pydev,
etc. by switching perspectives.

> Right now a lot of people already use the Sage notebook as part of their
> workflow from idea to peer reviewed code accepted into Sage.

Yes notebooks are great.  I just found, using Mathematica, that you
can end up writing a 'system' where you need to debug.
By using Python you probably have covered cases that led to the
Wolfram workbench.

>>  - MPI vs threading.  This is very application specific.  One
>> potential use of SAGE is to do very sophisticated calculations on very
>> large data sets. I don't wish to depreciate the importance of 'best-
>> speed' implementations, their benefits accrue if uses one or more
>> cpu's.  Nonetheless I have found that there quickly comes a point
>> where data related resources demands, latencies and 'issues' can
>> dominate calculation time.  In these cases an MPI based algorithm that
>> I can throw 100 computer's cpus, memory, hard-disk and network
>> bandwidths dominates/beats a single computer (multiple core) multiple
>> threaded implementation.  I'd be very happy with single threaded but
>> genuinely (beyond one machine) scalable algorithms.
>
> To be honest, I don't know anybody who uses Sage who actually programs
> directly with MPI.  For parallel computation, people use IPython1, DSage,
> or the new simple and robust parallel job queue system that Gary
> Furnish recently implemented (which we're now using a *lot* for automated
> testing and building of Sage).

Thanks I'll look into those.

>
>>  - Please think of a data point as a symbol in a computer algebra
>> system :)
>
> I have no clue what that means.  I don't know what a "symbol in a computer
> algebra system" means.  Do you mean an indeterminate like x (e.g., a 
> polynomial
> ring variable)?   Or a symbolic variable x like in "sin(x^2 + 1)".

The x in "sin(x^2+1)"

>
>> That is, if at all possible, treat data points as first
>> class citizens. My MMA gripe and the reason for not using
>> gridMathematica is that it is atrocious at handling data.  I quickly
>> realized that writing to use gridMMA with my data would be more
>> painful and take longer than if I switched to learn R and used Condor
>> - calling MMA only when desperately in need of some arbitrary
>> precision calculation.
>
> This comment would be vastly more helpful if you could formulate
> what about MMA makes it atrocious at handling data.  I could imagine

MMA is great in that 'everything' is a list, but this was an
uncomfortable match/mapping when trying to use a database table (say
xGB+).
I also found parsing incoming data, and manipulating data could be
tricky, not impossible but not natural, compared to R say.
Another big issue was MMA's caching of results.  This is great for
speeding up calculations that might be repeated, but killed when it
came to data analysis.

Python probably has the data base angle covered with some Object
Relational Mapping library - I'm coming from a Ruby background so
haven't used Python previously, but I hear they are similar.
I don't know if SAGE caches results, but this was a real pain point in
my use of MMA.

> many ways in which I would find it atrocious at handling specific
> data, but I don't know what you're thinking of.
>
>>  - If you aren't already familiar with Amazon's web services I'd
>> strongly encourage taking an hour or two to explore them.
>> Specifically the Amazon machine images.  To my mind this is a seismic
>> shift in the availability of computing power, and can change you
>> mindset when thinking about SAGE applications. Given that 1TB will
>> become available on demand I do think the 'sophisticated calculation
>> on massive datasets' will become a much more common use case.  It will
>> also shift what you consider to be a 'standard' use case - Example:
>> For USD 80.00 I can employ 100 machines each with 8 cpu's for one
>> hour, i.e. 800 cpu's, and +100x8GB memory.  For some more $'s each
>> instance could have up to 1TB (or probably higher in a year or so) of
>> storage :)
>> Currently I use an Amazon machine image as a (Gnome) desktop machine -
>> I know one user considered using an 8-cpu instance to compile their
>> code more rapidly and this might help some SAGE devs?
>
> Cool. Some of us Sage devs are at universities with
> access to supercomputers.  These are no good for interactive work,
> but are great for some things.
>

Yep I've dealt with them (supercomputers), and written my own
screen-saver based app for use in student labs (desktops).  Amazon
machine instances are _close_ to offering the best of both
super-computer and desktop worlds, i.e. both worlds.
With the benefit that you only pay for what you use, no queues, no
applications, no sysadmin defining what you can and can't install,
where, how and when you install it.   All for $0.10/cpu/hr!
The RightScale guys describe firing up 500 instances for a
calculation, there are also cases of people starting 3400 machines on
demand.  It wasn't clear if these were 1, 4 or 8 cpu instances,
running independently or dependently.
If your desktop is connected to that size of MPI-pool things start to
look interesting :)  Obviously there will be bottlenecks so it won't
be suitable for all and sundry uses.

Cheers
Mark
> William
>

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: parallel computation

Reply via email to