[sage-devel] Re: parallel computation

Mark V Tue, 20 May 2008 12:44:25 -0700

On Tue, May 20, 2008 at 3:48 PM, Mark V <[EMAIL PROTECTED]> wrote:
> Hi William,
> Thanks for taking time to respond so promptly.
>
> On Tue, May 20, 2008 at 1:17 PM, William Stein <[EMAIL PROTECTED]> wrote:
>> On Mon, May 19, 2008 at 6:57 PM, MarkV <[EMAIL PROTECTED]> wrote:
>>> Q1) Is the focus on performing symbolic calculations in parallel, or
>>> are numerically oriented calculations currently being considered?
>>
>> Both are equally important and relevant to Sage.
>>
>
> Great.
>
>>>> Yes.   But this won't happen on a grand scale until at least a dozen
>>>> sample applications get written to use parallelism (not re-written --
>>>> all sequential algorithms will stay), and we learn from the outcome
>>>> of this.
>>>
>>> Q2)  Have the MPI based PETSC and TAO libraries being considered for
>>> inclusion?
>>
>> No, they have never been considered.  Just shipping MPI itself
>> (a dependency) would significantly complicate Sage.   We do make
>
> Yes my impression is to build MPI in would take some effort.
> I don't underestimate that effort nor the competing demands for
> (scarce) dev resources.
>
>> mpi4py and openmpi available as optional packages.
>
> Thanks I'll look into those.
>
>>
>>>  Considered and rejected?
>>>
>>> PETSC: http://www-unix.mcs.anl.gov/petsc/petsc-as/  (and
>>> http://acts.nersc.gov/petsc/)
>>> TAO: http://www-unix.mcs.anl.gov/tao/
>>>
>>> Rather than being an extra wheel on the SAGE 'car' they might
>>> represent the addition of a couple of extra cylinders to the SAGE car
>>> engine :)
>>>
>>>> > * Does anyone know of computer algebra systems that use that take
>>>> > advantage of more thant 1 CPU?
>>>>
>>>> Maple, Mathematica, MATLAB, andParallelGap.  But with the first
>>>> three who knows how or what they do.  The last is dated.
>>>
>>> Wolfram has gridMathematica and its personal grid edition, but I've
>>> not used them - for reasons I mention below in some observations.
>>> I was never certain what this meant for their symbolic calculations.
>>>
>>>> > * How much of a issue is thread-safety in current libraries that SAGE
>>>> > uses? Are there ways around it?
>>>>
>>>> Python is not thread safe.   This stops a number of (probably bad,
>>>> in retrospect!) ideas in their tracks.
>>>>
>>>> I doubt threaded techniques are going to be used much, if at all,
>>>> in the core SAGE library.  Probably IPython will be used a lot
>>>> in the core library, and dsage will be used a lot by end users.
>>>>
>>>
>>> Some observations.
>>> Disclaimer.  These observations relate to a particular use case and
>>> aren't general.  Nothing mentioned here is intended to criticize or
>>> invalidate the SAGE effort.
>>>
>>>  - With parallel programming it seems that a good (any?) parallel
>>> debugger becomes essential.  I found things getting very tricky when
>>> writing for MPI.
>>
>> What do you mean by a good parallel debugger?  Are there some
>> examples you have in mind?
>
> Entus TotalView was the only one I saw that looked like it would do
> the job, but then it was USD 3k for a license, annual from memory.
> Subsequently I think it has been taken over by intel - I haven't been
> following that closely since I'm not likely to afford such fees -
> getting a MMA license was a big deal :)
>
> As far as I know the Eclipse PTP is the only opensource parallel
> coding IDE effort.
> Not having looked for a while this might have changed?
>
>>
>> Writing good parallel code is usually not easy. (It depends on the
>> application though.  Sometimes it easy.)
>>
>
> Yep, it gave me a bloody nose and put me on the canvas on a few occasions.
>
>>>  - The SAGE notebook interface is an excellent idea, it seems similar
>>> to a MMA notebook in some respects. I wonder if it won't, for the same
>>> reasons, become the case that SAGE users come to need something like
>>> the Wolfram workbench?  I found that application (Eclipse based) to be
>>> very useful in writing some MMA scripts.  I guess this might be the
>>> case for some SAGE (non-math research) users.
>>
>> I think there are several IDE for doing Python development.  With
>> a little work one could use any of them for Sage, since after all
>> Sage can be viewed as just another Python library.
>
> Yes that is true.
>
>>
>>>  - Putting the above two observations together it seemed that a SAGE
>>> IDE might lie ahead.  Continuing the 'build a car, not another wheel'
>>> philosophy, I thought it might be worth pointing out that the Eclipse
>>> project has a Parallel tools project, as well as a dynamic languages
>>> toolkit and Pydev.  Tying these together as a SAGE workbench might be
>>> something the Eclipse foundation, or one of its members, might be
>>> interested in as sponsor for a SAGE IDE effort? (Disclaimer I use
>>> Netbeans (primarily) _and_ Eclipse)
>>> Eclipse PTP: http://www.eclipse.org/ptp/
>>> Eclipse DLTK: http://www.eclipse.org/dltk/
>>> Pydev: http://pydev.sourceforge.net/
>>
>> The only possible situation in which I could imagine there being a SAGE
>> IDE would be something web-based.  If it isn't web based, it's pointless -- 
>> just
>
> Fair enough.
>
>> use Pydev/Eclipse, as you suggest above, or one of the many many other
>> non-web-based Python IDE's such as WingIDE, PyDev, Eric3, Boa,
>> BlackAdder, or Komodo.
>
> Yes there is probably nothing preventing anyone using PTP and Pydev,
> etc. by switching perspectives.
>
>> Right now a lot of people already use the Sage notebook as part of their
>> workflow from idea to peer reviewed code accepted into Sage.
>
> Yes notebooks are great.  I just found, using Mathematica, that you
> can end up writing a 'system' where you need to debug.
> By using Python you probably have covered cases that led to the
> Wolfram workbench.
>
>>>  - MPI vs threading.  This is very application specific.  One
>>> potential use of SAGE is to do very sophisticated calculations on very
>>> large data sets. I don't wish to depreciate the importance of 'best-
>>> speed' implementations, their benefits accrue if uses one or more
>>> cpu's.  Nonetheless I have found that there quickly comes a point
>>> where data related resources demands, latencies and 'issues' can
>>> dominate calculation time.  In these cases an MPI based algorithm that
>>> I can throw 100 computer's cpus, memory, hard-disk and network
>>> bandwidths dominates/beats a single computer (multiple core) multiple
>>> threaded implementation.  I'd be very happy with single threaded but
>>> genuinely (beyond one machine) scalable algorithms.
>>
>> To be honest, I don't know anybody who uses Sage who actually programs
>> directly with MPI.  For parallel computation, people use IPython1, DSage,
>> or the new simple and robust parallel job queue system that Gary
>> Furnish recently implemented (which we're now using a *lot* for automated
>> testing and building of Sage).
>
> Thanks I'll look into those.
>
>>
>>>  - Please think of a data point as a symbol in a computer algebra
>>> system :)
>>
>> I have no clue what that means.  I don't know what a "symbol in a computer
>> algebra system" means.  Do you mean an indeterminate like x (e.g., a 
>> polynomial
>> ring variable)?   Or a symbolic variable x like in "sin(x^2 + 1)".
>
> The x in "sin(x^2+1)"
>
>>
>>> That is, if at all possible, treat data points as first
>>> class citizens. My MMA gripe and the reason for not using
>>> gridMathematica is that it is atrocious at handling data.  I quickly
>>> realized that writing to use gridMMA with my data would be more
>>> painful and take longer than if I switched to learn R and used Condor
>>> - calling MMA only when desperately in need of some arbitrary
>>> precision calculation.
>>
>> This comment would be vastly more helpful if you could formulate
>> what about MMA makes it atrocious at handling data.  I could imagine
>
> MMA is great in that 'everything' is a list, but this was an
> uncomfortable match/mapping when trying to use a database table (say
> xGB+).
> I also found parsing incoming data, and manipulating data could be
> tricky, not impossible but not natural, compared to R say.
> Another big issue was MMA's caching of results.  This is great for
> speeding up calculations that might be repeated, but killed when it
> came to data analysis.
>
> Python probably has the data base angle covered with some Object
> Relational Mapping library - I'm coming from a Ruby background so
> haven't used Python previously, but I hear they are similar.
> I don't know if SAGE caches results, but this was a real pain point in
> my use of MMA.
>
>> many ways in which I would find it atrocious at handling specific
>> data, but I don't know what you're thinking of.
>>
>>>  - If you aren't already familiar with Amazon's web services I'd
>>> strongly encourage taking an hour or two to explore them.
>>> Specifically the Amazon machine images.  To my mind this is a seismic
>>> shift in the availability of computing power, and can change you
>>> mindset when thinking about SAGE applications. Given that 1TB will
>>> become available on demand I do think the 'sophisticated calculation
>>> on massive datasets' will become a much more common use case.  It will
>>> also shift what you consider to be a 'standard' use case - Example:
>>> For USD 80.00 I can employ 100 machines each with 8 cpu's for one
>>> hour, i.e. 800 cpu's, and +100x8GB memory.  For some more $'s each
>>> instance could have up to 1TB (or probably higher in a year or so) of
>>> storage :)
>>> Currently I use an Amazon machine image as a (Gnome) desktop machine -
>>> I know one user considered using an 8-cpu instance to compile their
>>> code more rapidly and this might help some SAGE devs?
>>
>> Cool. Some of us Sage devs are at universities with
>> access to supercomputers.  These are no good for interactive work,
>> but are great for some things.
>>
>
> Yep I've dealt with them (supercomputers), and written my own
> screen-saver based app for use in student labs (desktops).  Amazon
> machine instances are _close_ to offering the best of both
> super-computer and desktop worlds, i.e. both worlds.
> With the benefit that you only pay for what you use, no queues, no
> applications, no sysadmin defining what you can and can't install,
> where, how and when you install it.   All for $0.10/cpu/hr!
> The RightScale guys describe firing up 500 instances for a
> calculation, there are also cases of people starting 3400 machines on
> demand.  It wasn't clear if these were 1, 4 or 8 cpu instances,
> running independently or dependently.
> If your desktop is connected to that size of MPI-pool things start to
> look interesting :)  Obviously there will be bottlenecks so it won't
> be suitable for all and sundry uses.
>
> Cheers
> Mark
>> William
>>


Hi Devs,

I suppose the issues below could be described as related to the question:
Is SAGE being written as a desktop application, or being written as a
grid/cloud application?

>From what I've gleaned so far, and I've largely compiled these points
from snippets from above, current SAGE parallel plans are:

 - rewrite basic algorithms so that they are parallel-friendly after
some sample applications get written to use parallelism.  In the mean
time all sequential algorithms stay.
- shipping MPI itself (a dependency) would significantly complicate
Sage so use of MPI packages is currently optional.

I do appreciate the "If it isn't web based, it's pointless"
philosophy, but it does seem inconsistent with a view that integrating
MPI adds nuisance value.

It does seem ironic to insist that building on distributed technology
for the _interface_ to a application has primary importance (a very
sensible approach). Yet attach secondary importance to using
distributed technology for the application itself.

I'm also not convinced that a non-web browser application is not a
suitable platform for grid/cloud applications - this seems to be an
issue of the protocol used to communicate.  The Eclipse, and other IDE
projects, seem to handle web technologies and debug MPIs distributed
communication quite fine.

If the goal is to beat the 4M's then I'd argue that by choosing a
grid/cloud chassis (to continue the SAGE analogy of building a car and
not reinventing components), SAGE would be guaranteed to eventually
achieve that goal, if not immediately.  Assessing performance against
the 4M's then might have to be different since one is comparing a
desktop cemtric app against a grid/cloud centric application.
Examples, the startup time would typically be larger and the
distribution of calcs carries some overhead - one _might_ have to
settle for "Sage is not worse when using a 2 cpus or cores on a local
machine".

>From what I've seen of gridMathematica it is difficult/awkward to take
a desktop oriented chassis and build it into a truly
parallel/grid/cloud application.
A grid chasis for SAGE would, it seems to me, build on MPI and no
doubt Globus infrastructure/technology - it does sound like lots of
wheel reinventing lies ahead if the results of the MPI and Globus
efforts aren't exploited/integrated from the ground up.  I would be
surprised if the Globus group/participants were totally disinterested
in assisting such an effort.

Of course all this is moot if the goal is a desktop application that
has some parallel compute options.

My 2c :)

Cheers
Mark
>

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Re: parallel computation

Reply via email to