[all][nabla] proposition for a new project in sandbox

Luc Maisonobe Sun, 13 Apr 2008 12:23:33 -0700

Hello,

I have played with an idea for a new project for a few months. Askingfor a few advices both at the ApacheCon Europe and by direct contact,all responses I received were quite positive and suggested me to set upa component in sandbox. This message is the first public announcementand is intended to collect the opinion of all the commons communityabout this project. In short: can I play in sandbox with this or shouldI find another place for it ? Another possibility would be to put itinside [math], but that would be really strange.

The project already has a name: Nabla, which is an operator used inmathematics and physics for differentiation. It is a simple trianglepointing downwards (see http://mathworld.wolfram.com/Nabla.html). Letscall the component I want to develop [nabla] from now, to match ourlocal habits here. There is some code for it, but only developed bymyself on my spare time with my personal computer and never distributedto anyone. So I can consider I developed it under Apache umbrella andput it on sandbox with the Apache headers and license. I am already acommons committer and have filed an Individual Contributor LicenseAgreement to Apache.

[nabla] will be a mathematics/physics library aimed at building thesymbolic differentiation of any function provided as a bytecode compiledfunction.

Here is a typical use case for such a library. For some simulationpurposes, suppose I use a class with a method computing the consumptionof performing an action as a function of its start time:


public class DifficultComputation {
  public double f(double t) {
    // some lengthy equations here
  }
}

Now in addition to computing the consumption by itself, I want to beable to compute the sensitivity of this consumption to start timechanges. This would allow me to say: if action is started at t = 10seconds, then consumption will be 1.2 kilograms, and this consumptionwill increase by 10 grams for each second I delay the start. The value10 grams per second of delay is computing by differentiating theoriginal equation. There are several ways to do that.

The first way relies on by mathematical transformations on the equationsimplemented in the function f. It it implies mathematical analysis andnew development which is very error-prone (computing the differential ofa function is much more complex than computing the function itself). Itis only feasible if you know the equations or have the source code ofthe function. This approach may be used with symbolic computationpackages like Mathematica, Axiom where you develop your equations usingthese programs, and have them generate the implementation for you.However, the produce code is only for some languages (typically fortranand C), it is awful and cannot be maintained (it is not intended to be),and needs to be integrated with the rest of the application which isalready a difficult task.

The second way is using numerical finite-differences schemes. Thesealgorithms basically compute several values by changing the start timeby a small known amount and looking at the various results. This impliessetting up the step, which may be difficult if you don't already knowthe behavior of the function (should I use one microsecond or onecentury here, in fact it depends on the problem). This is also eitherquite computation intensive if you use high order schemes with 4, 6 or 8points or inaccurate if you don't use them. It is also impossible to usetoo close to functions boundaries which are often locations were wereally want to explore.

[nabla] provides a third way to get this result. It analyses thebytecode of the function at run time, performs the exact symbolicmathematical transforms, and generates a new class implementing thedifferentiated function. There is still a computation cost, but it isthe same you would get from a manually differentiated code, plus a onetime bytecode differentiation overhead (but we can also cache results).


This approach has the following benefits:
 - derivation is exact
 - there are no problem-dependent step size to handle
 - derivation can be computed even at domains boundaries
 - there is no special handling of source
   (no symbolic package with its own language, no source code
    generation, no integration with the rest of application)
 - one writes and maintains only the basic equation and get the
   derivative for free
 - it is effective even when source code is not available (but there
   are licensing issues in this case of course, since what I do
   automatically is really ... derived work)

The only drawback I see is that functions calling native code cannot behandled. In this case, we have a fallback available withfinite-differences schemes.

The existing implementation is not yet ready for production. A lot ofwork has been done, but there are many missing features. [nabla] canhandle simple functions from end to end (i.e. up to creating an instanceof the differentiated class that is fully functional). Making this codeavailable in the sandbox would allow to let people look at it, commenton it, participate if they are interested and make it go live.


What do you think about it ?
Luc


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[all][nabla] proposition for a new project in sandbox

Reply via email to