Re: [sympy] symjit

Oscar Benjamin Sat, 12 Apr 2025 12:59:43 -0700

On Sat, 12 Apr 2025 at 18:01, Shahriar Iravanian <[email protected]> wrote:
>
> Regarding the example, this is a tough test!


It is and in some ways it is not realistic but actually in some ways
it is. A common case will certainly be small expressions e.g. for
simple ODEs as you show in the README. Another case though is large
uncanonicalised expressions that result from things like solving a
system of linear equations or differentiating large expressions etc.
In these cases it is very common that large expressions will have many
repeating subexpressions and it is important to have an implementation
that can handle that.

In my example if N is the number of iterations in the loop then the
tree for e grows exponentially like O(2^N) whereas the DAG grows
linearly i.e. O(N). The differentiation to make ed at the end explodes
this even further. You can see the sizes of these from protosym for
N=10:

In [5]: e.count_ops_graph()
Out[5]: 24

In [6]: e.count_ops_tree()
Out[6]: 8189

In [7]: ed.count_ops_graph()
Out[7]: 68

In [8]: ed.count_ops_tree()
Out[8]: 55291

Those numbers correspond to the number of instructions in the IR if
repeating subexpressions are handled or not. For ed the difference is
a factor of 1000 but if you increase N it can be arbitrarily large.

Usually more realistic expressions will not be as extreme as this but
this sort of effect is very often there so if we want to evaluate
large expressions numerically it is generally important to take it
into account somehow.

> It shows that there is a bug in the x64-86 rust backend. Interestingly, the 
> Python backend and the rust one on ARM64 (MacOS) give the correct answer:
>
> e = x**2 + x
> for _ in range(10):
>     e = e**2 + e
> ed = e.diff(x)
> f = compile_func([x], [ed], backend='python')
>
> In [23]: %time f(0.001)
> CPU times: user 94 μs, sys: 9 μs, total: 103 μs
> Wall time: 105 μs
> Out[23]: array([1.02233423])

Sorry I should have said that I was testing on Linux x86-64. I confirm
that the Python backend gives the correct answer:

In [7]: from symjit import compile_func

In [8]: f = compile_func([x], [ed], backend='python')

In [9]: f(0.001)
Out[9]: array([1.02233423])

In [10]: f = compile_func([x], [ed])
In [11]: f(0.001)
Out[11]: array([0.00100401])

> However, the bigger question is where symjit fits in the Python/sympy 
> ecosystem. It is lightweight because sympy expressions act as an intermediate 
> representation. LLVM and other compilers do a lot of work to recreate the 
> control flow graph (first as a tree and later on as a DAG) from a linear 
> sequence of instructions. Symjit doesn't do this because it starts from a 
> tree representation. Of course, as you mentioned, the downside is that 
> generating sympy expressions can be computationally expensive. I don't know 
> what the right abstraction is. Symjit already converts sympy expressions into 
> its internal tree structure (with nodes like Unary and Binary). We could 
> expose this structure to the users. Moreover, it is possible to augment the 
> tree structure by adding loops, aggregate functions, and various functional 
> accessories to allow for more complex programs. However, this interface will 
> be the key and must be designed carefully.

Is it possible to have an interface that builds instructions one at a
time? You could imagine something like we have the expression
sin(x**2) + x**2 and we can compile this with something like:

from symjit import FuncBuilder

B, [x] = FuncBuilder('x')
a = B.pow(x, 2)
b = B.sin(a)
c = B.add(a, b)
f = B.compile(c)

print(f(1.0))

In my mind each step here appends a new operation with a new
destination operand. The intermediate variables a, b and c refer to
the output of a particular step rather than being trees. My hope would
be to get something equivalent to this LLVM IR which reuses %".0" to
avoid computing x**2 twice:

In [13]: print((sin(x**2) + x**2).to_llvm_ir([x]))
...
define double @"jit_func1"(double %"x")
{
%".0" = call double @llvm.pow.f64(double %"x", double 0x4000000000000000)
%".1" = call double @llvm.sin.f64(double %".0")
%".2" = fadd double %".1", %".0"
ret double %".2"
}

--
Oscar

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/sympy/CAHVvXxRzJMPsA9JtiF593q8rJLQN7mwbS%2Buv6Ts15jCo2s%2BD1A%40mail.gmail.com.

Re: [sympy] symjit

Reply via email to