On Fri, 11 Apr 2025 at 20:49, Shahriar Iravanian <irvanian1...@gmail.com> wrote:
>
> The latest version of symjit (1.5.0) has just been published. By now, the 
> Rust backend is stabilized and generates code on Linus/Darwin/Windows and 
> x86-64 and arm64 machines.

Wow this is amazing. I have been thinking for a long time that exactly
this is needed for precisely the reasons you show in the README but I
was working under the assumption that it would need something like
llvmlite. In protosym I added a lambdify function based on llvmlite
but symjit is just as fast without the massive llvmlite dependency and
can even be pure Python so super-portable.

I am amazed at how simple the symjit code seems to be for what it
achieves. Maybe these things are not as complicated as they seem if
you are just someone who just knows how to write machine code...

I have a prototype of how I wanted this to work for sympy in protosym:

https://github.com/oscarbenjamin/protosym

For comparison this is how protosym does it:

# pip install protosym llvmlite
from protosym.simplecas import x, y, cos, sin, lambdify, Matrix, Expr

e = x**2 + x
for _ in range(10):
    e = e**2 + e
ed = e.diff(x)

f = lambdify([x], ed)
print(f(.0001))

The expression here is converted to LLVM IR and compiled with
llvmlite. I'll show a simpler expression as a demonstration:

In [9]: print((sin(x)**2 + cos(x)).to_llvm_ir([x]))

; ModuleID = "mod1"
target triple = "unknown-unknown-unknown"
target datalayout = ""

declare double    @llvm.pow.f64(double %Val1, double %Val2)
declare double    @llvm.sin.f64(double %Val)
declare double    @llvm.cos.f64(double %Val)

define double @"jit_func1"(double %"x")
{
%".0" = call double @llvm.sin.f64(double %"x")
%".1" = call double @llvm.pow.f64(double %".0", double 0x4000000000000000)
%".2" = call double @llvm.cos.f64(double %"x")
%".3" = fadd double %".1", %".2"
ret double %".3"
}

For the particular benchmark ed shown above protosym is faster both at
compilation and evaluation:

This is protosym:

In [3]: %time f(0.001)
CPU times: user 37 μs, sys: 4 μs, total: 41 μs
Wall time: 51 μs
Out[3]: 1.0223342283660657

In [4]: %timeit f(0.001)
657 ns ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

This is the equivalent with symjit's compile_func:

In [3]: %time f(0.001)
CPU times: user 306 μs, sys: 8 μs, total: 314 μs
Wall time: 257 μs
Out[3]: array([0.00100401])

In [4]: %timeit f(0.001)
25.1 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I think the reason for the speed difference here is that protsym first
converts the expression into a forward graph much like sympy's cse
function which handles all the repeating subexpressions efficiently. I
think symjit generates the code recursively without handling the
repeating subexpressions. Also the number from symjit here is
incorrect as confirmed by using exact rational numbers:

In [6]: ed.subs({x: Rational(0.001)}).evalf()
Out[6]: 1.02233422836607

I'm not sure if that difference is to do with the forward graph being
more numerically accurate or is it a bug in symjit?

> Symjit also has a new plain Python-based backend, which depends only on the 
> Python standard library and numpy (the numpy dependency is not strictly 
> necessary) but can generate and run machine code routines. Currently, the 
> Python backend is used as a backup in cases where the compiled Rust code is 
> unavailable. However, it already works very well with a minimum performance 
> drop compared to the Rust backend.

Possibly the most useful thing to do is to publish this as two
separate packages like symjit and symjit-rust. Then anyone can pip
install symjit for any Python version without needing binaries on PyPI
or a Rust toolchain installed locally. The symjit-rust backend can be
an optional dependency that makes things faster if installed. It can
also be possible to have symjit depend conditionally on symjit-rust
but only for platforms where a binary is provided on PyPI. That way no
one ever ends up doing pip install symjit and having it fail due to
missing rust toolchain.

> I would like to have your suggestions and recommendations about the next 
> steps. I hope to add features that align with the maintainers' goals for 
> sympy. Some possibilities:
>
> 1. Expanding on the current focus on numerical computation and 
> numpy/scipy/matplotlib inter-operability, for example, adding other data 
> types besides double (single floats, complex numbers...).

I don't know how difficult it is to do these things but generally yes
those would be useful.

> 2. Fast polynomial evaluation, not only for floating point types, but also 
> over Z, Zp, and Q. The Python-only backend can be tightly coupled to the 
> polynomial subsystem. However, I don't know how useful having such a fast 
> polynomial evaluation function is, but, for example, it may be useful in the 
> combinatorial phase of the Zassenhaus algorithm. On the other hand, it seems 
> that sympy pivots toward using Flint for many such computations.

Generally SymPy is going to use FLINT for these things but FLINT is
only an optional dependency. Some downstream users may prefer not to
use FLINT because it has a different licence (LGPL) whereas symjit has
the MIT license that pairs better with SymPy's BSD license.

If symjit provided more general capability to just generate machine
code then I am sure that SymPy could make use of it for many of these
things. It would probably make more sense for the code that implements
those things to be in SymPy itself though with symjit as an optional
dependency that provides the code generation.

> 3. A different area would be the Satisfiability module, where writing fast 
> SAT/SMT solver, with or without interfacing with Z3 or other solvers, is 
> possible.

That would also be great but again I wonder if it makes sense to
include such specific things in symjit itself.

I think that what you have made here in symjit is something that
people will want to use more broadly than SymPy. Maybe the most useful
thing would be for symjit to focus on the core code generation and
execution as a primitive that other libraries can build on. In other
words the ideal thing here would be that symjit provides a general
interface so that e.g. sympy's lambdify function could use symjit to
generate the code that it wants rather than symjit providing a
compile_func function directly.

One downside of compile_func is precisely the fact that its input has
to be a sympy expression and just creating sympy expressions is slow.
This is something that we want to improve in sympy but realistically
the way to improve that is by using other types/representations like
symengine or protosym etc. I have some ideas for building new
representations of expressions so that many internal parts of sympy
could use those instead of the current slow expressions. Unfortunately
it is not going to be possible to make the user-facing sympy
expressions much faster unless at some point there is a significant
break in compatibility.

The ideal thing here would be for symjit to provide an interface that
can be used to generate the code without needing a SymPy expression as
input. For example how would protosym use symjit without needing to
create a SymPy expression?

I think that the reason that protosym is faster for the benchmark
shown above is because of the forward graph and so symjit could use
the same idea. What might be best though is to leave that sort of
thing for other libraries that would build on symjit and for symjit to
focus on being very good at providing comprehensive code generation
capabilities. I think that many Python libraries would want to build
on symjit to do all sorts of things because being able to generate
machine code directly like this is better in many ways than existing
approaches like llvmlite, numba, numexpr etc.

The thing that is nice about generating the LLVM IR as compared to
generating machine code directly is that it gives you unlimited
registers but then LLVM figures out how to use a finite number of
registers on the backend. This makes the IR particularly suitable for
dumping in the forward graph without needing to think about different
architectures. Can symjit's machine code builders achieve the same
sort of thing? It's not clear to me exactly how the registers are
being managed.

There is one important architecture for SymPy that symjit does not yet
generate code for which is wasm e.g. so you can run it in the browser:

https://live.sympy.org/

I don't know whether this sort of thing is even possible in wasm
though with its different memory safety rules.

Does symjit work with PyPy or GraalPython or can it only be for CPython?

--
Oscar

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sympy+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/sympy/CAHVvXxSb4wXbtEw%2BYqGSP%3DA%3DM1tTa_3NfpKwJ2UZywUKMKh6rQ%40mail.gmail.com.

Reply via email to