On Fri, 11 Apr 2025 at 20:49, Shahriar Iravanian <irvanian1...@gmail.com> wrote: > > The latest version of symjit (1.5.0) has just been published. By now, the > Rust backend is stabilized and generates code on Linus/Darwin/Windows and > x86-64 and arm64 machines.
Wow this is amazing. I have been thinking for a long time that exactly this is needed for precisely the reasons you show in the README but I was working under the assumption that it would need something like llvmlite. In protosym I added a lambdify function based on llvmlite but symjit is just as fast without the massive llvmlite dependency and can even be pure Python so super-portable. I am amazed at how simple the symjit code seems to be for what it achieves. Maybe these things are not as complicated as they seem if you are just someone who just knows how to write machine code... I have a prototype of how I wanted this to work for sympy in protosym: https://github.com/oscarbenjamin/protosym For comparison this is how protosym does it: # pip install protosym llvmlite from protosym.simplecas import x, y, cos, sin, lambdify, Matrix, Expr e = x**2 + x for _ in range(10): e = e**2 + e ed = e.diff(x) f = lambdify([x], ed) print(f(.0001)) The expression here is converted to LLVM IR and compiled with llvmlite. I'll show a simpler expression as a demonstration: In [9]: print((sin(x)**2 + cos(x)).to_llvm_ir([x])) ; ModuleID = "mod1" target triple = "unknown-unknown-unknown" target datalayout = "" declare double @llvm.pow.f64(double %Val1, double %Val2) declare double @llvm.sin.f64(double %Val) declare double @llvm.cos.f64(double %Val) define double @"jit_func1"(double %"x") { %".0" = call double @llvm.sin.f64(double %"x") %".1" = call double @llvm.pow.f64(double %".0", double 0x4000000000000000) %".2" = call double @llvm.cos.f64(double %"x") %".3" = fadd double %".1", %".2" ret double %".3" } For the particular benchmark ed shown above protosym is faster both at compilation and evaluation: This is protosym: In [3]: %time f(0.001) CPU times: user 37 μs, sys: 4 μs, total: 41 μs Wall time: 51 μs Out[3]: 1.0223342283660657 In [4]: %timeit f(0.001) 657 ns ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) This is the equivalent with symjit's compile_func: In [3]: %time f(0.001) CPU times: user 306 μs, sys: 8 μs, total: 314 μs Wall time: 257 μs Out[3]: array([0.00100401]) In [4]: %timeit f(0.001) 25.1 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) I think the reason for the speed difference here is that protsym first converts the expression into a forward graph much like sympy's cse function which handles all the repeating subexpressions efficiently. I think symjit generates the code recursively without handling the repeating subexpressions. Also the number from symjit here is incorrect as confirmed by using exact rational numbers: In [6]: ed.subs({x: Rational(0.001)}).evalf() Out[6]: 1.02233422836607 I'm not sure if that difference is to do with the forward graph being more numerically accurate or is it a bug in symjit? > Symjit also has a new plain Python-based backend, which depends only on the > Python standard library and numpy (the numpy dependency is not strictly > necessary) but can generate and run machine code routines. Currently, the > Python backend is used as a backup in cases where the compiled Rust code is > unavailable. However, it already works very well with a minimum performance > drop compared to the Rust backend. Possibly the most useful thing to do is to publish this as two separate packages like symjit and symjit-rust. Then anyone can pip install symjit for any Python version without needing binaries on PyPI or a Rust toolchain installed locally. The symjit-rust backend can be an optional dependency that makes things faster if installed. It can also be possible to have symjit depend conditionally on symjit-rust but only for platforms where a binary is provided on PyPI. That way no one ever ends up doing pip install symjit and having it fail due to missing rust toolchain. > I would like to have your suggestions and recommendations about the next > steps. I hope to add features that align with the maintainers' goals for > sympy. Some possibilities: > > 1. Expanding on the current focus on numerical computation and > numpy/scipy/matplotlib inter-operability, for example, adding other data > types besides double (single floats, complex numbers...). I don't know how difficult it is to do these things but generally yes those would be useful. > 2. Fast polynomial evaluation, not only for floating point types, but also > over Z, Zp, and Q. The Python-only backend can be tightly coupled to the > polynomial subsystem. However, I don't know how useful having such a fast > polynomial evaluation function is, but, for example, it may be useful in the > combinatorial phase of the Zassenhaus algorithm. On the other hand, it seems > that sympy pivots toward using Flint for many such computations. Generally SymPy is going to use FLINT for these things but FLINT is only an optional dependency. Some downstream users may prefer not to use FLINT because it has a different licence (LGPL) whereas symjit has the MIT license that pairs better with SymPy's BSD license. If symjit provided more general capability to just generate machine code then I am sure that SymPy could make use of it for many of these things. It would probably make more sense for the code that implements those things to be in SymPy itself though with symjit as an optional dependency that provides the code generation. > 3. A different area would be the Satisfiability module, where writing fast > SAT/SMT solver, with or without interfacing with Z3 or other solvers, is > possible. That would also be great but again I wonder if it makes sense to include such specific things in symjit itself. I think that what you have made here in symjit is something that people will want to use more broadly than SymPy. Maybe the most useful thing would be for symjit to focus on the core code generation and execution as a primitive that other libraries can build on. In other words the ideal thing here would be that symjit provides a general interface so that e.g. sympy's lambdify function could use symjit to generate the code that it wants rather than symjit providing a compile_func function directly. One downside of compile_func is precisely the fact that its input has to be a sympy expression and just creating sympy expressions is slow. This is something that we want to improve in sympy but realistically the way to improve that is by using other types/representations like symengine or protosym etc. I have some ideas for building new representations of expressions so that many internal parts of sympy could use those instead of the current slow expressions. Unfortunately it is not going to be possible to make the user-facing sympy expressions much faster unless at some point there is a significant break in compatibility. The ideal thing here would be for symjit to provide an interface that can be used to generate the code without needing a SymPy expression as input. For example how would protosym use symjit without needing to create a SymPy expression? I think that the reason that protosym is faster for the benchmark shown above is because of the forward graph and so symjit could use the same idea. What might be best though is to leave that sort of thing for other libraries that would build on symjit and for symjit to focus on being very good at providing comprehensive code generation capabilities. I think that many Python libraries would want to build on symjit to do all sorts of things because being able to generate machine code directly like this is better in many ways than existing approaches like llvmlite, numba, numexpr etc. The thing that is nice about generating the LLVM IR as compared to generating machine code directly is that it gives you unlimited registers but then LLVM figures out how to use a finite number of registers on the backend. This makes the IR particularly suitable for dumping in the forward graph without needing to think about different architectures. Can symjit's machine code builders achieve the same sort of thing? It's not clear to me exactly how the registers are being managed. There is one important architecture for SymPy that symjit does not yet generate code for which is wasm e.g. so you can run it in the browser: https://live.sympy.org/ I don't know whether this sort of thing is even possible in wasm though with its different memory safety rules. Does symjit work with PyPy or GraalPython or can it only be for CPython? -- Oscar -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sympy+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/sympy/CAHVvXxSb4wXbtEw%2BYqGSP%3DA%3DM1tTa_3NfpKwJ2UZywUKMKh6rQ%40mail.gmail.com.