Re: [sympy] GSoC 2025: Seeking Feedback on Assumptions System Research & GSoC Project Choice

Aaron Meurer Sat, 05 Apr 2025 10:16:11 -0700

On Sat, Mar 15, 2025 at 10:50 PM Krishnav Bajoria <[email protected]>
wrote:


> Hello again SymPy community,
> First of all, I would like to thank Aaron and Tilo for their valuable
> comments and ideas. I have taken a note of them. For the past two weeks I
> have been studying through the entire assumptions system in SymPy and along
> with their ideas I have come up with some observations and my plan that I
> have for GSoC:
> 1) As pointed out earlier regarding the performance of assumptions I plan
> on making it faster and improving performance in the following ways:
> - As suggested by Tilo, I would look to cache results in ways like
> currently @memoize decorator is used for in AssumptionKeys class.
> - By rewriting existing code for handlers by trying to take a two-fold
> approach as suggested by Aaron for fast queries and slow queries.
> - I also think by improving abilities of refine, which currently has very
> less functionality added to it by improving that we can reduce number of
> calls when dealing with complex expressions.
> 2) By trying to improve the deductive capabilities and add functionalities
> related to quantifiers:
> - I have been researching upon some new algorithms - as Tilo told that he
> will be implementing the SMT in his spare time so I wouldn't be looking
> into it then and instead would focus on elimination techniques like
> Fourier-Motzkin elimination, Quantifier elimination, Tarski's algorithm.
> - I also think we should be adding implementation and support for
> quantifiers for more general reasoning capability of SymPy. I think we
> could proceed in a way that is similar along the lines of functions like
> allargs, anyargs,etc which is written in a declarative programming paradigm
> and as was also suggested by Tilo we should add more general rules along
> these lines. I think quantifiers implementation along these lines would be
> a good idea.
>

These are all good points of approach. My suggestion would be to focus more
on just one or two of these. Otherwise, you will never end up being able to
complete everything by the end of the summer, unless you only implement a
half-baked solution for each.

And for what it's worth, I think the "two-fold" idea is a lot harder than
it sounds. It's something we've known we've needed to do for a long time,
but it hasn't been really implemented. I don't want to discourage you if
that's the project you want to do, but just be aware that it's going to
require a good bit of prototyping to get a good design, and trial and
error. Personally, I would suggest this project more for someone who is
already fairly familing with the SymPy assumptions system(s) and the issues
with them. Really, ideally, those of us who are already familiar with it
should sit down and build a design for what this should look like so that
it could actually be feasibly implemented in a GSoC project, but that
hasn't happened yet.

And anyways, it's generally useful to improve the capabilities of the "new"
assumptions system without worrying about how to integrate it into the
core. We will be able to design a much better integration into the core
once we have a better idea of what the capabilities of the new system can
be and how fast they can be. So a project just focused on implementing some
algorithm is fine.

Aaron Meurer


> Till now I have come up with mainly these observations and ideas and would
> appreciate any guidance or comments on these ideas if they are relevant and
> how much of an impact will this create to SymPy as a whole not just their
> assumptions module. Any additional algorithm or idea that could be added in
> assumptions system would be gladly welcome and helpful for me too.
> Best Regards,
> Krishnav Bajoria.
>
>
> On Sat, Mar 8, 2025 at 11:13 PM Aaron Meurer <[email protected]> wrote:
>
>> On Fri, Mar 7, 2025 at 2:12 PM Tilo RC <[email protected]> wrote:
>> >
>> > > 1. Improving the performance of the underlying system. An approach
>> > here is to make better use of faster dependencies like Z3 as well as
>> > SAT solvers. This also requires rewriting the existing code to use a
>> > better approach that doesn't try to do too many things at once for
>> > simple queries.
>> >
>> > I don't think using faster dependencies will improve the speed. The
>> process of passing the data to C++ is not very fast as the python bindings
>> for Z3 are not very efficient which defeats the whole purpose. Maybe there
>> is some other library that has more optimized python bindings though. In
>> general, the SAT problems being solved are not very difficult so I don't
>> think a more advanced algorithm would be all that helpful.
>>
>> Yes, the C-based solvers are really only going to help for the most
>> complex queries. What we really need though is to make common queries
>> faster. I think this basically boils down to two things:
>>
>> - Make it simpler for the assumptions system to have "fast" queries
>> and "slow" queries and make the sorts of things that are called often
>> and need to be fast only use the fast queries.
>> - Avoid using assumptions at all in cases where they aren't needed.
>>
>> >
>> > Currently, a single query to ask results in a lot of calls to satask
>> due to all of the recursive calls made to the _eval_is_* handlers. So
>> either a) reducing the number of calls to satask or b) making calls to
>> satask, which are currently very expensive, more efficient would greatly
>> speed up the assumptions.
>> >
>> > I have a few ideas for how to do both of these things:
>> >
>> > a.1. Some form of caching can be implemented. Recursive calls to ask
>> end up repeating the same calls to ask many times, so caching would avoid
>> doing repeated work. It would also allow for some of the _eval_is_*
>> handlers to be redesigned with the knowledge that caching is implemented in
>> mind.
>> >
>> > a.2. Maybe replace the _eval_is_* handlers completely with sathandlers
>> rules. This would mean no more recursive calls at all so calls to ask are
>> guaranteed to only call satask once. I'm not sure if this would always be
>> possible for all of them though?
>> >
>> > b.1. I came up with an SMT algorithm to replace the rules generated by
>> facts.py which would greatly reduce the size of boolean satisfiability
>> problems. I think I might want to implement this myself though if I have
>> time. It may even be possible to come up with a more general algorithm that
>> could replace the sathandlers and the _eval_is_* handlers entirely.
>> >
>> > b.2. The two SAT queries made by satask are very similar. If SymPy had
>> an incremental sat solver, some repeated work could be avoided. See
>> https://github.com/sympy/sympy/issues/27477. It might also be possible
>> to check if the assumptions were inconsistent only once during a
>> preprocessing step and not repeat this every single time a recursive call
>> to ask is made.
>> >
>> > A third idea is to look into how Maple's assumption system works and
>> take ideas from there. In most respects it's a lot better than SymPy's.
>> There are several papers related to how it works:
>> > - https://link.springer.com/chapter/10.1007/3-540-57272-4_27
>> > - https://dl.acm.org/doi/pdf/10.1145/120694.120749
>> > There are also more papers that might be worth looking into in the list
>> of papers that cite those two. It's also possible to actually look at the
>> source code for Maple's assumption system, and even step through it using a
>> debugger. Of course, Maple is proprietary software, so I'm not sure to what
>> extent it would be legally okay to inspiration from it. You can get a free
>> trial for Maple for 14 days. One reason not to go down this path is that
>> Maple's assumption system works very differently than SymPy's in some ways,
>> so it might be a lot of work to switch and not necessarily better.
>> >
>> > > 3. Implement new algorithms that improve the deductive capabilities of
>> > the assumptions system. This is the sort of project that was done in
>> > 2023
>> https://github.com/sympy/sympy/wiki/GSoC-2023-Report-Tilo-Reneau%E2%80%90Cardoso:-Improving-Relational-Assumptions-in-SymPy%E2%80%99s-New-Assumptions
>> .
>> > There are a lot of potential directions here, but the thing that we
>> > need the most is further improvements to assumptions on inequalities.
>> >
>> > Currently, queries to ask that involve inequalities can only be solved
>> if all of the variables involved are assumed to be real using the old
>> assumptions. It would be nice to be able to handle complex queries in
>> general — I think it would even be possible to handle queries that made
>> assumptions about the imaginary component of variables. It would also be
>> nice to be able to handle queries that involved variables that might not be
>> finite, and queries with non-rational coefficients. I don't think
>> implementing all of those changes would be too difficult, and this might be
>> something I work on in the next few months.
>> >
>> > As part of a club at the college I go to, I've been trying to help some
>> of the members contribute to SymPy and an ambitious goal for the club is to
>> get them to implement the changes I described above. As there's only
>> limited time in the semester and I'm not sure how much time they're be
>> willing to invest in this, I'm not sure how much progress will get done. So
>> far it's taken a full three weeks just to get their first pull requests
>> opened two days ago. Hopefully progress will speed up.
>> >
>> > > 4. Rewrite the way assumptions handlers are written so that they use
>> > something simpler and less error prone than the current 3-valued logic
>> > _eval_is_* handlers. My idea is to use something based on pattern
>> > matching. I haven't really fleshed this idea out much yet, so if
>> > you're interested in it we'd need to work together to prototype what
>> > it would look like.
>> >
>> > It would be nice to be able to write them with a declarative paradigm.
>> In other words, rather than using for loops and if statements and so on, it
>> would be nice to be able to write rules in a form closer to actual
>> mathematical logic. I think the sathandlers currently do a good job of
>> this. For example, consider the sathandlers for sums:
>> >
>> > ### Add ##
>> >
>> > @class_fact_registry.multiregister(Add)
>> > def _(expr):
>> >     return [allargs(x, Q.positive(x), expr) >> Q.positive(expr),
>> >             allargs(x, Q.negative(x), expr) >> Q.negative(expr),
>> >             allargs(x, Q.real(x), expr) >> Q.real(expr),
>> >             allargs(x, Q.rational(x), expr) >> Q.rational(expr),
>> >             allargs(x, Q.integer(x), expr) >> Q.integer(expr),
>> >             exactlyonearg(x, ~Q.integer(x), expr) >> ~Q.integer(expr),
>> >             ]
>> >
>> > The single line "allargs(x, Q.positive(x), expr) >> Q.positive(expr)"
>> (see these lines of code) is mostly equivalent to the following (except
>> maybe the `_PositivePredicate_number` call):
>> >
>> > @PositivePredicate.register(Add)
>> > def _(expr, assumptions):
>> >     if expr.is_number:
>> >         return _PositivePredicate_number(expr, assumptions)
>> >
>> >     r = ask(Q.real(expr), assumptions)
>> >     if r is not True:
>> >         return r
>> >
>> >     nonneg = 0
>> >     for arg in expr.args:
>> >         if ask(Q.positive(arg), assumptions) is not True:
>> >             if ask(Q.negative(arg), assumptions) is False:
>> >                 nonneg += 1
>> >             else:
>> >                 break
>> >     else:
>> >         if nonneg < len(expr.args):
>> >             return True
>> >
>> > Actually, the handler above is more powerful because it understands
>> that if a sum contains some positive and some zero terms, it will be
>> postive while the sathandlers rule does not do this. But we could modify
>> the sathandlers rule so that it was not just as powerful, but even more
>> powerful:
>> >
>> > Equivalent(Q.positive(expr), anyarg(x, Q.positive(x), expr) & allarg(x,
>> Q.zero(x) | Q.positive(x), expr))
>> >
>> > Assuming expr is of type Add, the line of logic above means: expr is
>> positive if and only if at least one of its summands is positive and all of
>> its summands are either zero or positive.
>> >
>> > Anyway, the power of declarative programming is that you can specify
>> what you want a program to do without worrying about the how. This means
>> that you don't need nearly as many lines of code and the logic is a lot
>> more human readable and closer to actual mathematical logic so it's easier
>> to verify.
>>
>> You're right about all these things. But I just wanted to clarify that
>> for this point I was specifically talking about the old assumptions
>> handlers (the various _eval_is_* methods). These are the most
>> pervasive around SymPy and they are currently written with a 3-valued
>> logic that is difficult to do correctly. The problem with them is also
>> that they hard-code the computation of the assumptions, which limits
>> optimizations that can be done (currently only basic caching is
>> implemented).
>>
>> Aaron Meurer
>>
>> >
>> > Tilo Reneau-Cardoso
>> > On Monday, March 3, 2025 at 9:49:21 AM UTC-8 [email protected] wrote:
>> >>
>> >> It looks like you've been making good progress on your pull requests
>> >> to the assumptions system. There is a lot of work that needs to be
>> >> done in the assumptions system, so the challenge for GSoC is finding a
>> >> small piece of that work that can be targeted for a suitable summer
>> >> project. There's a few important areas that the assumptions need to be
>> >> improved in. These areas are not completely independent, but it should
>> >> be possible to make a project that just focuses on one of them.
>> >>
>> >> 1. Improving the performance of the underlying system. An approach
>> >> here is to make better use of faster dependencies like Z3 as well as
>> >> SAT solvers. This also requires rewriting the existing code to use a
>> >> better approach that doesn't try to do too many things at once for
>> >> simple queries.
>> >>
>> >> 2. Reduce the parts of SymPy that make unnecessary use of the
>> >> assumptions. This is particularly the case in the core, where many
>> >> automatic operations use assumptions. Ideally no *automatic* operation
>> >> should use assumptions. By automatic, I mean the things that happen
>> >> when an expression object is constructed, without any further user
>> >> intervention. Here's a prototypical example:
>> >>
>> >> In [1]: from sympy.functions.elementary._trigonometric_special import
>> cos_257
>> >>
>> >> In [2]: %time a = cos_257()
>> >> CPU times: user 8.52 s, sys: 314 ms, total: 8.83 s
>> >> Wall time: 8.82 s
>> >>
>> >> It takes over 8 seconds just to *create* this expression, without
>> >> doing anything with it. And it isn't a particularly large expression,
>> >> at least for SymPy. (note that this function is cached so you'll need
>> >> to restart Python each time you want to time the function). See
>> >> https://github.com/sympy/sympy/pull/24566.
>> >>
>> >> This is happening because the objects that are being created are
>> >> calling assumptions, which are doing relatively expensive numerical
>> >> calculations on the objects.
>> >>
>> >> 3. Implement new algorithms that improve the deductive capabilities of
>> >> the assumptions system. This is the sort of project that was done in
>> >> 2023
>> https://github.com/sympy/sympy/wiki/GSoC-2023-Report-Tilo-Reneau%E2%80%90Cardoso:-Improving-Relational-Assumptions-in-SymPy%E2%80%99s-New-Assumptions
>> .
>> >> There are a lot of potential directions here, but the thing that we
>> >> need the most is further improvements to assumptions on inequalities.
>> >>
>> >> 4. Rewrite the way assumptions handlers are written so that they use
>> >> something simpler and less error prone than the current 3-valued logic
>> >> _eval_is_* handlers. My idea is to use something based on pattern
>> >> matching. I haven't really fleshed this idea out much yet, so if
>> >> you're interested in it we'd need to work together to prototype what
>> >> it would look like.
>> >>
>> >> As far as the Risch algorithm, there's also work that can be done
>> >> there, although if you're interested in that I would suggest reading
>> >> up on the Risch algorithm. A project there would be much more
>> >> mathematically involved. My suggestion for starting there would be
>> >> starting with Manual Bronstein's "Symbolic Integration Tutorial"
>> >> (there are other good resources too, including Bronstein's book) and
>> >> in particular, try to understand how the rational integration
>> >> algorithm works (currently implemented in SymPy in ratint.py). That is
>> >> the most basic part of the algorithm, and should at least give you an
>> >> idea of what the mathematics for Risch looks like.
>> >>
>> >> Aaron Meurer
>> >>
>> >> On Sun, Feb 23, 2025 at 4:12 AM Krishnav Bajoria
>> >> <[email protected]> wrote:
>> >> >
>> >> > Hi SymPy Community,
>> >> >
>> >> > I’m Krishnav Bajoria, a third-year student at IIT BHU,Varanasi in
>> the department of Mathematics and Computing(GitHub handle -
>> krishnavbajoria02).I am proficient in Python and have had a fairly strong
>> mathematical background which was also the reason that led me to start
>> contributing in SymPy. In terms of my contributions to SymPy, I have had
>> three PRs merged—two related to assumptions and one in
>> codegen/utilities.lambdify —and a few more under review.(Link to the PRs:
>> https://github.com/sympy/sympy/pull/27511,https://github.com/sympy/sympy/pull/27512,https://github.com/sympy/sympy/pull/27552
>> )
>> >> >
>> >> > As I was exploring potential GSoC projects, I became particularly
>> interested in two from the ideas page:
>> >> >
>> >> > Improvements to the Assumptions System
>> >> > The Risch Integration Algorithm
>> >> >
>> >> > Since I’ve already worked a bit on assumptions, I’ve been delving
>> deeper into the discussions and implementation details to understand its
>> current state. I wanted to share what I’ve gathered so far and get feedback
>> on whether I’m on the right track, or if there are other key areas I should
>> focus on.
>> >> >
>> >> > Assumptions System: What I’ve Studied So Far
>> >> >
>> >> > Old vs. New Assumptions System:
>> >> >
>> >> > The transition from the old rule-based system,which attached
>> inferences directly to the objects as attributes, to the new
>> predicate-based system,which separates inferences from objects and
>> evaluates them dynamically, was meant to enhance flexibility. However,
>> there are still inconsistencies where the new system does not propagate
>> certain facts correctly.
>> >> >
>> >> > Key Issues and Challenges:
>> >> >
>> >> > Some assumptions are incorrectly inferred or not inferred at all due
>> to missing rules.
>> >> > Symbolic variables (e.g., x.is_real) sometimes fail to evaluate
>> correctly when additional constraints are introduced.
>> >> > Cases where the assumptions system contradicts itself in different
>> parts of SymPy.
>> >> > Performance concerns—some assumption queries take longer than
>> expected due to inference chains.
>> >> >
>> >> > Areas I’ve Been Exploring for Improvements:
>> >> >
>> >> > Understanding handlers in the new system and whether they need
>> restructuring.
>> >> > Reviewing past discussions on improving assumptions and identifying
>> still-relevant open issues.
>> >> > Exploring ways to improve inference accuracy, especially in edge
>> cases.
>> >> > Considering whether any aspects of the old systemcould be
>> reintroduced to address current limitations.
>> >> >
>> >> > I’d appreciate to hear from the community—does this list cover the
>> main areas that need attention? Are there any other fundamental issues I
>> should be looking at?
>> >> >
>> >> > Considering Risch Integration
>> >> >
>> >> > Apart from assumptions, I’ve also started reading about the Risch
>> integration algorithm. I haven’t explored it as deeply yet, but I find it
>> intriguing and would like to get insights on its feasibility as a GSoC
>> project.
>> >> >
>> >> > At this stage, I’m just considering both projects for discussion and
>> plan to finalize my choice based on the feedback and guidance from the
>> community.
>> >> >
>> >> > Thank you for your time and consideration. I look forward to your
>> response!
>> >> >
>> >> > Best Regards,
>> >> > Krishnav Bajoria.
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> Groups "sympy" group.
>> >> > To unsubscribe from this group and stop receiving emails from it,
>> send an email to [email protected].
>> >> > To view this discussion visit
>> https://groups.google.com/d/msgid/sympy/42483b67-c51d-442e-b245-61527ef09fdan%40googlegroups.com
>> .
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "sympy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>> > To view this discussion visit
>> https://groups.google.com/d/msgid/sympy/f093b5fb-53e9-4d2f-a769-b88d134760c0n%40googlegroups.com
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/sympy/CAKgW%3D6Kvw%2BPokLMg%3Dsti9tdROwQS-03n1bwtaiQK35ztkjVXDQ%40mail.gmail.com
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/sympy/CAF0TZutH-RUv6pbJY6AACqw%3DMmv5fiO1taKuzPiJbeRQRnmTaQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/sympy/CAF0TZutH-RUv6pbJY6AACqw%3DMmv5fiO1taKuzPiJbeRQRnmTaQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/sympy/CAKgW%3D6J1AiM0pEcqMf%2BeVRMiE0Q6-MY5NLDeDWaHdxNOiL8NiQ%40mail.gmail.com.

Re: [sympy] GSoC 2025: Seeking Feedback on Assumptions System Research & GSoC Project Choice

Reply via email to