Yes, opening a PR would be a good idea. It will be easier to discuss these ideas on a PR.
Aaron Meurer On Sunday, March 9, 2025 at 12:56:46 AM UTC-7 [email protected] wrote: > So my next steps should be: > - Trying to test other aspects of factorint() the one mentioned above. > - Learning and using strategies for generating "interesting" integers in > case of factorint() > - Run the hypothesis in verbose mode for more information on generated > values. > > Should I open a PR for hypothesis testing of factorint()? In that way, we > can track progress. > I also discovered another function that can be tested: digits() > <https://github.com/sympy/sympy/blob/b836671fe8459e9301c620117b660c6c8ca20264/sympy/ntheory/digits.py#L7>. > > A simple example: digits(2345, 34) == [34, 2, 0, 33] can be easily tested > by generating N,n (N is the number and n is the base), then calculating > accordingly to check the assertion. This can also benefit from hypothesis > IMO. Let me know what you think. > > On Sun, 9 Mar 2025 at 02:50, Aaron Meurer <[email protected]> wrote: > >> Yes, factorint is a better example of something that can be tested >> with hypothesis. It's the example I gave on the issue >> https://github.com/sympy/sympy/issues/20914. >> >> It's also a good example of how we can start with something simple and >> built out a more rigorous test. >> >> There's other properties that could be added to the test as well, for >> instance >> >> assert isprime(prime) >> assert exp >= 1 >> assert isinstance(prime, int) >> assert isinstance(exp, int) >> >> And we can also test the various flags to factorint. >> >> As for the existing test, for now, we should generally leave any >> existing manual tests intact. Hypothesis should be treated as an >> extension to manual testing, not a complete replacement. For instance, >> some of the assertions in that test you showed are based on specific >> inputs that are known to potentially cause issues. Hypothesis might >> not necessarily generate an example like them. Plus, you'll notice >> that that test is marked as @slow, meaning some of the numbers being >> tested are too slow compared to the inputs we might want to generate >> from hypothesis. >> >> This is actually one thing that will need to be considered in this >> project. Hypothesis tries to always generate "interesting" examples in >> its strategies, in addition to random ones. But what hypothesis >> considers "interesting" is based on some heuristics that apply to a >> broad category of programming. For instance, the "interesting" >> integers from st.integers() are things like -1, 0, 1, etc. These are >> important to test, but for factorint, we also want to make sure we >> test "interesting" integers in terms of their prime factorizations. >> This might mean numbers that have both small and large prime factors, >> numbers that have many prime factors, and numbers that have very few >> prime factors, numbers with factors that are interesting corner cases >> in terms of the specific algorithms that are implemented, etc.. Some >> of these are not distributed very well on the number line, so we might >> have to create a custom strategy that generates them with higher >> likelihood. Otherwise, they would basically never be chosen at random. >> >> Hypothesis also limits the size of the maximum integer generated by >> integers() (probably to something like 2**64). But factorint can >> handle numbers much larger than that. Creating custom input strategies >> is going to be a big part of this project, so it's something you >> should be thinking about, and learn how to do (it also can be one of >> the more challenging parts of using hypothesis effectively). As a >> start, I would learn how to run hypothesis in verbose mode, so that >> you can see the actual inputs it is generating, then to take a look at >> those inputs and try to see if they actually cover all the important >> cases for the given function. >> >> The code for factorint is very complex, and testing it rigorously >> requires testing a lot of different kinds of corner cases. Hypothesis >> is very good at this sort of thing, but it wasn't built with these >> specific types of corner cases in mind, so it will need some help to >> get there. >> >> Aaron Meurer >> >> On Sat, Mar 8, 2025 at 1:02 PM Pradyot Ranjan <[email protected]> >> wrote: >> > >> > That makes a lot more sense. Thanks! >> > This would be a better test then, I guess: >> > I tried hypothesis testing of factorint(). This is what my test method >> looks like: >> > >> > @given(n=st.integers()) >> > def test_factorint(n): >> > factors = factorint(n) >> > product = 1 >> > for prime, exp in factors.items(): >> > product *= prime ** exp >> > assert product == n >> > >> > >> > Test runs for all positive and negative integers. I can extend this to >> test for kwargs as well. This will eliminate a lot of assert statements >> here. This test also doesn't take any significant amount of time. >> > >> > On Sat, 8 Mar 2025 at 23:04, Aaron Meurer <[email protected]> wrote: >> >> >> >> On Sat, Mar 8, 2025 at 2:33 AM Pradyot Ranjan <[email protected]> >> wrote: >> >> > >> >> > I tried using hypothesis to test for prime. The function returns nth >> prime number, and I tried generating nth prime myself and checked both >> (here is given by hypothesis). The test passes but the only problem is it >> takes painfully long to test. I tried limiting n value to 100,000 and it >> still takes around 40s. We can test composite and other related functions >> similarly. We can mark these tests as "slow" and run them separately if >> this is the approach we are looking for. >> >> >> >> This isn't really the right way to use hypothesis in this context. I'm >> >> assuming this is slow because your prime generating test function is >> >> slow. But what's to say that function is even correct? At best you >> >> could have an obviously correct function that is very slow. Or you'll >> >> just be reimplementing the function that's already in sympy, which is >> >> pointless for a test. >> >> >> >> For hypothesis, you should think about properties that a function >> >> should have and test those. For prime generation, you can check that >> >> the output is prime using isprime(). Testing that the nth prime is >> >> actually the nth prime is difficult without actually generating all n >> >> primes. prime() basically already does this itself internally, so >> >> that's not really a point to doing this in a test. You could test some >> >> mathematical bounds. Personally, though, I would focus on some other >> >> functions which have more easy to test properties. Not every function >> >> in SymPy is easy to property test, because not every function has >> >> straightforward properties that can be tested. Instead of trying to >> >> come up with properties for various functions, it would be better to >> >> try to find functions that have a fairly obvious set of properties >> >> that can be tested. >> >> >> >> Aaron Meurer >> >> >> >> > >> >> > On Wed, 5 Mar 2025 at 00:16, Aaron Meurer <[email protected]> wrote: >> >> >> >> >> >> Pretty much any function in SymPy that can have mathematical >> >> >> properties written about it could potentially benefit from property >> >> >> testing. However, a big challenge with this project is the input >> data >> >> >> generation (the strategies in hypothesis terminology). Generating >> >> >> arbitrary SymPy expressions is a difficult problem. There was some >> >> >> initial work on this at https://github.com/sympy/sympy/pull/17190. >> But >> >> >> the problem is that just generating expressions itself can be buggy. >> >> >> Consider the expression I posted about in another mailing list >> thread. >> >> >> It takes 8 seconds just to construct, essentially because the >> >> >> expression constructor itself is buggy. >> >> >> https://groups.google.com/g/sympy/c/XSJuvibPOro/m/Q3TTETm7AwAJ >> >> >> >> >> >> So for now, it's better to actually focus on those functions that >> take >> >> >> relatively simple inputs. The simplest possible input is an integer. >> >> >> For instance, several functions in the ntheory module basically just >> >> >> take an integer as input. The next simplest is polynomials. The >> >> >> initial work that has been done on hypothesis testing has been in >> >> >> these modules, but the work hasn't gone very far and there is still >> >> >> more that can be done there. So I would suggest starting where there >> >> >> are existing hypothesis tests and expanding the tests in those parts >> >> >> of SymPy. We'll want to expand beyond that, but building strategies >> is >> >> >> one going of the harder parts of this project. >> >> >> >> >> >> By the way, if you didn't notice on the idea page, this issue has a >> >> >> lot more details on hypothesis testing in SymPy >> >> >> https://github.com/sympy/sympy/issues/20914. >> >> >> >> >> >> Aaron Meurer >> >> >> >> >> >> On Tue, Mar 4, 2025 at 2:19 AM Pradyot Ranjan <[email protected]> >> wrote: >> >> >> > >> >> >> > What are the components that can benefit most out of hypothesis >> testing? I can try to implement them before I start writing a proposal it >> that's okay. >> >> >> > >> >> >> > On Tue, 4 Mar, 2025, 4:29 am Pradyot Ranjan, < >> [email protected]> wrote: >> >> >> >> >> >> >> >> Last year I worked as a GSoC student for PyBaMM. We had a >> stretch goal regarding the implementation of hypothesis testing which can >> be tracked here : >> >> >> >> - https://github.com/pybamm-team/PyBaMM/issues/4703 >> >> >> >> I also reviewed some PRs regarding this : >> >> >> >> - https://github.com/pybamm-team/PyBaMM/pull/4724 >> >> >> >> >> >> >> >> Other than this I also worked as an LFX mentee last year where I >> implemented Fuzz testing (which is similar to Hypothesis's property-based >> testing in some ways). >> >> >> >> >> >> >> >> >> >> >> >> On Tue, 4 Mar, 2025, 3:23 am Aaron Meurer, <[email protected]> >> wrote: >> >> >> >>> >> >> >> >>> Yes, that project is still very relevant. If you search the >> codebase >> >> >> >>> for hypothesis you'll see that it is currently only used in a >> few >> >> >> >>> tests, but we want that to increase by a lot. >> >> >> >>> >> >> >> >>> What sort of experience do you have with hypothesis? >> >> >> >>> >> >> >> >>> Aaron Meurer >> >> >> >>> >> >> >> >>> On Mon, Mar 3, 2025 at 1:53 PM Pradyot Ranjan < >> [email protected]> wrote: >> >> >> >>> > >> >> >> >>> > Hi, >> >> >> >>> > Just wanted to know if this project is still relevant >> regarding GSoC? If it is, who is the mentor? >> >> >> >>> > I have some experience with hypothesis testing and would love >> to work here. >> >> >> >>> > >> >> >> >>> > Thanks, >> >> >> >>> > Pradyot Ranjan >> >> >> >>> > >> >> >> >>> > -- >> >> >> >>> > You received this message because you are subscribed to the >> Google Groups "sympy" group. >> >> >> >>> > To unsubscribe from this group and stop receiving emails from >> it, send an email to [email protected]. >> >> >> >>> > To view this discussion visit >> https://groups.google.com/d/msgid/sympy/afa7d863-666f-475f-ae4c-1ccb8a5d3752n%40googlegroups.com >> . >> >> >> >>> >> >> >> >>> -- >> >> >> >>> You received this message because you are subscribed to the >> Google Groups "sympy" group. >> >> >> >>> To unsubscribe from this group and stop receiving emails from >> it, send an email to [email protected]. >> >> >> >>> To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CAKgW%3D6Jj85nrRpoBWnz7uwziokTKHquJDP%3Dbt8YFuZQi5pTwew%40mail.gmail.com >> . >> >> >> > >> >> >> > -- >> >> >> > You received this message because you are subscribed to the >> Google Groups "sympy" group. >> >> >> > To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> >> >> > To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CANENgK7CmETia1vkWPr2pTrN3mNi1r%2B%2B-ALPAcPQcmaw9uvA3w%40mail.gmail.com >> . >> >> >> >> >> >> -- >> >> >> You received this message because you are subscribed to the Google >> Groups "sympy" group. >> >> >> To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> >> >> To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CAKgW%3D6Lwsx_W7P4rn9om2ZemuKdF4Ad-fEnmfayzvnWSD4k6QQ%40mail.gmail.com >> . >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> Groups "sympy" group. >> >> > To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected]. >> >> > To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CANENgK6x4TvEZJf%3DC_yosMqQwZwBCx-pxvhbG8%3DAzFB_k6JVKA%40mail.gmail.com >> . >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups "sympy" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> >> To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2BiYktWMN4aw0vjwpLgPmHu%3DoMRUZAc-xjb7uFdkp19oQ%40mail.gmail.com >> . >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "sympy" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> > To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CANENgK4_%3D5Dws%3D3H-Pq3pL4dxBe5Do1SvKWj8eFjX7fqJUVxkA%40mail.gmail.com >> . >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sympy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CAKgW%3D6KGR3G_ViOrTgeK7YGjzEwHFH6chzirhBG55z%2BnZ8GENw%40mail.gmail.com >> . >> > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/sympy/9f542f42-edd6-4da1-ac92-8d47a4a476bcn%40googlegroups.com.
