Here is scipy's discussion about a policy: https://discuss.scientific-python.org/t/a-policy-on-generative-ai-assisted-contributions/1702/18
Jason moorepants.info +01 530-601-9791 On Sun, Oct 26, 2025 at 7:15 AM Jason Moore <[email protected]> wrote: > Hi Oscar, > > Thanks for raising this. I agree, this problem will grow and it is > not good. I think we should have a policy about LLM generated > contributions. It would be nice if a SYMPEP was drafted for one. > > Having a standard way to reject spam PRs would be helpful. If we could > close a PR and add a label to trigger sympybot to leave a comment that says > "This PR does not meet SymPy's quality standards for AI generated code and > comments, see policy <link>" could be helpful. It still requires manual > steps from reviewers. > > I also share the general concern expressed by some in the scipy ecosystem > here: > > > https://github.com/scientific-python/summit-2025/issues/35#issuecomment-3038587497 > > which is that LLMs universally violate copyright licenses of open source > code. If this is true, then PRs with LLM generated code are polluting > SymPy's codebase with copyright violations. > > Jason > moorepants.info > +01 530-601-9791 > > > On Sun, Oct 26, 2025 at 12:46 AM Oscar Benjamin < > [email protected]> wrote: > >> Hi all, >> >> I am increasingly seeing pull requests in the SymPy repo that were >> written by AI e.g. something like Claude Code or ChatGPT etc. I don't >> think that any of these PRs are written by actual AI bots but rather >> that they are "written" by contributors who are using AI tooling. >> >> There are two separate categories: >> >> - Some contributors are making reasonable changes to the code and then >> using LLMs to write things like the PR description or comments on >> issues. >> - Some contributors are basically just vibe coding by having an LLM >> write all the code for them and then opening PRs usually with very >> obvious problems. >> >> In the first case some people use LLMs to write things like PR >> descriptions because English is not their first language. I can >> understand this and I think it is definitely possible to do this with >> LLMs in a way that is fine but it needs to amount to using them like >> Google Translate rather than asking them to write the text. The >> problems are that: >> >> - LLM summaries for something like a PR are too verbose and include >> lots of irrelevant information making it harder to see what the actual >> point is. >> - LLMs often include information that is just false such as "fixes >> issue #12345" when the issue is not fixed. >> >> I think some people are doing this in a way that is not good and I >> would prefer for them to just write in broken English or use Google >> Translate or something but I don't see this as a major problem. >> >> For the vibe coding case I think that there is a real problem. Many >> SymPy contributors are novices at programming and are nowhere near >> experienced enough to be able to turn vibe coding into outputs that >> can be included in the codebase. This means that there are just spammy >> PRs with false claims about what they do like "fixes X", "10x faster" >> etc where the code has not even been lightly tested and clearly does >> not work or possibly does not even do anything. >> >> I think what has happened is that the combination of user-friendly >> editors with easy git/GitHub integration and LLM agent plugins has >> brought us to the point where there are pretty much no technical >> barriers preventing someone from opening up gibberish spam PRs while >> having no real idea what they are doing. >> >> Really this is just inexperienced people using the tools badly which >> is not new. Low quality spammy PRs are not new either. There are some >> significant differences though: >> >> - I think that the number of low quality PRs is going to explode. It >> was already bad last year in the run up to GSOC (January to March >> time) and I think it will be much worse this year. >> - I don't think that it is reasonable to give meaningful feedback on >> PRs where this happens because the contributor has not spent any time >> studying the code that they are changing and any feedback is just >> going to be fed into an LLM. >> >> I'm not sure what we can do about this so for now I am regularly >> closing low quality PRs without much feedback but some contributors >> will just go on to open up new PRs. The "anyone can submit a PR model" >> has been under threat for some time but I worry that the whole idea is >> going to become unsustainable. >> >> In the context of the Russia-Ukraine war I have often seen references >> to the "cost-exchange problem". This refers to the fact that while >> both sides have a lot of anti-air defence capability they can still be >> overrun by cheap drones because million dollar interceptor missiles >> are just too expensive to be used against any large number of incoming >> thousand dollar drones. The solution there would be to have some kind >> of cheap interceptor like an automatic AA gun that can take out many >> cheap drones efficiently even if much less effective against fancier >> targets like enemy planes. >> >> The first time I heard about ChatGPT was when I got an email from >> StackOverflow saying that any use of ChatGPT was banned. Looking into >> it the reason given was that it was just too easy to generate >> superficially reasonable text that was low quality spam and then too >> much effort for real humans to filter that spam out manually. In other >> words bad/incorrect answers were nothing new but large numbers of >> inexperienced people using ChatGPT had ruined the cost-exchange ratio >> of filtering them out. >> >> I think in the case of SymPy pull requests there is an analogous >> "effort-exchange problem". The effort PR reviewers put in to help with >> PRs is not reasonable if the author of the PR is not putting in a lot >> more effort themselves because there are many times more people trying >> to author PRs than review them. I don't think that it can be >> sustainable in the face of this spam to review PRs in the same way as >> if they had been written by humans who are at least trying to >> understand what they are doing (and therefore learning from feedback). >> Even just closing PRs and not giving any feedback needs to become more >> efficient somehow. >> >> We need some sort of clear guidance or policy on the use of AI that >> sets clear explanations like "you still need to understand the code". >> I think we will also need to ban people for spam if they are doing >> things like opening AI-generated PRs without even testing the code. >> The hype that is spun by AI companies probably has many novice >> programmers believing that it actually is reasonable to behave like >> this but it really is not and that needs to be clearly stated >> somewhere. I don't think any of this is malicious but I think that it >> has the potential to become very harmful to open source projects. >> >> The situation right now is not so bad but if you project forwards a >> bit to when the repo gets a lot busier after Christmas I think this is >> going to be a big problem and I think it will only get worse in future >> years as well. >> >> It is very unfortunate that right now AI is being used in all the >> wrong places. It can do a student's homework because it knows the >> answers to all the standard homework problems but it can't do the more >> complicated more realistic things and then students haven't learned >> anything from doing their homework. In the context of SymPy it would >> be so much more useful to have AI doing other things like reviewing >> the code, finding bugs, etc rather than helping novices to get a PR >> merged without actually investing the time to learn anything from the >> process. >> >> -- >> Oscar >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sympy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion visit >> https://groups.google.com/d/msgid/sympy/CAHVvXxQ1ntG0EWBGihrXErLhGuABHH7Kt5RmGJvp9bHcqaC5%3DQ%40mail.gmail.com >> . >> > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/sympy/CAP7f1AgXnvn%3Dp9dS4D%3DvWu1GMUtAYBmFaA%2Bo_SEy-wH1VNpSaw%40mail.gmail.com.
