Thank you Arne, for sharing your experience.

I just promised Ludo' elsewhere in the tread that I'll shut up, so I'll keep it short to answer your questions, add some clarifications, and to thank you.

On 4/7/26 12:23, Dr. Arne Babenhauserheide wrote:
Hugo Buddelmeijer <[email protected]> writes:

LLMs are much more reliable than guix refresh, but you do need a human
in the loop.

What do you mean by reliable here? Do you mean that they fix a problem
that may crop up, that they find the most recent version, or that they
fix a known broken package?

Here I used an LLM to fix broken packages (randomly chosen). I inferred from this that LLMs could also fix problems that come up when refreshing, but I haven't used them for that.

(As I primarily refresh packages I personally care about, so I want to be sure they will be merged.)

I’m annoyed by broken packages, but I think that the approach for that
would rather be to have a more distributed test-suite that ensures that
after an update all possibly affected packages¹ still build and
successfully run their tests.

That is one of the reasons I'm fixing random packages (by hand I mean). Because the closer we are to 100%, the more reluctant people might be to have their commit break things. I hope.

Luckily we have `guix time-machine`.  I love this project!

In a project I maintain, we now have two clear rules:

Thank you for sharing.

Does that correspond with your numbers or do you think they are lying?

GPTBot alone did 109,552 accesses to my website in march, so I think
they are telling the truth in a very misleading way.

Thanks, those statistics look pretty bad.

So yes, I think they do what they say, and that does make them an
unscrupulous crawler. Like most of the other LLM crawlers (GPTBot is 10%
of the crawler traffic).

:-(  I'll think about it.  Maybe there are better companies.

And
the result is often better, certainly in collaboration with a human.
My first P.R.s were full of problems that would not exist if I had
employed an LLM.

I went through the same. But there’s a difference: with you the
reviewers time is well spent: they help a new contributor grow.

That is true. And it is valuable both ways, mistakes show the reviewer how the new contributor thinks and collaborates, which perhaps is good to practice with trivial changes.

With an LLM their time goes into the void. The LLM doesn’t learn to work
with this project. And at the next update it may have completely
different idiosyncrasies that reviewers would have to learn.

Yes. I meant to always keep a human in the loop. That human would then be responsible for not repeating the same mistake again.

Consider this scenario where two people collaborate, voluntary.  They
use LLMs (and their own human intellect) to fix packages and review
each others code.  Would that be okay with you?

If they use the LLMs for review, then not. Because then there would
still be code in Guix that wasn’t reviewed by a human.

Humans would read the code yes. I rewrote that sentence several times to make it more to the point, and then I broke it.

(Initially I had one human + LLM writing code, and the other human reviewing without the LLM. But that didn't convey the intended meaning, as it could be interpreted as the first human not reviewing the generated code at all, which I also think would be bad, because they would still be responsible for the code.)

and if no one understood that
prototype to begin with, that’s a trainwreck in the making.

Yeah... There is code from me (long before LLMs) in production that falls into that category.

That kinda highlights a problem that I sidestepped. I've enough experience with creating terrible code that I think I can handle including an LLM in my workflow. But un-experienced me with an LLM would be, eh, dangerous[ly fun].

So that is something to think about. How would current-me guide past-me if there would be LLM's involved.

Hugo


P.S. How many Esperanto speakers are there in this project? I've got my grand-parents lesson material; I believe they even met at an Esperanto course. Maybe I should pick it up again.


Reply via email to