jman <[email protected]> writes:

> By reading this list, I can infer that you are looking at a purely technical 
> evaluation, ...

Yes and no.
I anticipated this kind of reply from someone in this thread, and my
rough plan would be adding some kind of Q/A section after the technical
guidelines that will address common ethical questions that people often
ask. 

>> Let me know if any of the above smells disaster.
>
> ... which brings me to this point. Today **any and every** discussion about 
> LLMs not only revolves 
> about the technical aspects but also about their second-order side-effects 
> and how these tools 
> affect the world we live in:
> - how they affect free/open-source projects: assaulted by "contributors" 
> pushing code not reviewed 
>   nor tested with made up claims. Sometimes maintainers not even speaking to 
> a human because 
>   comments are piped to an LLM

I do not see this being a problem with LLMs. If someone is pushing
changes carelessly, that's not acceptable. With or without LLMs.
I get it that some people may be overconfident with LLMs, but that's
simply a sign of limited experience. Once you work with LLMs long
enough, it becomes very clear that blindly trusting the generated code
is a very, very poor idea. Not even because LLMs write bad code, but
because they must operate on an incomplete context, often making
arbitrary decisions about the code design in the absence of 100%
complete instructions.

Do note that we do not give write access to people in Org mode
blindly. There is always a proof of competence with prior patch history
that is a requirement. I can revoke that access for abusers (we have had
none so far).

> - how they affect service hosting: scraping content and putting servers on 
> their knees

That has little to do with LLM tech. That has something to do with
abusive companies. Please do note that illegal scrapers are, well,
illegal. The recent legislation from EU demands LLM companies to detail
their training data, proving that they obey the opt-out rights in
robots.txt.

I know about at least one LLM fully trained respecting the license and
crawler opt-outs - https://huggingface.co/collections/swiss-ai/apertus-llm
AFAIK, this LLM is consistent with draft GNU criteria about libre LLMs -
data is free, training code is fully free, weights are free.

We can, and probably should, encourage contributors to use
freedom-respecting LLMs, but demanding it will be, IMHO, an
overkill. Just like demanding using only free software to produce
patches is currently an overkill - we allow contributors to prepare
patches in proprietary editors/OS without any questions.

> - how they affect developers' mindset: people "unlearning" how to think about 
> writing software and 
>   relying on proprietary, pay-per-token services to write FOSS code

Here, I see the proposed policy as a plus - we demand people to review
LLM code, thus encouraging the habit to do so in other code they
produce.

IMHO, rejecting the LLM usage is not going to be productive.
It can really be very useful in the skilled hands.
Educating how to use LLMs without "unlearning" is better.

> - how they affect the environment: climate impact, etc.

Here, I am not sure.
Inference requirements are much lower compared to LLM training.
You can run distilled versions of modern LLMs on CPU, locally.
I expect further optimizations coming to local LLMs.
With such trend, I am not even sure if using LLM will be worse than,
say, spending energy to compile your own program instead of using
compiled proprietary software.

That said, I do get it that energy spending impact will likely be real
given the sheer number of users (mostly non-programmers). But that
should be solved by reducing environment impact of the data centers. I
do not buy that we are going to get enough traction to do a full-scale
worldwide anti-LLM movement -- the only likely way to stop what is
happening right now. I count more on technological optimizations -
algorithmic to reduce computer; engineering to reduce power usage per
computer; more engineering to optimize power usage in data centers.

> - how they affect labor: people tagging datasets for training

Could you elaborate?
I did not see this particular argument so far.

> The bottom line is that I have serious concerns about LLMs state as of 
> _today_ and I believe any 
> discussion should not shy away from a comprehensive view and evaluation.

Yes. Here are some other ethical considerations I saw:

1. LLMs in their current form promote SaSS services, with companies
   already racing for lock-in effects on the users (e.g. memory feature)

   Here, we should encourage local LLMs or, at least, deployable LLMs
   with hardware rental. Such services already exist.

   Also, libre LLMs, not created by companies (Company LLM will
   potentially contain ads or even worse biases right in the training)

2. Excessive reliance on LLMs can reduce creativity, because LLMs
   currently lack creating truly innovative solutions, even though
   they are pretty good synthesizing the existing approaches

   I am not sure if I believe in this concern though. I think similar
   fears were present in the past, when the internet was not yet
   everywhere. It settled itself over time.

3. There is a risk that LLMs will damage CC BY code, especially when
   attributions are important to the code authors (e.g. researchers who
   need to report on their output)

   This is one of the concerns that may not yet be addressed. Although I
   expect it to be resolved in the near future by legislation, as it is
   a copyright question that is considered in the courts.

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to