jman <[email protected]> writes: > By reading this list, I can infer that you are looking at a purely technical > evaluation, ...
Yes and no. I anticipated this kind of reply from someone in this thread, and my rough plan would be adding some kind of Q/A section after the technical guidelines that will address common ethical questions that people often ask. >> Let me know if any of the above smells disaster. > > ... which brings me to this point. Today **any and every** discussion about > LLMs not only revolves > about the technical aspects but also about their second-order side-effects > and how these tools > affect the world we live in: > - how they affect free/open-source projects: assaulted by "contributors" > pushing code not reviewed > nor tested with made up claims. Sometimes maintainers not even speaking to > a human because > comments are piped to an LLM I do not see this being a problem with LLMs. If someone is pushing changes carelessly, that's not acceptable. With or without LLMs. I get it that some people may be overconfident with LLMs, but that's simply a sign of limited experience. Once you work with LLMs long enough, it becomes very clear that blindly trusting the generated code is a very, very poor idea. Not even because LLMs write bad code, but because they must operate on an incomplete context, often making arbitrary decisions about the code design in the absence of 100% complete instructions. Do note that we do not give write access to people in Org mode blindly. There is always a proof of competence with prior patch history that is a requirement. I can revoke that access for abusers (we have had none so far). > - how they affect service hosting: scraping content and putting servers on > their knees That has little to do with LLM tech. That has something to do with abusive companies. Please do note that illegal scrapers are, well, illegal. The recent legislation from EU demands LLM companies to detail their training data, proving that they obey the opt-out rights in robots.txt. I know about at least one LLM fully trained respecting the license and crawler opt-outs - https://huggingface.co/collections/swiss-ai/apertus-llm AFAIK, this LLM is consistent with draft GNU criteria about libre LLMs - data is free, training code is fully free, weights are free. We can, and probably should, encourage contributors to use freedom-respecting LLMs, but demanding it will be, IMHO, an overkill. Just like demanding using only free software to produce patches is currently an overkill - we allow contributors to prepare patches in proprietary editors/OS without any questions. > - how they affect developers' mindset: people "unlearning" how to think about > writing software and > relying on proprietary, pay-per-token services to write FOSS code Here, I see the proposed policy as a plus - we demand people to review LLM code, thus encouraging the habit to do so in other code they produce. IMHO, rejecting the LLM usage is not going to be productive. It can really be very useful in the skilled hands. Educating how to use LLMs without "unlearning" is better. > - how they affect the environment: climate impact, etc. Here, I am not sure. Inference requirements are much lower compared to LLM training. You can run distilled versions of modern LLMs on CPU, locally. I expect further optimizations coming to local LLMs. With such trend, I am not even sure if using LLM will be worse than, say, spending energy to compile your own program instead of using compiled proprietary software. That said, I do get it that energy spending impact will likely be real given the sheer number of users (mostly non-programmers). But that should be solved by reducing environment impact of the data centers. I do not buy that we are going to get enough traction to do a full-scale worldwide anti-LLM movement -- the only likely way to stop what is happening right now. I count more on technological optimizations - algorithmic to reduce computer; engineering to reduce power usage per computer; more engineering to optimize power usage in data centers. > - how they affect labor: people tagging datasets for training Could you elaborate? I did not see this particular argument so far. > The bottom line is that I have serious concerns about LLMs state as of > _today_ and I believe any > discussion should not shy away from a comprehensive view and evaluation. Yes. Here are some other ethical considerations I saw: 1. LLMs in their current form promote SaSS services, with companies already racing for lock-in effects on the users (e.g. memory feature) Here, we should encourage local LLMs or, at least, deployable LLMs with hardware rental. Such services already exist. Also, libre LLMs, not created by companies (Company LLM will potentially contain ads or even worse biases right in the training) 2. Excessive reliance on LLMs can reduce creativity, because LLMs currently lack creating truly innovative solutions, even though they are pretty good synthesizing the existing approaches I am not sure if I believe in this concern though. I think similar fears were present in the past, when the internet was not yet everywhere. It settled itself over time. 3. There is a risk that LLMs will damage CC BY code, especially when attributions are important to the code authors (e.g. researchers who need to report on their output) This is one of the concerns that may not yet be addressed. Although I expect it to be resolved in the near future by legislation, as it is a copyright question that is considered in the courts. -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
