Re: Accepting AI generated contributions

2025-08-16 Thread Josh McKenzie
> P.S. "Does don't-use-our-output-to-train-a-competitor language disqualify a > model/vendor" also seems to me to be plainly a question for Legal. Yeah; I was noodling on this meta question last week. If there are restrictions on output, but those restrictions don't apply in the circumstances you

Re: Accepting AI generated contributions

2025-08-15 Thread Jonathan Ellis
To the degree that we can draw conclusions from this three month old thread, I submit that the top one is: ASF policy is not optimally clear. (Personally I think the *spirit* of the policy is clear, as embodied in the TLDR at the bottom, but the text itself is not and since it's a legal document t

Re: Accepting AI generated contributions

2025-08-14 Thread Ariel Weisberg
Hi, I want to dig a little deeper into the actual ToS and make a distinction between the terms placing a burden on the output of the model and placing a burden on access/usage. Here are the Claude consumer ToS that seem relevant: ``` You may not access or use, or help another person to access o

Re: Accepting AI generated contributions

2025-08-14 Thread Ariel Weisberg
Hi, It's not up to us to interpret right? It's been interpreted by Apache Legal and if we are confused we can check, but this is one instance where they aren't being ambiguous or delegating to us to make a decision. I can't see how we can follow legal's guidance and accept output from models or

Re: Accepting AI generated contributions

2025-08-01 Thread Łukasz Dywicki
Effectively each online service will restrict competition from using their resources, as they are paying for infrastructure and energy to make profit, not to make competition stronger. Given the rumors or evidence (I'm not clear on that) that deepseek was trained on some sort of OpenAI resource

Re: Accepting AI generated contributions

2025-08-01 Thread Josh McKenzie
So I'll go ahead and preface this email - I'm not trying to open Pandora's Box or re-litigate settled things from the thread. *But...* > • The terms and conditions of the generative AI tool do not place any > restrictions on use of the output that would be inconsistent with the Open > S

Re: Accepting AI generated contributions

2025-08-01 Thread David Capwell
> On Aug 1, 2025, at 6:38 AM, Josh McKenzie wrote: > >> Kimi K2 has similar wording as OpenAI so I assume they are banned as well? > > What about the terms is incompatible with the ASF? Looks like you're good to > go with whatever you generate? Go to the "Service Misuse.” section • B

Re: Accepting AI generated contributions

2025-08-01 Thread Josh McKenzie
> Kimi K2 has similar wording as OpenAI so I assume they are banned as well? What about the terms is incompatible with the ASF? Looks like you're good to go with whatever you generate? > The ownership of the content generated based on Kimi is maintained by you, > and you are responsible for its

Re: Accepting AI generated contributions

2025-08-01 Thread David Capwell
Not really, this thread has made me really see that we need to know the tool/model provider so we can confirm the TOC allows contributions.OpenAI is not allowed and we know most popular ones are, but what about new ones?  Kimi K2 has similar wording as OpenAI so I assume they are banned as well? ht

Re: Accepting AI generated contributions

2025-07-31 Thread Yifan Cai
Does "*optionally* disclose the LLM used in whatever way you prefer and *definitely* no OpenAI" meet everyone's expectations? - Yifan On Thu, Jul 31, 2025 at 1:56 PM Josh McKenzie wrote: > Do we have a consensus on this topic or is there still further discussion > to be had? > > On Thu, Jul 24,

Re: Accepting AI generated contributions

2025-07-31 Thread Josh McKenzie
Do we have a consensus on this topic or is there still further discussion to be had? On Thu, Jul 24, 2025, at 8:26 AM, David Capwell wrote: > Given the above, code generated in whole or in part using AI can be > contributed if the contributor ensures that: The terms and conditions of the > gene

Re: Accepting AI generated contributions

2025-07-24 Thread David Capwell
Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that: The terms and conditions of the generative AI tool do not place any restrictions on use of the output that would be inconsistent with the Open Source Definition. At least one of the fol

Re: Accepting AI generated contributions

2025-07-23 Thread Jon Haddad
+1 to Patrick's proposal. On Wed, Jul 23, 2025 at 12:37 PM Patrick McFadin wrote: > I just did some review on all the case law around copywrite and AI code. > So far, every claim has been dismissed. There are some other cases like > NYTimes which have more merit and are proceeding. > > Which lea

Re: Accepting AI generated contributions

2025-07-23 Thread Patrick McFadin
I just did some review on all the case law around copywrite and AI code. So far, every claim has been dismissed. There are some other cases like NYTimes which have more merit and are proceeding. Which leads me to the opinion that this is feeling like a premature optimization. Somebody creating a P

Re: Accepting AI generated contributions

2025-07-23 Thread Łukasz Dywicki
Hello, The world moved forward, this is a fact. At the same time, most of people pushing their stuff to github, or other repository hosting solutions, rarely populate license information, and do not provide explicit patent rights. I agree that forbidding specific tools sound ridiculous, however

Re: Accepting AI generated contributions

2025-07-23 Thread Yifan Cai
According to the thread, the disclosure is for legal purposes. For example, the patch is not produced by OpenAI's service. I think having the discussion to clarify the AI usage in the projects is meaningful. I guess many are hesitating because of the unclarity in the area. > I don’t believe or agr

Re: Accepting AI generated contributions

2025-07-23 Thread Patrick McFadin
This is starting to get ridiculous. Disclosure statements on exactly how a problem was solved? What’s next? Time cards? It’s time to accept the world as it is. AI is in the coding toolbox now just like IDEs, linters and code formatters. Some may not like using them, some may love using them. What

Re: Accepting AI generated contributions

2025-07-23 Thread David Capwell
> David is disclosing it in the maillist and the GH page. Should the disclosure > be persisted in the commit? Someone asked me to update the ML, but I don’t believe or agree with us assuming we should do this for every PR; personally storing this in the PR description is fine to me as you are

Re: Accepting AI generated contributions

2025-07-23 Thread Bernardo Botella
That’s a great point. I’d say we can use the co-authored part of our commit messages to disclose the actual AI that was used? > On Jul 23, 2025, at 10:57 AM, Yifan Cai wrote: > > Curious, what are the good ways to disclose the information? > > > All of which comes back to: if people disclo

Re: Accepting AI generated contributions

2025-07-23 Thread Yifan Cai
Curious, what are the good ways to disclose the information? > All of which comes back to: if people disclose if they used AI, what models, and whether they used the code or text the model wrote verbatim or used it as a scaffolding and then heavily modified everything I think we'll be in a pretty

Re: Accepting AI generated contributions

2025-07-23 Thread David Capwell
Sent out this patch that was written 100% by Claude: https://github.com/apache/cassandra/pull/4266 Claudes license doesn’t have issues with the current ASF policy as far as I can tell. If you look at the patch it’s very clear there isn’t any copywriter material (its glueing together C* classes

Re: Accepting AI generated contributions

2025-06-25 Thread David Capwell
+1 to what Josh said Sent from my iPhone > On Jun 25, 2025, at 1:18 PM, Josh McKenzie wrote: > >  > Did some more digging. Apparently the way a lot of headline-grabbers have > been making models reproduce code verbatim is to prompt them with dozens of > verbatim tokens of copyrighted code as

Re: Accepting AI generated contributions

2025-06-25 Thread Josh McKenzie
Did some more digging. Apparently the way a lot of headline-grabbers have been making models reproduce code verbatim is to prompt them with dozens of verbatim tokens of copyrighted code as input where completion is then very heavily weighted to regurgitate the initial implementation. Which makes

Re: Accepting AI generated contributions

2025-06-25 Thread Ariel Weisberg
; On Mon, Jun 16, 2025 at 4:21 PM Patrick McFadin wrote: >>>> I'm on with the allow list(1) or option 2. 3 just isn't realistic >>>> anymore. >>>> >>>> Patrick >>>> >>>> >>>> >>>> On Mon, Jun 16, 2

Re: Accepting AI generated contributions

2025-06-25 Thread David Capwell
> 2. Models that do not do output filtering to restrict the reproduction of > training data unless the tool can ensure the output is license compatible? > > 2 would basically prohibit locally run models. I am not for this for the reasons listed above. There isn’t a difference between this and

Re: Accepting AI generated contributions

2025-06-24 Thread David Capwell
; >> >> On Mon, Jun 16, 2025 at 3:09 PM Caleb Rackliffe > <mailto:calebrackli...@gmail.com>> wrote: >> I haven't participated much here, but my vote would be basically #1, i.e. an >> "allow list" with a clear procedure for expansion. >> &

Re: Accepting AI generated contributions

2025-06-24 Thread Josh McKenzie
>>>> >>>> We could, but if the allow list is binding then it's still an allow list >>>> with some guidance on how to expand the allow list. >>>> >>>> If it isn't binding then it's guidance so still option 2 rea

Re: Accepting AI generated contributions

2025-06-24 Thread David Capwell
t binding then it's guidance so still option 2 really. >>> >>> I think the key distinction to find some early consensus on if we do a >>> binding allow list or guidance, and then we can iron out the guidance, but >>> I think that will be less controversial t

Re: Accepting AI generated contributions

2025-06-16 Thread Patrick McFadin
it's guidance so still option 2 really. >> >> I think the key distinction to find some early consensus on if we do a >> binding allow list or guidance, and then we can iron out the guidance, but >> I think that will be less controversial to work out. >> >> O

Re: Accepting AI generated contributions

2025-06-16 Thread Caleb Rackliffe
but > I think that will be less controversial to work out. > > Or option 3 which is not accepting AI generated contributions. I think > there are some with healthy skepticism of AI generated code, but so far I > haven't met anyone who wants to forbid it entirely. > > Ariel &

Re: Accepting AI generated contributions

2025-06-16 Thread Josh McKenzie
Couldn't our official stance be a combination of 1 and 2? i.e. "Here's an allow list. If you're using something not on that allow list, here's some basic guidance and maybe let us know how you tried to mitigate some of this risk so we can update our allow list w/some nuance". On Mon, Jun 16, 20

Re: Accepting AI generated contributions

2025-06-16 Thread Ariel Weisberg
allow list or guidance, and then we can iron out the guidance, but I think that will be less controversial to work out. Or option 3 which is not accepting AI generated contributions. I think there are some with healthy skepticism of AI generated code, but so far I haven't met anyone who wants t

Re: Accepting AI generated contributions

2025-06-16 Thread Ariel Weisberg
Hi, On Wed, Jun 11, 2025, at 3:48 PM, Jeremiah Jordan wrote: > Where are you getting this from? From the OpenAI terms of use: > https://openai.com/policies/terms-of-use/ Direct from the ASF legal mailing list discussion I linked to in my original email calling this out https://lists.apache.org

Re: Accepting AI generated contributions

2025-06-11 Thread Jeremiah Jordan
> > I respectfully mean that contributors, reviewers, and committers can't > feasibly understand and enforce the ASF guidelines. > If this is true, then the ASF is in a lot of trouble and you should bring it up with the ASF board. How many people are aware that if you get code from OpenAI direct

Re: Accepting AI generated contributions

2025-06-11 Thread Ariel Weisberg
Hi, I am not saying you said it, but I respectfully mean that contributors, reviewers, and committers can't feasibly understand and enforce the ASF guidelines. We would be another link in a chain of people abdicating responsibility starting with LLM vendors serving up models that reproduce cop

Re: Accepting AI generated contributions

2025-06-03 Thread scott
So the general concern we're talking about is identifying/avoiding cases in which a community member has contributed code generated with the support of a model that has reproduced training data verbatim, posing copyright risk to the Apache Cassandra project. Work in this area seems early, but t

Re: Accepting AI generated contributions

2025-06-02 Thread Ariel Weisberg
Hi, As PMC members/committers we aren't supposed to abdicate this to legal or to contributors. Despite the fact that we aren't equipped to solve this problem we are supposed to be making sure that code contributed is non-infringing. This is a quotation from Roman Shaposhnik from this legal thre

Re: Accepting AI generated contributions

2025-06-02 Thread Jeremiah Jordan
I don’t think I said we should abdicate responsibility? I said the key point is that contributors, and more importantly reviewers and committers understand the ASF guidelines and hold all code to those standards. Any suspect code should be blocked during review. As Roman says in your quote, this i

Re: Accepting AI generated contributions

2025-06-02 Thread Jeremiah Jordan
> Ultimately it's the contributor's (and committer's) job to ensure that their contributions meet the bar for acceptance To me this is the key point. Given how pervasive this stuff is becoming, I don’t think it’s feasible to make some list of tools and enforce it. Even without getting into extra

Re: Accepting AI generated contributions

2025-06-02 Thread David Capwell
> To clarify are you saying that we should not accept AI generated code until > it has been looked at by a human I think AI code would normally be the same process as normal code; the author and reviewers all reviewed the code; I am not against AI code in this context. > then written again wit

Re: Accepting AI generated contributions

2025-06-02 Thread Ariel Weisberg
Hi, To clarify are you saying that we should not accept AI generated code until it has been looked at by a human and then written again with different "wording" to ensure that it doesn't directly copy anything? Or do you mean something else about the quality of "vibe coding" and how we shouldn

Re: Accepting AI generated contributions

2025-06-02 Thread David Capwell
> fine tuning encourage not reproducing things verbatim > I think not producing copyrighted output from your training data is a > technically feasible achievement for these vendors so I have a moderate level > of trust they will succeed at it if they say they do it. Some team members and I discu

Accepting AI generated contributions

2025-05-30 Thread Ariel Weisberg
Hi all, It looks like we haven't discussed this much and haven't settled on a policy for what kinds of AI generated contributions we accept and what vetting is required for them. https://www.apache.org/legal/generative-tooling.html#:~:text=Given%20the%20above,code%20scanning%20results. ``` Giv