> P.S. "Does don't-use-our-output-to-train-a-competitor language disqualify a
> model/vendor" also seems to me to be plainly a question for Legal.
Yeah; I was noodling on this meta question last week. If there are restrictions
on output, but those restrictions don't apply in the circumstances you
To the degree that we can draw conclusions from this three month old
thread, I submit that the top one is: ASF policy is not optimally clear.
(Personally I think the *spirit* of the policy is clear, as embodied in the
TLDR at the bottom, but the text itself is not and since it's a legal
document t
Hi,
I want to dig a little deeper into the actual ToS and make a distinction
between the terms placing a burden on the output of the model and placing a
burden on access/usage.
Here are the Claude consumer ToS that seem relevant:
```
You may not access or use, or help another person to access o
Hi,
It's not up to us to interpret right? It's been interpreted by Apache Legal and
if we are confused we can check, but this is one instance where they aren't
being ambiguous or delegating to us to make a decision.
I can't see how we can follow legal's guidance and accept output from models or
Effectively each online service will restrict competition from using
their resources, as they are paying for infrastructure and energy to
make profit, not to make competition stronger. Given the rumors or
evidence (I'm not clear on that) that deepseek was trained on some sort
of OpenAI resource
So I'll go ahead and preface this email - I'm not trying to open Pandora's Box
or re-litigate settled things from the thread. *But...*
> • The terms and conditions of the generative AI tool do not place any
> restrictions on use of the output that would be inconsistent with the Open
> S
> On Aug 1, 2025, at 6:38 AM, Josh McKenzie wrote:
>
>> Kimi K2 has similar wording as OpenAI so I assume they are banned as well?
>
> What about the terms is incompatible with the ASF? Looks like you're good to
> go with whatever you generate?
Go to the "Service Misuse.” section
• B
> Kimi K2 has similar wording as OpenAI so I assume they are banned as well?
What about the terms is incompatible with the ASF? Looks like you're good to go
with whatever you generate?
> The ownership of the content generated based on Kimi is maintained by you,
> and you are responsible for its
Not really, this thread has made me really see that we need to know the tool/model provider so we can confirm the TOC allows contributions.OpenAI is not allowed and we know most popular ones are, but what about new ones? Kimi K2 has similar wording as OpenAI so I assume they are banned as well? ht
Does "*optionally* disclose the LLM used in whatever way you prefer and
*definitely* no OpenAI" meet everyone's expectations?
- Yifan
On Thu, Jul 31, 2025 at 1:56 PM Josh McKenzie wrote:
> Do we have a consensus on this topic or is there still further discussion
> to be had?
>
> On Thu, Jul 24,
Do we have a consensus on this topic or is there still further discussion to be
had?
On Thu, Jul 24, 2025, at 8:26 AM, David Capwell wrote:
> Given the above, code generated in whole or in part using AI can be
> contributed if the contributor ensures that: The terms and conditions of the
> gene
Given the above, code generated in whole or in part using AI can be contributed if the contributor ensures that:
The terms and conditions of the generative AI tool do not place any restrictions on use of the output that would be inconsistent with the Open Source Definition.
At least one of the fol
+1 to Patrick's proposal.
On Wed, Jul 23, 2025 at 12:37 PM Patrick McFadin wrote:
> I just did some review on all the case law around copywrite and AI code.
> So far, every claim has been dismissed. There are some other cases like
> NYTimes which have more merit and are proceeding.
>
> Which lea
I just did some review on all the case law around copywrite and AI code. So
far, every claim has been dismissed. There are some other cases like
NYTimes which have more merit and are proceeding.
Which leads me to the opinion that this is feeling like a premature
optimization. Somebody creating a P
Hello,
The world moved forward, this is a fact. At the same time, most of
people pushing their stuff to github, or other repository hosting
solutions, rarely populate license information, and do not provide
explicit patent rights.
I agree that forbidding specific tools sound ridiculous, however
According to the thread, the disclosure is for legal purposes. For example,
the patch is not produced by OpenAI's service. I think having the
discussion to clarify the AI usage in the projects is meaningful. I guess
many are hesitating because of the unclarity in the area.
> I don’t believe or agr
This is starting to get ridiculous. Disclosure statements on exactly how a
problem was solved? What’s next? Time cards?
It’s time to accept the world as it is. AI is in the coding toolbox now
just like IDEs, linters and code formatters. Some may not like using them,
some may love using them. What
> David is disclosing it in the maillist and the GH page. Should the disclosure
> be persisted in the commit?
Someone asked me to update the ML, but I don’t believe or agree with us
assuming we should do this for every PR; personally storing this in the PR
description is fine to me as you are
That’s a great point. I’d say we can use the co-authored part of our commit
messages to disclose the actual AI that was used?
> On Jul 23, 2025, at 10:57 AM, Yifan Cai wrote:
>
> Curious, what are the good ways to disclose the information?
>
> > All of which comes back to: if people disclo
Curious, what are the good ways to disclose the information?
> All of which comes back to: if people disclose if they used AI, what
models, and whether they used the code or text the model wrote verbatim or
used it as a scaffolding and then heavily modified everything I think we'll
be in a pretty
Sent out this patch that was written 100% by Claude:
https://github.com/apache/cassandra/pull/4266
Claudes license doesn’t have issues with the current ASF policy as far as I can
tell. If you look at the patch it’s very clear there isn’t any copywriter
material (its glueing together C* classes
+1 to what Josh said
Sent from my iPhone
> On Jun 25, 2025, at 1:18 PM, Josh McKenzie wrote:
>
>
> Did some more digging. Apparently the way a lot of headline-grabbers have
> been making models reproduce code verbatim is to prompt them with dozens of
> verbatim tokens of copyrighted code as
Did some more digging. Apparently the way a lot of headline-grabbers have been
making models reproduce code verbatim is to prompt them with dozens of verbatim
tokens of copyrighted code as input where completion is then very heavily
weighted to regurgitate the initial implementation. Which makes
; On Mon, Jun 16, 2025 at 4:21 PM Patrick McFadin wrote:
>>>> I'm on with the allow list(1) or option 2. 3 just isn't realistic
>>>> anymore.
>>>>
>>>> Patrick
>>>>
>>>>
>>>>
>>>> On Mon, Jun 16, 2
> 2. Models that do not do output filtering to restrict the reproduction of
> training data unless the tool can ensure the output is license compatible?
>
> 2 would basically prohibit locally run models.
I am not for this for the reasons listed above. There isn’t a difference
between this and
;
>>
>> On Mon, Jun 16, 2025 at 3:09 PM Caleb Rackliffe > <mailto:calebrackli...@gmail.com>> wrote:
>> I haven't participated much here, but my vote would be basically #1, i.e. an
>> "allow list" with a clear procedure for expansion.
>>
&
>>>>
>>>> We could, but if the allow list is binding then it's still an allow list
>>>> with some guidance on how to expand the allow list.
>>>>
>>>> If it isn't binding then it's guidance so still option 2 rea
t binding then it's guidance so still option 2 really.
>>>
>>> I think the key distinction to find some early consensus on if we do a
>>> binding allow list or guidance, and then we can iron out the guidance, but
>>> I think that will be less controversial t
it's guidance so still option 2 really.
>>
>> I think the key distinction to find some early consensus on if we do a
>> binding allow list or guidance, and then we can iron out the guidance, but
>> I think that will be less controversial to work out.
>>
>> O
but
> I think that will be less controversial to work out.
>
> Or option 3 which is not accepting AI generated contributions. I think
> there are some with healthy skepticism of AI generated code, but so far I
> haven't met anyone who wants to forbid it entirely.
>
> Ariel
&
Couldn't our official stance be a combination of 1 and 2? i.e. "Here's an allow
list. If you're using something not on that allow list, here's some basic
guidance and maybe let us know how you tried to mitigate some of this risk so
we can update our allow list w/some nuance".
On Mon, Jun 16, 20
allow list or guidance, and then we can iron out the guidance, but I think that
will be less controversial to work out.
Or option 3 which is not accepting AI generated contributions. I think there
are some with healthy skepticism of AI generated code, but so far I haven't met
anyone who wants t
Hi,
On Wed, Jun 11, 2025, at 3:48 PM, Jeremiah Jordan wrote:
> Where are you getting this from? From the OpenAI terms of use:
> https://openai.com/policies/terms-of-use/
Direct from the ASF legal mailing list discussion I linked to in my original
email calling this out
https://lists.apache.org
>
> I respectfully mean that contributors, reviewers, and committers can't
> feasibly understand and enforce the ASF guidelines.
>
If this is true, then the ASF is in a lot of trouble and you should bring
it up with the ASF board.
How many people are aware that if you get code from OpenAI direct
Hi,
I am not saying you said it, but I respectfully mean that contributors,
reviewers, and committers can't feasibly understand and enforce the ASF
guidelines. We would be another link in a chain of people abdicating
responsibility starting with LLM vendors serving up models that reproduce
cop
So the general concern we're talking about is identifying/avoiding cases in
which a community member has contributed code generated with the support of a
model that has reproduced training data verbatim, posing copyright risk to the
Apache Cassandra project.
Work in this area seems early, but t
Hi,
As PMC members/committers we aren't supposed to abdicate this to legal or to
contributors. Despite the fact that we aren't equipped to solve this problem we
are supposed to be making sure that code contributed is non-infringing.
This is a quotation from Roman Shaposhnik from this legal thre
I don’t think I said we should abdicate responsibility? I said the key
point is that contributors, and more importantly reviewers and committers
understand the ASF guidelines and hold all code to those standards. Any
suspect code should be blocked during review. As Roman says in your quote,
this i
> Ultimately it's the contributor's (and committer's) job to ensure that
their contributions meet the bar for acceptance
To me this is the key point. Given how pervasive this stuff is becoming, I
don’t think it’s feasible to make some list of tools and enforce it. Even
without getting into extra
> To clarify are you saying that we should not accept AI generated code until
> it has been looked at by a human
I think AI code would normally be the same process as normal code; the author
and reviewers all reviewed the code; I am not against AI code in this context.
> then written again wit
Hi,
To clarify are you saying that we should not accept AI generated code until it
has been looked at by a human and then written again with different "wording"
to ensure that it doesn't directly copy anything?
Or do you mean something else about the quality of "vibe coding" and how we
shouldn
> fine tuning encourage not reproducing things verbatim
> I think not producing copyrighted output from your training data is a
> technically feasible achievement for these vendors so I have a moderate level
> of trust they will succeed at it if they say they do it.
Some team members and I discu
Hi all,
It looks like we haven't discussed this much and haven't settled on a policy
for what kinds of AI generated contributions we accept and what vetting is
required for them.
https://www.apache.org/legal/generative-tooling.html#:~:text=Given%20the%20above,code%20scanning%20results.
```
Giv
43 matches
Mail list logo