Thanks for bringing this issue to our attention. I agree that the "Open Source AI Definition" is NOT DFSG-compliant.
"The source code" must include everything needed to rebuild the software so that it *works the same* as the original. An AI system doesn't "work the same" -- i.e., give the same output from the same input -- without the training data, so the training data is clearly part of the source. Requiring that users must "build a substantially equivalent" part of the source on their own, as stated in the "Data Information" paragraph, is obviously at odds with the DFSG. That's like not releasing the source code at all and claiming that it's still free software because "a skilled person" could rewrite it. That's obvious bullshit. Furthermore, clause (1) of that paragraph explicitly states that there might be parts of the training dataset that are "unshareable". I see that as a direct contradiction with the expectation that one can build an equivalent dataset. And clause (3) states that parts of the dataset might be obtained "for fee". That too is in direct contradiction with the requirement that the software can be *freely* redistributed (or is that a typo and they really meant "for free"?!?). While Debian is of course not required to adhere to the OSAID, it's going to have an impact on the free software ecosystem. I believe that it's Debian's responsibility, as a very respected player in that ecosystem, to issue a public statement saying that the OSAID in its current form is unacceptable. (That goes beyond whether we should allow OSAID-compliant software into Debian, so I think it's outside the mandate of ftpmasters, although of course their opinion would be welcome.) I'm not sure whether a GR is needed for that or it's enough to reach consensus by an informal procedure. (Starting a GR process might be desirable because it sets a hard deadline and forces a decision -- or it might be undesirable for exactly the same reasons?) Gerardo