Hi Thank you Jarek for taking care of this matter!
> Should we react and block new users from interacting with Airflow repo if we see it happening again? Maintainers' time is not an infinite resource, so "yes!" from me (also for Iceberg). Best On Wed, 22 Jan 2025 at 15:40, Russell Spitzer <russell.spit...@gmail.com> wrote: > This is pretty disturbing and I hope that any users out there see that > using automated tools to submit issues is just adding noise to the project > which makes it very hard for real issues to be addressed. > > On Wed, Jan 22, 2025 at 6:58 AM Jarek Potiuk <ja...@potiuk.com> wrote: > >> - Iceberg dev to not flood them :) (in bcc:) >> >> It looks like the flood had been somehow flood-gated - no similar report >> for the last 4 hours or so. >> >> I also started to receive confirmation from Github that they are looking >> at the reports, so likely we do not have to do any action now, but I >> think we can turn it into deciding about "future" reactions when something >> like this happens, so that we can potentially react quickly >> >> What do others think ? Should we react and block new users from >> interacting with Airflow repo if we see it happening again? Maybe >> temporarily - for a day or two initially - after reporting some initial >> reports? Does it sound reasonable? >> >> J. >> >> On Wed, Jan 22, 2025 at 11:35 AM Pavankumar Gopidesu < >> gopidesupa...@gmail.com> wrote: >> >>> +1 from me. >>> >>> It looks started yesterday, I feel we may get many of these tickets when >>> new users starts testing those AI agents. >>> >>> Regards, >>> Pavan Kumar >>> >>> On Wed, Jan 22, 2025, 10:27 Jarek Potiuk <ja...@potiuk.com> wrote: >>> >>> > We continue getting new issues - and more of them are by "new users" - >>> > created just an hour or so ago. >>> > >>> > Apparently Github has a way to temporarily limit interactions with the >>> repo >>> > for new users - see this screenshot: >>> > >>> > https://ibb.co/WWsr7RB >>> > >>> > And I think I'd be for enabling it - we will need an INFRA ticket for >>> that, >>> > because that's not currently configurable via .asf.yaml - and maybe if >>> > Iceberg would like to do it as well, we can create a single ticket for >>> > that. >>> > >>> > There is a new framework coming to enable faster implementation and >>> testing >>> > of .asf.yaml features (this was discussed at the latest roundtable) - >>> and >>> > we can contribute a feature to add it in .asf.yaml soon, but >>> temporarily we >>> > might want to ask INFRA to help. >>> > >>> > WDYT? If I hear a few voices for +1 and no strong opposition I will >>> open a >>> > JIRA ticket (and would love to hear what Iceberg friends of ours think >>> as >>> > well :) >>> > >>> > >>> > J. >>> > >>> > >>> > On Wed, Jan 22, 2025 at 10:36 AM Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > >>> > > Yeah. just closed this one. The pattern where those are coming at the >>> > same >>> > > time as two unrelated issues to both iceberg and airflow are very. >>> .... >>> > > strange >>> > > >>> > > On Wed, Jan 22, 2025 at 10:35 AM Elad Kalif <elad...@apache.org> >>> wrote: >>> > > >>> > >> Another one who also opened issues in Airflow and Iceberg >>> > >> https://github.com/apache/iceberg/issues/12034 >>> > >> https://github.com/apache/airflow/issues/45920 >>> > >> >>> > >> Same "mistake" with the # Title. >>> > >> All of these seem to come with accounts opened months ago, with some >>> > minor >>> > >> traffic to their own forks so they would appear legit to Github >>> > >> >>> > >> On Wed, Jan 22, 2025 at 11:23 AM Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > >> >>> > >> > Yeah. Again - my guess is that those are "Agentic AI" trials, >>> where >>> > >> someone >>> > >> > is deploying fake "agent" accounts acting as "people in the repo >>> > would". >>> > >> > That's a bit terrifying if this is not contained. >>> > >> > >>> > >> > On Wed, Jan 22, 2025 at 9:52 AM Fokko Driesprong < >>> fo...@apache.org> >>> > >> wrote: >>> > >> > >>> > >> > > That's quite a few! I also noticed that they sometimes >>> self-close >>> > the >>> > >> > issue >>> > >> > > (eg here <https://github.com/apache/iceberg/issues/12032>). >>> Closed >>> > >> > after 1 >>> > >> > > minute, but still flooding my mailbox :D >>> > >> > > >>> > >> > > So you might have more such issues now than you think. >>> > >> > > >>> > >> > > >>> > >> > > Yes, that's probably the case, still going through my mailbox. >>> > >> > > >>> > >> > > >>> > >> > > Op wo 22 jan 2025 om 09:49 schreef Jarek Potiuk < >>> ja...@potiuk.com>: >>> > >> > > >>> > >> > > > Example case: >>> > >> > > > >>> > >> > > > * https://github.com/apache/airflow/issues/45904 - airflow >>> > >> > > > * https://github.com/apache/iceberg/issues/12034 - iceberg >>> > >> > > > >>> > >> > > > Both issues are generic and useless and bring 0 value except >>> > noise. >>> > >> > > > >>> > >> > > > Interesting thing is that many of those users, if you look at >>> > their >>> > >> > > > history - created. similar number of issues in iceberg and >>> airflow >>> > >> > about >>> > >> > > > the same time. So you might have more such issues now than you >>> > >> think. >>> > >> > > > >>> > >> > > > J. >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > On Wed, Jan 22, 2025 at 9:41 AM Jarek Potiuk < >>> ja...@potiuk.com> >>> > >> wrote: >>> > >> > > > >>> > >> > > >> I have not counted all of them. there are quite a bit too >>> many - >>> > >> and >>> > >> > > >> other people closed some of them as well. I got a very >>> > rudimentary >>> > >> > check >>> > >> > > >> and applied "AI Spam" label to some of the issues >>> > >> > > >> >>> > >> > > >>> > >> > >>> > >> >>> > >>> https://github.com/apache/airflow/issues?q=is%3Aissue%20state%3Aclosed%20AI%20label%3A%22AI%20Spam%22 >>> > >> > > . >>> > >> > > >> -> so we have had at least 25 such issues in the last 12 >>> hours. >>> > >> > > >> >>> > >> > > >> > we also want to make sure that we don't accidentally close >>> > issues >>> > >> > that >>> > >> > > >> don't come from a bot, but just a newcomer to the project. >>> > >> > > >> >>> > >> > > >> Those reports and patterns look very. very human-like - they >>> are >>> > >> > > reported >>> > >> > > >> infrequently (per user) the description and text seem >>> legitimate, >>> > >> but >>> > >> > > they >>> > >> > > >> are wordy and just reading and understanding that those are >>> > >> completely >>> > >> > > >> useless takes a lot of time. This is part of the problem, >>> that it >>> > >> > takes >>> > >> > > a >>> > >> > > >> lot of energy and time to determine if those are valid or >>> not - >>> > and >>> > >> > with >>> > >> > > >> such a rate, it's not sustainable just to analyze whether >>> they >>> > are >>> > >> > good >>> > >> > > or >>> > >> > > >> bad. >>> > >> > > >> >>> > >> > > >> J. >>> > >> > > >> >>> > >> > > >> >>> > >> > > >> >>> > >> > > >> On Wed, Jan 22, 2025 at 9:23 AM Fokko Driesprong < >>> > fo...@apache.org >>> > >> > >>> > >> > > >> wrote: >>> > >> > > >> >>> > >> > > >>> Hey Jarek, >>> > >> > > >>> >>> > >> > > >>> Thanks for bringing this to our attention. When you talk >>> about >>> > >> > > flooding, >>> > >> > > >>> how many are we talking about? I see some suspicious issues >>> (eg, >>> > >> here >>> > >> > > >>> <https://github.com/apache/iceberg/issues/12039>), but not >>> > many. >>> > >> I >>> > >> > > >>> hope this will come to a halt soon because it all additional >>> > work, >>> > >> > and >>> > >> > > we >>> > >> > > >>> also want to make sure that we don't accidentally close >>> issues >>> > >> that >>> > >> > > don't >>> > >> > > >>> come from a bot, but just a newcomer to the project. >>> > >> > > >>> >>> > >> > > >>> Kind regards, >>> > >> > > >>> Fokko >>> > >> > > >>> >>> > >> > > >>> Op wo 22 jan 2025 om 09:00 schreef Jarek Potiuk < >>> > ja...@potiuk.com >>> > >> >: >>> > >> > > >>> >>> > >> > > >>> > Hey Iceberg community, And Airflow community too. >>> > >> > > >>> > >>> > >> > > >>> > As of yesterday Airflow repo is literally flooded with a >>> > number >>> > >> of >>> > >> > > >>> issues >>> > >> > > >>> > that look almost good, except they are clearly AI >>> generated >>> > and >>> > >> > make >>> > >> > > no >>> > >> > > >>> > sense or repeat content from other issues. We noticed >>> that the >>> > >> > users >>> > >> > > >>> who >>> > >> > > >>> > create a lot of the "spam AI" issues that are created in >>> > Airflow >>> > >> > are >>> > >> > > >>> also >>> > >> > > >>> > creating similar issues for Iceberg. >>> > >> > > >>> > >>> > >> > > >>> > We got to the point that we are closing and reporting such >>> > >> issues >>> > >> > to >>> > >> > > >>> > GitHub and we are blocking all such users without >>> spending too >>> > >> much >>> > >> > > >>> time on >>> > >> > > >>> > it with messages similar to this: >>> > >> > > >>> > >>> > >> > > >>> > ``` >>> > >> > > >>> > This looks totally AI-generated. useless issue report that >>> > >> brings >>> > >> > no >>> > >> > > >>> value >>> > >> > > >>> > and makes no sense. We are generally blocking users that >>> > sends a >>> > >> > lot >>> > >> > > of >>> > >> > > >>> > spam AI reports generated by bots.. as of yesterday so we >>> will >>> > >> > report >>> > >> > > >>> your >>> > >> > > >>> > account and block it unless: >>> > >> > > >>> > >>> > >> > > >>> > a) you explain how you generated reports >>> > >> > > >>> > b) prove you are human >>> > >> > > >>> > c) explain why you created the issue >>> > >> > > >>> > ``` >>> > >> > > >>> > >>> > >> > > >>> > My guess is that some company released and is testing an >>> > >> "agentic >>> > >> > AI" >>> > >> > > >>> that >>> > >> > > >>> > is "github-targeted" - where people can run the AI agents >>> on >>> > >> their >>> > >> > > >>> behalf. >>> > >> > > >>> > It does not look like regular bot-spam. >>> > >> > > >>> > I think we should all generally crowd-source reporting it >>> to >>> > >> > Github - >>> > >> > > >>> and >>> > >> > > >>> > hopefully they will find a way to battle those without >>> > involving >>> > >> > > >>> > maintainers. >>> > >> > > >>> > >>> > >> > > >>> > I hope it will not last too long. >>> > >> > > >>> > >>> > >> > > >>> > J. >>> > >> > > >>> > >>> > >> > > >>> > >>> > >> > > >>> > >>> > >> > > >>> > ---------- Forwarded message --------- >>> > >> > > >>> > From: Jarek Potiuk <ja...@potiuk.com> >>> > >> > > >>> > Date: Wed, Jan 22, 2025 at 8:12 AM >>> > >> > > >>> > Subject: Re: Very strange (AI generated) issues >>> > >> > > >>> > To: <d...@airflow.apache.org> >>> > >> > > >>> > >>> > >> > > >>> > >>> > >> > > >>> > You can also report it directly from the issue (... at >>> the top >>> > >> and >>> > >> > > >>> "report >>> > >> > > >>> > content") >>> > >> > > >>> > >>> > >> > > >>> > On Wed, Jan 22, 2025 at 7:46 AM Amogh Desai < >>> > >> > > amoghdesai....@gmail.com> >>> > >> > > >>> > wrote: >>> > >> > > >>> > >>> > >> > > >>> >> Elad, I just managed to report this user. >>> > >> > > >>> >> >>> > >> > > >>> >> This is how its done: >>> > >> > > >>> >> >>> > >> > > >>> >> >>> > >> > > >>> >>> > >> > > >>> > >> > >>> > >> >>> > >>> https://docs.github.com/en/communities/maintaining-your-safety-on-github/reporting-abuse-or-spam#reporting-a-user >>> > >> > > >>> >> >>> > >> > > >>> >> Thanks & Regards, >>> > >> > > >>> >> Amogh Desai >>> > >> > > >>> >> >>> > >> > > >>> >> >>> > >> > > >>> >> On Wed, Jan 22, 2025 at 12:05 PM Elad Kalif < >>> > >> elad...@apache.org> >>> > >> > > >>> wrote: >>> > >> > > >>> >> >>> > >> > > >>> >> > There are several reports from this user >>> > >> > > >>> >> > >>> > >> > > >>> >> > https://github.com/atharv9017 >>> > >> > > >>> >> > >>> > >> > > >>> >> > >>> > >> > > >>> >> > I didnt find a way to report the user account to >>> github. >>> > >> > > >>> >> > >>> > >> > > >>> >> > בתאריך יום ד׳, 22 בינו׳ 2025, 06:41, מאת Pavankumar >>> > Gopidesu >>> > >> < >>> > >> > > >>> >> > gopidesupa...@gmail.com>: >>> > >> > > >>> >> > >>> > >> > > >>> >> > > Yes, still issues are coming. >>> > >> > > >>> >> > > >>> > >> > > >>> >> > > Regards, >>> > >> > > >>> >> > > Pavan >>> > >> > > >>> >> > > >>> > >> > > >>> >> > > On Wed, Jan 22, 2025 at 4:35 AM Amogh Desai < >>> > >> > > >>> amoghdesai....@gmail.com >>> > >> > > >>> >> > >>> > >> > > >>> >> > > wrote: >>> > >> > > >>> >> > > >>> > >> > > >>> >> > > > I saw a couple of such SPAM issues too. >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > > I also recall some SPAM comments on pull requests >>> as >>> > >> well, >>> > >> > so >>> > >> > > >>> if any >>> > >> > > >>> >> > > > contributor sees any such SPAM message, >>> > >> > > >>> >> > > > please report it on Slack so that we can delete it >>> and >>> > >> > report >>> > >> > > >>> it. >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > > Thanks & Regards, >>> > >> > > >>> >> > > > Amogh Desai >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > > On Wed, Jan 22, 2025 at 8:45 AM Zhe You Liu < >>> > >> > > >>> zhu424....@gmail.com> >>> > >> > > >>> >> > > wrote: >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > > > I came across another strange issue: >>> > >> > > >>> >> > > > > https://github.com/apache/airflow/issues/45837. >>> It >>> > >> > appears >>> > >> > > >>> to be >>> > >> > > >>> >> a >>> > >> > > >>> >> > > > > copy-paste of >>> > >> > > https://github.com/apache/airflow/issues/45661 >>> > >> > > >>> with >>> > >> > > >>> >> > just >>> > >> > > >>> >> > > > the >>> > >> > > >>> >> > > > > issue title changed. >>> > >> > > >>> >> > > > > >>> > >> > > >>> >> > > > > On Wed, Jan 22, 2025 at 6:50 AM Jarek Potiuk < >>> > >> > > >>> ja...@potiuk.com> >>> > >> > > >>> >> > wrote: >>> > >> > > >>> >> > > > > >>> > >> > > >>> >> > > > > > I even got to this stage: >>> > >> > > >>> >> > > > > > >>> > >> > > >>> >> > > > > > > We've received a few new tickets from your >>> > account >>> > >> > > >>> recently. >>> > >> > > >>> >> If >>> > >> > > >>> >> > > you'd >>> > >> > > >>> >> > > > > > like to add additional information you can add >>> a >>> > >> comment >>> > >> > > to >>> > >> > > >>> an >>> > >> > > >>> >> > > existing >>> > >> > > >>> >> > > > > > ticket, or wait a few minutes before opening a >>> new >>> > >> > ticket. >>> > >> > > >>> >> > > > > > >>> > >> > > >>> >> > > > > > On Tue, Jan 21, 2025 at 11:49 PM Jarek Potiuk < >>> > >> > > >>> ja...@potiuk.com >>> > >> > > >>> >> > >>> > >> > > >>> >> > > > wrote: >>> > >> > > >>> >> > > > > > >>> > >> > > >>> >> > > > > > > There are few more that I still saw after >>> sending >>> > >> it. >>> > >> > > >>> There is >>> > >> > > >>> >> > > > > something >>> > >> > > >>> >> > > > > > > going on bypassing GitHub filters. I hope >>> they >>> > >> will >>> > >> > > >>> manage >>> > >> > > >>> >> to do >>> > >> > > >>> >> > > > > > something >>> > >> > > >>> >> > > > > > > about it >>> > >> > > >>> >> > > > > > > >>> > >> > > >>> >> > > > > > > Last one is >>> > >> > > >>> https://github.com/apache/airflow/issues/45867 >>> > >> > > >>> >> > > > > > > >>> > >> > > >>> >> > > > > > > On Tue, Jan 21, 2025 at 11:46 PM Vikram Koka >>> > >> > > >>> >> > > > > > <vik...@astronomer.io.invalid> >>> > >> > > >>> >> > > > > > > wrote: >>> > >> > > >>> >> > > > > > > >>> > >> > > >>> >> > > > > > >> Agreed. >>> > >> > > >>> >> > > > > > >> >>> > >> > > >>> >> > > > > > >> Thanks for flagging these Jarek! >>> > >> > > >>> >> > > > > > >> >>> > >> > > >>> >> > > > > > >> >>> > >> > > >>> >> > > > > > >> On Tue, Jan 21, 2025 at 2:34 PM Jarek >>> Potiuk < >>> > >> > > >>> >> ja...@potiuk.com> >>> > >> > > >>> >> > > > > wrote: >>> > >> > > >>> >> > > > > > >> >>> > >> > > >>> >> > > > > > >> > Seems that we have a flood of AI generated >>> > >> feature >>> > >> > > >>> requests >>> > >> > > >>> >> > for >>> > >> > > >>> >> > > > > > Airflow, >>> > >> > > >>> >> > > > > > >> > The issues look somewhat legitimate, with >>> > >> somewhat >>> > >> > > >>> related >>> > >> > > >>> >> > > > content, >>> > >> > > >>> >> > > > > > but >>> > >> > > >>> >> > > > > > >> > they are wordy and make no sense when you >>> read >>> > >> > them. >>> > >> > > >>> Some >>> > >> > > >>> >> > > > examples: >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> > * >>> > >> https://github.com/apache/airflow/issues/45858 >>> > >> > > >>> >> > > > > > >> > * >>> > >> https://github.com/apache/airflow/issues/45856 >>> > >> > > >>> >> > > > > > >> > * >>> > >> https://github.com/apache/airflow/issues/45854 >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> > All of them done by accounts with short >>> > history >>> > >> in >>> > >> > GH >>> > >> > > >>> and >>> > >> > > >>> >> not >>> > >> > > >>> >> > > much >>> > >> > > >>> >> > > > > > >> activity >>> > >> > > >>> >> > > > > > >> > before >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> > There were quite a few more. >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> > I suggest we close such issues AND report >>> > >> authors >>> > >> > to >>> > >> > > >>> >> GitHub - >>> > >> > > >>> >> > > > > > hopefully >>> > >> > > >>> >> > > > > > >> we >>> > >> > > >>> >> > > > > > >> > can help to battle the AI-generated >>> traffic >>> > >> flood. >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> > J. >>> > >> > > >>> >> > > > > > >> > >>> > >> > > >>> >> > > > > > >> >>> > >> > > >>> >> > > > > > > >>> > >> > > >>> >> > > > > > >>> > >> > > >>> >> > > > > >>> > >> > > >>> >> > > > >>> > >> > > >>> >> > > >>> > >> > > >>> >> > >>> > >> > > >>> >> >>> > >> > > >>> > >>> > >> > > >>> >>> > >> > > >> >>> > >> > > >>> > >> > >>> > >> >>> > > >>> > >>> >>