Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Aaron Merey
On Tue, Apr 22, 2025 at 10:24 AM Serhei Makarov  wrote:
>
> On Tue, Apr 22, 2025, at 10:17 AM, Aaron Merey wrote:
> >> One question this raises re: the Dwfl_Process_Tracker structure and where 
> >> its implementation should be located. In the patches, the Dwfl struct 
> >> implementation includes a pointer to a Dwfl_Process_Tracker. I’m not sure 
> >> if elfutils currently has a ‘lower’ level library refer to symbols from a 
> >> library that uses it. Would the circular dependency cause any problems?
> >
> > dwfl_st or dwflst prefixes work for me. I think I slightly prefer
> > dwfl_st. As for where to define Dwfl_Process_tracker let's try to keep
> > it to the new dwfl_stacktraceP.h and if possible use forward
> > declarations to avoid circular dependencies. If it's necessary to
> > include more in libdwflP.h that should be ok since it's not publicly
> > exposed,
> Ok, I'll test and see if a forward decl of Dwflst_Process_Tracker in 
> libdwflP.h works as intended.
>
> Starting to code this and seeing how the API/code is looking;
> I would really argue for dwflst over dwfl_st.
> The reason is that it becomes confusing to the casual reader
> whether a symbol dwfl_st_FOO is a symbol st_FOO inside libdwfl
> or a symbol FOO inside libdwfl_st. But dwflst_FOO has no such
> ambiguity.
>
> The public header would be called libdwfl_stacktrace.h,
> for immediate clarify re: what is being included.

Fair point, dwflst works for me.

Aaron



Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Jonathan Wakely
On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc  wrote:
>
> On 4/21/25 12:59 PM, Mark Wielaard wrote:
> > Hi hackers,
> >
> > TLDR; When using https://patchwork.sourceware.org or Bunsen
> > https://builder.sourceware.org/testruns/ you might now have to enable
> > javascript. This should not impact any scripts, just browsers (or bots
> > pretending to be browsers). If it does cause trouble, please let us
> > know. If this works out we might also "protect" bugzilla, gitweb,
> > cgit, and the wikis this way.
> >
> > We don't like to hav to do this, but as some of you might have noticed
> > Sourceware has been fighting the new AI scraperbots since start of the
> > year. We are not alone in this.
> >
> > https://lwn.net/Articles/1008897/
> > https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
> >
> > We have tried to isolate services more and block various ip-blocks
> > that were abusing the servers. But that has helped only so much.
> > Unfortunately the scraper bots are using lots of ip addresses
> > (probably by installing "free" VPN services that use normal user
> > connections as exit point) and pretending to be common
> > browsers/agents.  We seem to have to make access to some services
> > depend on solving a javascript challenge.
>
> Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI
> scrapers might be doing this:
> https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the
> last post in the thread because it was hard to actually follow the
> thread given the number of replies, please go all the way up and read
> all 8 posts).
>
> Essentially, there's a library developer that pays developers to just
> "include this library and a few more lines in your TOS". This library
> then allows the app to sell the end-user's bandwidth to clients of the
> library developer, allowing them to make requests. This is how big
> companies are managing to have so many IP addresses, so many of those
> being residential IP addresses, and it also means that by blocking those
> IP addresses we will be - necessarily - blocking real user traffic to
> our platforms.

It seems to me that blocking real users *who are running these shady
apps* is perfectly reasonable.

They might not realise it, but those users are part of the problem. If
we block them, maybe they'll be incentivised to stop using the shady
apps. And if users stop using those apps, maybe those app developers
will stop bundling the libraries that piggyback on users' bandwidth.

>
> I'm happy to see that the sourceware is moving to a more comprehensive
> solution, and if this is successful, I'd suggest that we also try to do
> that to the forgejo instance, and remove the IPs blocked because of this
> scraping.

For now, maybe. This thread already explained how to get around Anubis
by changing the UserAgent string - how long will it be until these
peer-to-business network libraries figure that out?


Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Guinevere Larsen

On 4/22/25 10:06 AM, Jonathan Wakely wrote:

On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc  wrote:

On 4/21/25 12:59 PM, Mark Wielaard wrote:

Hi hackers,

TLDR; When using https://patchwork.sourceware.org or Bunsen
https://builder.sourceware.org/testruns/ you might now have to enable
javascript. This should not impact any scripts, just browsers (or bots
pretending to be browsers). If it does cause trouble, please let us
know. If this works out we might also "protect" bugzilla, gitweb,
cgit, and the wikis this way.

We don't like to hav to do this, but as some of you might have noticed
Sourceware has been fighting the new AI scraperbots since start of the
year. We are not alone in this.

https://lwn.net/Articles/1008897/
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

We have tried to isolate services more and block various ip-blocks
that were abusing the servers. But that has helped only so much.
Unfortunately the scraper bots are using lots of ip addresses
(probably by installing "free" VPN services that use normal user
connections as exit point) and pretending to be common
browsers/agents.  We seem to have to make access to some services
depend on solving a javascript challenge.

Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI
scrapers might be doing this:
https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the
last post in the thread because it was hard to actually follow the
thread given the number of replies, please go all the way up and read
all 8 posts).

Essentially, there's a library developer that pays developers to just
"include this library and a few more lines in your TOS". This library
then allows the app to sell the end-user's bandwidth to clients of the
library developer, allowing them to make requests. This is how big
companies are managing to have so many IP addresses, so many of those
being residential IP addresses, and it also means that by blocking those
IP addresses we will be - necessarily - blocking real user traffic to
our platforms.

It seems to me that blocking real users *who are running these shady
apps* is perfectly reasonable.

They might not realise it, but those users are part of the problem. If
we block them, maybe they'll be incentivised to stop using the shady
apps. And if users stop using those apps, maybe those app developers
will stop bundling the libraries that piggyback on users' bandwidth.


If an IP mapped perfectly to one user, maybe. But I can't control what 
other users of the same ISP in the same area as me are doing, and we're 
sharing an IP. And worse, if I still lived with my family, no way would 
I be able to veto what my parents are using their phone for, so because 
they have a shady app I wouldn't be able to access systems? that doesn't 
seem fair at all


Not to mention the fact that "read and understand the entirety of the 
TOS of every single app" assumes a pretty decent amount of free time for 
users that they may not have, and we wouldn't want to make open source 
even more hostile for people who are overwhelmed or overworked already. 
Of course people should, but having that as a requirement excludes 
people like... well, myself to be quite honest.





I'm happy to see that the sourceware is moving to a more comprehensive
solution, and if this is successful, I'd suggest that we also try to do
that to the forgejo instance, and remove the IPs blocked because of this
scraping.

For now, maybe. This thread already explained how to get around Anubis
by changing the UserAgent string - how long will it be until these
peer-to-business network libraries figure that out?


hopefully longer than the bubble lasts

--
Cheers,
Guinevere Larsen
She/Her/Hers



Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Serhei Makarov



On Tue, Apr 22, 2025, at 10:17 AM, Aaron Merey wrote:
>> One question this raises re: the Dwfl_Process_Tracker structure and where 
>> its implementation should be located. In the patches, the Dwfl struct 
>> implementation includes a pointer to a Dwfl_Process_Tracker. I’m not sure if 
>> elfutils currently has a ‘lower’ level library refer to symbols from a 
>> library that uses it. Would the circular dependency cause any problems?
>
> dwfl_st or dwflst prefixes work for me. I think I slightly prefer
> dwfl_st. As for where to define Dwfl_Process_tracker let's try to keep
> it to the new dwfl_stacktraceP.h and if possible use forward
> declarations to avoid circular dependencies. If it's necessary to
> include more in libdwflP.h that should be ok since it's not publicly
> exposed,
Ok, I'll test and see if a forward decl of Dwflst_Process_Tracker in libdwflP.h 
works as intended.

Starting to code this and seeing how the API/code is looking;
I would really argue for dwflst over dwfl_st.
The reason is that it becomes confusing to the casual reader
whether a symbol dwfl_st_FOO is a symbol st_FOO inside libdwfl
or a symbol FOO inside libdwfl_st. But dwflst_FOO has no such
ambiguity.

The public header would be called libdwfl_stacktrace.h,
for immediate clarify re: what is being included.

-- 
All the best,
Serhei
http://serhei.io


Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Aaron Merey
On Tue, Apr 22, 2025 at 9:53 AM Serhei Makarov  wrote:
> On Tue, Apr 22, 2025, at 9:45 AM, Aaron Merey wrote:
> >
> > Let's move the process_tracker interface as well for additional
> > flexibility to modify if needed. As for a name, I like
> > libdwfl_stacktrace. It clearly communicates the purpose of the library
> > and it's open to the possibility of supporting non-perf samples. It
> > also avoids namespace collision (the name libstacktrace is already
> > used by other projects).
> Agreed re: stability impacts.
>
> Is keeping a dwfl_ prefix for the apis acceptable? Inventing a new one might 
> lead to silly and verbose function names unless we come up with an 
> abbreviation like dwflst_
>
> One question this raises re: the Dwfl_Process_Tracker structure and where its 
> implementation should be located. In the patches, the Dwfl struct 
> implementation includes a pointer to a Dwfl_Process_Tracker. I’m not sure if 
> elfutils currently has a ‘lower’ level library refer to symbols from a 
> library that uses it. Would the circular dependency cause any problems?

dwfl_st or dwflst prefixes work for me. I think I slightly prefer
dwfl_st. As for where to define Dwfl_Process_tracker let's try to keep
it to the new dwfl_stacktraceP.h and if possible use forward
declarations to avoid circular dependencies. If it's necessary to
include more in libdwflP.h that should be ok since it's not publicly
exposed,

Aaron



Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Serhei Makarov



On Mon, Apr 21, 2025, at 12:29 AM, Aaron Merey wrote:.
>
> I know we're close to the next release and I do want this work to be
> included.  My proposal is to move the current API out of libdwfl and
> into a new library, as-is but clearly marked as experimental and
> subject to possible API/ABI breakage.  This fits with the
> "experimental" label that eu-stacktrace still carries. The new
> functionality introduced in this series makes it into the release and
> we retain flexibility to iterate on the design without affecting
> libdwfl itself.
Do you want this to be done for the processtracker interface as well, or only 
the ~two functions that mention perf fields in the API?

I can do either option. Can you let me know what name you want for the library? 
eg libdwfl_perf libstacktrace libdwfl_stacktrace …

-- 
All the best,
Serhei
http://serhei.io


Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Guinevere Larsen

On 4/21/25 12:59 PM, Mark Wielaard wrote:

Hi hackers,

TLDR; When using https://patchwork.sourceware.org or Bunsen
https://builder.sourceware.org/testruns/ you might now have to enable
javascript. This should not impact any scripts, just browsers (or bots
pretending to be browsers). If it does cause trouble, please let us
know. If this works out we might also "protect" bugzilla, gitweb,
cgit, and the wikis this way.

We don't like to hav to do this, but as some of you might have noticed
Sourceware has been fighting the new AI scraperbots since start of the
year. We are not alone in this.

https://lwn.net/Articles/1008897/
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

We have tried to isolate services more and block various ip-blocks
that were abusing the servers. But that has helped only so much.
Unfortunately the scraper bots are using lots of ip addresses
(probably by installing "free" VPN services that use normal user
connections as exit point) and pretending to be common
browsers/agents.  We seem to have to make access to some services
depend on solving a javascript challenge.


Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI 
scrapers might be doing this: 
https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the 
last post in the thread because it was hard to actually follow the 
thread given the number of replies, please go all the way up and read 
all 8 posts).


Essentially, there's a library developer that pays developers to just 
"include this library and a few more lines in your TOS". This library 
then allows the app to sell the end-user's bandwidth to clients of the 
library developer, allowing them to make requests. This is how big 
companies are managing to have so many IP addresses, so many of those 
being residential IP addresses, and it also means that by blocking those 
IP addresses we will be - necessarily - blocking real user traffic to 
our platforms.


I'm happy to see that the sourceware is moving to a more comprehensive 
solution, and if this is successful, I'd suggest that we also try to do 
that to the forgejo instance, and remove the IPs blocked because of this 
scraping.




So we have installed Anubis https://anubis.techaro.lol/ in front of
patchwork and bunsen. This means that if you are using a browser that
identifies as Mozilla or Opera in their User-Agent you will get a
brief page showing the happy anime girl that requires javascript to
solve a challenge and get a cookie to get through. Scripts and search
engines should get through without. Also removing Mozilla and/or Opera
from your User-Agent will get you through without javascript.

We want to thanks Xe Iaso who has helped us set this up and worked
with use over the Easter weekend solving some of our problems/typos.
Please check out if you want to be one of their patrons as thank you.
https://xeiaso.net/notes/2025/anubis-works/
https://xeiaso.net/patrons/

Cheers,

Mark



--
Cheers,
Guinevere Larsen
She/Her/Hers



Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Aaron Merey
Hi Serhei,

On Tue, Apr 22, 2025 at 9:27 AM Serhei Makarov  wrote:
>
> On Mon, Apr 21, 2025, at 12:29 AM, Aaron Merey wrote:.
> >
> > I know we're close to the next release and I do want this work to be
> > included.  My proposal is to move the current API out of libdwfl and
> > into a new library, as-is but clearly marked as experimental and
> > subject to possible API/ABI breakage.  This fits with the
> > "experimental" label that eu-stacktrace still carries. The new
> > functionality introduced in this series makes it into the release and
> > we retain flexibility to iterate on the design without affecting
> > libdwfl itself.
> Do you want this to be done for the processtracker interface as well, or only 
> the ~two functions that mention perf fields in the API?
>
> I can do either option. Can you let me know what name you want for the 
> library? eg libdwfl_perf libstacktrace libdwfl_stacktrace …

Let's move the process_tracker interface as well for additional
flexibility to modify if needed. As for a name, I like
libdwfl_stacktrace. It clearly communicates the purpose of the library
and it's open to the possibility of supporting non-perf samples. It
also avoids namespace collision (the name libstacktrace is already
used by other projects).

Aaron



Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Jonathan Wakely
On Tue, 22 Apr 2025, 14:17 Guinevere Larsen,  wrote:
>
> On 4/22/25 10:06 AM, Jonathan Wakely wrote:
> > On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc  
> > wrote:
> >> On 4/21/25 12:59 PM, Mark Wielaard wrote:
> >>> Hi hackers,
> >>>
> >>> TLDR; When using https://patchwork.sourceware.org or Bunsen
> >>> https://builder.sourceware.org/testruns/ you might now have to enable
> >>> javascript. This should not impact any scripts, just browsers (or bots
> >>> pretending to be browsers). If it does cause trouble, please let us
> >>> know. If this works out we might also "protect" bugzilla, gitweb,
> >>> cgit, and the wikis this way.
> >>>
> >>> We don't like to hav to do this, but as some of you might have noticed
> >>> Sourceware has been fighting the new AI scraperbots since start of the
> >>> year. We are not alone in this.
> >>>
> >>> https://lwn.net/Articles/1008897/
> >>> https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
> >>>
> >>> We have tried to isolate services more and block various ip-blocks
> >>> that were abusing the servers. But that has helped only so much.
> >>> Unfortunately the scraper bots are using lots of ip addresses
> >>> (probably by installing "free" VPN services that use normal user
> >>> connections as exit point) and pretending to be common
> >>> browsers/agents.  We seem to have to make access to some services
> >>> depend on solving a javascript challenge.
> >> Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI
> >> scrapers might be doing this:
> >> https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the
> >> last post in the thread because it was hard to actually follow the
> >> thread given the number of replies, please go all the way up and read
> >> all 8 posts).
> >>
> >> Essentially, there's a library developer that pays developers to just
> >> "include this library and a few more lines in your TOS". This library
> >> then allows the app to sell the end-user's bandwidth to clients of the
> >> library developer, allowing them to make requests. This is how big
> >> companies are managing to have so many IP addresses, so many of those
> >> being residential IP addresses, and it also means that by blocking those
> >> IP addresses we will be - necessarily - blocking real user traffic to
> >> our platforms.
> > It seems to me that blocking real users *who are running these shady
> > apps* is perfectly reasonable.
> >
> > They might not realise it, but those users are part of the problem. If
> > we block them, maybe they'll be incentivised to stop using the shady
> > apps. And if users stop using those apps, maybe those app developers
> > will stop bundling the libraries that piggyback on users' bandwidth.
>
> If an IP mapped perfectly to one user, maybe. But I can't control what
> other users of the same ISP in the same area as me are doing, and we're
> sharing an IP. And worse, if I still lived with my family, no way would
> I be able to veto what my parents are using their phone for, so because
> they have a shady app I wouldn't be able to access systems?

Yeah, maybe.

> that doesn't seem fair at all

What's fair about those users of shady apps being able to prevent all
of us from accessing the systems we need to use?

The apps and ISPs and cloud providers that allow this traffic should
be excluded from the public net, and let market forces push people to
choose better services. Why reward the app developers who are
profiting this way, when we could discourage it?

If users are currently oblivious to the problems caused by their bad
choice of apps, they won't stop using the malware that creates a
botnet (just a semi-legal form because nobody had to get hacked to
install the malware). App stores and ISPs should certainly be
protecting users from being exploited this way, but that's not
happening.


> Not to mention the fact that "read and understand the entirety of the
> TOS of every single app" assumes a pretty decent amount of free time for
> users that they may not have, and we wouldn't want to make open source
> even more hostile for people who are overwhelmed or overworked already.

I don't think we should have to tolerate bad actors (in this case,
that means the malware apps, but by extension the users of those
malware apps).

> Of course people should, but having that as a requirement excludes
> people like... well, myself to be quite honest.
>
> >
> >> I'm happy to see that the sourceware is moving to a more comprehensive
> >> solution, and if this is successful, I'd suggest that we also try to do
> >> that to the forgejo instance, and remove the IPs blocked because of this
> >> scraping.
> > For now, maybe. This thread already explained how to get around Anubis
> > by changing the UserAgent string - how long will it be until these
> > peer-to-business network libraries figure that out?
> >
> hopefully longer than the bubble lasts


I expect them to figure it out 

Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Serhei Makarov



On Tue, Apr 22, 2025, at 9:45 AM, Aaron Merey wrote:
> Hi Serhei,
>
> On Tue, Apr 22, 2025 at 9:27 AM Serhei Makarov  wrote:
>>
>> On Mon, Apr 21, 2025, at 12:29 AM, Aaron Merey wrote:.
>> >
>> > I know we're close to the next release and I do want this work to be
>> > included.  My proposal is to move the current API out of libdwfl and
>> > into a new library, as-is but clearly marked as experimental and
>> > subject to possible API/ABI breakage.  This fits with the
>> > "experimental" label that eu-stacktrace still carries. The new
>> > functionality introduced in this series makes it into the release and
>> > we retain flexibility to iterate on the design without affecting
>> > libdwfl itself.
>> Do you want this to be done for the processtracker interface as well, or 
>> only the ~two functions that mention perf fields in the API?
>>
>> I can do either option. Can you let me know what name you want for the 
>> library? eg libdwfl_perf libstacktrace libdwfl_stacktrace …
>
> Let's move the process_tracker interface as well for additional
> flexibility to modify if needed. As for a name, I like
> libdwfl_stacktrace. It clearly communicates the purpose of the library
> and it's open to the possibility of supporting non-perf samples. It
> also avoids namespace collision (the name libstacktrace is already
> used by other projects).
Agreed re: stability impacts.

Is keeping a dwfl_ prefix for the apis acceptable? Inventing a new one might 
lead to silly and verbose function names unless we come up with an abbreviation 
like dwflst_ 

One question this raises re: the Dwfl_Process_Tracker structure and where its 
implementation should be located. In the patches, the Dwfl struct 
implementation includes a pointer to a Dwfl_Process_Tracker. I’m not sure if 
elfutils currently has a ‘lower’ level library refer to symbols from a library 
that uses it. Would the circular dependency cause any problems?

-- 
All the best,
Serhei
http://serhei.io


Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Chris Packham
Hi Mark,

On Tue, 22 Apr 2025, 4:00 am Mark Wielaard,  wrote:

> Hi hackers,
>
> TLDR; When using https://patchwork.sourceware.org or Bunsen
> https://builder.sourceware.org/testruns/ you might now have to enable
> javascript. This should not impact any scripts, just browsers (or bots
> pretending to be browsers). If it does cause trouble, please let us
> know. If this works out we might also "protect" bugzilla, gitweb,
> cgit, and the wikis this way.
>
> We don't like to hav to do this, but as some of you might have noticed
> Sourceware has been fighting the new AI scraperbots since start of the
> year. We are not alone in this.
>
> https://lwn.net/Articles/1008897/
>
> https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
>
> We have tried to isolate services more and block various ip-blocks
> that were abusing the servers. But that has helped only so much.
> Unfortunately the scraper bots are using lots of ip addresses
> (probably by installing "free" VPN services that use normal user
> connections as exit point) and pretending to be common
> browsers/agents.  We seem to have to make access to some services
> depend on solving a javascript challenge.
>
> So we have installed Anubis https://anubis.techaro.lol/ in front of
> patchwork and bunsen. This means that if you are using a browser that
> identifies as Mozilla or Opera in their User-Agent you will get a
> brief page showing the happy anime girl that requires javascript to
> solve a challenge and get a cookie to get through. Scripts and search
> engines should get through without. Also removing Mozilla and/or Opera
> from your User-Agent will get you through without javascript.
>
> We want to thanks Xe Iaso who has helped us set this up and worked
> with use over the Easter weekend solving some of our problems/typos.
> Please check out if you want to be one of their patrons as thank you.
> https://xeiaso.net/notes/2025/anubis-works/
> https://xeiaso.net/patrons/


Ah that might explain a few things. We've seen sporadic failures in the
crosstool-ng CI builds (run via a github action) where a download of the
newlib snapshot failed (but worked fine when I tried the download manually).

The good news is that this finally prompted me to look at why we were
downloading something that should have been cached. I've fixed that now so
whatever extra load our builds were contributing should stop soon.

We might still get caught up in the bot detection when a package hosted on
sourceware.org is updated. I'm not sure if there is anything we can do
about that. I totally understand why this is necessary (AI scraper bots
have taken the crosstool-ng website down twice).


Re: scraperbot protection - Patchwork and Bunsen behind Anubis

2025-04-22 Thread Aurelien Jarno
On 2025-04-22 14:06, Jonathan Wakely wrote:
> On Tue, 22 Apr 2025 at 13:36, Guinevere Larsen via Gcc  
> wrote:
> >
> > On 4/21/25 12:59 PM, Mark Wielaard wrote:
> > > Hi hackers,
> > >
> > > TLDR; When using https://patchwork.sourceware.org or Bunsen
> > > https://builder.sourceware.org/testruns/ you might now have to enable
> > > javascript. This should not impact any scripts, just browsers (or bots
> > > pretending to be browsers). If it does cause trouble, please let us
> > > know. If this works out we might also "protect" bugzilla, gitweb,
> > > cgit, and the wikis this way.
> > >
> > > We don't like to hav to do this, but as some of you might have noticed
> > > Sourceware has been fighting the new AI scraperbots since start of the
> > > year. We are not alone in this.
> > >
> > > https://lwn.net/Articles/1008897/
> > > https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
> > >
> > > We have tried to isolate services more and block various ip-blocks
> > > that were abusing the servers. But that has helped only so much.
> > > Unfortunately the scraper bots are using lots of ip addresses
> > > (probably by installing "free" VPN services that use normal user
> > > connections as exit point) and pretending to be common
> > > browsers/agents.  We seem to have to make access to some services
> > > depend on solving a javascript challenge.
> >
> > Jan Wildeboer, on the fediverse, has a pretty interesting lead on how AI
> > scrapers might be doing this:
> > https://social.wildeboer.net/@jwildeboer/114360486804175788 (this is the
> > last post in the thread because it was hard to actually follow the
> > thread given the number of replies, please go all the way up and read
> > all 8 posts).
> >
> > Essentially, there's a library developer that pays developers to just
> > "include this library and a few more lines in your TOS". This library
> > then allows the app to sell the end-user's bandwidth to clients of the
> > library developer, allowing them to make requests. This is how big
> > companies are managing to have so many IP addresses, so many of those
> > being residential IP addresses, and it also means that by blocking those
> > IP addresses we will be - necessarily - blocking real user traffic to
> > our platforms.
> 
> It seems to me that blocking real users *who are running these shady
> apps* is perfectly reasonable.

How do you detect them? From my experience at other hosting places, 
those IPs, just make a few request per hours or per day, with a standard
User Agent. As such it's difficult to differentiate them from normal 
users.

The problem is that you suddenly have hundreds of thousands of requests 
per hours from just a slightly lower number of IPs. And in the middle 
you also have legit users using IPs from the same net block.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net


Sourceware infrastructure updates for Q1 2025

2025-04-22 Thread Mark Wielaard
Sourceware infrastructure community updates for Q1 2025

Sourceware has provided the infrastructure for core toolchain
and developer tools projects for more than 25 years.
https://sourceware.org/sourceware-25-roadmap.html

Over the last couple of years, Sourceware has transformed from a
purely volunteer into a professional organization with an eight person
strong Project Leadership Committee, monthly open office hours,
multiple hardware services partners, expanded services, the Software
Freedom Conservancy as fiscal sponsor and a more diverse funding model
that allows us to enter into contracts with paid contractors or staff
when appropriate.

Every quarter we provide a summary of news about Sourceware, the core
toolchain and developer tools infrastructure, covering the previous 3
months.

- Sourceware Survey 2025
- Cyber Security update and secure project policy checklist
- AI/LLM scraperbots attacks and Anubis
- New RISC-V CI builders
- Q3 server moves
- Signed-commit census report
- Sourceware Organization, Contact and Open Office hours

= Sourceware Survey 2025

  The survey ran from Friday, 14 March to Monday, 31 March. In the end
  we got 103 (!) responses with a nice mix of developers, users and
  maintainers from various hosted projects.

  Full results can be found at https://sourceware.org/survey-2025

  Thanks to everybody who responded, this will help guide the PLC
  allocate resources.

= Cyber Security update and secure project policy checklist

  Thanks to all the input during some of the Sourceware Open Office
  hours earlier this year, feedback given at Fosdem and discussions
  with the Software Freedom Conservancy we have update the Sourceware
  Cyber Security FAQ (really an explainer) with updates to the current
  state of the US Improving the Nation's Cybersecurity Executive Order
  and EU Cyber Resilience Act.

  We also added a section with Recommendations for Sourceware hosted
  projects.

  https://sourceware.org/cyber-security-faq.html

  For Sourceware hosted projects that want to have a documented
  verifiable cybersecurity policy we now have a policy checklist your
  project can follow. Most are common sense things most projects
  already do.
  https://sourceware.org/cyber-security-faq.html#policy-checklist

  Also check out the Sourceware infrastructure security vision and
  sourceware security posture:
  https://sourceware.org/sourceware-security-vision.html
  https://sourceware.org/sourceware-wiki/sourceware_security_posture/

= AI/LLM scraperbots attacks and Anubis

  As some of you might have noticed Sourceware has been fighting the
  new AI/LLM scraperbots since start of the year. We are not alone in
  this.

  https://lwn.net/Articles/1008897/
  
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

  We have tried to isolate services more and block various ip-blocks
  that were abusing the servers. But that has helped only so much.
  Unfortunately the scraper bots are using lots of ip addresses
  (probably by installing "free" VPN services that use normal user
  connections as exit point) and pretending to be common
  browsers/agents.  We seem to have to make access to some services
  depend on solving a javascript challenge.

  So when using https://patchwork.sourceware.org or Bunsen
  https://builder.sourceware.org/testruns/ you might now have to
  enable javascript.  This should not impact any scripts, just
  browsers (or bots pretending to be browsers).  If it does cause
  trouble, please let us know.  If this works out we might also
  "protect" bugzilla, gitweb, cgit, and the wikis this way.

  Thanks Xe Iaso who has helped us set this up.
  Please check out if you want to be one of their patrons as thank you.
  https://xeiaso.net/notes/2025/anubis-works/
  https://xeiaso.net/patrons/

= New RISC-V CI builders

  Thanks to RISC-V International we got 3 new buildbot CI workers.
  One HiFive Premier P550
  https://www.sifive.com/boards/hifive-premier-p550
  and two Banana Pi BPI-F3
  https://wiki.banana-pi.org/Banana_Pi_BPI-F3

  They have been used for testing the Valgrind risc-v backend that
  will be introduced with Valgrind 3.25.0 later this month.

  The P550 now runs a gdb and full testsuite build. One bpi-f3 runs
  glibc and the full testsuite. The other bpi-f3 runs a gcc bootstrap
  and full testsuite the bpi-f3 has an 8 core SpacemiT K1 supporting
  rvv 1.0.

  Unfortunately we had to shut down the Pioneer box, which was faster
  than the above machines, but just overheated too often and then
  needed manual intervention.

= Q3 server moves

  Somewhere in Q3 the Red Hat community cage, which hosts two of our
  servers, will move to another data center
  https://www.osci.io/tenants/

  We don't know the precise date yet. Please contact us ASAP if there
  is a specific date where your project really cannot tolerate any
  down time. The data centers are not too far apart and we hope any
  downtime will be 

Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Serhei Makarov



On Tue, Apr 22, 2025, at 10:17 AM, Aaron Merey wrote:
> dwfl_st or dwflst prefixes work for me. I think I slightly prefer
> dwfl_st. As for where to define Dwfl_Process_tracker let's try to keep
> it to the new dwfl_stacktraceP.h and if possible use forward
> declarations to avoid circular dependencies. If it's necessary to
> include more in libdwflP.h that should be ok since it's not publicly
> exposed,
Reworked 6/12 patches so far, but running into a linking problem:

Making all in libdw
make  all-am
  CCLD libdw.so
/usr/bin/ld: ../libdwfl/libdwfl_pic.a(dwfl_module_getdwarf.os): in function 
`open_elf':
/home/serhei/Documents/elfutils/libdwfl/dwfl_module_getdwarf.c:86:(.text+0xa13):
 undefined reference to `dwflst_tracker_cache_elf'

This seems to be an issue specific to libdwfl_pic.a rather than libdwfl.a.
The public function dwflst_tracker_cache_elf is called from 
dwfl_module_getdwarf.c.
Calling a public function across the library boundary should be no problem,
but we run into this -- circular dependency?

I'm not knowledgeable enough about how the Automake stuff works to know the
immediate resolution; concerned this delays testing the reworked patches.
Let's see if I can figure this out in the next 30min.

-- 
All the best,
Serhei
http://serhei.io


Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Frank Ch. Eigler
Hi -

>   CCLD libdw.so
> /usr/bin/ld: ../libdwfl/libdwfl_pic.a(dwfl_module_getdwarf.os): in function 
> `open_elf':
> Calling a public function across the library boundary should be no problem,
> but we run into this -- circular dependency?

It might just require the libdw.so LDFLAGS/LDADD to include the
new ../path/to/libdwflst.a static library.

- FChE



Re: [PATCH 04/13] libdwfl [4/13]: add dwfl_perf_sample_preferred_regs_mask

2025-04-22 Thread Serhei Makarov



On Tue, Apr 22, 2025, at 6:26 PM, Frank Ch. Eigler wrote:
> Hi -
>
>>   CCLD libdw.so
>> /usr/bin/ld: ../libdwfl/libdwfl_pic.a(dwfl_module_getdwarf.os): in function 
>> `open_elf':
>> Calling a public function across the library boundary should be no problem,
>> but we run into this -- circular dependency?
>
> It might just require the libdw.so LDFLAGS/LDADD to include the
> new ../path/to/libdwflst.a static library.
Yep, found the entries in libdw/Makefile.am I missed on my first pass-through. 
Thanks!

-- 
All the best,
Serhei
http://serhei.io