This is a really helpful delineation of the issues. Thank you, Maruan, for
this and for all of your support with the server.

I'll open a ticket on LEGAL's jira?

On Wed, Jan 15, 2025 at 3:55 AM sahy...@fileaffairs.de <
sahy...@fileaffairs.de> wrote:

> Hi Tim,
>
> IMHO there are several parts to it.
>
> a) serving content which might look like other corps sites can be
> interpreted as phishing
> b) scraping and storing coyprighted content
> c) scraping and storing content containing personal data
>
> a) is being dealt with in the current form. As long as we don't
> publicly serve the files we are fine. We could also allow password
> protected https access if that has a benefit over ssh.
> b) scraping copyrighted information is typically OK (there are legal
> cases where this has been decided) although there might be cases where
> we need to remove individual files
> c) scraping and storing personal data is mostly not OK with GDPR and
> other acts without permission. This becomes very difficult to handle.
> E.g. if one uploaded a file to a bug tracker one could argue that if
> that file contained personal data by uploading one gave permission to
> use it within the context of the bug tracking and the dev process
> behind it. That doesn't include permission to load the file from that
> system and use it in a different context.
>
> I think until c is sorted we can not allow access in a wider context
> and even need to reconsider if we can use it at all although being very
> beneficial.
>
> Maybe we can have a chat with legal about that.
>
> BR
> Maruan
>
>
>
>
> Am Dienstag, dem 14.01.2025 um 08:17 -0500 schrieb Tim Allison:
> > Hi Stefan,
> >
> >   I'm sorry for this sudden change. I'm hoping that we can find a way
> > to
> > make this all work again, but there are complexities. Part of the
> > challenge
> > is that the liability is spread across several organizations and
> > individuals; part of the challenge is everything to do with the
> > varying
> > global legal/privacy requirements around crawled data. And there are
> > other
> > challenges.
> >
> >   These corpora have been critical to numerous parsing projects at
> > the ASF
> > and to devs and projects outside of ASF.   I've heard from a few
> > others
> > offline who are also affected by this.
> >
> >
> > All,
> >   What are our priorities? How can we move forward? Some options that
> > I see:
> >
> > 0) nuclear option: shutdown the server entirely
> > 1) continue as we have it now -- no http/s access
> > 2) host reports/metadata only via https
> > 3) host "packaged" corpora in zips (password protected?) via https
> > 4) password protect https access to the corpora
> > 5) not a viable option: turn everything back on
> > 6) not a viable option: turn everything back on with a strict
> > robots.txt
> > policy
> >
> >   Any other options? What are our preferences?
> >
> >           Best,
> >
> >                 Tim
> >
> > On Sat, Jan 11, 2025 at 9:01 AM stefan6419846
> > <stefan6419...@gmail.com>
> > wrote:
> >
> > > We at pypdf (https://github.com/py-pdf/pypdf) have been hit by the
> > > unexpected shutdown of the service and were glad to at least find
> > > this
> > > indirect announcement. Nevertheless, it seems like we have to find
> > > a
> > > suitable alternative for the previously used govdocs1 PDF files
> > > from
> > > your server, as the official govdocs1 sources do not expose the
> > > single
> > > PDF files directly.
> > >
> > > Thanks for hosting these files in the past.
> > >
> > > Best regards,
> > > Stefan
> > >
> > > On 2025/01/09 01:36:59 Tim Allison wrote:
> > > > \All,
> > > >  We've gotten a handful of takedown requests recently. I had
> > > > initially
> > > > envisioned public sharing of files as a key component of our
> > > > server. We
> > > can
> > > > still use the files and offer read access to fellow file
> > > > researchers. I'm
> > > > not sure I want to deal with further takedown requests.
> > > >  As an intermediate step, we could ask robots not to crawl the
> > > > data, but
> > > > that's not reliable.
> > > >  So, in lieu of that, with heavy heart, I ask if it is time to
> > > > close off
> > > > public access?
> > > >   WDYT?
> > > >
> > > >           Best,
> > > >
> > > >                     Tim
> > > >
> > >
>
>

Reply via email to