e.g. when scraping a website we might also scrape personal data.
Storing that is not possible according to GDPR without explicit
consent. 

Am Donnerstag, dem 09.01.2025 um 08:18 +0100 schrieb
sahy...@fileaffairs.de:
> Am Donnerstag, dem 09.01.2025 um 08:07 +0100 schrieb Tilman Hausherr:
> > How about a simple password protection?
> 
> is https access needed or would ssh do?
> 
> > Thus would be helpful during regression tests.
> 
> We have to restrict access. It might still be neccessary to delete
> some
> of the content where this is not public content/within the intented
> use. I.e. grabbing a website and storing it's content still poses
> some
> risks.
> 
> BR
> Maruan 
> 
> 
> > Tilman 
> > 
> > 
> > 
> > Gesendet mit der Telekom Mail App
> > 
> > -- Original-Nachricht --
> > Von: Andreas Lehmkühler <andr...@lehmi.de.invalid>
> > Betreff: Re: Turning off public access to the regression corpora?
> > Datum: 09.01.2025, 07:53 Uhr
> > An: corpora-dev@tika.apache.org
> > 
> > Hi,
> > 
> > I agree with Maruan. :-(
> > 
> > Just out of curiosity, the origin source of those files is some
> > public 
> > webserver, isn't it?
> > 
> > Andreas
> > 
> > Am 09.01.25 um 05:27 schrieb Maruan Sahyoun:
> > > Hi,
> > > 
> > > this is unfortunate but as this is posing the risk of legal
> > > actions
> > > to the ASF but also to me hosting the site I think we should stop
> > > that.
> > > 
> > > BR
> > > Maruan
> > > 
> > > > Am 09.01.2025 um 02:37 schrieb Tim Allison
> > > > <talli...@apache.org>:
> > > > 
> > > > \All,
> > > > We've gotten a handful of takedown requests recently. I had
> > > > initially
> > > > envisioned public sharing of files as a key component of our
> > > > server. We can
> > > > still use the files and offer read access to fellow file
> > > > researchers. I'm
> > > > not sure I want to deal with further takedown requests.
> > > > As an intermediate step, we could ask robots not to crawl the
> > > > data, but
> > > > that's not reliable.
> > > > So, in lieu of that, with heavy heart, I ask if it is time to
> > > > close off
> > > > public access?
> > > >   WDYT?
> > > > 
> > > >           Best,
> > > > 
> > > >                     Tim
> > 
> > 
> 

Reply via email to