On Fri, Dec 25, 2020 at 11:17 AM Daniel Shahaf <d...@daniel.shahaf.name> wrote: >...
> > I'll figure out a way to have the mboxes downloadable. If I understand > > Google's documentation of robots.txt they don't care about robots.txt if > a > > specific URL is linked from somewhere indexable, they will index it > anyway. > > Maybe just make one big tarball of everything? > > One big tarball would be wasteful to consume (would have to download > everything) and to produce (would need to, basically, «cp everything.tgz > tmp.tgz; tar -zcf - $new >> tmp.tgz; mv tmp.tgz everything.tgz», and you > can > see that's O(#everything) rather than O(appended stuff)). Would rather > avoid > it if possible. > > Not sure what to do about robots. I suppose we could set <link > rel="canonical"> in the HTTP headers when serving the rfc822 files (example > in <https://en.wikipedia.org/wiki/Canonical_link_element#HTTP>)? > I thought robots.txt can exclude subdirs. So just cut off (say) svn-haxx.apache.org/mbox/ I'm not too worried about Google crawling the mboxes, as they'll likely do it just once and never again (by keeping the etag and/or mtime). >... > > I couldn't figure out puppet, the links was 404 for me. I've created a > > request in Jira and I hope someone will take a look: > > https://issues.apache.org/jira/browse/INFRA-21230 > > I think the github repository is restricted to Apache committers only, so > you'll need to enter your github username on id.apache.org in order to get > access to that URL. If you don't have a github account, there ought to be > a mirror of the repository on *.apache.org somewhere (at least, if Infra's > following the same policy PMCs do). > Correct: committers only. And only after linking accounts via https://gitbox.apache.org/setup/ as Nathan noted (and we forgot to mention to DSahlberg). If you do not have a GitHub account, or do not want one (say, because you don't want to accept their T&Cs), then you can use the repository via gitbox.apache.org (ask on Slack for the link; I prefer not to post it here). Cheers, -g