Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100: > Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d...@daniel.shahaf.name>: > > > Sounds good. Nathan, Daniel Sahlberg — could you work with Infra on > > getting the data over to ASF hardware? > > > > I have been given access to svn-qavm and uploaded a tarball of the website > (including mboxes). I'm a bit reluctant to unpack it since it takes almost > 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or > should we ask Infra for more disk space? >
I vote to ask for more disk space, especially considering that some percentage is reserved for uid=0's use. > > Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as > > mentioned upthread, the post-migration (from tigris.org to apache.org) > > mboxes may be in a different order than the official ones, and shouldn't > > be "deduplicated". > > > > The mboxes will be preserved but I don't plan to make them available for > download (since they are not available from lists.a.o or mail-archives.a.o). > Please do make them available for download. Being able to download the raw data is useful for both backup and perusal purposes, and I doubt the bandwidth requirements would be a problem. (Might want a robots.txt entry, though?) Regarding the behaviour of the existing archives, see <https://mail-archives.apache.org/mod_mbox/subversion-dev/202012.mbox> (which used to also be available via https://subversion.apache.org/mail/, but nowadays that just redirects to a landing page ☹). I don't know whether lists.a.o has equivalent functionality, but then again, lists.a.o has had vendor lock-in baked into it from day one, so a lack of a "download raw rfc822 data" feature might simply be another form of that. The mod_mbox product is owned by dev@httpd. > > You indicate a desire to maintain URLs. Do you have some ideas on that? > > > > Each individual message .shtml file contains the message-id in > > a comment. We can extract the comments and build a redirector around > > them. (By the way, this is basically the same exercise that Infra must > > have solved back when Sebb received that CSV file from the lists.a.o > > vendor, so there may be an opportunity for code reuse.) Of course, the > > full rsync likely has the same info available less scrapily. > > > > Or, as mentioned above, the .shtml files could just be preserved > > statically (plus or minus an appropriate message in the list of years on > > the /${listname}/ page). In fact, I'm having trouble coming up with > > a reason _not_ to serve a static snapshot of the pages, even if we do > > build a redirector. > > > > No redirector as of now, only the static [s]html pages. > <glass type="half-full">Yay!</glass> > I will need some help from root to: Not me, I'm afraid; ENOTIME. > 1. Install a web server. nginx? (just kidding) Apache HTTP Server would probably be a better choice since more dev@svn and Infra people are familiar with it, but it's a fair question to ask. (Cf. INFRA-7524) > 2. Setup httpd.conf > 3. Configure a DocumentRoot where I can put the files. Doesn't seem right > to store them in /home Hmm. These things should all be done via puppet. I'm not sure what's best practice nowadays regarding writing puppet PRs and testing them, though. Cheers, Daniel