Daniel Sahlberg wrote on Mon, 21 Dec 2020 08:55 +0100:
> Den fre 27 nov. 2020 kl 19:26 skrev Daniel Shahaf <d...@daniel.shahaf.name>:
> 
> > Sounds good.  Nathan, Daniel Sahlberg — could you work with Infra on
> > getting the data over to ASF hardware?
> >  
> 
> I have been given access to svn-qavm and uploaded a tarball of the website
> (including mboxes). I'm a bit reluctant to unpack it since it takes almost
> 7GB, and there is only 14 GB disk space remaining. Is it ok to unpack or
> should we ask Infra for more disk space?
> 

I vote to ask for more disk space, especially considering that some
percentage is reserved for uid=0's use.

> > Note that svn-org@ doesn't have an equivalent @s.a.o list, and that, as
> > mentioned upthread, the post-migration (from tigris.org to apache.org)
> > mboxes may be in a different order than the official ones, and shouldn't
> > be "deduplicated".
> >  
> 
> The mboxes will be preserved but I don't plan to make them available for
> download (since they are not available from lists.a.o or mail-archives.a.o).
> 

Please do make them available for download.  Being able to download the
raw data is useful for both backup and perusal purposes, and I doubt
the bandwidth requirements would be a problem.  (Might want
a robots.txt entry, though?)

Regarding the behaviour of the existing archives, see
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202012.mbox>
(which used to also be available via
https://subversion.apache.org/mail/, but nowadays that just redirects
to a landing page ☹).  I don't know whether lists.a.o has equivalent
functionality, but then again, lists.a.o has had vendor lock-in baked
into it from day one, so a lack of a "download raw rfc822 data" feature
might simply be another form of that.

The mod_mbox product is owned by dev@httpd.

> > You indicate a desire to maintain URLs. Do you have some ideas on that?
> >
> > Each individual message .shtml file contains the message-id in
> > a comment.  We can extract the comments and build a redirector around
> > them.  (By the way, this is basically the same exercise that Infra must
> > have solved back when Sebb received that CSV file from the lists.a.o
> > vendor, so there may be an opportunity for code reuse.)  Of course, the
> > full rsync likely has the same info available less scrapily.
> >
> > Or, as mentioned above, the .shtml files could just be preserved
> > statically (plus or minus an appropriate message in the list of years on
> > the /${listname}/ page).  In fact, I'm having trouble coming up with
> > a reason _not_ to serve a static snapshot of the pages, even if we do
> > build a redirector.
> >  
> 
> No redirector as of now, only the static [s]html pages.
> 

<glass type="half-full">Yay!</glass>

> I will need some help from root to:

Not me, I'm afraid; ENOTIME.

> 1. Install a web server. nginx? (just kidding)

Apache HTTP Server would probably be a better choice since more dev@svn
and Infra people are familiar with it, but it's a fair question to ask.
(Cf. INFRA-7524)

> 2. Setup httpd.conf
> 3. Configure a DocumentRoot where I can put the files. Doesn't seem right
> to store them in /home

Hmm.  These things should all be done via puppet.  I'm not sure what's
best practice nowadays regarding writing puppet PRs and testing them,
though.

Cheers,

Daniel

Reply via email to