On Thu, 26 Dec 2019 16:13:33 +0000
"goleo ." <goleo...@gmail.com> wrote:

> I was wondering how much space distfiles on "ftp" take, so because
> I couldn't see that in my web browser clearly, I downloaded the page
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt

With wget, you can download the HTML of a web page, and also recurse
into links within it. 

$ wget -r -l 0 -A '*.html' --no-parent -O everything.html 
https://ftp.openbsd.org/pub/OpenBSD/distfiles/

This command recurses into an infinite number of links without going up
in the hierarchy and into the parent directory, downloads only other
.html files (from which more links can be acquired), and appends 
everything to an "everything.html" file.

After a few minutes running and just ~1.7MiB of HTML downloaded, it 
tried to recurse into a lot of non-existing directories, so I cut it
short there. The figure may not be perfect.

$ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk 
'{sum+=$1} END{print sum / 1024 / 1024}'
65629


The sum of all filesizes, which are listed in kebibytes, divided by
1024^2, to turn it into gibibytes, returns 65629 gibibytes or about
65 tebibytes.
This number seems a little absurd, I'm not sure if I made a mistake.
It does not seem completely implausible either however, the tree 
does have files dating all the way back to 1990.
https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/

Reply via email to