On Tue, 26 Mar 2019 at 08:25, Bertrand Delacretaz <bdelacre...@apache.org> wrote: > > On Mon, Mar 25, 2019 at 10:16 AM Bertrand Delacretaz > <bdelacre...@apache.org> wrote: > > ...I have saved the contents of https://wiki.apache.org/incubator/ at > > https://svn.apache.org/repos/private/pmc/incubator/wiki-archive-march-2019/ > > ... > > FWIW, as someone was asking how that was done, I just used > > wget -r -l5 -np https://wiki.apache.org/incubator/ > > and then semi-manually removed the help pages based on their names, > which are in several languages.
Unfortunately that does not seem to capture all the pages, for example the following is missing: https://wiki.apache.org/incubator/WookieProposal Something must have gone wrong with the download. I did another download using the following command: wget -r -np -l1 --reject-regex '(.*)\?(.*)' http://wiki.apache.org/incubator/TitleIndex The TitleIndex should have links to every page, so there's no need to follow links further. Also the regex stops it from asking for raw pages etc. Also used the following .wgetrc header = Accept-Language: en-us,en;q=0.5 header = Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 header = Connection: keep-alive referer = / robots = off random_wait = on wait = 1 OK to add the missing pages to the archive? S. > -Bertrand > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org