WARC-files have their origins at the Internet Archive and they are essentially a persistent hash-table in the form of
key --- <URL the way it is in the Wild-Wild-Web> value --- <thefile> http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml the issue with the current Debian sites seems to be that tools like the https://github.com/ludios/grab-site create files like (~30MiB) http://temporary.softf1.com/2017/bugs/www.debian.org-devel-2016-12-28-ec5f8b13-00000.warc.gz that fail to be viewed with a tool like the https://github.com/alard/warc-proxy With the exception of large files https://github.com/alard/warc-proxy/issues/5 the warc-proxy actually works fine and the WARC cration and viewing tools that I use can be downloaded from (~9MiB) http://archive.softf1.com/2016/software/2016_12_xx_WARC_tools.tar.xz however, some sites, including the Debian web sites, fail to be "WARC-able". It would be nice, if it were fixed, specially given the fact that one never knows, when something becomes censored. Please keep in mind that there is no limit at the absurdity of censorship. At some day photos of pigeons might be banned, because may be some religious sect or political party finds them offensive or otherwise endangering their ability to keep the dumb ones working as slaves for them, paying taxes, etc. The warc-proxy works fine with files that have a size of ~200MiB, meaning, the aforementioned http://temporary.softf1.com/2017/bugs/www.debian.org-devel-2016-12-28-ec5f8b13-00000.warc.gz is not "too big". Regards, martin.v...@softf1.com