>>>>> On Tue, 25 Sep 2007 09:17:23 -0500, >>>>> hadley wickham (hw) wrote:
>> > but I can understand your desire to do >> > that. Perhaps just taking a static snapshot using something like >> > wget, and hosting that on the R-project website would be a good >> > compromise. >> >> Hmm, wouldn't it be easier if the hosting institution would make a tgz >> file? wget over HTTP is rather bad in resolving links etc > Really? I've always found it to be rather excellent. Sorry, my statement was rather ambigous. wget is excellent, but mirroring via HTTP is terrible (administering CRAN for a couple of years gives you more experience than you ever wanted to have in that arena). With "links" I meant symbolic links on the server filesystem. HTTP does not make a difference between a symbolic link and a real file/directory, so for every symbolic link you get a copy of the target (and there is nothing wget can do about it AFAIK). > The reason I suggest it is that unless you have some way to generate a > static copy of the site, you'll need to ensure that the R-project > supports any dynamic content. e.g. for example the user 2008 site > uses some (fairly vanilla) php for including the header and > footer. I don't care how the tgz file I get is created, but probably it is better if the local authors create (and check) it rather than I do it. So no problem if the tarball is created using wget ... but I'd rather prefer not to do it myself. >> we could include a note on the top page that this is only a snapshot >> copy and have a link to the original site (in case something changes >> there). > That's reasonable, although it would be even better to have it on > every page. Again, if the authors create a tarball, they can put the note wherever they like. I thought of adding a link "local copy from 200x-yy-zz" to the list of conferences at www.R-project.org next to the links to the original sites. >> > The one problem is setting up a redirect so that existing links and >> > google searches aren't broken. This would need to be put in place at >> > least 6 months before the old website closed. >> >> Yes, very good point, I didn't think about that. But the R site is >> searched very often, so material there appears rather quickly on >> Google searches. Ad bookmarks: I don't want to remove the old site, >> just have an archive copy at a central location. > In that case, should it be labelled no-index as it's just a cache of > material that should be available elsewhere? We need some > machine-readable way of indicating where the canonical resource is. > It's always frustrated me a little that when googling for r > documentation, you find hundreds of the same page hosted at different > sites. Well, 2 copies are not as bad as hundreds. But material might get found faster on the www.R-project.org site, because that ranks surprisingly high in many google searches. Best, Fritz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.