Hi All, Just a thought…
I've had some success writing website testing utilities using a mashup of Crawl4J<https://github.com/yasserg/crawler4j> and HtmlUnit<http://htmlunit.sourceforge.net/>. It's not too hard to hack something together to crawl a wiki and accumulate dead links. Cheers, Jason -----Original Message----- From: Juan Pablo Santos Rodríguez [mailto:juanpablo.san...@gmail.com] Sent: Wednesday, 30 December 2015 4:15 AM To: user@jspwiki.apache.org Subject: Re: A way to find dead links for external pages Hi, we bundle, as an example, not intended for production use, a PingWeblogscomFilter [#1], which pings weblog.com on each page save (a much older, similar approach on [#2]). A plugin, performing similar functionality, could be easily made and placed on a protected wikipage, or better, perform the ping only for a given set of users / groups. As for protecting the changing urls of external sites, you could define some interwiki links [#3] HTH, juan pablo [#1]: http://jspwiki.apache.org/apidocs/2.10.1/org/apache/wiki/filters/PingWeblogsComFilter.html [#2]. http://www.ecyrd.com/JSPWiki/wiki/WeblogsPing [#3]: https://jspwiki-wiki.apache.org/Wiki.jsp?page=InterWiki On Tue, Dec 29, 2015 at 2:27 PM, Adrien Beau <adrienb...@gmail.com<mailto:adrienb...@gmail.com>> wrote: > On Mon, Dec 28, 2015 at 7:09 PM, Harry Metske > <harry.met...@gmail.com<mailto:harry.met...@gmail.com>> > wrote: > > > > We considered it a security risk and did not implement it. > > Having a server go blindly into user-specified URLs is indeed a huge > security risk. Users could easily create a denial of service (listing > hundreds of URLs) either for the target or the JSPWiki server itself. > They could also use the feature to exploit vulnerable URLs, disguising > themselves as the JSPWiki server. > > However, I believe safer, more limited approaches could be used, that > would still provide value to site administrators (from least to most > dangerous, from least to most value to the administrator): > > - Collate all host names mentioned in wiki pages; run one DNS query > per host name (using rate limits); take note of which host names are > not existent anymore; report pages that contain links to those hosts > - Similar idea, but run one HEAD HTTP request to the root (/) of each > host name in addition to resolving the name > - Similar idea, up to the path component of the URL; canonicalize it, > apply a size limit, remove queries and fragments; this should still be > rather safe > > (Note that these are only ideas. I am not volunteering to implement > them.) > > -- > Adrien >