RE: A way to find dead links for external pages

Jason Morris Wed, 30 Dec 2015 01:13:26 -0800

Hi All,

Just a thought…


I've had some success writing website testing utilities using a mashup of 
Crawl4J<https://github.com/yasserg/crawler4j> and 
HtmlUnit<http://htmlunit.sourceforge.net/>.

It's not too hard to hack something together to crawl a wiki and accumulate 
dead links.

Cheers,

Jason



-----Original Message-----
From: Juan Pablo Santos Rodríguez [mailto:juanpablo.san...@gmail.com]
Sent: Wednesday, 30 December 2015 4:15 AM
To: user@jspwiki.apache.org
Subject: Re: A way to find dead links for external pages



Hi,



we bundle, as an example, not intended for production use, a 
PingWeblogscomFilter [#1], which pings weblog.com on each page save (a much 
older, similar approach on [#2]). A plugin, performing similar functionality, 
could be easily made and placed on a protected wikipage, or better, perform the 
ping only for a given set of users / groups.



As for protecting the changing urls of external sites, you could define some 
interwiki links [#3]





HTH,

juan pablo





[#1]:

http://jspwiki.apache.org/apidocs/2.10.1/org/apache/wiki/filters/PingWeblogsComFilter.html

[#2]. http://www.ecyrd.com/JSPWiki/wiki/WeblogsPing

[#3]: https://jspwiki-wiki.apache.org/Wiki.jsp?page=InterWiki



On Tue, Dec 29, 2015 at 2:27 PM, Adrien Beau 
<adrienb...@gmail.com<mailto:adrienb...@gmail.com>> wrote:



> On Mon, Dec 28, 2015 at 7:09 PM, Harry Metske 
> <harry.met...@gmail.com<mailto:harry.met...@gmail.com>>

>  wrote:

> >

> > We considered it a security risk and did not implement it.

>

> Having a server go blindly into user-specified URLs is indeed a huge

> security risk. Users could easily create a denial of service (listing

> hundreds of URLs) either for the target or the JSPWiki server itself.

> They could also use the feature to exploit vulnerable URLs, disguising

> themselves as the JSPWiki server.

>

> However, I believe safer, more limited approaches could be used, that

> would still provide value to site administrators (from least to most

> dangerous, from least to most value to the administrator):

>

> - Collate all host names mentioned in wiki pages; run one DNS query

> per host name (using rate limits); take note of which host names are

> not existent anymore; report pages that contain links to those hosts

> - Similar idea, but run one HEAD HTTP request to the root (/) of each

> host name in addition to resolving the name

> - Similar idea, up to the path component of the URL; canonicalize it,

> apply a size limit, remove queries and fragments; this should still be

> rather safe

>

> (Note that these are only ideas. I am not volunteering to implement

> them.)

>

> --

> Adrien

>

RE: A way to find dead links for external pages

Reply via email to