At 04:06 AM 3/14/2013, tamouse mailing lists wrote:

>If the files are delivered via the web, by php or some other means, even if
>located outside webroot, they'd still be scrapeable.

Bots, however, being "mechanical" (i.e., hard wired or programmed) behave in 
different ways than humans, and that difference can be exploited in a script.

Part of the rationale in putting the files outside the root is that they have 
no URLs, eliminating one vulnerability (you can't scrape the URL of a file if 
it has no URL). Late last night I figured out why I was having trouble 
accessing those external files from my script, and now I'm working out the 
parsing details that enable one script to access multiple external files. My 
approach probably won't defeat all bad bots, but it will likely defeat most of 
them. You can't make code bulletproof, but you can wrap it in Kevlar.

Dale H. Cook, Member, NEHGS and MA Society of Mayflower Descendants;
Plymouth Co. MA Coordinator for the USGenWeb Project
Administrator of http://plymouthcolony.net 


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to