Hi,

I'm having trouble using file() to get the text of web pages for an
in-house URL directory and search engine I'm trying to build. I'm using
something along the lines of the following code to take a user-submitted
URL and then fetch the text of the page at that address and store it in
a database for later searching. Most URLs work fine, but really long
ones with long query strings for some reason cause file() to throw up an
error. This URL used in the code below, for example, causes the error
below, but most any other URL works just fine. Any ideas?

Thanks,

Andy

 

//The code

$_POST['url'] =
'http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&th
readm=v0401171cb70ca1615b5a%40%5B192.34.169.24%5D&rnum=1&prev=/groups%3F
q%3D%252Bphp%2B%252Breturn%2B%252Bkey%2B%252Bform%26hl%3Den%26lr%3D%26ie
%3DUTF-8%26oe%3DUTF-8%26safe%3Doff%26selm%3Dv0401171cb70ca1615b5a%2540%2
55B192.34.169.24%255D%26rnum%3D1/';

$_POST['url'] = eregi_replace('http://','',$_POST['url']);

$_POST['url'] = eregi_replace('/$','',$_POST['url']);

$full_url = 'http://' . $_POST['url'] . '/';

 

if(!$body = file($full_url)){

   //echo an error message

} else {

   echo $body;

}

 

Causes:

Warning:
file("http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=o
ff&threadm=v0401171cb70ca1615b5a%40%5B192.34.169.24%5D&rnum=1&prev=/grou
ps%3Fq%3D%252Bphp%2B%252Breturn%2B%252Bkey%2B%252Bform%26hl%3Den%26lr%3D
%26ie%3DUTF-8%26oe%3DUTF-8%26safe%3Doff%26selm%3Dv0401171cb70ca1615b5a%2
540%255B192.34.169.24%255D%26rnum%3D1/") - No error in c:\program
files\nusphere\apache\htdocs\newslogic_dev\url_directory\test.php on
line 11

Reply via email to