On Sun, 11 May 1997, J.H.M. Dassen wrote: > How can I get a list of the URLs of the objects that squid has currently > cached?
awk '{print $6}' </var/spool/squid/log The 'log' file format depends on the squid version. This is for squid 1.1.x - if you're still using the old squid 1.0.x you'll have to look at the file to figure out which field to print with awk. > Having such a list would allow me to use 'wget' to refresh the cache; this > would be useful for my laptop system, which is not alway on the net. #! /bin/sh proxy=some.host port=3128 http_proxy=http://$proxy:$port/ ftp_proxy=http://$proxy:$port/ gopher_proxy=http://$proxy:$port/ awk '{print $6}' </var/spool/squid/log | \ wget -q -nh -i /dev/stdin -O /dev/null This is untested but it should work. If wget doesn't like working with /dev/stdin then you'll have to redirect the output of awk to a temporary file (e.g. "tmpfile=/tmp/wget.$$") and use that instead. The -q is for "quiet", the -nh is to disable DNS lookups of hostnames (let squid do that as required). The "-O /dev/null" should make wget just dump everything it fetches into the bit-bucket. If you wanted to exclude certain URLs then you could insert a 'grep -v <regexp> | \' line in between the awk and the wget. e.g. $exclude="foo.com\|bar.org\|ftp://\|gopher://" awk '{print $6}' </var/spool/squid/log | \ grep -v "$exclude" \| wget -q -nh -i /dev/stdin -O /dev/null excludes all ftp & gopher URLs, as well as everything from domains foo.com and bar.org I also have a sample perl script posted by Duane Wessels (squid author) on the squid-user list for converting the log file into pathnames (this only works if you have a single cache_dir): #!/usr/bin/perl $L1= 16; # Level 1 directories $L2= 256; # Level 2 directories while (<>) { $f= hex($_); $path= sprintf("%02X/%02X/%08X", $f % $L1, ($f / $L1) % $L2, $f); print $path ; } (modified slightly from Duane's original to suit my purposes) Converts log lines like: 00006075 3373d9ac fffffffe 33054581 1667 http://foo.com/path/file.html into lines like: 05/07/00006075 which are pathnames relative to the cache_dir (/var/spool/squid by default on debian systems) You can use this to extract information about URLs from the cache - the first few lines (usually approx 6 or 8) of each cached file contain "header" information about the URL for squid's use. e.g. $ head -6 /var/spool/squid/00/00/00007001 HTTP/1.0 200 OK Server: Netscape-Commerce/1.12 Date: Tuesday, 29-Apr-97 11:45:24 GMT Last-modified: Friday, 28-Mar-97 01:11:23 GMT Content-length: 656 Content-type: image/gif "head -6" is inadequate - sometimes there are more than 6 headers. I don't think there is ever less than 6. Unfortunately, the 'header' program which comes with deliver doesn't work on these files (probably because the "HTTP/1.0 ....." first line doesn't have a : in it) have fun! craig -- craig sanders networking consultant Available for casual or contract temporary autonomous zone system administration tasks. -- TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to [EMAIL PROTECTED] . Trouble? e-mail to [EMAIL PROTECTED] .