Re: [squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.

Eliezer Croitoru Thu, 29 Oct 2015 21:45:07 -0700

Hey,

I was convinced that there was an option to disable the host forgerytest, which will make more sense if you will use bind and will interceptall DNS traffic into it.


About your idea for an upstream cache.

It's a pretty nice idea, I am pretty sure that the host forgery test canbe disabled in a case you are using an upstream cache_peer.If it is not in the code yet it should be reported as a bug.(some canargue it is a wanted feature)The idea by itself is not crazier that what I have done at:http://wiki.squid-cache.org/ConfigExamples/DynamicContent/Coordinator

The idea of pre-fetching is old and I had an intention to write and ICAPservice that will do something like that but with the full originalrequest headers.The issue with a pre-fetching of a file is that you will be required todownload the file at-least twice and the first request will might not besaved into the cache as it should.If you plan to implement pre-fetching consider using some ICAP servicethat will know about the full request headers to mimic the exact samerequest.If you do have interest in the ICAP idea take a look at the the ICAPservice I wrote(in Golang) at:

https://github.com/elico/squidblocker-icap-server

You can see that in the filterByUrl function the req.Request objectcontent can be dumped and re-used for the pre-fetch.


Eliezer

On 30/10/2015 05:09, Jester Purtteman wrote:


We've got a couple thoughts going at once here, so let me condense it a bit.  
First, yes, this is coming in over a satellite and that is part of the bugger.  
Nothing like 560 ms to bring a connection to a halt.  Part of my plan is 
exactly as you say, optimize the links by setting huge tcp_windows and all the 
rest so that I can get full bandwidth.  The other part of the story (and I 
could just be misunderstanding this too) is that it appears that if I have say, 
3 or 4 clients connect for a file over the course of the period of the 
download, if any one of them (or maybe just the last one, again, insufficient 
testing so I don't know the exactly course of events here) ends up requesting 
an IP different than what is looked up, it appeared to drop the file.

>I think a worse problem is if the DNS TTL is shorter than a client connections 
TCP connected time.
>Then requests arriving after the DNS TTL expired would no longer match the 
initial dst-IP.

That is what I think I was seeing:  if by that you mean, clients A, B, and C 
all request a large file (few hundred MB), it downloads but takes more than 300 
seconds (which has become a pretty common TTL, when did that happen?), and then 
D requests it too, but the DNS updates while its coming in and suddenly gets 
flagged as a host forgery and is no longer cacheable.  I could be wrong, so I 
need to experiment, but I think that’s what I am seeing.

My crazy solution is, I have a server on a fast connection on which I setup a 
cache there with a pretty big minimum and maximum file size (say, 10 MB minimum 
object size, 8,000MB maximum) and set it up as a parent cache to the cache out 
at the slow end of the universe, which is a transparent proxy.  The transparent 
proxy then uses the parent proxy to request the files, and when the files 
happen to be very big, I set up the connection to do a pre-cache (because a 100 
MB file is a piece of cake for a 100 mbps connection) and it stores it, because 
the time to download was trivial compared to the DNS TTL.  I set the cache up 
no the slow end to cache more aggressively, but the point is that once the 
cache down south has the file, the cache up north is requesting the file from a 
system much more optimized to pull big files over, and that improves the odds 
that the DNS has not updated before the transfer completes.

I'm not convinced my idea is valid, so I'll have to ponder it a bit, but I'm 
going to give it a shot and let you know if it makes a difference.  Bottom line 
is, it is a pretty nasty work around, and there is probably a better solution 
if someone that knows C out there worth beans is into it.  I don't think there 
are ANY answers that don't involve setting up your own DNS, but after 
configuring BIND in about 7 minutes last night, I am thinking that’s not a big 
issue.  The obvious answers I can think of are (1) to maintain a short table of 
IPs associated with a specific domain request until all transfers referring 
back to it have passed and rewrite the DNS resolution calls to refer to that 
table or (2) tag the requested IP and resolved IP.

The last line of C I wrote was in the 90s, but I'll dig in and see if I can 
find the right place to start making a mess:).

In any event, you and Eliezer have helped me get farther since Tuesday night 
than I had since August, Thank you both!



_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

Re: [squid-users] Inconsistent accessing of the cache, craigslist.org images, wacky stuff.

Reply via email to