what about a post-optimization of the cache? I mean... 1. when ATS receives a huge data it stores the URLs with a rounded timestamp and the flag "checked:true/false" into a RDBMS (eg. postgresql) with a unique constraint on URLs and timestamp fields 2. a batch process periodically get URLs ( last_check_time<timestamp, checked=false) from DB, requests them to ATS that has cached them, calculates SHA and then performs two queries to a NoSQL: insert "key:URL,value:SHA" into table "A" (always), insert "key:SHA, value:URL" into table "B" (if not exists, else update the expire timeout for this key and delete the ATS cache of the new URL), finally set flag checked=true 3. when ATS receives requests from a client (not the batch process) it looks for records in table "A" of NoSQL, if a value is returned it looks for the url from table "B" and finally returns its cached data, else forward request to origin.
Obviously you should estimate the convenience of something like that. Do you have so much huge traffic/cache?