Again, this is more for a user@ list.... Sorry. I want to confirm I understand refetching correctly.
When the crawler goes to refetch a page, it adds the If-Modified-Since and the If-None-Match (if an etag exists) headers. If the host respects those, it will return a 200 and new content if something has changed, otherwise it will return a non-200. If the host doesn't respect those headers and returns exactly the same bytes as were originally fetched with a 200, that content is returned and written to a bolt. In short, if we're writing to warcs, and we refetch a page that returns a 200 and the contents are the same as we originally fetched, we'll have two copies of the same content? Thank you! Best, Tim