Thank you Sudheer. This is really helpful. That at least gives some
justification for the current behavior: try to communicate the lifetime of
the object to something in the future so the upstream clients or proxies
don't think they need to revalidate their cached objects. It is
insufficient, unfortunately, since the mechanism doesn't remove any max-age
directives which would override the Expires header. Take for example these
following responses, the first being what the origin Proxy Verifier replied
with and with which the cache was populated:

 [0]: Wrote 134 bytes in an HTTP/1 response to request with key 1 with
response status 200:
- "content-length": "32"
- "cache-control": "max-age=2"
- "date": "Thu, 25 Mar 2021 19:02:17 GMT"

Note the Date is for yesterday. It is thus immediately stale since the
max-age is for 2 seconds.

This second is the response received by the client for this same object and
was served out of ATS's cache. Proxy Verifier was configured to reply with
a 503, thus ATS served this stale resource out of cache because of negative
revalidating since max_stale_age is 7 days (the default value), so ATS
thinks it's fresh enough:

   [0]: Received an HTTP/1 200 response for key 2 with headers:
- "content-length": "32"
- "cache-control": "max-age=2"
- "date": "Thu, 25 Mar 2021 19:02:17 GMT"
- "Expires": "Fri, 26 Mar 2021 19:44:34 GMT"
- "Connection": "keep-alive"
- "Server": "ATS/10.0.0"

You see the problem: the Expires is set to 30 minutes in the future (the
default value of negative_revalidating_lifetime). But the Date: header is
yesterday and the max-age directive is still present and says 2 seconds.
Since the RFC specifies preference of max-age directives over the Expires
header, upstream caches will still consider this stale. s-maxage would have
this problem as well. An Age header field can also be problematic for
different reasons.

I say this only to clarify in more detail the limitations of what is
currently done with the Expires header. None of this invalidates your
observations. I'm more comfortable with and prefer option 1 now: simply
document the current behavior. We can at least explain its rationale while
acknowledging briefly its limitations. It has been like this for years and
no one has expressed problems it has caused. I'll put up a documentation PR.

Thanks again for your thoughts.

Brian


On Thu, Mar 25, 2021 at 11:27 AM Sudheer Vinukonda
<sudheervinuko...@yahoo.com.invalid> wrote:

>  Hi Brian,
> Thank you for the detailed and excellent write up on the current behavior!
> Just to play a bit of devil's advocate, I'm wondering whether the
> `negative_revalidating_lifetime` is being interpreted incorrectly. I do
> admit it's confusing given the name and the documentation (which can both
> clearly use some improvement), but, one potential interpretation of the
> current behavior could be:
> Use `negative_revalidating_lifetime` as a delta extension to the object's
> life following a request that resulted in a Origin error (connection or
> 5xx). IOW, when configured, this timeout is used to not ping the Origin
> again and assume the object is fresh in cache for that duration. Arguably,
> this may be a way to minimize the load on the Origin that may be
> potentially experiencing systemic issues.
> The reason to still cap the overall object's life to
> `proxy.config.http.cache.max_stale_age` also seems like a
> reasonable/desirable behavior to me. This is a catchall upper bound on how
> long one would consider the object in their cache is still valid beyond
> which it is not usable anymore - meaning, at that point returning an error
> to the client is better than serving that stale copy and causing other side
> effects.
> Tldr; I think if we adjusted the documentation to explain this somehow,
> then the current behavior doesn't sound all that embarrassing to me.
> PS: Just to be clear, I'm not saying we should not change the current
> behavior in favor of a better alternative, but only trying to
> speculate/explain the possible reasoning behind the behavior.
> Thanks.
> Sudheer
>     On Wednesday, March 24, 2021, 02:13:42 PM PDT, Brian Neradt <
> brian.ner...@gmail.com> wrote:
>
>  dev@trafficserver.apache.org:
>
> Context
> This design review concerns
> proxy.config.http.negative_revalidating_lifetime
> <
> https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-negative-revalidating-lifetime
> >
> and
> is the result of investigating the feature for the following issue:
>
> https://github.com/apache/trafficserver/issues/7425
>
> Feature Overview
> The ATS negative revalidation feature implements the option to serve stale
> cached responses in the circumstance that the server is not reachable for
> revalidation. This can happen due to connectivity issues against the
> origin or the origin's service being down resulting in 5xx responses.
>
> As a clarification, do not conflate negative revalidation with the negative
> caching feature. While they sound alike, the two features are orthogonal to
> each other. The ATS negative caching feature allows for caching of negative
> HTTP responses (such as 404 "Not Found"). By contrast, as summarized above,
> negative revalidation allows for serving some cached response that is stale
> when the server is not reachable for revalidation.
>
> negative_revalidating_lifetime issues
> That's the negative revalidation feature: serve stale content when the
> server is not accessible. This generally works as expected. The problem
> lies in determining whether the stale resource is fresh enough via the
> negative_revalidating_lifetime configuration for this feature. The
> intention of this lifetime configuration was to provide a way for the user
> to specify how old a stale resource can be when negative revalidation takes
> place before it will no longer be served from the cache. Looking at the
> documentation and the code for it, the *expected behavior* seemed to have
> been the following:
>
>
>   1. A client requests an item. The server responds with, say, a 200 OK
>   and the ATS proxy caches the response.
>   2. At some point in the future, a client requests this item again.
>   3. ATS inspects the item and determines that it is stale.
>   4. ATS attempts to validate its cached resource against the origin.
>   5. Either a connection cannot be made to the origin or it receives a 5xx
>   in response to its validation request.
>   6. If negative revalidation is enabled, ATS compares the age of the
>   stale resource against negative_revalidating_lifetime. It will then do
> one
>   of two things:
>       1. If the age of the cached resource is younger than
>       negative_revalidating_lifetime, consider it "fresh enough" and
> reply to the
>       origin with the cached response.
>       2. If the age is older than negative_revalidating_lifetime, reply
>       with a 5xx response.
>
> In reality step 6 does not happen. The stale resource is never re-evaluated
> as fresh or still stale against negative_revalidating_lifetime. Rather, the
> determining factor of whether the resource is served is
> proxy.config.http.cache.max_stale_age
> <
> https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age
> >.
> That is, if its age is less than max_stale_age, it is used; otherwise it is
> not. In addition, an incorrectly calculated Expires header field is added
> which is set to negative_revalidating_lifetime seconds from the current
> time rather than negative_revalidating_lifetime seconds from the resources
> calculated generation time.
>
> These negative_revalidating_lifetime issues may very well be day zero bugs.
> Looking at git, it goes back to at least 2009 in a commit
> <
> https://github.com/apache/trafficserver/commit/a1651345d11f228dfb392cc79169760e80d25f06
> >
> titled "Initial commit" (presumably when we transitioned source code
> repositories). It has not worked for at least a very long time and people
> have been relying, knowingly or not, upon max_stale_age to limit how stale
> resources can be before they are used for negative revalidation.
>
> This design review evaluates the options for how to handle
> negative_revalidating_lifetime.
>
> Options
> Option 1: Document the Current Behavior
> The first option is to leave the feature as is and document how
> negative_revalidating_lifetime currently behaves.
>
> Advantages:
>
>   - This requires the least amount of work as it only involves a
>   documentation change.
>   - This will effectively avoid user confusion about
>   negative_revalidating_lifetime so that no further issues will be filed
> and
>   time wasted when it does not behave as expected.
>
> Disadvantages
>
>   - negative_revalidating_lifetime, as it currently behaves, would be hard
>   to explain without embarrassing the project. We would have to say
> something
>   like:
>
> The value of negative_revalidating_lifetime is used to calculate an Expires
> > header with a field value of negative_revalidating_lifetime seconds in
> the
> > future from the time the cached resource is served. This header is added
> in
> > the cached response. No other behavior is influenced by this
> configuration.
> > The age limit enforced for such stale resources is configured
> > via proxy.config.http.cache.max_stale_age.
>
>
> This likely raises more questions than it answers. In reality, it would
> probably be most helpful to steer users away from this feature and toward
> max_stale_age.
>
>
> Option 2: Fix the Issues
> Alternatively, we can fix the issues surrounding
> negative_revalidating_lifetime. Looking at the code, it seems that the
> Expires header, which is currently incorrectly calculated, really isn't
> needed. Rather we can make the negative_revalidating_lifetime conditionally
> a part of the HttpTransact::is_stale_cache_response_returnable()
> <
> https://github.com/apache/trafficserver/blob/f639621e238f778ba6f18aaa5de47772d6504e55/proxy/http/HttpTransact.cc#L5943
> >
> function.
>
> In addition to max_stale_age, which is currently taken into account there,
> we can also compare against negative_revalidating_lifetime to see whether
> the resource can be served (but only when negative revalidation is being
> used...the function is used in other contexts).
>
> Advantages:
>
>   - This addresses both the known issues with
>   negative_revalidating_lifetime in that the Expires header is no longer
>   added and the negative_revalidating_lifetime is effectively used in
>   determining whether the item is fresh enough for the response.
>   - The fix doesn't look all that hard to implement.
>
> Disadvantages:
>
>   - Implementing the feature adds various complexities to ATS. It
>   complicates the documentation, it complicates the code, and will require
>   tests to verify it works correctly. None of this complexity is large if
> the
>   feature is worth it, and I'm happy to do it. But I have my doubts that it
>   is worth it. Keep in mind that negative_revalidating_lifetime has never
>   been a working feature in ATS and this has caused problems for no one. If
>   its absence had caused problems, we would know about it because people
>   would complain. In fact, the only reported issue with the feature is the
>   confusion it caused by existing in the document in the first place. Also,
>   keep in mind that this feature would only be useful, if it is at all, to
>   *extend* the lifetime of these cached resources past max_stale_age. Both
>   of those values would be used and we would want to use the max of each.
>   Currently max_stale_age defaults to 7 days. How many users will want
>   max_stale_age to be one value for general max-age computations, but use a
>   different, larger, value for negative revalidation? I'm guessing not
> many.
>   (Alternatively, I suppose we could use only
> negative_revalidating_lifetime
>   instead of using both it and max_stale_age in these situations, but that
>   would potentially be a breaking change for current configurations in
> which
>   negative_revalidation would now only happen for the default 30 minutes of
>   negative_revalidating_lifetime rather than the default 7 days of
>   max_stale_age.)
>
>
> Option 3: Remove negative_revalidating_lifetime
> We can remove negative_revalidating_lifetime from the product. This would
> involve:
>
>   - Removing the documentation of negative_revalidating_lifetime
>   <
> https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-negative-revalidating-lifetime
> >
>   from our records.config documentation. We would also want to explain the
>   role that max_stale_age plays in limiting the usage of stale resources in
>   negative revalidating circumstances.
>   - Removing the addition of the faulty Expires header from HttpTransact.cc
>   <
> https://github.com/apache/trafficserver/blob/f639621e238f778ba6f18aaa5de47772d6504e55/proxy/http/HttpTransact.cc#L4364
> >
>   - Keep the parsing of negative_revalidating_lifetime, if it is used by a
>   user, but add a Warning that it is defunct, deprecated, and will be
> removed
>   in a future release.
>
> Advantages:
>
>   - This removes the confusion associated with having both a
>   negative_revalidating_lifetime feature used for these negative
> revalidating
>   resources and the max_stale_age feature.
>   - It removes the bugs associated with negative_revalidating_lifetime
>   because the feature is removed.
>
> Disadvantages:
>
>   - If someone would like a negative_revalidating_lifetime feature in the
>   future, it will not exist and will need to be implemented.
>
> Suggestion
> I recommend Option 3: Remove negative_revalidating_lifetime. It's safe to
> remove because the feature does not currently work and seemingly has never
> worked; thus nothing is being taken away from ATS. On the other hand, the
> buggy behavior and confusing documentation is alleviated by it being
> removed. The downside to this option is that we won't have the added
> negative_revalidating_lifetime feature in the event that someone wants it
> in the future. But, again, that leaves us no worse off than we are today
> since our current implementation doesn't effectively implement it.
>
> Let me know if you have other thoughts or concerns.
>
> Thanks,
> Brian
>
> --
> "Come to Me, all who are weary and heavy-laden, and I will
> give you rest. Take My yoke upon you and learn from Me, for
> I am gentle and humble in heart, and you will find rest for
> your souls. For My yoke is easy and My burden is light."
>
>     ~ Matthew 11:28-30
>



-- 
"Come to Me, all who are weary and heavy-laden, and I will
give you rest. Take My yoke upon you and learn from Me, for
I am gentle and humble in heart, and you will find rest for
your souls. For My yoke is easy and My burden is light."

    ~ Matthew 11:28-30

Reply via email to