Er, sorry that didn’t sound right. Agree with you that not updating max-age 
seems more like a bug if we are to go with the hypothesis.

> On Mar 26, 2021, at 3:02 PM, Sudheer Vinukonda <sudheervinuko...@yahoo.com> 
> wrote:
> 
> 
> Thanks Brian! 
> 
> Hmm, it seems to me that the traces you shared actually further confirm the 
> hypothesis - ie ATS serves a locally cached stale object because of negative 
> revalidation (and Origin connection error/5xx), yet we do *not* want the 
> clients receiving that copy (whether it's an actual browser/device or a CDN) 
> to cache it. This makes sense because the problem is between ATS and its 
> Origin, but, we want the clients to continue trying to refresh their copy of 
> the object? 
> 
> 
> 
> On Friday, March 26, 2021, 01:01:15 PM PDT, Brian Neradt 
> <brian.ner...@gmail.com> wrote:
> 
> 
> Thank you Sudheer. This is really helpful. That at least gives some
> justification for the current behavior: try to communicate the lifetime of
> the object to something in the future so the upstream clients or proxies
> don't think they need to revalidate their cached objects. It is
> insufficient, unfortunately, since the mechanism doesn't remove any max-age
> directives which would override the Expires header. Take for example these
> following responses, the first being what the origin Proxy Verifier replied
> with and with which the cache was populated:
> 
> [0]: Wrote 134 bytes in an HTTP/1 response to request with key 1 with
> response status 200:
> - "content-length": "32"
> - "cache-control": "max-age=2"
> - "date": "Thu, 25 Mar 2021 19:02:17 GMT"
> 
> Note the Date is for yesterday. It is thus immediately stale since the
> max-age is for 2 seconds.
> 
> This second is the response received by the client for this same object and
> was served out of ATS's cache. Proxy Verifier was configured to reply with
> a 503, thus ATS served this stale resource out of cache because of negative
> revalidating since max_stale_age is 7 days (the default value), so ATS
> thinks it's fresh enough:
> 
>   [0]: Received an HTTP/1 200 response for key 2 with headers:
> - "content-length": "32"
> - "cache-control": "max-age=2"
> - "date": "Thu, 25 Mar 2021 19:02:17 GMT"
> - "Expires": "Fri, 26 Mar 2021 19:44:34 GMT"
> - "Connection": "keep-alive"
> - "Server": "ATS/10.0.0"
> 
> You see the problem: the Expires is set to 30 minutes in the future (the
> default value of negative_revalidating_lifetime). But the Date: header is
> yesterday and the max-age directive is still present and says 2 seconds.
> Since the RFC specifies preference of max-age directives over the Expires
> header, upstream caches will still consider this stale. s-maxage would have
> this problem as well. An Age header field can also be problematic for
> different reasons.
> 
> I say this only to clarify in more detail the limitations of what is
> currently done with the Expires header. None of this invalidates your
> observations. I'm more comfortable with and prefer option 1 now: simply
> document the current behavior. We can at least explain its rationale while
> acknowledging briefly its limitations. It has been like this for years and
> no one has expressed problems it has caused. I'll put up a documentation PR.
> 
> Thanks again for your thoughts.
> 
> Brian
> 
> 
> On Thu, Mar 25, 2021 at 11:27 AM Sudheer Vinukonda
> <sudheervinuko...@yahoo.com.invalid> wrote:
> 
> >  Hi Brian,
> > Thank you for the detailed and excellent write up on the current behavior!
> > Just to play a bit of devil's advocate, I'm wondering whether the
> > `negative_revalidating_lifetime` is being interpreted incorrectly. I do
> > admit it's confusing given the name and the documentation (which can both
> > clearly use some improvement), but, one potential interpretation of the
> > current behavior could be:
> > Use `negative_revalidating_lifetime` as a delta extension to the object's
> > life following a request that resulted in a Origin error (connection or
> > 5xx). IOW, when configured, this timeout is used to not ping the Origin
> > again and assume the object is fresh in cache for that duration. Arguably,
> > this may be a way to minimize the load on the Origin that may be
> > potentially experiencing systemic issues.
> > The reason to still cap the overall object's life to
> > `proxy.config.http.cache.max_stale_age` also seems like a
> > reasonable/desirable behavior to me. This is a catchall upper bound on how
> > long one would consider the object in their cache is still valid beyond
> > which it is not usable anymore - meaning, at that point returning an error
> > to the client is better than serving that stale copy and causing other side
> > effects.
> > Tldr; I think if we adjusted the documentation to explain this somehow,
> > then the current behavior doesn't sound all that embarrassing to me.
> > PS: Just to be clear, I'm not saying we should not change the current
> > behavior in favor of a better alternative, but only trying to
> > speculate/explain the possible reasoning behind the behavior.
> > Thanks.
> > Sudheer
> >    On Wednesday, March 24, 2021, 02:13:42 PM PDT, Brian Neradt <
> > brian.ner...@gmail.com> wrote:
> >
> >  dev@trafficserver.apache.org:
> >
> > Context
> > This design review concerns
> > proxy.config.http.negative_revalidating_lifetime
> > <
> > https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-negative-revalidating-lifetime
> > >
> > and
> > is the result of investigating the feature for the following issue:
> >
> > https://github.com/apache/trafficserver/issues/7425
> >
> > Feature Overview
> > The ATS negative revalidation feature implements the option to serve stale
> > cached responses in the circumstance that the server is not reachable for
> > revalidation. This can happen due to connectivity issues against the
> > origin or the origin's service being down resulting in 5xx responses.
> >
> > As a clarification, do not conflate negative revalidation with the negative
> > caching feature. While they sound alike, the two features are orthogonal to
> > each other. The ATS negative caching feature allows for caching of negative
> > HTTP responses (such as 404 "Not Found"). By contrast, as summarized above,
> > negative revalidation allows for serving some cached response that is stale
> > when the server is not reachable for revalidation.
> >
> > negative_revalidating_lifetime issues
> > That's the negative revalidation feature: serve stale content when the
> > server is not accessible. This generally works as expected. The problem
> > lies in determining whether the stale resource is fresh enough via the
> > negative_revalidating_lifetime configuration for this feature. The
> > intention of this lifetime configuration was to provide a way for the user
> > to specify how old a stale resource can be when negative revalidation takes
> > place before it will no longer be served from the cache. Looking at the
> > documentation and the code for it, the *expected behavior* seemed to have
> > been the following:
> >
> >
> >  1. A client requests an item. The server responds with, say, a 200 OK
> >  and the ATS proxy caches the response.
> >  2. At some point in the future, a client requests this item again.
> >  3. ATS inspects the item and determines that it is stale.
> >  4. ATS attempts to validate its cached resource against the origin.
> >  5. Either a connection cannot be made to the origin or it receives a 5xx
> >  in response to its validation request.
> >  6. If negative revalidation is enabled, ATS compares the age of the
> >  stale resource against negative_revalidating_lifetime. It will then do
> > one
> >  of two things:
> >      1. If the age of the cached resource is younger than
> >      negative_revalidating_lifetime, consider it "fresh enough" and
> > reply to the
> >      origin with the cached response.
> >      2. If the age is older than negative_revalidating_lifetime, reply
> >      with a 5xx response.
> >
> > In reality step 6 does not happen. The stale resource is never re-evaluated
> > as fresh or still stale against negative_revalidating_lifetime. Rather, the
> > determining factor of whether the resource is served is
> > proxy.config.http.cache.max_stale_age
> > <
> > https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-cache-max-stale-age
> > >.
> > That is, if its age is less than max_stale_age, it is used; otherwise it is
> > not. In addition, an incorrectly calculated Expires header field is added
> > which is set to negative_revalidating_lifetime seconds from the current
> > time rather than negative_revalidating_lifetime seconds from the resources
> > calculated generation time.
> >
> > These negative_revalidating_lifetime issues may very well be day zero bugs.
> > Looking at git, it goes back to at least 2009 in a commit
> > <
> > https://github.com/apache/trafficserver/commit/a1651345d11f228dfb392cc79169760e80d25f06
> > >
> > titled "Initial commit" (presumably when we transitioned source code
> > repositories). It has not worked for at least a very long time and people
> > have been relying, knowingly or not, upon max_stale_age to limit how stale
> > resources can be before they are used for negative revalidation.
> >
> > This design review evaluates the options for how to handle
> > negative_revalidating_lifetime.
> >
> > Options
> > Option 1: Document the Current Behavior
> > The first option is to leave the feature as is and document how
> > negative_revalidating_lifetime currently behaves.
> >
> > Advantages:
> >
> >  - This requires the least amount of work as it only involves a
> >  documentation change.
> >  - This will effectively avoid user confusion about
> >  negative_revalidating_lifetime so that no further issues will be filed
> > and
> >  time wasted when it does not behave as expected.
> >
> > Disadvantages
> >
> >  - negative_revalidating_lifetime, as it currently behaves, would be hard
> >  to explain without embarrassing the project. We would have to say
> > something
> >  like:
> >
> > The value of negative_revalidating_lifetime is used to calculate an Expires
> > > header with a field value of negative_revalidating_lifetime seconds in
> > the
> > > future from the time the cached resource is served. This header is added
> > in
> > > the cached response. No other behavior is influenced by this
> > configuration.
> > > The age limit enforced for such stale resources is configured
> > > via proxy.config.http.cache.max_stale_age.
> >
> >
> > This likely raises more questions than it answers. In reality, it would
> > probably be most helpful to steer users away from this feature and toward
> > max_stale_age.
> >
> >
> > Option 2: Fix the Issues
> > Alternatively, we can fix the issues surrounding
> > negative_revalidating_lifetime. Looking at the code, it seems that the
> > Expires header, which is currently incorrectly calculated, really isn't
> > needed. Rather we can make the negative_revalidating_lifetime conditionally
> > a part of the HttpTransact::is_stale_cache_response_returnable()
> > <
> > https://github.com/apache/trafficserver/blob/f639621e238f778ba6f18aaa5de47772d6504e55/proxy/http/HttpTransact.cc#L5943
> > >
> > function.
> >
> > In addition to max_stale_age, which is currently taken into account there,
> > we can also compare against negative_revalidating_lifetime to see whether
> > the resource can be served (but only when negative revalidation is being
> > used...the function is used in other contexts).
> >
> > Advantages:
> >
> >  - This addresses both the known issues with
> >  negative_revalidating_lifetime in that the Expires header is no longer
> >  added and the negative_revalidating_lifetime is effectively used in
> >  determining whether the item is fresh enough for the response.
> >  - The fix doesn't look all that hard to implement.
> >
> > Disadvantages:
> >
> >  - Implementing the feature adds various complexities to ATS. It
> >  complicates the documentation, it complicates the code, and will require
> >  tests to verify it works correctly. None of this complexity is large if
> > the
> >  feature is worth it, and I'm happy to do it. But I have my doubts that it
> >  is worth it. Keep in mind that negative_revalidating_lifetime has never
> >  been a working feature in ATS and this has caused problems for no one. If
> >  its absence had caused problems, we would know about it because people
> >  would complain. In fact, the only reported issue with the feature is the
> >  confusion it caused by existing in the document in the first place. Also,
> >  keep in mind that this feature would only be useful, if it is at all, to
> >  *extend* the lifetime of these cached resources past max_stale_age. Both
> >  of those values would be used and we would want to use the max of each.
> >  Currently max_stale_age defaults to 7 days. How many users will want
> >  max_stale_age to be one value for general max-age computations, but use a
> >  different, larger, value for negative revalidation? I'm guessing not
> > many.
> >  (Alternatively, I suppose we could use only
> > negative_revalidating_lifetime
> >  instead of using both it and max_stale_age in these situations, but that
> >  would potentially be a breaking change for current configurations in
> > which
> >  negative_revalidation would now only happen for the default 30 minutes of
> >  negative_revalidating_lifetime rather than the default 7 days of
> >  max_stale_age.)
> >
> >
> > Option 3: Remove negative_revalidating_lifetime
> > We can remove negative_revalidating_lifetime from the product. This would
> > involve:
> >
> >  - Removing the documentation of negative_revalidating_lifetime
> >  <
> > https://docs.trafficserver.apache.org/en/latest/admin-guide/files/records.config.en.html#proxy-config-http-negative-revalidating-lifetime
> > >
> >  from our records.config documentation. We would also want to explain the
> >  role that max_stale_age plays in limiting the usage of stale resources in
> >  negative revalidating circumstances.
> >  - Removing the addition of the faulty Expires header from HttpTransact.cc
> >  <
> > https://github.com/apache/trafficserver/blob/f639621e238f778ba6f18aaa5de47772d6504e55/proxy/http/HttpTransact.cc#L4364
> > >
> >  - Keep the parsing of negative_revalidating_lifetime, if it is used by a
> >  user, but add a Warning that it is defunct, deprecated, and will be
> > removed
> >  in a future release.
> >
> > Advantages:
> >
> >  - This removes the confusion associated with having both a
> >  negative_revalidating_lifetime feature used for these negative
> > revalidating
> >  resources and the max_stale_age feature.
> >  - It removes the bugs associated with negative_revalidating_lifetime
> >  because the feature is removed.
> >
> > Disadvantages:
> >
> >  - If someone would like a negative_revalidating_lifetime feature in the
> >  future, it will not exist and will need to be implemented.
> >
> > Suggestion
> > I recommend Option 3: Remove negative_revalidating_lifetime. It's safe to
> > remove because the feature does not currently work and seemingly has never
> > worked; thus nothing is being taken away from ATS. On the other hand, the
> > buggy behavior and confusing documentation is alleviated by it being
> > removed. The downside to this option is that we won't have the added
> > negative_revalidating_lifetime feature in the event that someone wants it
> > in the future. But, again, that leaves us no worse off than we are today
> > since our current implementation doesn't effectively implement it.
> >
> > Let me know if you have other thoughts or concerns.
> >
> > Thanks,
> > Brian
> >
> > --
> > "Come to Me, all who are weary and heavy-laden, and I will
> > give you rest. Take My yoke upon you and learn from Me, for
> > I am gentle and humble in heart, and you will find rest for
> > your souls. For My yoke is easy and My burden is light."
> >
> >    ~ Matthew 11:28-30
> 
> >
> 
> 
> 
> -- 
> "Come to Me, all who are weary and heavy-laden, and I will
> give you rest. Take My yoke upon you and learn from Me, for
> I am gentle and humble in heart, and you will find rest for
> your souls. For My yoke is easy and My burden is light."
> 
>     ~ Matthew 11:28-30

Reply via email to