Hi! Mark H Weaver <m...@netris.org> skribis:
> Tobias Geerinckx-Rice <m...@tobias.gr> writes: [...] >> Are you sure? I was under the impression¹ that this is exactly what >> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please >> — anyone! — correct me if I'm misguided. > > I agree that "proxy_cache_lock on" should prevent multiple concurrent > requests for the same URL, but unfortunately its behavior is quite > undesirable, and arguably worse than leaving it off in our case. See: > > https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock > > Specifically: > > Other requests of the same cache element will either wait for a > response to appear in the cache or the cache lock for this element to > be released, up to the time set by the proxy_cache_lock_timeout > directive. > > In our problem case, it takes more than an hour for Hydra to finish > sending a response for the 'texlive-texmf' nar. During that time, the > nar will be slowly sent to the first client while it's being packed and > bzipped on-demand. > > IIUC, with "proxy_cache_lock on", we have two choices of how other > client requests will be treated: > > (1) If we increase "proxy_cache_lock_timeout" to a huge value, then > there will *no* data sent to the other clients until the first > client has received the entire nar, which means they wait over an > hour before receiving the first byte. I guess this will result in > timeouts on the client side. > > (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients > will get failure responses until the first client has received the > entire nar. > > Either way, this would cause users to see the same download failures > (requiring user work-arounds like --fallback) that this fix is intended > to prevent for 'texlive-texmf', but instead of happening only for that > one nar, it will now happen for *all* large nars. My understanding is that proxy_cache_lock allows us to avoid spawning concurrent compression threads of the same item at the same time, while also avoiding starvation (proxy_cache_lock_timeout should ensure that nobody ends up waiting until the nar-compression process is done.) IOW, it should help reduce load in most cases, while introducing small delays in some cases (if you’re downloading a nar that’s already being downloaded.) > IMO, the best solution is to *never* generate nars on Hydra in response > to client requests, but rather to have the build slaves pack and > compress the nars, copy them to Hydra, and then serve them as static > files using nginx. The problem is that we want nars to be signed by the master node. Or, if we don’t require that, we need a PKI that allows us to express the fact that hydra.gnu.org delegates to the build machines. > A far inferior solution, but possibly acceptable and closer to the > current approach, would be to arrange for all concurrent responses for > the same nar to be sent incrementally from a single nar-packing process. > More concretely, while packing and sending a nar response to the first > client, the data would also be written to a file. Subsequent requests > for the same nar would be serviced using the equivalent of: > > tail --bytes=+0 --follow FILENAME > > This way, no one would have to wait an hour to receive the first byte. Yes. I would think that NGINX does something like that for its caching, but I don’t know exactly when/how. Other solutions I’ve thought about: 1. Produce narinfos and nars periodically rather than on-demand and serve them as static files. pros: better HTTP latency and bandwidth pros: allows us to add a Content-Length for nars cons: doesn’t reduce load on hydra.gnu.org cons: introduces arbitrary delays in delivering nars cons: difficult/expensive to know what new store items are available 2. Produce a narinfo and corresponding nar the first time they are requested. So, the first time we receive “GET foo.narinfo”, return 404 and spawn a thread to compute foo.narinfo and foo.nar. Return 200 only when both are ready. The precomputed nar{,info}s would be kept in a cache and we could make sure a narinfo and its nar have the same lifetime, which addresses one of the problems we have. pros: better HTTP latency and bandwidth pros: allows us to add a Content-Length for nars pros: helps keep narinfo/nar lifetime in sync cons: doesn’t reduce load on hydra.gnu.org cons: exposes inconsistency between the store contents and the HTTP response (you may get 404 even if the thing is actually in store), but maybe that’s not a problem Thoughts? Ludo’.