bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos

Ludovic Courtès Fri, 24 Mar 2017 02:26:46 -0700

Hi!

Mark H Weaver <m...@netris.org> skribis:


> Tobias Geerinckx-Rice <m...@tobias.gr> writes:

[...]

>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case.  See:
>
>   https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
>   Other requests of the same cache element will either wait for a
>   response to appear in the cache or the cache lock for this element to
>   be released, up to the time set by the proxy_cache_lock_timeout
>   directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar.  During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
>     there will *no* data sent to the other clients until the first
>     client has received the entire nar, which means they wait over an
>     hour before receiving the first byte.  I guess this will result in
>     timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
>     will get failure responses until the first client has received the
>     entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you’re downloading a nar that’s already being
downloaded.)

> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

The problem is that we want nars to be signed by the master node.  Or,
if we don’t require that, we need a PKI that allows us to express the
fact that hydra.gnu.org delegates to the build machines.

> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
>   tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes.  I would think that NGINX does something like that for its caching,
but I don’t know exactly when/how.

Other solutions I’ve thought about:

  1. Produce narinfos and nars periodically rather than on-demand and
     serve them as static files.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     cons: doesn’t reduce load on hydra.gnu.org
     cons: introduces arbitrary delays in delivering nars
     cons: difficult/expensive to know what new store items are available

  2. Produce a narinfo and corresponding nar the first time they are
     requested.  So, the first time we receive “GET foo.narinfo”, return
     404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
     200 only when both are ready.

     The precomputed nar{,info}s would be kept in a cache and we could
     make sure a narinfo and its nar have the same lifetime, which
     addresses one of the problems we have.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     pros: helps keep narinfo/nar lifetime in sync
     cons: doesn’t reduce load on hydra.gnu.org
     cons: exposes inconsistency between the store contents and the HTTP
           response (you may get 404 even if the thing is actually in
           store), but maybe that’s not a problem

Thoughts?

Ludo’.

bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos

Reply via email to