On 02/27, Jonathan Nieder wrote:
> Hi,
>
> Brandon Williams wrote:
>
> > Teach remote-curl the 'stateless-connect' command which is used to
> > establish a stateless connection with servers which support protocol
> > version 2. This allows remote-curl to act as a proxy, allowing the git
> > client to communicate natively with a remote end, simply using
> > remote-curl as a pass through to convert requests to http.
>
> Cool! I better look at the spec for that first.
>
> *looks at the previous patch*
>
> Oh, there is no documented spec. :/ I'll muddle through this instead,
> then.
I'll make sure to add one :)
>
> [...]
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery
> > *heads, int for_push)
> > heads->version = discover_version(&reader);
> > switch (heads->version) {
> > case protocol_v2:
> > - die("support for protocol v2 not implemented yet");
> > + /*
> > + * Do nothing. Client should run 'stateless-connect' and
> > + * request the refs themselves.
> > + */
> > break;
>
> This is the 'list' command, right? Since we expect the client to run
> 'stateless-connect' instead, can we make it error out?
Yes and no. Remote-curl will run this when trying to establish a
stateless-connection. If the response is v2 then this is a capability
list and not refs. So the capabilities will be dumped to the client and
they will be able to request the refs themselves at a later point. The
comment here is just misleading, so i'll make sure to fix it.
>
> [...]
> > @@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
> > free(specs);
> > }
> >
> > +struct proxy_state {
> > + char *service_name;
> > + char *service_url;
> > + struct curl_slist *headers;
> > + struct strbuf request_buffer;
> > + int in;
> > + int out;
> > + struct packet_reader reader;
> > + size_t pos;
> > + int seen_flush;
> > +};
>
> Can this have a comment describing what it is/does? It's not obvious
> to me at first glance.
>
> It doesn't have to go in a lot of detail since this is an internal
> implementation detail, but something saying e.g. that this represents
> a connection to an HTTP server (that's an example; I'm not saying
> that's what it represents :)) would help.
Always making new code have higher standards than the existing code ;)
Haha, I'll add a simple comment explaining it.
>
> > +
> > +static void proxy_state_init(struct proxy_state *p, const char
> > *service_name,
> > + enum protocol_version version)
> [...]
> > +static void proxy_state_clear(struct proxy_state *p)
>
> Looks sensible.
>
> [...]
> > +static size_t proxy_in(char *buffer, size_t eltsize,
> > + size_t nmemb, void *userdata)
>
> Can this have a comment describing the interface? (E.g. does it read
> a single pkt_line? How is the caller expected to use it? Does it
> satisfy the interface of some callback?)
I'll add a comment that its used as a READFUNCTION callback for curl and
that it tries to copy over a packet-line at a time.
>
> libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
> calls this read_callback. Such a name plus a pointer to
> CURLOPT_READFUNCTION should do the trick; bonus points if the comment
> says what our implementation of the callback does.
>
> Is this about having peek ability?
No its just that Curl only requests a set about of data at a time so you
need to be able to buffer the data that can't be read yet.
>
> > + struct proxy_state *p = userdata;
> > + size_t avail = p->request_buffer.len - p->pos;
> > +
> > + if (!avail) {
> > + if (p->seen_flush) {
> > + p->seen_flush = 0;
> > + return 0;
> > + }
> > +
> > + strbuf_reset(&p->request_buffer);
> > + switch (packet_reader_read(&p->reader)) {
> > + case PACKET_READ_EOF:
> > + die("unexpected EOF when reading from parent process");
> > + case PACKET_READ_NORMAL:
> > + packet_buf_write_len(&p->request_buffer, p->reader.line,
> > + p->reader.pktlen);
> > + break;
> > + case PACKET_READ_DELIM:
> > + packet_buf_delim(&p->request_buffer);
> > + break;
> > + case PACKET_READ_FLUSH:
> > + packet_buf_flush(&p->request_buffer);
> > + p->seen_flush = 1;
> > + break;
> > + }
> > + p->pos = 0;
> > + avail = p->request_buffer.len;
> > + }
> > +
> > + if (max < avail)
> > + avail = max;
> > + memcpy(buffer, p->request_buffer.buf + p->pos, avail);
> > + p->pos += avail;
> > + return avail;
>
> This is a number of bytes, but CURLOPT_READFUNCTION expects a number
> of items, fread-style. That is:
>
> if (avail < eltsize)
> ... handle somehow, maybe by reading in more? ...
>
> avail_memb = avail / eltsize;
> memcpy(buffer,
> p->request_buffer.buf + p->pos,
> st_mult(avail_memb, eltsize));
> p->pos += st_mult(avail_memb, eltsize);
> return avail_memb;
>
> But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says
>
> Your function must then return the actual number of bytes that
> it stored in that memory area.
>
> Does this mean eltsize is always 1? This is super confusing...
>
> ... ok, a quick grep for fread_func in libcurl reveals that eltsize is
> indeed always 1. Can we add an assertion so we notice if that
> changes?
>
> if (eltsize != 1)
> BUG("curl read callback called with size = %zu != 1", eltsize);
> max = nmemb;
Yeah i can go ahead and do this. Just note that the v1 path uses logic
identical to this so it would be a problem there.
>
> [...]
> > +static size_t proxy_out(char *buffer, size_t eltsize,
> > + size_t nmemb, void *userdata)
> > +{
> > + size_t size = eltsize * nmemb;
> > + struct proxy_state *p = userdata;
> > +
> > + write_or_die(p->out, buffer, size);
> > + return size;
> > +}
>
> Nice. Same questions about st_mult or just asserting on eltsize apply
> here, too.
>
> [...]
> > +static int proxy_post(struct proxy_state *p)
>
> What does this function do? Can it get a brief comment?
Will do.
>
> > +{
> > + struct active_request_slot *slot;
> > + int err;
> > +
> > + slot = get_active_slot();
> > +
> > + curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
> > + curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
> > + curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
> > + curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
> > +
> > + /* Setup function to read request from client */
> > + curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
> > + curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
> > +
> > + /* Setup function to write server response to client */
> > + curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
> > + curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
> > +
> > + err = run_slot(slot, NULL);
> > +
> > + if (err != HTTP_OK)
> > + err = -1;
> > +
> > + return err;
>
> HTTP_OK is 0 but kind of obscures that. How about something like the
> following?
>
> if (run_slot(slot, NULL))
> return -1;
> return 0;
>
> or
>
> if (run_slot(slot, NULL) != HTTP_OK)
> return -1;
> return 0;
>
> That way it's clearer that this always returns 0 or -1.
Sounds good.
>
> [...]
> > +static int stateless_connect(const char *service_name)
> > +{
> > + struct discovery *discover;
> > + struct proxy_state p;
> > +
> > + /*
> > + * Run the info/refs request and see if the server supports protocol
> > + * v2. If and only if the server supports v2 can we successfully
> > + * establish a stateless connection, otherwise we need to tell the
> > + * client to fallback to using other transport helper functions to
> > + * complete their request.
> > + */
> > + discover = discover_refs(service_name, 0);
> > + if (discover->version != protocol_v2) {
> > + printf("fallback\n");
> > + fflush(stdout);
> > + return -1;
>
> Interesting. I wonder if we can make remote-curl less smart and drive
> this more from the caller.
>
> E.g. if the caller could do a single stateless request, they could do:
>
> option git-protocol version=2
> stateless-request GET info/refs?service=git-upload-pack
> [pkt-lines, ending with a flush-pkt]
>
> The git-protocol option in this hypothetical example is the value to
> be passed in the Git-Protocol header.
>
> Then based on the response, the caller could decide to keep using
> stateless-request for further requests or fall back to "fetch".
>
> That way, if we implement some protocol v3, the remote helper would
> not have to be changed at all to handle it: the caller would instead
> make the new v3-format request and remote-curl would be able to oblige
> without knowing why they're doing it.
>
> What do you think?
I do see the draw for wanting this. I think a change like this requires
a lot more refactoring, simply because with the current setup the
fetch/ls-refs logic doesn't care that its talking through a
remote-helper where if we went down that route it would need to be aware
of that.
--
Brandon Williams