Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

Brandon Williams Wed, 28 Feb 2018 12:22:32 -0800

On 02/27, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > Teach remote-curl the 'stateless-connect' command which is used to
> > establish a stateless connection with servers which support protocol
> > version 2.  This allows remote-curl to act as a proxy, allowing the git
> > client to communicate natively with a remote end, simply using
> > remote-curl as a pass through to convert requests to http.
> 
> Cool!  I better look at the spec for that first.
> 
> *looks at the previous patch*
> 
> Oh, there is no documented spec. :/  I'll muddle through this instead,
> then.


I'll make sure to add one :)

> 
> [...]
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery 
> > *heads, int for_push)
> >     heads->version = discover_version(&reader);
> >     switch (heads->version) {
> >     case protocol_v2:
> > -           die("support for protocol v2 not implemented yet");
> > +           /*
> > +            * Do nothing.  Client should run 'stateless-connect' and
> > +            * request the refs themselves.
> > +            */
> >             break;
> 
> This is the 'list' command, right?  Since we expect the client to run
> 'stateless-connect' instead, can we make it error out?

Yes and no.  Remote-curl will run this when trying to establish a
stateless-connection.  If the response is v2 then this is a capability
list and not refs.  So the capabilities will be dumped to the client and
they will be able to request the refs themselves at a later point.  The
comment here is just misleading, so i'll make sure to fix it.

> 
> [...]
> > @@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
> >     free(specs);
> >  }
> >  
> > +struct proxy_state {
> > +   char *service_name;
> > +   char *service_url;
> > +   struct curl_slist *headers;
> > +   struct strbuf request_buffer;
> > +   int in;
> > +   int out;
> > +   struct packet_reader reader;
> > +   size_t pos;
> > +   int seen_flush;
> > +};
> 
> Can this have a comment describing what it is/does?  It's not obvious
> to me at first glance.
> 
> It doesn't have to go in a lot of detail since this is an internal
> implementation detail, but something saying e.g. that this represents
> a connection to an HTTP server (that's an example; I'm not saying
> that's what it represents :)) would help.

Always making new code have higher standards than the existing code ;)
Haha, I'll add a simple comment explaining it.

> 
> > +
> > +static void proxy_state_init(struct proxy_state *p, const char 
> > *service_name,
> > +                        enum protocol_version version)
> [...]
> > +static void proxy_state_clear(struct proxy_state *p)
> 
> Looks sensible.
> 
> [...]
> > +static size_t proxy_in(char *buffer, size_t eltsize,
> > +                  size_t nmemb, void *userdata)
> 
> Can this have a comment describing the interface?  (E.g. does it read
> a single pkt_line?  How is the caller expected to use it?  Does it
> satisfy the interface of some callback?)

I'll add a comment that its used as a READFUNCTION callback for curl and
that it tries to copy over a packet-line at a time.

> 
> libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
> calls this read_callback.  Such a name plus a pointer to
> CURLOPT_READFUNCTION should do the trick; bonus points if the comment 
> says what our implementation of the callback does.
> 
> Is this about having peek ability?

No its just that Curl only requests a set about of data at a time so you
need to be able to buffer the data that can't be read yet.

> 
> > +   struct proxy_state *p = userdata;
> > +   size_t avail = p->request_buffer.len - p->pos;
> > +
> > +   if (!avail) {
> > +           if (p->seen_flush) {
> > +                   p->seen_flush = 0;
> > +                   return 0;
> > +           }
> > +
> > +           strbuf_reset(&p->request_buffer);
> > +           switch (packet_reader_read(&p->reader)) {
> > +           case PACKET_READ_EOF:
> > +                   die("unexpected EOF when reading from parent process");
> > +           case PACKET_READ_NORMAL:
> > +                   packet_buf_write_len(&p->request_buffer, p->reader.line,
> > +                                        p->reader.pktlen);
> > +                   break;
> > +           case PACKET_READ_DELIM:
> > +                   packet_buf_delim(&p->request_buffer);
> > +                   break;
> > +           case PACKET_READ_FLUSH:
> > +                   packet_buf_flush(&p->request_buffer);
> > +                   p->seen_flush = 1;
> > +                   break;
> > +           }
> > +           p->pos = 0;
> > +           avail = p->request_buffer.len;
> > +   }
> > +
> > +   if (max < avail)
> > +           avail = max;
> > +   memcpy(buffer, p->request_buffer.buf + p->pos, avail);
> > +   p->pos += avail;
> > +   return avail;
> 
> This is a number of bytes, but CURLOPT_READFUNCTION expects a number
> of items, fread-style.  That is:
> 
>       if (avail < eltsize)
>               ... handle somehow, maybe by reading in more? ...
> 
>       avail_memb = avail / eltsize;
>       memcpy(buffer,
>              p->request_buffer.buf + p->pos,
>              st_mult(avail_memb, eltsize));
>       p->pos += st_mult(avail_memb, eltsize);
>       return avail_memb;
> 
> But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says
> 
>       Your function must then return the actual number of bytes that
>       it stored in that memory area.
> 
> Does this mean eltsize is always 1?  This is super confusing...
> 
> ... ok, a quick grep for fread_func in libcurl reveals that eltsize is
> indeed always 1.  Can we add an assertion so we notice if that
> changes?
> 
>       if (eltsize != 1)
>               BUG("curl read callback called with size = %zu != 1", eltsize);
>       max = nmemb;

Yeah i can go ahead and do this.  Just note that the v1 path uses logic
identical to this so it would be a problem there.

> 
> [...]
> > +static size_t proxy_out(char *buffer, size_t eltsize,
> > +                   size_t nmemb, void *userdata)
> > +{
> > +   size_t size = eltsize * nmemb;
> > +   struct proxy_state *p = userdata;
> > +
> > +   write_or_die(p->out, buffer, size);
> > +   return size;
> > +}
> 
> Nice.  Same questions about st_mult or just asserting on eltsize apply
> here, too.
> 
> [...]
> > +static int proxy_post(struct proxy_state *p)
> 
> What does this function do?  Can it get a brief comment?

Will do.

> 
> > +{
> > +   struct active_request_slot *slot;
> > +   int err;
> > +
> > +   slot = get_active_slot();
> > +
> > +   curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
> > +   curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
> > +   curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
> > +   curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
> > +
> > +   /* Setup function to read request from client */
> > +   curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
> > +   curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
> > +
> > +   /* Setup function to write server response to client */
> > +   curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
> > +   curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
> > +
> > +   err = run_slot(slot, NULL);
> > +
> > +   if (err != HTTP_OK)
> > +           err = -1;
> > +
> > +   return err;
> 
> HTTP_OK is 0 but kind of obscures that.  How about something like the
> following?
> 
>       if (run_slot(slot, NULL))
>               return -1;
>       return 0;
> 
> or
> 
>       if (run_slot(slot, NULL) != HTTP_OK)
>               return -1;
>       return 0;
> 
> That way it's clearer that this always returns 0 or -1.

Sounds good.

> 
> [...]
> > +static int stateless_connect(const char *service_name)
> > +{
> > +   struct discovery *discover;
> > +   struct proxy_state p;
> > +
> > +   /*
> > +    * Run the info/refs request and see if the server supports protocol
> > +    * v2.  If and only if the server supports v2 can we successfully
> > +    * establish a stateless connection, otherwise we need to tell the
> > +    * client to fallback to using other transport helper functions to
> > +    * complete their request.
> > +    */
> > +   discover = discover_refs(service_name, 0);
> > +   if (discover->version != protocol_v2) {
> > +           printf("fallback\n");
> > +           fflush(stdout);
> > +           return -1;
> 
> Interesting.  I wonder if we can make remote-curl less smart and drive
> this more from the caller.
> 
> E.g. if the caller could do a single stateless request, they could do:
> 
>       option git-protocol version=2
>       stateless-request GET info/refs?service=git-upload-pack
>       [pkt-lines, ending with a flush-pkt]
> 
> The git-protocol option in this hypothetical example is the value to
> be passed in the Git-Protocol header.
> 
> Then based on the response, the caller could decide to keep using
> stateless-request for further requests or fall back to "fetch".
> 
> That way, if we implement some protocol v3, the remote helper would
> not have to be changed at all to handle it: the caller would instead
> make the new v3-format request and remote-curl would be able to oblige
> without knowing why they're doing it.
> 
> What do you think?

I do see the draw for wanting this.  I think a change like this requires
a lot more refactoring, simply because with the current setup the
fetch/ls-refs logic doesn't care that its talking through a
remote-helper where if we went down that route it would need to be aware
of that.

-- 
Brandon Williams

Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

Reply via email to