Junio C Hamano <[email protected]> writes:
> I have a feeling that it is a bit too premature to specify the
> details at such a low level as "capaiblities are announced by
> prefixing four-byte 'c', 'a', 'p', ':' in front" and "a multi-record
> group has its element count at the beginning (or an end marker at
> the end, for that matter)", and it may be a better idea to outline
> all the necessary elements at a bit higher level first---that would
> avoid needs for useless exchanges like what we are having right now.
>
> .... If you keep the
> discussion at the level like "fetch first asks capabilities it wants
> upload-pack-2 to enable, optionally gives the current shallow
> boundaries when the capaibilty says the other side supports it, and
> then starts showing what it has" while we are trying to achieve
> concensus on what kind of protocol elements we would need, and what
> information each element would carry, the discussion will help us
> reach a shared understanding on what to write down in EBNF form
> exactly faster, I would imagine.
And I see we went silent after this, so let's try to stir the pot
again to see if it simmers.
This is a follow-up on $gmane/264553, which is a continuation of
$gmane/264000, but instead of giving two required readings to
readers, I'll start with reproduction of the two, and add a few more
things the current protocol lacks that I would want to see in the
updated protocol.
The current protocol has the following problems that limit us:
- It is not easy to make it resumable, because we recompute every
time. This is especially problematic for the initial fetch aka
"clone" as we will be talking about a large transfer. Redirection
to a bundle hosted on CDN might be something we could do
transparently.
- The protocol extension has a fairly low length limit.
- Because the protocol exchange starts by the server side
advertising all its refs, even when the fetcher is interested in
a single ref, the initial overhead is nontrivial, especially when
you are doing a small incremental update. The worst case is an
auto-builder that polls every five minutes, even when there is no
new commits to be fetched.
- Because we recompute every time, taking into account of what the
fetcher has, in addition to what the fetcher obtained earlier
from us in order to reduce the transferred bytes, the payload for
incremental updates become tailor-made for each fetch and cannot
be easily reused.
- The semantics of the side-bands are unclear.
- Is band #2 meant only for progress output (I think the current
protocol handlers assume that and unconditionally squelch it
under --quiet)? Do we rather want a dedicated "progress" and
"error message" sidebands instead?
- Is band #2 meant for human consumption, or do we expect the
other end to interpret and act on it? If the former, would it
make sense to send locale information from the client side and
ask the server side to produce its output with _("message")?
- The semantics of packet_flush() is suboptimal, and this
shortcoming seeps through to the protocol mapped to the
smart-HTTP transport.
Originally, packet_flush() was meant as "Here is an end of one
logical section of what I am going to speak.", hinting that it
might be a good idea for the underlying implementation to hold
the packets up to that point in-core and then write(2) them all
out (i.e. "flush") to the file descriptor only when we handle
packet_flush(). It never meant "Now I am finished speaking for
now and it is your turn to speak."
But because HTTP is inherently a ping-pong protocol where the
requestor at one point stops talking and lets the responder
speak, the code to map our protocol to the smart HTTP transport
made the packet_flush() boundary as "Now I am done talking, it is
my turn to listen."
We probably need two kinds of packet_flush(). When a requestor
needs to say two or more logical groups of things before telling
the other side "Now I am done talking; it is your turn.", we need
some marker (i.e. the original meaning of packet_flush()) at the
end of these logical groups. And in order to be able to say "Now
I am done saying everything I need to say at this point for you
to respond to me. It is your turn.", we need another kind of
marker.
- The fetch-pack direction does the common-parent discovery but the
push-pack direction does not. This is OK for the normal
fast-forward push, in which case we will see a known commit on
the tip of the branch we are pushing into, but makes forced push
inefficient.
- The existing common-parent discovery done on the fetch-pack side
enumerates commits contiguously traversing the history to the
past. We might want to go exponential or Fibonacci to quickly
find an ancient common commit and bisect the history from there
(or it might turn out not to be worth it).
- We may want to revamp the builtin/receive-pack.c::report() that
reports the final result of a push back to the pusher to tell the
exact object name that sits at the updated tips of refs, not just
refnames. It will allow the server side to accept a push of
commit X to a branch, do some "magic" on X (e.g. rebase it on top
of the current tip, merge it with the current tip, or let a hook
to rewrite the commit in any way it deems appropriate) and put
the resulting commit Y at the tip of the branch. Without such a
revamp, it is currently not possible to sensibly allow the server
side to rewrite what got pushed.
- If we were to start allowing the receiver side to rewrite pushed
commits, the updated send-pack protocol must be able to send the
new objects created by that "magic" back to the pusher. The
current protocol does not allow the receive-pack to send packdata
back to send-pack.
I'd like to see a new protocol that lets us overcome the above
limitations (did I miss others? I am sure people can help here)
sometime this year.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html