Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)

Greg Stein Tue, 31 Jan 2012 20:08:58 -0800

I've included an email from back in November for the basis of the
"next steps" for Ev2 tweaks. As I mentioned yesterday, Hyrum and I got
a chance to talk at length about Ev2 issues and design. Some extra
conversations with Philip and Julian also helped out here.

These next steps feel Good, but have some big implications around
client/server interaction. I'd like to hear any thoughts and concerns.

See below:

On Sat, Nov 5, 2011 at 19:56, Greg Stein <gst...@gmail.com> wrote:
> On Fri, Nov 4, 2011 at 11:16, Julian Foad <julian.f...@wandisco.com> wrote:
>...
>> Huh?  We're now talking about a single call that sets the target and
>> the properties together.  If we take this approach, I suggest naming
>> the three calls 'set_symlink' (like 'add_symlink'), 'set_file' (like
>> 'add_file'), and 'set_dir_props' (which is not quite like
>> 'add_directory').
>
> It was never intended to be set_dir_props(). You could set properties
> on any node. That should stick around, cuz it kind of sucks for the
> caller to have to know the node type just to set some properties.
>
> But I do like where you're going with this. Rather than set_$kind,
> let's go with alter_file(), alter_symlink(), and alter_directory(). I
> chose "alter" rather than "change" since change_directory might throw
> people off with some implied stateful semantics. Each of the alter_*
> functions would provide for changing the properties, symlink target,
> file contents, etc. Maybe we eliminate alter_directory() since that
> would be exactly the same as set_props() [though maybe that becomes
> alter_props?].

I plan to eliminate set_props() and its COMPLETE parameter. Instead,
there will be three APIs for changing a node, and they will complete
before returning. Thus, no more dual calls where a receiver may need
to retain state to accomplish the entire change. The receiver can
perform the change to the node, then throw out any temporary state.
The node won't be touched again.

These entry points will be:

alter_directory(RELPATH, REVISION, PROPS)
alter_file(RELPATH, REVISION, PROPS, CHECKSUM, CONTENTS)
alter_symlink(RELPATH, REVISION, PROPS, TARGET)

Again, we use "alter" to avoid change_directory to avoid confusion
with the standard "chdir()" posix call.

set_props, set_text, and set_target all go away.

One more detail about add_file() and alter_file() is noted below.

>...
>> I wanted to clarify three separate things here.
>>
>>  (1) Partial read is allowed.  Good.
>>
>>  (2) It's a 'pull-mode' interface.  Fine.
>>
>>  (3) The editor is not allowed to return early and defer the reading
>> of this stream until it's ready.  I wonder if we might want to let the
>> editor keep several streams open and read from them as and when its
>> transmit buffer allows, especially if it wants to be able to send two
>> or more file streams in parallel.  These are just shallow thoughts at
>> the moment.
>
> Actually... damn. One of the reasons for a separate of set_props() vs
> set_text() was to allow for the delayed delivery of contents. Same
> thing for the add_file() and set_text(). We adjusted add_file() to
> take contents, but that may have been a mistake.
>
> At commit time, we want to delay the delivery of the content streams.

Actually, this will be fine. Hyrum and I figured out a different
(better?) approach. This allows the contents to always be provided at
add_file() and alter_file() time, rather than allowing a delayed
delivery. As noted above, two-step interfaces and delayed calls can
make it difficult for a receiver -- they need to retain some state
about the node and link up the two calls to complete the
addition/change.

>...
> Well... see above, ref: delayed content delivery. The API as I
> originally designed provided for a delayed delivery. I forgot about
> that aspect when I acceded to combining add_file/set_text.

Again, we will leave it as-is, and alter_file() will also contain the contents.

When Hyrum and I went through this, there are two occasions when file
contents are delayed:

1) at commit time, we note all the changes that will be made, expect a
"fast-fail" from the server for out-of-date items, and then we start
delivery of the bulk/file content

2) at update time, the changes to a file may be noted in the update
report skeleton, applied via the editor, and then a separate GET is
run to fetch the contents and then set the contents when it arrives
(via a delayed apply_textdelta). (Note: for Neon, with its
mother-report approach, the content is typically present at the time
of the file's metadata changes)

For problem #2, we will simply make the RA update process (as an Ev2
driver) manage the delayed state, rather than impose the burden upon
all Ev2 receivers.

For problem #1, it gets trickier. Hyrum noted that the delayed content
delivery exists *only* so that we can get the fast-fail on the commit
process, and then suggested: why don't we simply tell the server the
entire commit plan, get the response, and *then* start sending all the
changes to the server?

Thus: we propose to turn the commit process into two parts, and
corresponding RA interfaces:

Step 1: The commit process tells RA something like "here are all the
relpaths/revisions that I plan to $operation". Note that it isn't
really "all paths" since recursive operations like a copy-destination
path, or a deletion, don't need to list all the child nodes. This
"plan" is sent to the server, which starts a txn, and examines whether
any of the operations are being applied to out-of-date nodes. The
server can allow the commit operation to proceed, or respond with an
error (possibly, multiple errors!).

Step 2: The commit process then drives an Ev2 commit editor to send
all the changes to the server. This is a blend of metadata changes and
content delivery (no delayed content, as before).

The "plan" is some new XML report-like document, posted to the "me"
resource on the server to create the FS txn and perform the check. I'm
not sure what the schema looks like, what kinds of data items are
needed, nor what the RA API looks like. This "plan" is probably an
opaque object constructed by the commit process. It would be nice to
have this in libsvn_ra, and the internals available to all RA layers.
This plan object may be able to replace the "commit info" stuff that
we have in the client today (preferable).

For backwards compatibility, the RA API still needs to provide for the
old commit process. That *may* be mappable to the new server "plan"
protocol, but I'm not sure. The RA layer may need to retain too much
information in memory (specifically: properties), until the
apply_textdelta calls arrive with the content (the first one signaling
the end of the Step 1, and the beginning of Step 2). It can probably
use the Ev1 interface calls to construct a plan, and then use the new
"send-plan" machinery at the transition stage. This compatibility code
may be able to live in libsvn_ra and be implemented in terms of the
new RA APIs (plan + Ev2). Thus, all RA layers may be able to get rid
of their Ev1 code and just implement plan+Ev2.

A new client talking to an old server would simply make Step 1 (the
plan) perform all the working resource checkouts, which is how an old
server performs the out-of-date checks. When Step 2 is run, the
changes to the working resources are performed.

.... okay. That's my brain dump for now.

Thoughts?

Cheers,
-g

Ev2, RA, Commit Process (was: Editor v2 - suggestions and queries)

Reply via email to