Hi Greg. Thanks for the discussion. I'm at least getting to grips with the basics of op_depth functioning.
Greg Stein wrote: > On Thu, Sep 23, 2010 at 09:19, Julian Foad <julian.f...@wandisco.com> wrote: > > On Wed, 2010-09-22, Greg Stein wrote: > >... > >> > That flag would just mean "There is a row for the same path with a > >> > smaller op_depth and with a non-negative kind of presence", right? So > >> > whether we actually store that flag is a matter of impl/efficiency, not > >> > of logical design. Have I understood? > >> > >> No. It means that this (relpath/op_depth) layer was created by a > >> deletion ("svn rm" or "svn mv"). > > > > OK, that's fine. What I'm trying to ascertain next is whether this > > information is already stored in the table without adding this extra > > column. Let me try again, differently: > > Sure, you could derive the column by examining layers underneath. The > semantics of add/copy/move is "not allowed if something exists", so > anything that was there has been deleted first. Thanks for confirming my understanding of that. > The underlying question is "what is the query pattern? do we ask 'was > something deleted by this operation?' very often?". If the answer is > yes, then keep the flag. If we don't ask that question often, then run > the extra query looking for prior nodes. > > I believe the answer is "often". A simple 'svn status' will need to > distinguish between 'A' and 'R', and that is done by checking > prior_deleted. > > And no... this amount of denormalization will not hurt us. OK. I know that "svn status" speed is a big deal. > >... > >> op_depth > 0 can have the following presence values: added, deleted, > >> copied-here, moved-here, excluded, incomplete. prior_deleted may be 0 > >> or 1. moved_to is a discriminator between deleted and moved-away. > > > > Great, this is good. > > > > Let's go into a bit more detail on some of the op_depth > 0 values. > > > > 'incomplete': > > > > Let's see where 'incomplete' can be used at op_depth > 0. Primarily > > during a repo->WC or WC->WC copy, constructing the 'copied-here' dirs. > > Maybe also during a (WC->WC) move, constructing the 'moved-here' dirs, > > so that the whole subtree need not be moved in a single database > > operation. If so, I'm wondering whether we need to distinguish > > copied-here-incomplete from moved-here-incomplete. Or maybe we can > > always rely on the caller remembering what it's doing and setting the > > correct state afterwards. I'm not sure. If we start out assuming it > > refers to a copy-here, and later find that we do need to distinguish > > these, then I think we could do so at that time by introducing another > > value. > > I don't think we need to discriminate *how* or *why* the incomplete > nodes were created... just that they are. And if find we need to ask > that question, then we can examine the parent node, or we can add new > presence values. OK. > > 'excluded': > > > > I think we need to support 'excluded' at op_depth > 0 on a copied-here > > node (only for a child, not the root of the copy), and also on a > > moved-here node (child). We should distinguish these. How? > > "should"? Why? Again, looking at the parent immediately tells us what > is going on. But I don't even see where/how/why we need to know this > information. Maybe we'll never need to know, in practice. But I'm thinking from a "clean design" rather than a "how frequently are we likely to ask" point of view. For sanity there should be one way of asking questions such as "is this part of a copy", not two ways. > >... > > 'moved-here': > > > > I believe the idea of the 'move' in WC-NG is that it represents an > > atomic move. We should consider how a move is to be represented, in > > full. > > > > A 'moved-here' row's 'moved_to' column points to the corresponding > > 'moved-away' row. The 'source' of the move is the row that is shadowed > > by the 'moved-away' row. > > Huh? Oops - I got the 'moved_to' direction mixed up. Hence the bit I wrote below about looking up via the source. > A 'moved-here' node has repository source information given in the > repos_* columns. > > A 'moved-away' node has presence='deleted' and moved_to set to the > destination local_relpath. Yes and yes. > At the moment, we do not record the local_relpath of the source of a > copy/move. (it may be null for a repo->WC copy; I believe we would not > allow a repo->WC move) Hmm. If the 'moved-*' presence values are indeed intended to support atomic moves (in the future), we may need to re-think how they are to do so. Storing the local_relpath (and op_depth I guess!) of the source is probably going to be a necessary part of the puzzle. > > The destination of the move must always refer to the same repo node-rev > > as the source did. Therefore its repo node-rev columns and stored > > (pristine) content columns (file checksum, link target, dir depth) could > > be either stored as duplicates of the source row values, or stored as > > null and always looked up via the source row when needed. > > Duplicates. No need to make querying harder than it should be. > > > One of the particular desired behaviours of a move is that 'update' will > > still update it. > > So you say. I don't know that that is true. > > After a WC-WC copy, we don't update the copied nodes (afaik), so I > don't see that this immediately follows. We should discuss support for 'moves' separately. > >... > >> On Wed, Sep 22, 2010 at 18:27, Greg Stein <gst...@gmail.com> wrote: > >> >... > >> > The children don't have to be touched. They just hang out in their > >> > deleted state with the same op_depth. We *never* want to modify a rows > >> > op_depth. That is part of its primary key(!). > >> > >> Let me clarify this. There is one situation where we (effectively) > >> modify the op_depth. > >> > >> Consider: > >> > >> 1. checkout. add rows <A, 0>, <A/B, 0>, <A/B/C, 0>, <A/B/gamma, 0>: > >> presence=normal. > >> 2. delete gamma. add row <A/B/gamma, 3>: presence=deleted. > >> 3. delete B. add rows <A/B, 2>, <A/B/C, 2>: presence=deleted. modify > >> <A/B/gamma, 3>: op_depth=2 > >> > >> We subsume the child deletion into the parent deletion. We could just > >> as well revert/forget the child deletion and add rows for all children > >> at op_depth, or we can just rewrite the op_depth. Net is the same. > >> > >> > >> Outside of this deletion, I don't think we want to be modifying > >> op_depth at all. > > > > Except for reverting, where you can get the reverse of your last point. > > For example, follow your last step "3." with "4. revert B. delete row > > <A/B, 2>. modify rows <A/B/C, 2>, <A/B/gamma, 2>: op_depth = 3. > > No. Step 3 subsumed the delete from step 2. All knowledge of it is > lost. A revert will delete the three rows at op_depth=2 and expose the > original checked-out rows. I meant a non-recursive revert, which just reverts 'A/B', and leaves its children deleted. That's when it has to re-write the children as if they had been deleted individually. - Julian