Re: add NODES.prior_deleted boolean column

Julian Foad Thu, 23 Sep 2010 10:32:44 -0700

Hi Greg.

Thanks for the discussion.  I'm at least getting to grips with the
basics of op_depth functioning.


Greg Stein wrote:
> On Thu, Sep 23, 2010 at 09:19, Julian Foad <julian.f...@wandisco.com> wrote:
> > On Wed, 2010-09-22, Greg Stein wrote:
> >...
> >> > That flag would just mean "There is a row for the same path with a
> >> > smaller op_depth and with a non-negative kind of presence", right?  So
> >> > whether we actually store that flag is a matter of impl/efficiency, not
> >> > of logical design.  Have I understood?
> >>
> >> No. It means that this (relpath/op_depth) layer was created by a
> >> deletion ("svn rm" or "svn mv").
> >
> > OK, that's fine.  What I'm trying to ascertain next is whether this
> > information is already stored in the table without adding this extra
> > column.  Let me try again, differently:
> 
> Sure, you could derive the column by examining layers underneath. The
> semantics of add/copy/move is "not allowed if something exists", so
> anything that was there has been deleted first.

Thanks for confirming my understanding of that.

> The underlying question is "what is the query pattern? do we ask 'was
> something deleted by this operation?' very often?". If the answer is
> yes, then keep the flag. If we don't ask that question often, then run
> the extra query looking for prior nodes.
> 
> I believe the answer is "often". A simple 'svn status' will need to
> distinguish between 'A' and 'R', and that is done by checking
> prior_deleted.
> 
> And no... this amount of denormalization will not hurt us.

OK.  I know that "svn status" speed is a big deal.

> >...
> >> op_depth > 0 can have the following presence values: added, deleted,
> >> copied-here, moved-here, excluded, incomplete. prior_deleted may be 0
> >> or 1. moved_to is a discriminator between deleted and moved-away.
> >
> > Great, this is good.
> >
> > Let's go into a bit more detail on some of the op_depth > 0 values.
> >
> > 'incomplete':
> >
> > Let's see where 'incomplete' can be used at op_depth > 0.  Primarily
> > during a repo->WC or WC->WC copy, constructing the 'copied-here' dirs.
> > Maybe also during a (WC->WC) move, constructing the 'moved-here' dirs,
> > so that the whole subtree need not be moved in a single database
> > operation.  If so, I'm wondering whether we need to distinguish
> > copied-here-incomplete from moved-here-incomplete.  Or maybe we can
> > always rely on the caller remembering what it's doing and setting the
> > correct state afterwards.  I'm not sure.  If we start out assuming it
> > refers to a copy-here, and later find that we do need to distinguish
> > these, then I think we could do so at that time by introducing another
> > value.
> 
> I don't think we need to discriminate *how* or *why* the incomplete
> nodes were created... just that they are. And if find we need to ask
> that question, then we can examine the parent node, or we can add new
> presence values.

OK.

> > 'excluded':
> >
> > I think we need to support 'excluded' at op_depth > 0 on a copied-here
> > node (only for a child, not the root of the copy), and also on a
> > moved-here node (child).  We should distinguish these.  How?
> 
> "should"? Why? Again, looking at the parent immediately tells us what
> is going on. But I don't even see where/how/why we need to know this
> information.

Maybe we'll never need to know, in practice.  But I'm thinking from a
"clean design" rather than a "how frequently are we likely to ask" point
of view.  For sanity there should be one way of asking questions such as
"is this part of a copy", not two ways.

> >...
> > 'moved-here':
> >
> > I believe the idea of the 'move' in WC-NG is that it represents an
> > atomic move.  We should consider how a move is to be represented, in
> > full.
> >
> > A 'moved-here' row's 'moved_to' column points to the corresponding
> > 'moved-away' row.  The 'source' of the move is the row that is shadowed
> > by the 'moved-away' row.
> 
> Huh?

Oops - I got the 'moved_to' direction mixed up.  Hence the bit I wrote
below about looking up via the source.

> A 'moved-here' node has repository source information given in the
> repos_* columns.
> 
> A 'moved-away' node has presence='deleted' and moved_to set to the
> destination local_relpath.

Yes and yes.

> At the moment, we do not record the local_relpath of the source of a
> copy/move. (it may be null for a repo->WC copy; I believe we would not
> allow a repo->WC move)

Hmm.  If the 'moved-*' presence values are indeed intended to support
atomic moves (in the future), we may need to re-think how they are to do
so.  Storing the local_relpath (and op_depth I guess!) of the source is
probably going to be a necessary part of the puzzle.

> > The destination of the move must always refer to the same repo node-rev
> > as the source did.  Therefore its repo node-rev columns and stored
> > (pristine) content columns (file checksum, link target, dir depth) could
> > be either stored as duplicates of the source row values, or stored as
> > null and always looked up via the source row when needed.
> 
> Duplicates. No need to make querying harder than it should be.
> 
> > One of the particular desired behaviours of a move is that 'update' will
> > still update it.
> 
> So you say. I don't know that that is true.
> 
> After a WC-WC copy, we don't update the copied nodes (afaik), so I
> don't see that this immediately follows.

We should discuss support for 'moves' separately.

> >...
> >> On Wed, Sep 22, 2010 at 18:27, Greg Stein <gst...@gmail.com> wrote:
> >> >...
> >> > The children don't have to be touched. They just hang out in their
> >> > deleted state with the same op_depth. We *never* want to modify a rows
> >> > op_depth. That is part of its primary key(!).
> >>
> >> Let me clarify this. There is one situation where we (effectively)
> >> modify the op_depth.
> >>
> >> Consider:
> >>
> >> 1. checkout. add rows <A, 0>, <A/B, 0>, <A/B/C, 0>, <A/B/gamma, 0>:
> >> presence=normal.
> >> 2. delete gamma. add row <A/B/gamma, 3>: presence=deleted.
> >> 3. delete B. add rows <A/B, 2>, <A/B/C, 2>: presence=deleted. modify
> >> <A/B/gamma, 3>: op_depth=2
> >>
> >> We subsume the child deletion into the parent deletion. We could just
> >> as well revert/forget the child deletion and add rows for all children
> >> at op_depth, or we can just rewrite the op_depth. Net is the same.
> >>
> >>
> >> Outside of this deletion, I don't think we want to be modifying
> >> op_depth at all.
> >
> > Except for reverting, where you can get the reverse of your last point.
> > For example, follow your last step "3." with "4. revert B. delete row
> > <A/B, 2>. modify rows <A/B/C, 2>, <A/B/gamma, 2>: op_depth = 3.
> 
> No. Step 3 subsumed the delete from step 2. All knowledge of it is
> lost. A revert will delete the three rows at op_depth=2 and expose the
> original checked-out rows.

I meant a non-recursive revert, which just reverts 'A/B', and leaves its
children deleted.  That's when it has to re-write the children as if
they had been deleted individually.

- Julian

Re: add NODES.prior_deleted boolean column

Reply via email to