Daniel Näslund wrote: > Hi! > > First, I've been accepted as a GSoC student for the summer of 2010. I'm > really excited and look forward to a summer of coding. > > I'm supposed to implement the git unidiff format for 'svn diff' and > 'svn patch' and I'll start with the diff side. The git unidiff format > can represent tree changes but unfortunately the diff code in it's > current state makes it hard to detect those tree changes. > > What to do? > ------------ > 1) Just allow wc-wc diffs to create diffs with the git format. Use the > available wc functions to retrieve info on tree changes. > 2) Allow diffs involving the repos too by creating special ra functions > for retrieving the missing information. Something like > svn_ra_get_copyfrom_info(). > 3) Start revamping the diff code to not use an editor but instead return > text-modified and props-modified nodes as detected by the server. > [1]. In the mail, Greg makes a case for not using an editor in the > diff code since nothing is modified. As I've understood it, an editor > is used for almost all repos communication. I see the complexity > involved in using an editor and I understand that sharing the same > code for merge and diff has drawbacks. But I'm not seeing how we will > decrease complexity by not using an editor. We'll still have to > detect all those tree changes and we'll have to create additional > code for doing it. If we would just have have to check for > text-modified and props-modified things would have been different.
I think it makes sense to reuse the same API. The following section is slightly off-topic to this thread. I'm leaping into the lands editor-v2 theory. <brain-dump subject="editor-v2"> For example, let's compare merge and diff. svn merge -rA:B ^/branch and svn diff -rA:B ^/branch Both want to get the exact same information from the repos. But merge wants to apply that to the working copy, and diff wants to print it out. (As of this thread, diff is also interested in changes on the tree level.) Furthermore, if, say, a node of the working copy is BASEd on rA, svn update also wants to get the same information from the repos as diff wants; simplified: At revision A. svn update -rB But the API must be suitable for reuse. I don't remember in detail, but a long time ago I took a closer look at the diff and merge code, and it had an amount of grown code around the shortcomings of the diff editor. Not good. My romantic view is that with editor v2, we can have/mold such API that is easy to adapt to the different tasks of update, merge, diff, switch... With explicit, "atomic" replaces and moves in the API, we can shed the grown code for the benefit of a firm API definition. The worst specimens of madness in diff and merge that I've stumbled over would be gone with ev2. As we ripple through the code, replacing the old editor with a more concise new one, things become a lot easier, and code becomes shorter. Wishful thinking, of course. But I think it is possible and desirable to think of editor v2 as a hammer and see nails in everything that involves getting the tree/text difference between two subtrees/revisions. I see generalized drivers; one that can generate editor calls by comparing a WC state to a repos state, one that can compare two repos states, one that can compare two WC states, ... And then the different subcommands "simply" implement their callbacks of choice and take care to ask the right driver on the right revisions. E.g., the diff callbacks, once implemented, could provide arbitrary diffs simply by asking a different generalised driver type to generate callback calls (I'm thinking specifically of wc-actual-against-any-url diffs). There's also a problem with my views. Editor v2, as intended in its design, gives the callback receivers only full texts, never text delta data. The idea is that all delta-ing to get to B is hidden behind the API. In case of an update, the driver fires all events necessary to get from tree A to tree B, and provides full texts of B. The driver can choose any way it wants to get to the full texts of B. Fair enough. But for text diffs, that means that I receive all events that are necessary to get from tree A to tree B (good) -- and then receive full texts of B from the driver, after which I have to fetch full texts of A from <blackbox> and work out the text-diff from those (bad). Are you following? Doesn't sound so romantic anymore. The editor does not provide the difference, but the result of applying the difference? I still haven't entirely wrapped my head around the generalised case of that. An advantage is that the API can choose to get at the full texts of B any way it likes, e.g. from the pristine store via a pristine checksum match, or taking a shortcut via some other revision. The disadvantage, illustrated in an example: If I want diff to tell me the difference of my working-copy BASE and the repos' HEAD (== what update would apply to the WC), and say I have thousands of huge text files, each of which have a change of only a single line. The driver would construct each huge text file completely with the first line adapted, then the diff callback implementations would read each original huge text file from BASE and compare the two. But, all the time, the repos knew that exactly only the first line was changed. There was no need to pass these huge amounts of data through the API functions (locally). So, in a nutshell, editor v2, as it is outlined today, isn't always that suitable for communicating text-diffs -- if the driver already knows the text-diff, it has no way of telling the receiver about it. The driver must provide the full text result and the receiver must work out the text-diff from that. Maybe that's how it was intended to be: one editor type for getting to a given revision in full (delta_editor_t for update, switch) and one for getting the differences between to revisions/paths (diff_editor for diff, merge). In my head, they are still both very much related. I have unfinished business with this topic... and a bunch of homework left to do before I can start making any real sense to Greg. </brain-dump> > 4) Wait by the roadside for editor-v2 to be finished. It is supposed to > automatically detect tree changes. And that's the problem: you have further plans for diff, which, like other things before, want more info than the diff-editor can offer (notably diff and merge don't use the delta_editor at all but implement their own diff_editor). All previous attempts (e.g. detecting replaces) deflected at an early stage, grew some nasty compromise code and went on without going into depth. Understandable, given the size of considerations, but Bad. I think if you want to take on (an) editor v2 for diff and merge, that's the "Best" way to start. But it'll be a lot of work, including theoretical. If you want to get anything done soonish, I think it would indeed be best to start playing around with a wc-wc diff, maybe structuring the code in anticipation of a move towards editor vN-markM. Avoiding implementation of uniform API across subcommands for sending tree-/text-differences is, again, probably Bad, but understandable. (Note that currently in wc-wc, only diff between @base and the actual working copy is implemented.) > Has anyone given any more thoughts to how the diff code could be > improved? I'm buried up to the neck in them... :/ Haven't had the time [1] to actually start implementing them. [1] read: guts ~Neels
signature.asc
Description: OpenPGP digital signature