On Thu, Aug 26, 2010 at 12:57:47PM +0300, anatoly techtonik wrote: > Hello, > > Don't you think it is time to design an extensible changeset format > for exchanging information about changesets between systems? > > Right now I am struggling to extract full information from uncommitted > Subversion changeset for uploading it for review (in Rietveld > project). Rietveld code review tool was initially designed to work > with Subversion, but so far it is still impossible to get complete > diff of changes from SVN that reviewer can apply to its working copy > and commit after review. The problem to get complete diff is twofold: > > 1. Subversion data for uncommited changeset is scattered and it is > hard to say if it ever complete. > 2. "svn diff" format is too limited. > > For the first part I can give an example of problem I am trying to > solve currently - 'Rietveld code review data is missing files that > were created as a result of "svn copy" or "svn move" operation'. If a > text file is added with "svn add" - its contents will appear in "svn > diff" output, but text files created as a result of "svn move" or "svn > copy" operation will not.
In trunk, svn diff has a --show-copies-as-adds option, which causes copied and moved files to be displayed even if they weren't modified after being copied/moved. This will be released in 1.7. > To get this missing information one need to > run "svn status", check for the presence of copied or moved files > (marked with "A +"), check these files are not binary, manually > reconstruct change chunk for them and append missing data to the > output of "svn diff". But even after that reviewer still won't be able > to exactly reproduce changeset, because "svn diff" format will not > contain information about source of copied or moved file. And here > comes the second part. svn diff does show deleted files, so it also shows the delete half of each move. With the new --git option of svn diff, you get headers which tell you where something was copied from. > "svn diff" format doesn't record enough information to reproduce > committed changeset. For example, it doesn't have data about source of > copied and moved files. This is believed to be solved by "git diff" > format, but it won't be a panacea either, because Subversion > changesets also contain information about properties, mime types etc. svn diff and svn patch in trunk can show and apply property diffs, respectively. This will be released in 1.7. > It is also impossible to include binary files (if needed) or original > author info (can be useful for contibulyzer), or any other information > that a given VCS (Subversion in this case) is needed to completely > reconstruct its own changeset. Support for binary data is on the todo list for svn diff / svn patch. Nothing has been implemented yet. Showing author information is interesting, though in the general case where a diff spans multiple revisions it may not be very useful. But note also that in Subversion trunk, svn log has a --diff option which shows the committed diff beneath the log message (which includes author and date information). This will also be released in Subversion 1.7. > For code reviews, ideally, code review system such as Rietveld should > grab the changeset, parse it and extract relevant information for > reviewer (skipping or filtering non-interesting parts and giving > warning about unknown parts). It should also save original or filtered > changeset file to be imported and committed if review is successful. > > > That's why extensible changeset format is required. It will not only > be useful for sending changesets for review, but also for > synchronizing changes with other VCSes. With new changeset format > mirroring tool could automatically analyze incoming data to find > Subversion related attributes to save them into repository directly > and automatically save all other attributes to properties. You realise that it's often impossible to represent data generated by one version control tool in another version control tool? If that was an easy problem, the company I work for would be out of business because nobody would need our help. We're often migrating data between version control systems, and there is always compromise involved. Some things, like add/delete, and maybe even copy (unless you count older systems like CVS), are virtually universal. But renames are already represented very differently in virtually every tool. Directories are another example -- some tools version them, some don't. And most meta data, like EOL-style and character set of files, commit author information, list of files touched by a changset, etc., is represented in very different and sometimes incompatible ways, and sometimes not at all. There is no single data format that can really solve this problem. Version control tools differ. In general, you cannot magically mirror every aspect of a change made in one tool to another tool. I'm not saying that a common changeset exchange data format would be useless. It would certainly help if all tools had a unified way of exporting and importing changesets. But it will always be limited to handling the lowest common denominator, which often isn't enough. The svn diff --git is the best we've got so far. It's not perfect, but it's a good step forward. > I see this format as an XML format that resembles Atom feed, with > logical order of events (i.e. file removed after it was copied etc.). > Subversion already uses XML formats internally, Subversion uses virtually no XML internally. It can produce some XML for presentation, but data isn't being stored as XML inside of Subversion. > so I logically assume > that folks here possess required experience and may even have some > ready pieces to work out an initial draft of such format. We've added the --git option to svn diff, which produces output compatible to Mercurial and git for some common operations (add, delete, copy). That's a common denominator, and the format is nice because it is readable. svn diff also has an --xml option which makes it produce XML output. Currently that only works in --summarize mode, and only for repository to repository diffs. You cannot use it to show changes in a working copy. I guess if there really is a need we could extend the XML output. But I think the --git diff format is nicer, because it contains more information and is already usable by at least two other tools. Maybe more tools will start to support it, now that Subversion also supports it. I hope the new features I've listed above will help you solve the problems you're trying to solve. If you have further ideas about how they can be improved, please share them. Thanks, Stefan