I'm not a MIME expert, but isn't there a thing where you can add key/value pairs after a MIME type which customize its meaning? Maybe that could be used for this?
--dave On Tue, Nov 17, 2009 at 7:05 PM, Julian Foad <julianf...@btopenworld.com> wrote: > I need to spend some time replying, late at night though it is. > > Let me try to explain why I think a "how to merge" property should not > be the primary indicator of how subversion should merge each file. > > > Principle > ========= > > I have read that, in the realm of data handling, there is a principle > that it is a bad idea to tag data with annotations that say what kind of > actions can or should be performed on it. That kind of coupling is > unscalable. Instead, it is better to tag data with an indication of what > meaning and/or what syntax the data has, and then let tools decide what > to do, based on that information. > > We already have one data-type indicator: svn:mime-type. Now, MIME type > is far from a complete data type specifier. It is insufficient for our > needs, in theory. However, in practice, it is nearly sufficient. (See > Problem 2 below for an exception.) > > We also have another data-type indicator: the file name. A file name is > also an incomplete source of metadata, and some file names ("README" or > "CHANGES") give no indication at all of the format, but it is useful in > many cases ("*.py", "*.c"). > > > Problem 1 (limited recognition of MIME types) > ========= > > Subversion mis-categorizes a lot of MIME types as "binary" (and > therefore will not merge or diff or blame them) which really are > line-based text formats. > > The list of such MIME types is continually evolving so it is not > possible for Subversion to have a built-in complete list. However, it is > easy for new releases of Subversion to have an updated list. > > It is not much harder for Subversion to have a configuarable list of > which MIME types (or MIME type patterns) should be considered mergeable. > (The configuration could be extensible: it could say line-wise-mergeable > or not mergeable or XML-mergeable or ...) > > > Problem 2 (mergeable and non-mergeable XML files) > ========= > > A user has some XML files on which a line-based merge is useful, and > some XML files on which that is not useful, and wishes to label both > kinds with svn:mime-type=text/xml. Let us suppose one format has each > XML tag on a separate line, and the other has them all run together and > line breaks inserted at arbitrary places in the sequence. It may be > possible to find a different MIME type for one of the file types, but > that may well not be possible, e.g. if they are both proprietary or > arbitrary XML formats. > > This problem is not limited to XML files. Consider two "plain text" > files, MIME type text/plain, with different kinds of text in them. One > has line-based content, such as a shopping list, and changes usually > leave many lines unchanged. The other contains the text of a newspaper > article with line breaks at roughly every 70 characters in a stream of > words, so two similar versions of it may have very few whole lines in > common. > > I believe this is a real but relatively uncommon requirement. It is a > genuine example of the MIME being insufficient to determine (line-wise) > mergeability. There are many file formats that can be regarded as being > (line-wise) mergeable or non-mergeable depending on some aspect of their > content that cannot be reflected in the MIME type. It is uncommon in the > sense that most Subversion users' needs can be satisfied by > distinguishing mergeability based on the MIME type, or better the MIME > type and file name taken together, of their files. > > To solve this problem when it exists, there does indeed need to be > further metadata about the content type of the file. (Alternatively it > could be metadata that says how to merge the file, but see "Principle" > above.) > > > Solution 0 (merge-mode) > ========== > > So we could add a property to each file which says whether the file is > to be considered line-wise mergeable by Subversion, and say that this > property will be the primary source of this information. What are the > pros and cons of this? > > Pro: The user can force a line-wise merge on one file and no merge > attempt on another file even when MIME type and file name are > insufficient distinguishers. > > Pro: The user can forget about providing MIME type at all, and just set > this property to one of the pre-defined two types of merging (line-based > or none), if that is all the user cares about. > > Con: This property associates the file with one simple kind of merging; > but the best merge tool available on the client may not be that simple > kind. If we want to use a better merge tool, say an XML-aware merge > tool, this property actually gets in the way: it tells us to use a > simple line-based merge on this file. It would have been better if the > property had said, "this file contains line-based XML, so you might want > to use an XML-aware merge rather than a simple line-based merge if you > can". In other words, we really want to tell the client what the content > type is, and let the client choose the best merge tool for that content > type. > > Con: This property conveys redundant information. In almost all cases, > the MIME type and/or file name are sufficient information. It is wrong > to pretend that MIME type and file name are not good sources of > how-to-merge information, and to leave their currently weak and > deficient interpretation as just a deprecated backward-compatibility > fallback. > > Con: Not extensible to diff, blame, etc. An indication that the file is > line-wise-mergeable is not really a good indication of whether the file > can be line-wise diffed or blamed. > > > Proposal > ======== > > This is the full, long-term proposal. We can choose a subset of this to > do initially. > > (1) Make svn merge/diff/blame take into account the file name as well as > the svn:mime-type in deciding whether to operate in a "line-wise" mode > or not operate at all. > > (2) Update the built-in MIME type and filename patterns. > > * Update the built-in selection based on svn:mime-type to recognise a > list of MIME types that is reasonably up-to-date right now (even though > it will be out of date by the time the released software is in use). > > * Update the built-in selection based on file names to recognize a > reasonable list of file name patterns. > > (3) Provide a client-side config for extending and overriding the rules > that map MIME type and file name to a merge/diff/blame mode. This mode > should be specifiable in the config, not just "line-wise" or "none" but > any other named mode. Provide config options for specifying the merge > tool, diff tool and blame tool per mode. Tools should be specifiable as > none, built-in or external. > > (4) Add an optional property for selecting a particular merge mode (and > diff mode and blame mode) for the cases where (1) and (2) are > insufficient or inconvenient. > > > Regards, > - Julian > > > Mike Samuel wrote: >> Proposal: >> ======== >> (1) Add documentation on the svn:merge-mode property that lists the >> allowed values as ("simple" and "none") >> (2) Add example autoprops rules to the documentation that sets >> svn:merge-mode to "simple" for the following file types >> application/ecmascript >> application/json >> application/xml >> image/svg+xml >> (3) Change the text quoted from the SVN manual under Background to >> read as below. >> (4) Update the implementation to agree. >> >> Subversion treats the following files as [[mergable]]: >> >> * Files with no svn:mime-type [[and no svn:merge-mode]] >> * Files with a svn:mime-type starting "text/" >> * Files with a svn:mime-type equal to "image/x-xbitmap" >> * Files with a svn:mime-type equal to "image/x-xpixmap" >> * [[Files with a svn:merge-mode that is equal to "simple"]] >> >> All other files are treated as [[unmergeable]], meaning that >> Subversion will: >> >> * Not attempt to automatically merge received changes with >> local changes during svn update or svn merge >> * Not show the differences as part of svn diff >> * Not show line-by-line attribution for svn blame >> >> In all other respects, Subversion treats [[mergable]] files the >> same as [[unmergeable]] files, e.g. if you set >> the svn:keywords or svn:eol-style properties, Subversion will >> perform keyword substitution >> or newline conversion on [[unmergeable]] files. >> >> >> Goal: >> ==== >> To update the scheme by which svn {update,diff,merge,blame} to allow >> merging of files >> with svn:mime-type outside the hard-coded list currently used. >> >> This determination should be independent of the platform svn >> is running on, so independent of the set of supported character sets. >> >> This scheme should not complicate future extensions to the merge >> system which might wish to use a different merge policy, e.g. for XML >> than for source code files. >> >> This scheme should work with autoprops, and other mechanisms repository >> administrators use to manage files. Specifically, some kinds of XML can >> be meaningfully meged, and others cannot. >> >> This scheme should work within existing limitations, such as the inability >> to merge UTF-16 and UTF-32. >> >> >> Background: >> ========== >> The current behavior is described at >> http://subversion.tigris.org/faq.html#binary-files >> >> Subversion treats the following files as text: >> >> * Files with no svn:mime-type >> * Files with a svn:mime-type starting "text/" >> * Files with a svn:mime-type equal to "image/x-xbitmap" >> * Files with a svn:mime-type equal to "image/x-xpixmap" >> >> All other files are treated as binary, meaning that Subversion will: >> >> * Not attempt to automatically merge received changes with >> local changes during svn update or svn merge >> * Not show the differences as part of svn diff >> * Not show line-by-line attribution for svn blame >> >> In all other respects, Subversion treats binary files the same as >> text files, e.g. if you set >> the svn:keywords or svn:eol-style properties, Subversion will >> perform keyword substitution >> or newline conversion on binary files. >> >> Common source code mime-types are misclassified, and that problem is >> likely to grow because of current IANA policy. >> Mime-types are handed out by the IANA, which only assigns text/* >> mime-types for file-types that are meant to be human readable. Source >> code is explicitly not considered human readable. This is why many >> source code and data mime-types are in the application/* group or >> other non text/* groups: application/json, application/ecmascript, >> application/xml, image/svg+xml. >> RFC 4288 ( ftp://ftp.rfc-editor.org/in-notes/rfc4288.txt ) says this >> Expected uses for the "application" media type >> include but are not limited to file transfer, spreadsheets, >> presentations, scheduling data, and languages for "active" >> (computational) material. >> >> ------------------------------------------------------ >> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2419155 > > ------------------------------------------------------ > http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2419275 > -- glas...@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/