When a file with a trailing newline (ASCII 0x0a) has made its way into the repository, there are various areas of the system that exhibit failure modes. Failure modes I observed in one particular case are discussed below.
I intend to file separate issues for each failure mode to keep track of them, rather than filing one giant issue for them all. Though perhaps a meta-issue that depends on the other issues would be useful. In the case observed there was a file, added in a revision rX, which had a trailing newline in its name. The file was added along with its parent directory (which was also new in rX) and alongside several other files in the same directory which did not have trailing newlines in their names. The server was running Subversion 1.7.8, but might have been running an older release at the time when rX was committed. All commands listed below were run with Subversion 1.7.8. Immediately visible problems were: svnadmin verify resulted in an error for the revision: 'svnadmin: E160013: File not found: ...') svnsync failed to sync the revision ('svnsync: E160013: File not found: ...') However, the directory at rX could be listed (with svn ls) and checked out successfully. During checkout, RA layers were behaving inconsistently. Checkout via ra_svn and ra_local resulted in the expected filename with \n (0x0a) as the last byte both in wc.db (local_relpath NODES column) and on disk. However, ra_neon (didn't try serf) checked the file out with a trailing space (0x20) instead of a trailing newline. The filename was stored with a trailing 0x20 in wc.db as well as on disk. Examining the FSFS revision file [*] of rX showed that the filename was stored with a newline as part of its parent directory's representation. [*] For those who didn't know, the FSFS revision file format is documented here: https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure Let's call the parent directory 'A', and the file in question 'alpha\n'. A file named 'alpha\n' added in r1 might appear like this in its parent directory's representation (i.e. the rep of 'A'): ============ PLAIN K 6 alpha V 17 file 2-1.0.r1/122 ============ However, the cpath: field within the file's own representation did not contain the trailing newline. Also, the file was listed in the changed-paths data section without the trailing newline. Note that the FSFS format docs say "The changed-path data is represented as a series of changed-path items, each consisting of two lines." -- i.e. a trailing newline is considered to be part of revision file data, rather than the filename. Furthermore, several children of the newly-added-in rX 'A' directory were missing from the changed-paths list in the revision file, for an unkown reason. More precisely, children listed after 'alpha\n' in the rep of A were missing from the changed paths list. This was not detected by 'svnadmin verify' (which, in my opinion, is a bug) but was detected by 'svnsync' when a later revision rY referred to one of the missing children in copyfrom information. It turned out that the slave server was missing any children of A not listed in the changed-paths list in the rX revision file on the master server. What kind of consistency checks could prevent this from happening remains to be determined. Perhaps 'svnadmin verify' could cross-check the changed paths list with node-rev IDs which appear in the revision's directory listings? The revision rX could be fixed by removing the trailing newline from the filename in the parent directory's representation, and adjusting the rep's length and checksum information accordingly. About two dozen other revisions had to be adjusted as well because they listed the file 'alpha\n' as part of the 'A' directory rep (which is of course written out in its entirety whenever a revision changes some child of 'A'). Also, the changed-paths list of rX had to be extended to add the missing newly added children of A. Other revisions had to be checked for similar omissions in the changed-paths list. How the filename entered the repository is not known with absolute certainty. The current theory, based on discussion with the revision's author, is that the file was checked in using Eclipse with SVNKit 1.6 and was not refused by the Subversion server at the time. Attempting the reproduce this problem with a stock Subversion client in a naive way fails because libsvn_client refuses to add such files to version control in the first place: subversion/svn/add-cmd.c:86: (apr_err=160005) subversion/svn/util.c:621: (apr_err=160005) subversion/libsvn_client/add.c:1031: (apr_err=160005) subversion/libsvn_client/add.c:937: (apr_err=160005) subversion/libsvn_client/add.c:320: (apr_err=160005) subversion/libsvn_wc/adm_ops.c:1004: (apr_err=160005) subversion/libsvn_wc/adm_ops.c:679: (apr_err=160005) subversion/libsvn_subr/path.c:1231: (apr_err=160005) svn: E160005: Invalid control character '0x0a' in path '/tmp/wc/alpha\012' This error does not seem to be triggered by our test suite, which is bad because it would have allowed the SVNKit developers to prevent this problem from happening (they use our test suite to test their Java-based implementation of Subversion). There is a tab_test() in commit_tests.py, but that test only tries a tab, not a newline character. We should add some C tests as well to verify API behaviour at the client layer and at the repos layer. Given the ripple effects of this problem in FSFS revision files I think we should ensure that the Subversion server blocks such filenames from entering the repository (any repository, FSFS and BDB). It seems FSFS format changes would be required to support filenames with trailing newlines properly, an effort which isn't worth the gain in my opinion. And given the ripple effects seen in areas such as repository verification, svnsync, and ra_neon, I don't think we can afford to call this a supported use case until all components of the system have been fixed to handle filenames with trailing newlines properly. Comments and opinions are very welcome. I intend to make sure that this problem cannot happen with future Subversion releases, and welcome any input and help I can get.