Stefan Sperling wrote: > C. Michael Pilato wrote: >> On 03/27/2013 11:03 AM, Stefan Sperling wrote: >> > That's fine. The fix I've committed and proposed for backport applies >> > the large hammer and blocks any control characters. If there's a case >> > to be made for relaxing this check I'm happy to do that. >> >> Sorry, but I'm -1 on that change. How did this expand from "trailing >> newlines" to "any control character"? > > An IRC discussion with Ben: > http://colabti.org/irclogger/irclogger_log/svn-dev?date=2013-03-26#l479 > Who said: > "I'm not inclined to spend a bunch of time thinking through all the > implications of the control characters that we already don't allow > in the client lib." > > I agree there. I also didn't want to test all control characters > to see if FSFS can handle them. It seems safer to reject them outright. > >> Look, the FS allows things that the repos layer does not. For example, it >> doesn't care about property encoding or formats, where the repository layer >> does. *That's by design.* > > Sure, but then we need an implementation that conforms to that design. > We clearly don't have one, else it wouldn't choke on newlines. > Fixing the implementation is going to be hard and we aren't even > sure if any user of the API really cares. > >> It's perfectly okay for us to tell folks that if >> they modify their FS with something other than Subversion and end up with >> data that Subversion doesn't like, that's their own problem. > > But (at least) with newlines, they can end up with data that the filesystem > implementation itself doesn't like. So the API becomes entirely useless > to them if they want to store filenames which contain newlines. > > What about: > > - We make the FS layer check for newlines (because we know FSFS can't cope) > > - We make the repos layer check for control characters (because the > Subversion clients does, too, and we had problems with third-party > stuff like git-svn committing stuff which svn clients couldn't deal > with in the past) > > That's some additional work, but I'm willing to do it.
My thoughts: The key issue is we need to define 'valid path' for the FS layer, and follow through with documenting it, implementing run-time checks, and testing it. If the domain of valid FS paths is greater than what Subversion's higher layers use, then we need unit testing at the FS layer to ensure we're supporting what we claim. If however, we decide to use the same definition as in the layer above, then we don't need to do so much unit testing of this layer, because testing through the layer above should mostly suffice. So there is a compromise between the theoretical independence which allows FS to have a different definition of valid filenames, and the practical issue that if we actually do make it different from the repos layer's definition then it's more work to maintain. If we now decide that LF isn't supported, I will ask, what about CR? That seems the next-most-likely to cause problems. I can almost guarantee that CR will break the repos layer (I'm thinking about passing a list of paths to a hook script, on Windows), but it might be acceptable at the FS layer. How are we going to decide whether LF is the only control character that we need to forbid? Partly by analysis / thought / knowledge, and partly by testing, I hope. The alternative is we decide that the benefit to the Subversion community of allowing other control characters in the FS layer is too small compared with the risks and effort, and so define that it shall no longer support any control characters. As for "trailing newline", that's one specific problem that occurred in practice. A test for that case is a *regression test*. But it's plain that a newline anywhere in a filename is going to cause problems in FSFS, so the new rule will (at least) forbid newlines anywhere in a path. Along with implementing a check for the new behaviour, we should be adding a new *unit test* to verify the new behaviour, whatever that behaviour shall be. Since that unit test will clearly cover the "trailing newline" case, we don't really need the regression test as well. We would do well to take more care in distinguishing unit tests from regression tests. Something like the proposal above, to reject LF only at the FS (or FSFS) layer, and all control characters in libsvn_repos, sounds good to me. We should write unit tests to ensure that the FS layer works properly with all other control characters. - Julian