Stefan Sperling wrote:

> C. Michael Pilato wrote:
>>  On 03/27/2013 11:03 AM, Stefan Sperling wrote:
>>  > That's fine. The fix I've committed and proposed for backport applies
>>  > the large hammer and blocks any control characters. If there's a case
>>  > to be made for relaxing this check I'm happy to do that.
>> 
>>  Sorry, but I'm -1 on that change.  How did this expand from "trailing
>>  newlines" to "any control character"?
> 
> An IRC discussion with Ben:
> http://colabti.org/irclogger/irclogger_log/svn-dev?date=2013-03-26#l479
> Who said:
>   "I'm not inclined to spend a bunch of time thinking through all the
>    implications of the control characters that we already don't allow
>    in the client lib."
> 
> I agree there. I also didn't want to test all control characters
> to see if FSFS can handle them. It seems safer to reject them outright.
> 
>>  Look, the FS allows things that the repos layer does not.  For example, it
>>  doesn't care about property encoding or formats, where the repository layer
>>  does.  *That's by design.*
> 
> Sure, but then we need an implementation that conforms to that design.
> We clearly don't have one, else it wouldn't choke on newlines.
> Fixing the implementation is going to be hard and we aren't even
> sure if any user of the API really cares.
> 
>>  It's perfectly okay for us to tell folks that if
>>  they modify their FS with something other than Subversion and end up with
>>  data that Subversion doesn't like, that's their own problem.
> 
> But (at least) with newlines, they can end up with data that the filesystem
> implementation itself doesn't like. So the API becomes entirely useless
> to them if they want to store filenames which contain newlines.
> 
> What about:
> 
> - We make the FS layer check for newlines (because we know FSFS can't cope)
> 
> - We make the repos layer check for control characters (because the
>    Subversion clients does, too, and we had problems with third-party
>    stuff like git-svn committing stuff which svn clients couldn't deal
>    with in the past)
> 
> That's some additional work, but I'm willing to do it.

My thoughts:

The key issue is we need to define 'valid path' for the FS layer, and follow 
through with documenting it, implementing run-time checks, and testing it.  If 
the domain of valid FS paths is greater than what Subversion's higher layers 
use, then 
we need unit testing at the FS layer to ensure we're 
supporting what we claim.  If however, we decide to use the same 
definition as in the layer above, then we don't need to do so much
 unit testing of this layer, because testing through the layer above 
should mostly suffice.  So there is a compromise between the theoretical
 independence which allows FS to have a different definition of valid 
filenames, and the practical issue that if we actually do make it 
different from the repos layer's definition then it's more work to 
maintain.

If we now decide that LF isn't supported, I will 
ask, what about CR?  That seems the next-most-likely to cause problems. 
 I can almost guarantee that CR will break the repos layer (I'm thinking
 about passing a list of paths to a hook script, on Windows), but it 
might be acceptable at the FS layer.  How are we going to decide whether LF is 
the only control character that we need to forbid?  Partly by 
analysis / thought / knowledge, and partly by testing, I hope.  The alternative 
is we decide that the benefit to the Subversion community of allowing other 
control characters in the FS layer is too small compared with the risks and 
effort, and so define that it shall no longer support any control characters.

As for "trailing newline", that's one specific problem that occurred in 
practice.  A test for that 
case is a *regression test*.  But it's plain that a newline anywhere in a
 filename is going to cause problems in FSFS, so the new rule will (at least) 
forbid 
newlines anywhere in a path.  Along with implementing a check for the new 
behaviour, we should be adding a new *unit 
test* to verify the new behaviour, whatever that behaviour shall be.  Since 
that unit test will clearly 
cover the "trailing newline" case, we don't really need the regression 
test as well.  We would do well to take more care in distinguishing unit
 tests from regression tests.

Something like the proposal above, to reject LF only at the FS (or FSFS)
 layer, and all control characters in libsvn_repos, sounds good to me.  
We should write unit tests to ensure that the FS layer works 
properly with all other control characters.

- Julian

Reply via email to