C. Michael Pilato wrote on Tue, Jul 19, 2011 at 09:45:09 -0400:
> On 07/19/2011 03:49 AM, Ivan Zhakov wrote:
> > On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d...@daniel.shahaf.name> 
> > wrote:
> >> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens.
> >> (At least, he reports errors in 'svn ls -v' over http://; but for me
> >> 'ls' and 'info' work fine over svn://.)
> >>
> >> Are lock tokens even permitted to be non-ASCII?
> >> (both backends generate ASCII-only lock tokens (in the same manner))
> >>
> >> If they are, are they required to be in UTF-8?  Or can they be, say,
> >> arbitrary octet strings?
> >>
> > I didn't find explicit specification for lock token encoding, but it
> > handled everywhere as normal utf-8 string.
> > 
> > That's why I decided to implement minimum patch just to decide lock
> > token returned from hook output to utf-8 string.
> > 
> > But I'm fine to require lock token to be ASCII only string.
> 
> Having this as an open question is unacceptable, so we need to make a
> decision about it.  To my knowledge, ASCII is the only thing used for lock
> tokens in the code we ship.  There's no telling what encoding could be in

For the record, our code generates tokens in the form
"opaquelocktoken:$UUID".

> use if folks are taking advantage of the pre-lock-hook token dictation
> feature -- Apache will give them a "C" locale environment, but sure anyone
> who has advanced enough configuration to require taking advantage of this
> feature is also setting the locale from inside those hooks.
> 
> FWIW, I think UTF-8 is a fine choice, especially if we've been internally
> handling this string as UTF-8 already.  But let's document the fact
> somewhere, please (preferably in the pre-lock-hook template comments).
> 

Looking again, svn_fs.h:2059 documents that lock tokens are URI's.
(and, consequently, are printable ASCII)

We also need them to be XML-safe for their use in ra_dav LOCK responses.

So, I'm thinking that:

* We should validate the supplied lock tokens, not for being well-formed
  UTF-8 as in Ivan's change, but for being ASCII-only.

* We should move the validation to the FS layer.

Thoughts?


> -- 
> C. Michael Pilato <cmpil...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 


Reply via email to