On Tue, Jul 19, 2011 at 21:46, Daniel Shahaf <d...@daniel.shahaf.name> wrote: > C. Michael Pilato wrote on Tue, Jul 19, 2011 at 09:45:09 -0400: >> On 07/19/2011 03:49 AM, Ivan Zhakov wrote: >> > On Mon, Jul 18, 2011 at 23:50, Daniel Shahaf <d...@daniel.shahaf.name> >> > wrote: >> >> Ivan notes on IRC that this fixes issues with non-ASCII lock tokens. >> >> (At least, he reports errors in 'svn ls -v' over http://; but for me >> >> 'ls' and 'info' work fine over svn://.) >> >> >> >> Are lock tokens even permitted to be non-ASCII? >> >> (both backends generate ASCII-only lock tokens (in the same manner)) >> >> >> >> If they are, are they required to be in UTF-8? Or can they be, say, >> >> arbitrary octet strings? >> >> >> > I didn't find explicit specification for lock token encoding, but it >> > handled everywhere as normal utf-8 string. >> > >> > That's why I decided to implement minimum patch just to decide lock >> > token returned from hook output to utf-8 string. >> > >> > But I'm fine to require lock token to be ASCII only string. >> >> Having this as an open question is unacceptable, so we need to make a >> decision about it. To my knowledge, ASCII is the only thing used for lock >> tokens in the code we ship. There's no telling what encoding could be in > > For the record, our code generates tokens in the form > "opaquelocktoken:$UUID". > >> use if folks are taking advantage of the pre-lock-hook token dictation >> feature -- Apache will give them a "C" locale environment, but sure anyone >> who has advanced enough configuration to require taking advantage of this >> feature is also setting the locale from inside those hooks. >> >> FWIW, I think UTF-8 is a fine choice, especially if we've been internally >> handling this string as UTF-8 already. But let's document the fact >> somewhere, please (preferably in the pre-lock-hook template comments). >> > > Looking again, svn_fs.h:2059 documents that lock tokens are URI's. > (and, consequently, are printable ASCII) > > We also need them to be XML-safe for their use in ra_dav LOCK responses. > > So, I'm thinking that: > > * We should validate the supplied lock tokens, not for being well-formed > UTF-8 as in Ivan's change, but for being ASCII-only. > > * We should move the validation to the FS layer. > > Thoughts? > I'm fine with this approach, but I think we should leave explicit validation of lock tokens in pre-lock hook handling for better diagnostic.
-- Ivan Zhakov