Re: Let's discuss about unicode compositions for filenames!

2012-02-17 Thread Hiroaki Nakamura
2012/2/17 Vincent Lefevre : > On 2012-02-17 13:54:35 +0900, Hiroaki Nakamura wrote: >> Actually, whether filename is in NFC or NFD depends on the way of >> inputting filenames. >> If you type all characters, it is in NFC. > > No, or actually, perhaps this depends on the user configuration > (e.g. k

Re: Let's discuss about unicode compositions for filenames!

2012-02-17 Thread Vincent Lefevre
On 2012-02-17 13:54:35 +0900, Hiroaki Nakamura wrote: > Actually, whether filename is in NFC or NFD depends on the way of > inputting filenames. > If you type all characters, it is in NFC. No, or actually, perhaps this depends on the user configuration (e.g. keyboard configuration / input method).

Re: Let's discuss about unicode compositions for filenames!

2012-02-16 Thread Hiroaki Nakamura
2012/2/17 Vincent Lefevre : > On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote: >> On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: >> > Are you seriously proposing that we /support/ such broken, hackish >> > nonsense? How do you expect users to tell the difference between file >> >

Re: Let's discuss about unicode compositions for filenames!

2012-02-16 Thread Vincent Lefevre
On 2012-01-30 21:29:41 +0100, Stefan Sperling wrote: > On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: > > Are you seriously proposing that we /support/ such broken, hackish > > nonsense? How do you expect users to tell the difference between file > > names that look identical on the

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Thomas Åkesson
On 12 feb 2012, at 16:59, Stefan Sperling wrote: > On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote: >> Would it make sense to formalize the different approaches into a >> couple of RFCs attempting to summarize the respective implications of >> each approach? I could try to write on

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Stefan Sperling
On Sun, Feb 12, 2012 at 04:47:45PM +0100, Thomas Åkesson wrote: > Would it make sense to formalize the different approaches into a > couple of RFCs attempting to summarize the respective implications of > each approach? I could try to write one up for the "Non-normalizing > approach". Detailed de

Re: Let's discuss about unicode compositions for filenames!

2012-02-12 Thread Thomas Åkesson
On 11 feb 2012, at 13:10, Hiroaki Nakamura wrote: > Hi, > > 2012/2/9 Thomas Åkesson : >> Hi, >> I have been interested in this issue for a couple of years and I remember it >> was discussed briefly at Subconf in Germany a couple of years ago. >> >> Branching the thread here because I'd like to

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/11 Branko Čibej : > On 11.02.2012 13:05, Hiroaki Nakamura wrote: >> 2012/2/9 Markus Schaber : >>> Von: Stefan Sperling [mailto:s...@elego.de] >>> On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: > [Upgrade options / backwards compatibility for proposed unicode > nor

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Branko Čibej
On 11.02.2012 13:05, Hiroaki Nakamura wrote: > 2012/2/9 Markus Schaber : >> Hi, >> >> Von: Stefan Sperling [mailto:s...@elego.de] >> >> On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: [Upgrade options / backwards compatibility for proposed unicode normalization fix] >>>

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
Hi, 2012/2/9 Thomas Åkesson : > Hi, > I have been interested in this issue for a couple of years and I remember it > was discussed briefly at Subconf in Germany a couple of years ago. > > Branching the thread here because I'd like to propose a different approach > than Hiroaki. This proposition

Re: Let's discuss about unicode compositions for filenames!

2012-02-11 Thread Hiroaki Nakamura
2012/2/9 Markus Schaber : > Hi, > > Von: Stefan Sperling [mailto:s...@elego.de] > > On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: >> > [Upgrade options / backwards compatibility for proposed unicode >> > normalization fix] > >> - Need to re-checkout existing working copies of t

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Thomas Åkesson
Hi, I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago. Branching the thread here because I'd like to propose a different approach than Hiroaki. This proposition is not very different from the note "uni

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Daniel Shahaf
Hiroaki Nakamura wrote on Thu, Feb 09, 2012 at 07:16:57 +0900: > 2012/2/9 Stefan Sperling : > >  - What happens if NFC/NFD is enabled in repository config, but the > >   repository contains non-normalised paths (i.e. did not go through > >   a dump/load cycle to normalise all paths)? > > I think w

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Hiroaki Nakamura
Hi, thanks for your review. 2012/2/9 Stefan Sperling : > Open questions: Here I try to answer these. Of course, I welcome everyone to answer. > >  - How can the client retrieve the configuration from the server? >   This is related to server-dictated configuration, see >   http://wiki.apache.org

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Stefan Sperling
On Thu, Feb 09, 2012 at 12:20:14AM +0900, Hiroaki Nakamura wrote: > 2012/1/30 Stefan Sperling : > > I think the following caveats would be acceptable if they help > > with fixing the issue: > > > >  - An upgrade path which optionally requires people to check all > >   working copies out again, when

Re: Let's discuss about unicode compositions for filenames!

2012-02-08 Thread Hiroaki Nakamura
2012/1/30 Stefan Sperling : > I think the following caveats would be acceptable if they help > with fixing the issue: > >  - An upgrade path which optionally requires people to check all >   working copies out again, when either the server or the client is upgraded. >   Note again, this must be _op

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Branko Čibej
On 07.02.2012 15:00, Stefan Sperling wrote: > On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote: >> The client-side mapping table is a more general solution, if a >> lot harder to implement. >> >> But it brings additional benefits in that we could use it to, e.g., >> transliterate charac

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Stefan Sperling
On Tue, Feb 07, 2012 at 02:43:19PM +0100, Branko Čibej wrote: > The client-side mapping table is a more general solution, if a > lot harder to implement. > > But it brings additional benefits in that we could use it to, e.g., > transliterate characters that are allowed by some file systems, but no

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Branko Čibej
On 07.02.2012 14:30, Hiroaki Nakamura wrote: > 2012/2/7 Branko Čibej : >> On 06.02.2012 22:26, Hiroaki Nakamura wrote: >>> The Unicode Standard says canonical equivalent sequences should be >>> interpreted the same way. >>> * 1.1 Canonical and Compatibility Equivalence >>> http://unicode.org/repo

Re: Let's discuss about unicode compositions for filenames!

2012-02-07 Thread Hiroaki Nakamura
2012/2/7 Branko Čibej : > On 06.02.2012 22:26, Hiroaki Nakamura wrote: >> The Unicode Standard says canonical equivalent sequences should be >> interpreted the same way. >> * 1.1 Canonical and Compatibility Equivalence >>   http://unicode.org/reports/tr15/#Canonical_Equivalence >> * 2.12 Equivalent

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Stefan Sperling
On Tue, Feb 07, 2012 at 06:26:54AM +0900, Hiroaki Nakamura wrote: > 2012/2/6 Stefan Sperling : > >  2) Do something else that effects repositories, too, and provide > >    a clean upgrade path for everyone (servers and clients). > >    AFAIK nobody has made a suggestion as to what could be done her

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Branko Čibej
On 06.02.2012 22:26, Hiroaki Nakamura wrote: > The Unicode Standard says canonical equivalent sequences should be > interpreted the same way. > * 1.1 Canonical and Compatibility Equivalence > http://unicode.org/reports/tr15/#Canonical_Equivalence > * 2.12 Equivalent Sequences and Normalization >

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Hiroaki Nakamura
2012/2/6 Stefan Sperling : > On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote: >> On 06.02.2012 14:10, Hiroaki Nakamura wrote: >> > Hi, all. >> > >> > It seems there is no further discussion. >> > >> > I think the conclusion for the short term solution is: >> > We convert unnormalized p

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Stefan Sperling
On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote: > On 06.02.2012 14:10, Hiroaki Nakamura wrote: > > Hi, all. > > > > It seems there is no further discussion. > > > > I think the conclusion for the short term solution is: > > We convert unnormalized paths to NFC normalized paths on clie

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Branko Čibej
On 06.02.2012 14:10, Hiroaki Nakamura wrote: > Hi, all. > > It seems there is no further discussion. > > I think the conclusion for the short term solution is: > We convert unnormalized paths to NFC normalized paths on clients only, > that is, svn_path_cstring_to_utf8. > > It is the same approach a

Re: Let's discuss about unicode compositions for filenames!

2012-02-06 Thread Hiroaki Nakamura
Hi, all. It seems there is no further discussion. I think the conclusion for the short term solution is: We convert unnormalized paths to NFC normalized paths on clients only, that is, svn_path_cstring_to_utf8. It is the same approach as utf8precompose_macosx_2.patch in http://subversion.tigris.

Re: Let's discuss about unicode compositions for filenames!

2012-02-04 Thread Hiroaki Nakamura
2012/2/3 Julian Foad : > You may well be correct that NFC is never longer than NFD, but that's not the > question.  The question is whether NFC may be longer than the current paths > (which are not normalized to normalization form C or to form D).  And the > answer is yes it may be longer.  See

Re: Let's discuss about unicode compositions for filenames!

2012-02-03 Thread Julian Foad
Hiroaki Nakamura wrote: >>> It would be nice if we could normalize paths in the repository without >>> having to perform a dump/reload cycle, but I don't know how that >>> would work in FSFS. >> >> It won't.  Changing the encoding increase the length (in bytes) of the >> string (in the dire

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Erik Huelsmann
On Thu, Feb 2, 2012 at 10:59 PM, Hiroaki Nakamura wrote: > 2012/2/3 Peter Samuelson : >> >>> On 02.02.2012 20:22, Peter Samuelson wrote: >>> > By proposing a client-only solution, I hope to avoid _all_ those >>> > questions. >> >> [Branko Cibej] >>> Can't see how that works, unless you either make

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Peter Samuelson : > >> On 02.02.2012 20:22, Peter Samuelson wrote: >> > By proposing a client-only solution, I hope to avoid _all_ those >> > questions. > > [Branko Cibej] >> Can't see how that works, unless you either make the client-side >> solution optional, create a mapping table, or m

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Peter Samuelson : > > [Hiroaki Nakamura] >> Existing repositories, I think it would be better to convert them too using >> svndump/svnload. And we change svnload to convert filenames to NFC. >> However in reality we cannot force users to convert every existing >> repository. > > Also note

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Branko Čibej
On 02.02.2012 21:28, Hiroaki Nakamura wrote: > 2012/2/3 Branko Čibej : >> On 02.02.2012 20:59, Hiroaki Nakamura wrote: >>> So we need to change servers too. When servers read filenames from >>> repositories, they first convert to NFC and then process commands. >> That won't work. You have to do the

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Daniel Shahaf
Hiroaki Nakamura wrote on Fri, Feb 03, 2012 at 05:33:02 +0900: > 2012/2/3 Daniel Shahaf : > > Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: > >> On 02.02.2012 20:22, Peter Samuelson wrote: > >> > [Hiroaki Nakamura] > >> >> In option (2), we do n12n on all clients on all platforms, and

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
> On 02.02.2012 20:22, Peter Samuelson wrote: > > By proposing a client-only solution, I hope to avoid _all_ those > > questions. [Branko Cibej] > Can't see how that works, unless you either make the client-side > solution optional, create a mapping table, or make name lookup on the > server agno

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Daniel Shahaf : > Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: >> On 02.02.2012 20:22, Peter Samuelson wrote: >> > [Hiroaki Nakamura] >> >> In option (2), we do n12n on all clients on all platforms, and we >> >> include web_dav_svn in "clients". So we convert all input paths

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
[Hiroaki Nakamura] > Existing repositories, I think it would be better to convert them too using > svndump/svnload. And we change svnload to convert filenames to NFC. > However in reality we cannot force users to convert every existing repository. Also note that if you convert a repository (via d

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Branko Čibej : > On 02.02.2012 20:59, Hiroaki Nakamura wrote: >> So we need to change servers too. When servers read filenames from >> repositories, they first convert to NFC and then process commands. > > That won't work. You have to do the initial lookup in a > normalization-agnostic way

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Daniel Shahaf
Branko Čibej wrote on Thu, Feb 02, 2012 at 21:03:47 +0100: > On 02.02.2012 20:22, Peter Samuelson wrote: > > [Hiroaki Nakamura] > >> In option (2), we do n12n on all clients on all platforms, and we > >> include web_dav_svn in "clients". So we convert all input paths to > >> the "server encoding",

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Branko Čibej
On 02.02.2012 20:59, Hiroaki Nakamura wrote: > So we need to change servers too. When servers read filenames from > repositories, they first convert to NFC and then process commands. That won't work. You have to do the initial lookup in a normalization-agnostic way, and neither BDB nor FSFS makes

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Branko Čibej
On 02.02.2012 20:22, Peter Samuelson wrote: > [Hiroaki Nakamura] >> In option (2), we do n12n on all clients on all platforms, and we >> include web_dav_svn in "clients". So we convert all input paths to >> the "server encoding", which is NFC. > Indeed. But the very concept of a "server encoding"

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Hiroaki Nakamura
2012/2/3 Peter Samuelson : > > [Hiroaki Nakamura] >> In option (2), we do n12n on all clients on all platforms, and we >> include web_dav_svn in "clients". So we convert all input paths to >> the "server encoding", which is NFC. > > Indeed.  But the very concept of a "server encoding" means we are

Re: Let's discuss about unicode compositions for filenames!

2012-02-02 Thread Peter Samuelson
[Hiroaki Nakamura] > In option (2), we do n12n on all clients on all platforms, and we > include web_dav_svn in "clients". So we convert all input paths to > the "server encoding", which is NFC. Indeed. But the very concept of a "server encoding" means we are involving the server side. Which in

Re: Let's discuss about unicode compositions for filenames!

2012-01-31 Thread Peter Samuelson
[reordering the conversation flow slightly] [Peter Samuelson] > > That's the implementation I would like to see, to be honest. Start > > with the observation that we can treat Mac OS X NFD paths as a > > client character encoding. Now observe that it is lossy. But > > ... almost all non-Unic

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 31.01.2012 02:47, Bert Huijben wrote: > Last time we discussed this in depth (a few years ago), Windows didn't > perform the normalization you describe here. > Was this added later? (Any documentation pointers?) Ouch, you're right ... Windows API doesn't normalize the paths. -- Brane

RE: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Bert Huijben
> -Original Message- > From: Branko Čibej [mailto:br...@xbc.nu] > Sent: maandag 30 januari 2012 16:11 > To: dev@subversion.apache.org > Subject: Re: Let's discuss about unicode compositions for filenames! > > On 31.01.2012 00:14, Peter Samuelson wrote: > &

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 31.01.2012 00:14, Peter Samuelson wrote: > [Stefan Sperling] >> It is indeed harder because we are passing paths verbatim to sqlite. >> I doubt having more than one form of a given path in wc.db is fun... > That's the implementation I would like to see, to be honest. Start > with the observatio

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson
[Stefan Sperling] > It is indeed harder because we are passing paths verbatim to sqlite. > I doubt having more than one form of a given path in wc.db is fun... That's the implementation I would like to see, to be honest. Start with the observation that we can treat Mac OS X NFD paths as a client

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Mon, Jan 30, 2012 at 09:34:03PM +0100, Branko Čibej wrote: > Sure, if you want to turn on such normalization, you pretty much have to > dump and reload the repository as well as upgrading all working copies > (again). Either that, or use form-independent comparison on the server, > which isn't s

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 21:29, Stefan Sperling wrote: > On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: >> Are you seriously proposing that we /support/ such broken, hackish >> nonsense? How do you expect users to tell the difference between file >> names that look identical on the character le

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Mon, Jan 30, 2012 at 09:09:22PM +0100, Branko Čibej wrote: > Are you seriously proposing that we /support/ such broken, hackish > nonsense? How do you expect users to tell the difference between file > names that look identical on the character level, but are not on the > code point level? > > S

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn
On Mon, Jan 30, 2012 at 9:09 PM, Branko Čibej wrote: > On 30.01.2012 21:00, Johan Corveleyn wrote: >> On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling wrote: >>> On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: 2012/1/30 Stefan Sperling : >> [ ... ] >> >>> And mixing various

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 21:00, Johan Corveleyn wrote: > On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling wrote: >> On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: >>> 2012/1/30 Stefan Sperling : > [ ... ] > >> And mixing various unicode forms works fine today if the filesystem >> used by t

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Johan Corveleyn
On Mon, Jan 30, 2012 at 8:10 PM, Stefan Sperling wrote: > On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: >> 2012/1/30 Stefan Sperling : [ ... ] > And mixing various unicode forms works fine today if the filesystem > used by the client supports this. The use case Neels contrive

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Tue, Jan 31, 2012 at 01:42:21AM +0900, Hiroaki Nakamura wrote: > 2012/1/30 Stefan Sperling : > > My friend is not willing to upgrade to a new client version yet, which > > is fine because all 1.x releases of Subversion clients are supposed > > to be compatible with all 1.y releases of Subversion

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Neels J Hofmeyr
On 01/30/2012 02:00 PM, Markus Schaber wrote: > Maybe the best solution to this issue is a client-only solution, in a similar > way the case sensitivity problem is tackled. Spinning the client-only thought a bit: Imagine a repos with a un*x user adding a file called "föö". Now an OSX user checks

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Julian Foad
Let me just note some of the main similarities and differences between this issue of Unicode compositions and the issue of case-sensitivity in file names. Differences:   * NFC and NFD look the same when displayed, and most users haven't heard of them and don't expect that a computer might treat

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Peter Samuelson
[Stefan Sperling] > > We could also open the parent directory, read all the filenames > > within it, normalise them all, and then search the resulting > > list. This works, expect if a name exists twice, once in NFC form > > and once in NFD form. We'd somehow have to solve the name collision > >

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Branko Čibej
On 30.01.2012 13:30, Stefan Sperling wrote: > On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote: >> Hi folks! >> >> I read the note about unicode compositions for filenames >> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames >> and would like t

Re: Let's discuss about unicode compositions for filenames!

2012-01-30 Thread Stefan Sperling
On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote: > Hi folks! > > I read the note about unicode compositions for filenames > http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames > and would like to drive the discussion. Hi, I am very happy to h