Joel Rees said: > Now, what would you do with this? > > ジョエル > > Why not decompose it to the following? > > ジョエル
Because it is not what Unicode normalization is. > I know what the Unicode rules say, but my boss says, if I'm going to > play with file names, he wants it done his way. And now you suggest that idea of enforcing local filename policy is bad idea because local filename policy might not be sane. Ok. First, let's decouple NFD suggestion from local policy. Again, no problems with NFD here. I don't really see any sense in local policy that demands this conversion, but if your boss needs it, it is not my business. I can't get why mention it though: it is completely unrelated problem. > You have to keep rules about making file names for internal use > separate from rules about storing filenames received, or the internal > system loses its meaning. And now you speak of normalization or of local policy? At any rate, any incoming file has a name, which is encoded somehow. It may be encoded in utf-16le, for example. Now, either you store a filename that you can't read without using iconv or another tool of a kind, or you convert the name to your locale. If your locale happens to use utf-8, you still have to convert byte sequence to another byte sequence. The conversion I proposed would convert destructively, but maintaining Unicode equivalence, so aside from subtle technical (choice of canonical form) the set of glyphs that makes the filename would remain exactly the same. This is not even a policy, just consistent representation. -- Dmitrij D. Czarkoff