Thanks Mark and Matthias

I think it is clear the problem is not related to variant forms - if I replace 
[e-acute] by any other non-ascii character, such as a Kanji character or emoji, 
I get the same “can’t open that file” error. And the weird decoding of 
[e-acute] to [E-grave] would be explained if textDecode is failing in LC Server.

So if I understand Mark correctly, while one can create utf-8 encoded filenames 
directly in a terminal session,  LC Server internally accesses Apache 
environment variables to encode/decode the filename before opening a file 
rather than directly using the shell. Presumably this has something to do with 
the engine being a server app having to respect the server environment.  

On Dreamhost, as far as I can determine, the LANG and LC-ALL variables are 
*not* set (though WordPress is running and it adds support for a swathe of 
languages, so surely has support for non-ascii filenames?) The site is a shared 
hosting, so I do not have permissions to change the Apache conf files. I tried 
adding the SetEnv commands in the .htaccess file but that didn’t work, although 
I could well be doing it wrong, I am fumbling around in the dark here.

Unless there is some way to fix the configuration, it would seem that not only 
will opening files fail but the detailed files (the long files) command will 
also fail if non-ascii characters are encountered since it uses textEncode. I 
presume that using shell commands could be used as a workaround for accessing 
the filesystem, as long as LC doesn’t do an internal textEncode as it passes 
the variables to the shell! 

However it also means one cannot use textDecode/Encode at all, not just for the 
filenames but also content; and that could be a bummer. I haven’t encountered 
this so far because to this point I have encoded content before uploading 
binary files to the server, but I can envision situations where I would want to 
encode or decode server-side.

I’m puzzled that this problem hasn’t been raised before. Surely the vast 
majority of website host providers use Linux servers, and the Dreamhost 
configuration for shared hosting is most likely standard. So has no-one in 
Europe (or Asia..) using LC Server wanted to create native-language filenames? 
I think LC Server is a magnificent tool, but perhaps it is not as widely used 
as it deserves! Or: they all found the fix and haven’t told us.

> So, when you run lc-server from a terminal session directly, its almost 
> certainly the case that the LC_ALL and LANG environment variables are 
> set to en_US.UTF-8 (or some other language code DOT UTF-8 - it is the 
> UTF-8 which is the important bit).
> 
> On Linux, a C API nl_langinfo() is used to fetch the encoding to use 
> when talking to the system APIs (e.g. filesystem APIs) - this (I 
> believe) derives its information from LANG/LC_ALL.
> 
> If the latter *are not set* then it will likely default to the 'C' 
> locale which has no interpretation of any non-ascii chars, and thus 
> attempts to encode/decode utf-8 encoded filenames will fail.
> 
> My theory is that these variables are not set in the configuration for 
> running CGIs in Apache (or whatever web server is being used in this 
> instance).
> 
> Digging around it looks like Apache (at least) has a `SetEnv` directive 
> which would allow these environment variables to be set, e.g.
> 
>   SetEnv LC_ALL en_US.UTF-8
>   SetEnv LANG en_US.UTF-8
> 
> Although I'm not 100% sure where such things go, perhaps someone more 
> conversant with apache config could chime in to suggest.
Neville Smythe




_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to