On 2023-08-15 08:42, Neville Smythe via use-livecode wrote:
So if I understand Mark correctly, while one can create utf-8 encoded filenames directly in a terminal session, LC Server internally accesses Apache environment variables to encode/decode the filename before opening a file rather than directly using the shell. Presumably this has something to do with
the engine being a server app having to respect the server environment.

So what is actually happening here is that there is a notion of a 'SysString' in the engine. A 'SysString' is a string represented as a sequence of bytes in whatever encoding the host platform understands in its APIs. The engine converts its internal string representation to a sys string whenever it accesses a system API - e.g. for opening files.

In the case of Linux what encoding such 'sys strings' need to use depends on the environment - the encoding *could* be anything and thus the engine uses the UNIX 'iconv' library to convert from internal representation to the encoded bytes needed. I think this is what is causing the failure of the file APIs - iconv is refusing to convert a string with non-ascii characters to the 'default' 'C' locale as it can't (there is no mapping from, say, e-acute to ascii).

I should point out that textEncode/Decode do not use system APIs - the conversions between UTF* forms and 'native' are all built into the engine - so that part is fine - its the low-level connection between commands like 'open file' and calling the UNIX open API which is throwing an error on file name conversion.

On Dreamhost, as far as I can determine, the LANG and LC-ALL variables are *not* set (though WordPress is running and it adds support for a swathe of languages, so surely has support for non-ascii filenames?) The site is a shared hosting, so I do not have permissions to change the Apache conf files. I tried adding the SetEnv commands in the .htaccess file but that didn’t work, although I could well be doing it wrong,
I am fumbling around in the dark here.

The only thing I've found so far is SetEnv which does look like it can only be configured in the host config for a domain which is slightly irksome. However, there is a way to launch the CGI engine with any vars needed.

I'm not sure how Dreamhost sets things up - indeed it might be worth asking their support if there is a way to configure environment variables which are passed through to CGI executables.

If there isn't then it can be done with a launcher script:

```
#!/bin/sh
export LC_ALL="en_US.UTF8"
export LANG="en_US.UTF8"
exec livecode-server
```

This would be a text file which has been made executable - and needs to be configured as the executable which is launched when a livecode server script is launched (livecode-server in the above needs to be the location of the livecode-server executable in the hosting setup).

I know others here use (or have used) Dreamhost in the past - so they might know more about how the above could be configured (although, again, Dreamhost support can probably help).


Unless there is some way to fix the configuration, it would seem that not only will opening files fail but the detailed files (the long files) command will also fail if non-ascii characters are encountered since it uses textEncode. I presume that using shell commands could be used as a workaround for accessing the filesystem, as long as LC doesn’t do an internal textEncode as it passes the
variables to the shell!
However it also means one cannot use textDecode/Encode at all, not just for the filenames but also content; and that could be a bummer. I haven’t encountered this so far because to this point I have encoded content before uploading binary files to the server, but I can envision situations where I
would want to encode or decode server-side.

The problem isn't with textEncode/Decode - they work fine as mentioned above - its just the engine doesn't have the necessary information (due to lack of env vars) to know how to interpret/create the filenames the system APIs need.

I’m puzzled that this problem hasn’t been raised before. Surely the vast majority of website host providers use Linux servers, and the Dreamhost configuration for shared hosting is most likely standard. So has no-one in Europe (or Asia..) using LC Server wanted to create native-language filenames? I think LC Server is a magnificent tool, but perhaps it is not as widely used as it
deserves! Or: they all found the fix and haven’t told us.

This is almost certainly a server setup/config thing - I guess apache (by default) runs CGIs in the most 'raw' environment possible by default.

The observation about Wordpress is interesting - certainly before PHP was 'unicodified' - the encoding of filenames was up to the script - i.e. you had to to encode/decode filenames appropriately yourself and I guess utf-8 was just assumed. With PHP7 I believe it handles unicode transparently a bit like LC does, so I'll see if I can see what PHP7+ uses to determine the system encoding. Indeed, it might do no harm at all to just assume UTF-8 encoding for Linux in the engine if the locale vars are not set (which appears to be the case here) which would resolve the problem transparently.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to