On 2023-08-15 08:42, Neville Smythe via use-livecode wrote:
So if I understand Mark correctly, while one can create utf-8 encoded
filenames directly in a terminal
session, LC Server internally accesses Apache environment variables to
encode/decode the filename
before opening a file rather than directly using the shell. Presumably
this has something to do with
the engine being a server app having to respect the server environment.
So what is actually happening here is that there is a notion of a
'SysString' in the engine. A 'SysString' is a string represented as a
sequence of bytes in whatever encoding the host platform understands in
its APIs. The engine converts its internal string representation to a
sys string whenever it accesses a system API - e.g. for opening files.
In the case of Linux what encoding such 'sys strings' need to use
depends on the environment - the encoding *could* be anything and thus
the engine uses the UNIX 'iconv' library to convert from internal
representation to the encoded bytes needed. I think this is what is
causing the failure of the file APIs - iconv is refusing to convert a
string with non-ascii characters to the 'default' 'C' locale as it can't
(there is no mapping from, say, e-acute to ascii).
I should point out that textEncode/Decode do not use system APIs - the
conversions between UTF* forms and 'native' are all built into the
engine - so that part is fine - its the low-level connection between
commands like 'open file' and calling the UNIX open API which is
throwing an error on file name conversion.
On Dreamhost, as far as I can determine, the LANG and LC-ALL variables
are *not* set (though WordPress
is running and it adds support for a swathe of languages, so surely has
support for non-ascii filenames?)
The site is a shared hosting, so I do not have permissions to change
the Apache conf files. I tried adding
the SetEnv commands in the .htaccess file but that didn’t work,
although I could well be doing it wrong,
I am fumbling around in the dark here.
The only thing I've found so far is SetEnv which does look like it can
only be configured in the host config for a domain which is slightly
irksome. However, there is a way to launch the CGI engine with any vars
needed.
I'm not sure how Dreamhost sets things up - indeed it might be worth
asking their support if there is a way to configure environment
variables which are passed through to CGI executables.
If there isn't then it can be done with a launcher script:
```
#!/bin/sh
export LC_ALL="en_US.UTF8"
export LANG="en_US.UTF8"
exec livecode-server
```
This would be a text file which has been made executable - and needs to
be configured as the executable which is launched when a livecode server
script is launched (livecode-server in the above needs to be the
location of the livecode-server executable in the hosting setup).
I know others here use (or have used) Dreamhost in the past - so they
might know more about how the above could be configured (although,
again, Dreamhost support can probably help).
Unless there is some way to fix the configuration, it would seem that
not only will opening files
fail but the detailed files (the long files) command will also fail if
non-ascii characters are
encountered since it uses textEncode. I presume that using shell
commands could be used as a workaround
for accessing the filesystem, as long as LC doesn’t do an internal
textEncode as it passes the
variables to the shell!
However it also means one cannot use textDecode/Encode at all, not just
for the filenames but also
content; and that could be a bummer. I haven’t encountered this so far
because to this point I have
encoded content before uploading binary files to the server, but I can
envision situations where I
would want to encode or decode server-side.
The problem isn't with textEncode/Decode - they work fine as mentioned
above - its just the engine doesn't have the necessary information (due
to lack of env vars) to know how to interpret/create the filenames the
system APIs need.
I’m puzzled that this problem hasn’t been raised before. Surely the
vast majority of website host
providers use Linux servers, and the Dreamhost configuration for shared
hosting is most likely
standard. So has no-one in Europe (or Asia..) using LC Server wanted to
create native-language
filenames? I think LC Server is a magnificent tool, but perhaps it is
not as widely used as it
deserves! Or: they all found the fix and haven’t told us.
This is almost certainly a server setup/config thing - I guess apache
(by default) runs CGIs in the most 'raw' environment possible by
default.
The observation about Wordpress is interesting - certainly before PHP
was 'unicodified' - the encoding of filenames was up to the script -
i.e. you had to to encode/decode filenames appropriately yourself and I
guess utf-8 was just assumed. With PHP7 I believe it handles unicode
transparently a bit like LC does, so I'll see if I can see what PHP7+
uses to determine the system encoding. Indeed, it might do no harm at
all to just assume UTF-8 encoding for Linux in the engine if the locale
vars are not set (which appears to be the case here) which would resolve
the problem transparently.
Warmest Regards,
Mark.
--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Build Amazing Things
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode