Hello Jürgen,
thank your for the insight.
Best Regards
Hans-Peter
Am 03.11.20 um 11:29 schrieb Dr. Jürgen Sauermann:
Hi Hans-Peter,
see below.
Best Regards,
Jürgen
On 11/2/20 8:02 PM, Hans-Peter Sorge wrote:
Hello Jürgen,
as far as some UTF study got me I have 2 questions:
Two bytes in range x8000 - x will be changed to x108000 -
x10. Correct?
No. The conversion works byte by byte. So if you have an invalid
sequence of bytes,
say ... 0x80 0x90 ... then the 0x80 (which is an invalid UTF start
byte) is translated
to 0x100080 iand the process restarts at 0x90 (which is again invalid)...
So you should get ... 0x100080 0x100090 ...
Could it happen, that the last byte in a file is an invalid UTF-8
char - leading to x1080__ - x10FF__. What then would be __?
That can of course happen and in that case the UTF start character of
the sequence (!) and not the offending
character is mapped to some 1000XX and the process is repeated after
the start character. This way when an error
is detected the conversion resynchronises itself at the earliest
possible time.
Best Regards
Hans-Peter
Am 02.11.20 um 14:51 schrieb Dr. Jürgen Sauermann:
Hi,
I have done some rework of the UTF8-to-Unicode conversion.
It now maps incorrect characters in an UTF8 encoding to corresponding
characters in the "Supplementary Private Use Area-B" (so that the
offending character becomes available at APL level and can be
recovered by subtracting 0x10 from the codepoint) rather than
raisong
an error.
*SVN 1352*.
Best Regards,
Jürgen
On 11/2/20 10:05 AM, Hans-Peter Sorge wrote:
Hi Jürgen,
I agree. A *cat BIN_FILE* in a terminal session is of artistically
value only.
Best Regards
Hans-Peter
Am 01.11.20 um 20:48 schrieb Dr. Jürgen Sauermann:
Hi Hans-Peter,
the result of an ⍎'ed command is the output of that command, normally
one (nested) APL string for every line of command output.
This requires that the command output can be represented as APL
strings. This is the case for "normal" text output which must then
be either normal ASCII or else UTF8-encoded.
In theory one could have used raw bytes instead of UTF8 encoded
APL characters, but in most cases (and especially for interactive use
cases) the current solution is more convenient since the result can
be displayed directly (at least for text output).
Best Regards,
Jürgen
On 10/31/20 4:10 PM, Hans-Peter Sorge wrote:
EIJHHH - never thought about it - COOOL.
⍝IBM APL:
⍎ ')HOST ls'
VALUE ERROR
)HOST ls
^
⍎')HOST ls'
^
⍝ GNU-APL:
l ← ⍎ ')HOST ls -1'
And it works:-)) Makes life much easier.
f ← ⍎ ')HOST cat filename'
⍝ returns the file as nested vector
⍝ No intermediate file required.
⍝ Does not like binary data:
⍴⍎')HOST cat /OTH/APL/trunk/src/apl-Symbol.o'
Bad UTF8 string: 0x48 0x8B 0x45 0xD8 0x48 0x83 0xC0 0x30 0xEB
0x05 0xB8 at UCS_string.cc:120 .
doc/apl.info (incorrectly) reads:
snip
Like system commands, user-define commands can only be executed
in immediate
execution mode and *not* from user-defined functions or *from ⍎.*
/snip
A last thought:
How to connect apl-command_line to host-stdin? Like
&⍞ ← 'test test test'
⍎ ')HOST &0 > data_entry_from_apl'
OK - Just a weekend. Asking too much:-)
Best Regards
Hans-Peter
Am 31.10.20 um 14:39 schrieb Dr. Jürgen Sauermann:
Hi,
On 10/30/20 6:14 PM, Kacper Gutowski wrote:
On Fri, Oct 30, 2020 at 02:34:35PM +0100, Dr. Jürgen Sauermann
wrote:
There is also ⎕FIO[26] which reads an entire file, but I am
not sure how it works with popen()ed streams.
It doesn't at all because it takes a path which additionally
needs to be a regular file because it's mmaped rather than read.
For the record, something like ⍎')HOST ...' might sometimes be
practical.
-k
That is actually a cool idea: run your pipe or program with
)HOST, forward the
output into some /tmp/xxx and read back /tmp/xxx. It also gives
you some more
control over using stdout, stderr, or both from the executed
program .
Jürgen