Re: ⎕FIO Buffer limit is 5000 Bytes

2020-11-03 Thread Dr . Jürgen Sauermann

  
  
Hi Hans-Peter,
  
  see below.
  
  Best Regards,
  Jürgen
  

On 11/2/20 8:02 PM, Hans-Peter Sorge
  wrote:


  
  Hello Jürgen,
  
  as far as some UTF study got me  I have 2 questions:
  
  Two bytes in range x8000 - x will be changed to x108000 -
  x10.  Correct?
  

No. The conversion works byte by byte. So if you have an invalid
sequence of bytes,
say ... 0x80 0x90 ... then the 0x80 (which is an invalid UTF start
byte) is translated
to 0x100080 iand the process restarts at 0x90 (which is again
invalid)...

So you should get ... 0x100080 0x100090 ...

  Could it happen, that the last byte in a file is an invalid UTF-8
  char - leading to x1080__ - x10FF__.  What then would be __?


That can of course happen and in that case the UTF start character
of the sequence (!) and not the offending
character is mapped to some 1000XX and the process is repeated after
the start character. This way when an error
is detected the conversion resynchronises itself at the earliest
possible time.
 
  Best Regards
  Hans-Peter 
  
  
  
  Am 02.11.20 um 14:51 schrieb Dr.
Jürgen Sauermann:
  
  

Hi,
  
  I have done some rework of the UTF8-to-Unicode conversion.
  It now maps incorrect characters in an UTF8 encoding to
  corresponding
  characters in the "Supplementary Private Use Area-B" (so that
  the
  offending character becomes available at APL level and can be
  recovered by subtracting 0x10 from the codepoint) rather
  than raisong
  an error.
  
  SVN 1352.
  
  Best Regards,
  Jürgen
  
  

On 11/2/20 10:05 AM, Hans-Peter
  Sorge wrote:


  
  Hi Jürgen,
  
  I agree. A  cat BIN_FILE  in a terminal session is of
  artistically value only.
  
  Best Regards
  Hans-Peter
  
Am 01.11.20 um 20:48 schrieb Dr. Jürgen Sauermann:
  
  

Hi Hans-Peter,
  
the result
  of an ⍎'ed command is the output of that command, normally
  one (nested) APL string for every line of command output.
  
  This requires that the command output can be represented
  as APL
  strings. This is the case for "normal" text output which
  must then
  be either normal ASCII or else UTF8-encoded.
  
  In theory one could have used raw bytes instead of UTF8
  encoded
  APL characters, but in most cases (and especially for
  interactive use
  cases) the current solution is more convenient since the
  result can
  be displayed directly (at least for text output).
  
  Best Regards,
  Jürgen
  
  

On 10/31/20 4:10 PM, Hans-Peter
  Sorge wrote:


  
  EIJHHH - never thought about it - COOOL.
  
  
  ⍝IBM APL:
   ⍎ ')HOST ls'
  VALUE ERROR
    )HOST ls
      ^
    ⍎')HOST ls'
   ^
  ⍝ GNU-APL:
  	l ← ⍎ ')HOST ls -1'
  And it works:-)) Makes life much easier.
  
  	f ← ⍎ ')HOST cat filename'
  ⍝ returns the file as nested vector
  ⍝ No intermediate file required.
  
  ⍝ Does not like binary data:
    ⍴⍎')HOST cat /OTH/APL/trunk/src/apl-Symbol.o'  
  Bad UTF8 string: 0x48 0x8B 0x45 0xD8 0x48 0x83 0xC0 0x30 0xEB 0x05 0xB8 at UCS_string.cc:120 . 
  
  
  
  doc/apl.info (incorrectly) reads:
  snip
  Like system commands, user-define commands can only be
  executed in immediate
  execution mode and not from user-defined functions
  or from ⍎.
  /snip
  
  
  A last thought:
  How to connect apl-command_line to host-stdin? Like
  &⍞ ← 'test test test' 
	⍎ ')HOST &0  > data_entry_from_apl'
  
  OK - Just a weekend. Asking too much:-)
  
  Best Regards
  Hans-Peter
   

Re: ⎕FIO Buffer limit is 5000 Bytes

2020-11-03 Thread Hans-Peter Sorge

Hello Jürgen,

thank your for the insight.

Best Regards
Hans-Peter

Am 03.11.20 um 11:29 schrieb Dr. Jürgen Sauermann:

Hi Hans-Peter,

see below.

Best Regards,
Jürgen


On 11/2/20 8:02 PM, Hans-Peter Sorge wrote:

Hello Jürgen,

as far as some UTF study got me  I have 2 questions:

Two bytes in range x8000 - x will be changed to x108000 - 
x10.  Correct?


No. The conversion works byte by byte. So if you have an invalid 
sequence of bytes,
say ... 0x80 0x90 ... then the 0x80 (which is an invalid UTF start 
byte) is translated

to 0x100080 iand the process restarts at 0x90 (which is again invalid)...

So you should get ... 0x100080 0x100090 ...
Could it happen, that the last byte in a file is an invalid UTF-8 
char - leading to x1080__ - x10FF__.  What then would be __?


That can of course happen and in that case the UTF start character of 
the sequence (!) and not the offending
character is mapped to some 1000XX and the process is repeated after 
the start character. This way when an error
is detected the conversion resynchronises itself at the earliest 
possible time.


Best Regards
Hans-Peter



Am 02.11.20 um 14:51 schrieb Dr. Jürgen Sauermann:

Hi,

I have done some rework of the UTF8-to-Unicode conversion.
It now maps incorrect characters in an UTF8 encoding to corresponding
characters in the "Supplementary Private Use Area-B" (so that the
offending character becomes available at APL level and can be
recovered by subtracting 0x10 from the codepoint) rather than 
raisong

an error.

*SVN 1352*.

Best Regards,
Jürgen



On 11/2/20 10:05 AM, Hans-Peter Sorge wrote:

Hi Jürgen,

I agree. A *cat BIN_FILE*  in a terminal session is of artistically 
value only.


Best Regards
Hans-Peter

Am 01.11.20 um 20:48 schrieb Dr. Jürgen Sauermann:

Hi Hans-Peter,

the result of an ⍎'ed command is the output of that command, normally
one (nested) APL string for every line of command output.

This requires that the command output can be represented as APL
strings. This is the case for "normal" text output which must then
be either normal ASCII or else UTF8-encoded.

In theory one could have used raw bytes instead of UTF8 encoded
APL characters, but in most cases (and especially for interactive use
cases) the current solution is more convenient since the result can
be displayed directly (at least for text output).

Best Regards,
Jürgen



On 10/31/20 4:10 PM, Hans-Peter Sorge wrote:

EIJHHH - never thought about it - COOOL.


⍝IBM APL:
 ⍎ ')HOST ls'
VALUE ERROR
  )HOST ls
    ^
  ⍎')HOST ls'
 ^
⍝ GNU-APL:
l ← ⍎ ')HOST ls -1'
And it works:-)) Makes life much easier.

f ← ⍎ ')HOST cat filename'
⍝ returns the file as nested vector
⍝ No intermediate file required.

⍝ Does not like binary data:
  ⍴⍎')HOST cat /OTH/APL/trunk/src/apl-Symbol.o'
Bad UTF8 string: 0x48 0x8B 0x45 0xD8 0x48 0x83 0xC0 0x30 0xEB 
0x05 0xB8 at UCS_string.cc:120 .




doc/apl.info (incorrectly) reads:
snip
Like system commands, user-define commands can only be executed 
in immediate

execution mode and *not* from user-defined functions or *from ⍎.*
/snip


A last thought:
How to connect apl-command_line to host-stdin? Like
 &⍞ ← 'test test test'
⍎ ')HOST &0  > data_entry_from_apl'

OK - Just a weekend. Asking too much:-)

Best Regards
Hans-Peter






Am 31.10.20 um 14:39 schrieb Dr. Jürgen Sauermann:

Hi,

On 10/30/20 6:14 PM, Kacper Gutowski wrote:
On Fri, Oct 30, 2020 at 02:34:35PM +0100, Dr. Jürgen Sauermann 
wrote:
There is also ⎕FIO[26] which reads an entire file, but I am 
not sure how it works with popen()ed streams.


It doesn't at all because it takes a path which additionally 
needs to be a regular file because it's mmaped rather than read.


For the record, something like ⍎')HOST ...' might sometimes be 
practical.


-k
That is actually a cool idea: run your pipe or program with 
)HOST, forward the
output into some /tmp/xxx and read back /tmp/xxx. It also gives 
you some more
control over using stdout, stderr, or both from the executed 
program .


Jürgen

















Fwd: Unexpected error message when allocating oversized array

2020-11-03 Thread Elias Mårtenson
Instead of the expected WS FULL, I got the following error when trying to
sum a large array:

*  +/ ,9 (⌊n÷9) ⍴ ⍳n←200*
Value_P::Value_P(ShapeItem len, const char * loc) failed at Value_P.icc:107
(caller: Bif_F12_INDEX_OF.cc:44)
 what: std::bad_alloc
 initial sbrk(): 0xeef000
 current sbrk(): 0xeef000
 alloc_size: 0x6fc23ac000 (4800)
 used memory:0x14548 (83272)
WS FULL+
  +/,9(⌊n÷9)⍴⍳n←200

Regards,
Elias