Re: [Bug-apl] ⎕RE merged

2017-10-10 Thread Juergen Sauermann

  
  
Hi Elias,
  
  thanks, fixed in SVN 1013.
  
  /// Jürgen
  

On 10/09/2017 04:41 AM, Elias Mårtenson
  wrote:


  Thank you.


There are some errors when compiling on my Arch system:



  g++ -DHAVE_CONFIG_H -I.
  -I..    -Wall -I sql -Wold-style-cast -Werror
  -I/usr/include -I/usr/include  -rdynamic -g -O2 -MT
  apl-Quad_RE.o -MD -MP -MF .deps/apl-Quad_RE.Tpo -c -o
  apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo
  './'`Quad_RE.cc
  Quad_RE.cc: In static
  member function ‘static Value_P
  Quad_RE::partition_result(const Regexp&, const
  Quad_RE::Flags&, const UCS_string&)’:
  Quad_RE.cc:211:42:
  error: comparison between signed and unsigned integer
  expressions [-Werror=sign-compare]
      for (ShapeItem
  match_id = 1; B_offset < len; match_id += match_id_inc)
                         
           ~^
  cc1plus: all warnings
  being treated as errors
  make[3]: ***
  [Makefile:2725: apl-Quad_RE.o] Error 1
  make[3]: Leaving
  directory '/home/emartenson/src/apl/src'
  make[2]: ***
  [Makefile:: all-recursive] Error 1
  make[2]: Leaving
  directory '/home/emartenson/src/apl/src'
  make[1]: ***
  [Makefile:514: all-recursive] Error 1
  make[1]: Leaving
  directory '/home/emartenson/src/apl'
  make: ***
  [Makefile:401: all] Error 2



Regards,
Elias
  
  
On 9 October 2017 at 00:47, Juergen
  Sauermann 
  wrote:
  
 Hi,

I have merged Elias' ⎕RE implementation into GNU
APL.
Thanks, Elias, for contributing it. See 'info apl'
for a description
and src/testcases/Quad_RE.tc for examples
of how to use ⎕RE.

SVN 1012.

Enjoy,
/// Jürgen

   
  


  


  




Re: [Bug-apl] ⎕RE merged

2017-10-10 Thread Juergen Sauermann

  
  
Hi Elias,
  
  thanks, fixed in SVN 1013.
  
  /// Jürgen


On 10/09/2017 05:12 AM, Elias Mårtenson
  wrote:


  I found another bug. ↓ is used to indicate that
string indexes are requested, but the error message when
multiple output types are requested is wrong:



        "foo"
⎕RE["⊂↓"] "bar"
  DOMAIN
  ERROR+
       
  'foo' ⎕RE['⊂↓']'bar'
   
    ^             ^
        )more
  Multiple
  ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/
  
  

Note the ⍳ in the error message instead of ↓.


Regards,
Elias
  
  
On 9 October 2017 at 10:45, Elias
  Mårtenson 
  wrote:
  
I fixed the problem by adding a static_cast(len),
  but I found another issue: The testcases file is missing.
  
  
  Regards,
  Elias


  

  On 9 October 2017 at 10:41,
Elias Mårtenson 
wrote:

  Thank you.


There are some errors when compiling on my
  Arch system:



  g++
  -DHAVE_CONFIG_H -I. -I..    -Wall -I sql
  -Wold-style-cast -Werror -I/usr/include
  -I/usr/include  -rdynamic -g -O2 -MT
  apl-Quad_RE.o -MD -MP -MF
  .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o
  `test -f 'Quad_RE.cc' || echo
  './'`Quad_RE.cc
  Quad_RE.cc:
  In static member function ‘static Value_P
  Quad_RE::partition_result(const
  Regexp&, const Quad_RE::Flags&,
  const UCS_string&)’:
  Quad_RE.cc:211:42:
  error: comparison between signed and
  unsigned integer expressions
  [-Werror=sign-compare]
      for
  (ShapeItem match_id = 1; B_offset <
  len; match_id += match_id_inc)
         
                           ~^
  cc1plus:
  all warnings being treated as errors
  make[3]:
  *** [Makefile:2725: apl-Quad_RE.o] Error 1
  make[3]:
  Leaving directory
  '/home/emartenson/src/apl/src'
  make[2]:
  *** [Makefile:: all-recursive] Error 1
  make[2]:
  Leaving directory
  '/home/emartenson/src/apl/src'
  make[1]:
  *** [Makefile:514: all-recursive] Error 1
  make[1]:
  Leaving directory
  '/home/emartenson/src/apl'
  make:
  *** [Makefile:401: all] Error 2



Regards,
Elias
  
  

  
On 9 October 2017
  at 00:47, Juergen Sauermann 
  wrote:
  
 Hi,

I have merged Elias' ⎕RE
implementation into GNU APL.
Thanks, Elias, for contributing it.
See 'info apl' for a
description
and src/testcases/Quad_RE.tc
for examples of how to use ⎕RE.

SVN 1012.

Enjoy,
/// Jürgen

  

Re: [Bug-apl] ⎕RE merged

2017-10-10 Thread Juergen Sauermann

  
  
Hi Elias,
  
  thanks, fixed in SVN 1013.
  
  /// Jürgen


On 10/09/2017 10:11 AM, Elias Mårtenson
  wrote:


  One more bug:


The call to pcre2_compile_32 should be changed from:

  

     code =
  pcre2_compile_32(pattern_ucs, pattern.size(),
                         
     PCRE2_NO_UTF_CHECK | flags, &error_code,
                         
     &error_offset, 0);



To:



     code =
  pcre2_compile_32(pattern_ucs, pattern.size(),
                         
     PCRE2_UTF | PCRE2_UCP | flags,
  &error_code,
                         
     &error_offset, 0);



Without PCRE2_UTF, proper Unicode semantics will
  not be applied (such as properly handling case matching for
  non-ASCII characters).


PCRE2_UCP, is a little less obvious. I think it
  would make sense to enable it, since we care more for
  correctness than performance. Here's what the documentation
  has to say about it:



  “This option changes the way PCRE2 processes \B, \b,
  \D, \d, \S, \s, \W, \w, and some of the POSIX character
  classes. By default, only ASCII characters are recognized,
  but if PCRE2_UCP is set, Unicode properties are used
  instead to classify characters. More details are given in
  the section on generic character types in the pcre2pattern
  page. If you set PCRE2_UCP, matching one of the items it
  affects takes much longer.”



Finally, I don't think it makes sense to use PCRE2_NO_UTF_CHECK since
  at best it's a no-op (since we're using UTF-32) and at worst
  it can cause a crash when trying to match an invalid string.
  That's not worth what little performance benefit there is to
  gain from it.


Regards,
Elias
  
  
On 9 October 2017 at 11:12, Elias
  Mårtenson 
  wrote:
  
I found another bug. ↓ is used to indicate
  that string indexes are requested, but the error message
  when multiple output types are requested is wrong:
  
  
  
      "foo"
  ⎕RE["⊂↓"] "bar"
DOMAIN
ERROR+
 
    'foo' ⎕RE['⊂↓']'bar'
      ^             ^
      )more
Multiple
⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/


  
  Note the ⍳ in the error message instead of ↓.
  
  
  Regards,
  Elias


  

  On 9 October 2017 at 10:45,
Elias Mårtenson 
wrote:

  I fixed the problem by adding a static_cast(len),
but I found another issue: The testcases file is
missing.


Regards,
Elias
  
  

  
On 9 October 2017
  at 10:41, Elias Mårtenson 
  wrote:
  
Thank you.
  
  
  There are some errors when
compiling on my Arch system:
  
  
  
g++ -DHAVE_CONFIG_H
-I. -I..    -Wall -I sql
-Wold-style-cast -Werror
-I/usr/include -I/usr/include 
-rdynamic -g -O2 -MT
apl-Quad_RE.o -MD -MP -MF
.deps/apl-Quad_RE.Tpo -c -o
apl-Quad_RE.o `test -f
'Quad_RE.cc' || echo
'./'`Quad_RE.cc
Quad_RE.cc: In static
member function ‘static Valu

Re: [Bug-apl] ⎕RE merged

2017-10-10 Thread Juergen Sauermann

  
  
Hi Jay,
  
  thanks, done.
  
  Normally the doc subdir (e.g. in the  savannah SVN repsitory)
  contains the latest version of this file,
  and I sometimes (read: usuaally) forget to also commit it to the
  GNU web repository.
  
/// Jürgen


On 10/09/2017 11:02 AM, Jay Foad wrote:


  Could you please update https://www.gnu.org/software/apl/apl.html
? Or will it update automatically in due course?


Thanks,
Jay.

  On 8 October 2017 at 17:47, Juergen
Sauermann 
wrote:

   Hi,
  
  I have merged Elias' ⎕RE implementation into
  GNU APL.
  Thanks, Elias, for contributing it. See 'info apl'
  for a description
  and src/testcases/Quad_RE.tc for
  examples of how to use ⎕RE.
  
  SVN 1012.
  
  Enjoy,
  /// Jürgen
  
 

  
  

  


  




Re: [Bug-apl] ⎕RE merged

2017-10-10 Thread Juergen Sauermann

  
  
Hi Elias,
  
  thanks, fixed in SVN 1013.
  
  /// Jürgen
  
  

On 10/09/2017 11:46 AM, Elias Mårtenson
  wrote:


  One more issue. The last snippet in the info manual
for regexp (great work, and thanks for doing it, by the way)
looks really weird, probably because the content is too wide.


Regards,
Elias
  
  
On 9 October 2017 at 17:02, Jay Foad 
  wrote:
  
Could you please update https://www.gnu.org/software/apl/apl.html
  ? Or will it update automatically in due course?
  
  
  Thanks,
  Jay.
  

  
On 8 October 2017 at 17:47,
  Juergen Sauermann 
  wrote:
  
 Hi,

I have merged Elias' ⎕RE
implementation into GNU APL.
Thanks, Elias, for contributing it. See 'info
  apl' for a description
and src/testcases/Quad_RE.tc
for examples of how to use ⎕RE.

SVN 1012.

Enjoy,
/// Jürgen

   
  


  

  

  


  


  




Re: [Bug-apl] Monadic form of ↓

2017-10-10 Thread Juergen Sauermann

  
  
Hi Elias,
  
  I believe ↓ for 1↓ is too trivial to be useful.
  
  Unoccupied variants of APL primitives (like monadic ↓ or monadic =)
  are
  a very scarce resource that we should not use for trivial things.

/// Jürgen


On 10/09/2017 11:06 AM, Elias Mårtenson
  wrote:


  I was thinking about the usefulness of a monadic ↓
in terms of the new regexp feature. In the current version, when
using subexpressions, the return value is always 1+the number of
subexpressions, where the first one is always the full matched
string. Monadic ↓ would be a neat way of dropping that part.


In any case, my point is that monadic ↓ should do something
  useful. I guess split is one such useful thing.


In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most
  efficient way?


Regards,
Elias
  
  
On 9 October 2017 at 16:58, Jay Foad 
  wrote:
  

  
On 9 October
2017 at 04:56, Elias Mårtenson 
wrote:

  Currently, monadic ↑ acts as if it
was called dyadically with 1 as its left
argument,



  
  That's not quite true:
  


      ⍴⍴1↑'ABC'
1
      ⍴⍴↑'ABC'
0
  
  



   while monadic ↓ raises a VALENCE
ERROR. In almost every single case where I have
used ↓, it has been in the form 1↓X. Is there a
reason why the monadic form is not allowed?



  
  FYI in Dyalog APL monadic ↓ is Split:
  


      ↓3 3⍴⎕A
┌───┬───┬───┐
│ABC│DEF│GHI│
└───┴───┴───┘
  
  
  
  I believe this came from STSC's NARS.
  
  
  
  
  Jay.

  

  


  


  




Re: [Bug-apl] Regex support

2017-10-10 Thread Juergen Sauermann

  
  
Hi Peter,
  
  the current syntax is A ⎕RE [X] B where A is the matching RE, B is
  the subject
  (sthe string being matched) and X is matching flags.
  
  I never liked it when programs lumped these strings together into
  a single string (or argument).
  
  What hasn't been addressed yet is substitution as opposed to
  matching. I tend to believe
  that APL2 selective specification of some kind would be an elegant
  solution, but details
  have not yet been worked out.
  
  Best Regards,
  /// Jürgen
  

On 09/29/2017 11:41 AM, Hans-Peter
  Sorge wrote:


  Hi Jürgen,

The construct  regex ⎕Regex string  looks OK to me.

However having the following regex patterns

match:   'regexm' ['modifier'] ⎕Regex string  and
substitute:  'regexs' 'regexr'  ['modifier'] ⎕Regex string

the patterns
'regexm' 'modifier' ⎕Regex string and
'regexs' 'regexr'   ⎕Regex string
are contradictory.

Either
'm' 'regexm' ['modifier']  ⎕Regex string and
's' 'regexs' 'regexr'  ['modifier'] ⎕Regex string

or
'regexm' '' ⎕Regex string  and
'regexs' 'regexr'  '' ⎕Regex string
would solve this syntactical problem.  But typing is a bit tedious.


So I would rather go with regex =^= 'm/.../mod' and  's///mod'

which makes expressions like
(⊂'s///mod') ⎕Regex ¨ string string string
easier to read.

(⊂'m//mod') ⎕Regex ¨ string string string
should return 1 for match and 0 for non match to be used in a subsequent
scan.

.. (⊂'m//mod') ⎕Regexi ¨ string string string
could return the indexes as vector of vectors using selective
specification:  (matching_index  non_matching_index) ← ...

... (⊂'m//mod') ⎕Regexc ¨ string string string
should return the content as vector of vectors using selective
specification:
(matching_content  non_matching_content) ← ...

and further:
dates ← '2017-01-02' '2017-01-03'
(⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates
results in
('2017' '01' '02') ('2017' '01' '03')

and
dates ← ⊃ '2017-01-02' '2017-01-03'
's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates
results in
'2017' '01' '02'
'2017' '01' '03'


My be I prefer ⎕Regex['i'] over ⎕Regexi ->>  ⎕Regex['option' 'option']
to handle various transform alternatives from regex results to apl.

FWIIW

Hans-Peter Sorge


Am 22.09.2017 um 23:55 schrieb Peter Teeson:

  
Hi Jürgen:
Thanks for your usual gracious reply. I understand the points you present.

Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language.
IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes
are really separate, but necessary, components.

You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter.
Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals.

As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and
extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all).
Separating the interpreter from the rest allows for different “models” - OS’s. 

What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box
 - desirable as that might be.

I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve.

FWIW

respect….

Peter



  On Sep 22, 2017, at 11:48 AM, Juergen Sauermann  wrote:
Hi Peter,

I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier
in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement
certain things. I haven't really found a way out of this dilemma.

My current thinking is this:

1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping
of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU
   APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I
   believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as
   opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL.

2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the
real world. I am equally concerned about a prol

Re: [Bug-apl] Monadic form of ↓

2017-10-10 Thread Louis de Forcrand
Since the subject has been brought up, how about using it as the analog of 
first (monadic take), but instead unboxing the last element of an array in 
ravel order?

I don’t think this can generally be done on an array X in a more concise way 
than 
first reverse ravel X
or
(shape X) pick X
which I suppose are both slower than a primitive could be.
This might be considered trivial as well though.

Just a suggestion!
Louis

> On 10 Oct 2017, at 18:46, Juergen Sauermann  
> wrote:
> 
> Hi Elias,
> 
> I believe ↓ for 1↓ is too trivial to be useful.
> 
> Unoccupied variants of APL primitives (like monadic ↓ or monadic =) are
> a very scarce resource that we should not use for trivial things.
> 
> /// Jürgen
> 
> 
>> On 10/09/2017 11:06 AM, Elias Mårtenson wrote:
>> I was thinking about the usefulness of a monadic ↓ in terms of the new 
>> regexp feature. In the current version, when using subexpressions, the 
>> return value is always 1+the number of subexpressions, where the first one 
>> is always the full matched string. Monadic ↓ would be a neat way of dropping 
>> that part.
>> 
>> In any case, my point is that monadic ↓ should do something useful. I guess 
>> split is one such useful thing.
>> 
>> In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way?
>> 
>> Regards,
>> Elias
>> 
>>> On 9 October 2017 at 16:58, Jay Foad  wrote:
>>> On 9 October 2017 at 04:56, Elias Mårtenson  wrote:
 Currently, monadic ↑ acts as if it was called dyadically with 1 as its 
 left argument,
>>> 
>>> That's not quite true:
>>> 
>>>   ⍴⍴1↑'ABC'
>>> 1
>>>   ⍴⍴↑'ABC'
>>> 0
>>> 
 while monadic ↓ raises a VALENCE ERROR. In almost every single case where 
 I have used ↓, it has been in the form 1↓X. Is there a reason why the 
 monadic form is not allowed?
>>> 
>>> FYI in Dyalog APL monadic ↓ is Split:
>>> 
>>>   ↓3 3⍴⎕A
>>> ┌───┬───┬───┐
>>> │ABC│DEF│GHI│
>>> └───┴───┴───┘
>>> 
>>> I believe this came from STSC's NARS.
>>> 
>>> Jay.
>> 
> 


[Bug-apl] Suggestion for Quad-RE

2017-10-10 Thread Christian Robert

Sometimes we only want to know if it match or not.

I suggest a new flag ['m']  (as match) that will return ...

  for a string:  either 0 or 1 as a scalar for "not matching" or "matching"
  for an array of strings: a vector of 0/1 for each string saying like above.


lets say:

  z←⎕fio[49] '/var/log/messages'  // beware that this file is inaccessible by default 
unless being "root" on linux
  // or you chmod a+r /var/log/messages  # 
as root

who may return 50,000 lines or even 2 millions, on an average of say ~120 
characters each.


I would hope to be able to use a flag as ['m']:

 'Started|Stopped' ⎕RE['m'] z

who will return an array of (0/1) telling which lines match or not the pattern, 
so I can
only retain those matching for further fine tuning (via diadic operator "/").

It will be a LOT faster than letting ⎕RE returning the whole result of pcre2 
INTO the physical Gnu-APL memory engine
creating a lot of integers arrays for no real purpose, ie: seen from the 
application.

comments welcome,

my usual 2 cents,
Xtian.



Re: [Bug-apl] Suggestion for Quad-RE

2017-10-10 Thread Elias Mårtenson
I think you have a point. It would be very useful to be able to have ⎕RE
filter the results for you.

In experimenting with your specific case, I came across another use-case
that might warrant another flag: One that does not return the full match,
but only the parenthesised subexpressions (this used to be the default in
my initial draft version). Now I have to use 1↓ to remove this.

Here is my somewhat realistic test case that takes the log file, and
extracts the date and the name of the service that was started or stopped:

*  file ← ⎕FIO[49] "/some/file/name"*

*  x ← "^([a-zA-Z]{3} [0-9]+ [0-9]{2}:[0-9]{2}:[0-9]{2}).*:
(Started|Stopped) (.*)$" ⎕RE file*
*  ⍴ x*
┏→┓
┃69339┃
┗━┛
*  result ← ⊃ 1↓¨ ({⍬≢⍵}¨x) / x*
*  ⍴ result*
┏→━┓
┃7269 3┃
┗━━┛

This is a lot more complicated than it needs to be. The two new flags
mentioned would completely remove the last line and replace it with a
simple pair of ⎕RE["XY"] flags.

Regards,
Elias

On 11 October 2017 at 11:12, Christian Robert 
wrote:

> Sometimes we only want to know if it match or not.
>
> I suggest a new flag ['m']  (as match) that will return ...
>
>   for a string:  either 0 or 1 as a scalar for "not matching" or "matching"
>   for an array of strings: a vector of 0/1 for each string saying like
> above.
>
>
> lets say:
>
>   z←⎕fio[49] '/var/log/messages'  // beware that this file is
> inaccessible by default unless being "root" on linux
>   // or you chmod a+r
> /var/log/messages  # as root
>
> who may return 50,000 lines or even 2 millions, on an average of say ~120
> characters each.
>
>
> I would hope to be able to use a flag as ['m']:
>
>  'Started|Stopped' ⎕RE['m'] z
>
> who will return an array of (0/1) telling which lines match or not the
> pattern, so I can
> only retain those matching for further fine tuning (via diadic operator
> "/").
>
> It will be a LOT faster than letting ⎕RE returning the whole result of
> pcre2 INTO the physical Gnu-APL memory engine
> creating a lot of integers arrays for no real purpose, ie: seen from the
> application.
>
> comments welcome,
>
> my usual 2 cents,
> Xtian.
>
>