Re: [Bug-apl] ⎕RE merged

2017-10-09 Thread Elias Mårtenson
One more bug:

The call to pcre2_compile_32 should be changed from:

   code = pcre2_compile_32(pattern_ucs, pattern.size(),
   PCRE2_NO_UTF_CHECK | flags, &error_code,
   &error_offset, 0);

To:

   code = pcre2_compile_32(pattern_ucs, pattern.size(),
   *PCRE2_UTF | **PCRE2_UCP* | flags, &error_code,
   &error_offset, 0);

Without *PCRE2_UTF*, proper Unicode semantics will not be applied (such as
properly handling case matching for non-ASCII characters).

*PCRE2_UCP*, is a little less obvious. I think it would make sense to
enable it, since we care more for correctness than performance. Here's what
the documentation has to say about it:

*“This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W,
\w, and some of the POSIX character classes. By default, only ASCII
characters are recognized, but if PCRE2_UCP is set, Unicode properties are
used instead to classify characters. More details are given in the section
on generic character types in the pcre2pattern page. If you set PCRE2_UCP,
matching one of the items it affects takes much longer.”*


Finally, I don't think it makes sense to use *PCRE2_NO_UTF_CHECK* since at
best it's a no-op (since we're using UTF-32) and at worst it can cause a
crash when trying to match an invalid string. That's not worth what little
performance benefit there is to gain from it.

Regards,
Elias

On 9 October 2017 at 11:12, Elias Mårtenson  wrote:

> I found another bug. ↓ is used to indicate that string indexes are
> requested, but the error message when multiple output types are requested
> is wrong:
>
> *  "foo" ⎕RE["⊂↓"] "bar"*
> DOMAIN ERROR+
>   'foo' ⎕RE['⊂↓']'bar'
>   ^ ^
> *  )more*
> Multiple ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/
>
> Note the ⍳ in the error message instead of ↓.
>
> Regards,
> Elias
>
> On 9 October 2017 at 10:45, Elias Mårtenson  wrote:
>
>> I fixed the problem by adding a static_cast(len), but I
>> found another issue: The testcases file is missing.
>>
>> Regards,
>> Elias
>>
>> On 9 October 2017 at 10:41, Elias Mårtenson  wrote:
>>
>>> Thank you.
>>>
>>> There are some errors when compiling on my Arch system:
>>>
>>> g++ -DHAVE_CONFIG_H -I. -I..-Wall -I sql -Wold-style-cast -Werror
>>> -I/usr/include -I/usr/include  -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP
>>> -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo
>>> './'`Quad_RE.cc
>>> Quad_RE.cc: In static member function ‘static Value_P
>>> Quad_RE::partition_result(const Regexp&, const Quad_RE::Flags&, const
>>> UCS_string&)’:
>>> Quad_RE.cc:211:42: error: comparison between signed and unsigned integer
>>> expressions [-Werror=sign-compare]
>>> for (ShapeItem match_id = 1; B_offset < len; match_id +=
>>> match_id_inc)
>>>  ~^
>>> cc1plus: all warnings being treated as errors
>>> make[3]: *** [Makefile:2725: apl-Quad_RE.o] Error 1
>>> make[3]: Leaving directory '/home/emartenson/src/apl/src'
>>> make[2]: *** [Makefile:: all-recursive] Error 1
>>> make[2]: Leaving directory '/home/emartenson/src/apl/src'
>>> make[1]: *** [Makefile:514: all-recursive] Error 1
>>> make[1]: Leaving directory '/home/emartenson/src/apl'
>>> make: *** [Makefile:401: all] Error 2
>>>
>>> Regards,
>>> Elias
>>>
>>> On 9 October 2017 at 00:47, Juergen Sauermann <
>>> juergen.sauerm...@t-online.de> wrote:
>>>
 Hi,

 I have merged Elias' *⎕RE* implementation into GNU APL.
 Thanks, Elias, for contributing it. See *'info apl**'* for a
 description
 and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*.

 *SVN 1012*.

 Enjoy,
 /// Jürgen


>>>
>>
>


Re: [Bug-apl] Monadic form of ↓

2017-10-09 Thread Jay Foad
On 9 October 2017 at 04:56, Elias Mårtenson  wrote:

> Currently, monadic ↑ acts as if it was called dyadically with 1 as its
> left argument,
>

That's not quite true:

  ⍴⍴1↑'ABC'
1
  ⍴⍴↑'ABC'
0

while monadic ↓ raises a VALENCE ERROR. In almost every single case where I
> have used ↓, it has been in the form 1↓X. Is there a reason why the monadic
> form is not allowed?
>

FYI in Dyalog APL monadic ↓ is Split:

  ↓3 3⍴⎕A
┌───┬───┬───┐
│ABC│DEF│GHI│
└───┴───┴───┘

I believe this came from STSC's NARS.

Jay.


Re: [Bug-apl] ⎕RE merged

2017-10-09 Thread Jay Foad
Could you please update https://www.gnu.org/software/apl/apl.html ? Or will
it update automatically in due course?

Thanks,
Jay.

On 8 October 2017 at 17:47, Juergen Sauermann  wrote:

> Hi,
>
> I have merged Elias' *⎕RE* implementation into GNU APL.
> Thanks, Elias, for contributing it. See *'info apl**'* for a description
> and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*.
>
> *SVN 1012*.
>
> Enjoy,
> /// Jürgen
>
>


Re: [Bug-apl] Monadic form of ↓

2017-10-09 Thread Elias Mårtenson
I was thinking about the usefulness of a monadic ↓ in terms of the new
regexp feature. In the current version, when using subexpressions, the
return value is always 1+the number of subexpressions, where the first one
is always the full matched string. Monadic ↓ would be a neat way of
dropping that part.

In any case, my point is that monadic ↓ should do something useful. I guess
split is one such useful thing.

In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way?

Regards,
Elias

On 9 October 2017 at 16:58, Jay Foad  wrote:

> On 9 October 2017 at 04:56, Elias Mårtenson  wrote:
>
>> Currently, monadic ↑ acts as if it was called dyadically with 1 as its
>> left argument,
>>
>
> That's not quite true:
>
>   ⍴⍴1↑'ABC'
> 1
>   ⍴⍴↑'ABC'
> 0
>
> while monadic ↓ raises a VALENCE ERROR. In almost every single case where
>> I have used ↓, it has been in the form 1↓X. Is there a reason why the
>> monadic form is not allowed?
>>
>
> FYI in Dyalog APL monadic ↓ is Split:
>
>   ↓3 3⍴⎕A
> ┌───┬───┬───┐
> │ABC│DEF│GHI│
> └───┴───┴───┘
>
> I believe this came from STSC's NARS.
>
> Jay.
>


Re: [Bug-apl] Monadic form of ↓

2017-10-09 Thread Jay Foad
On 9 October 2017 at 10:06, Elias Mårtenson  wrote:

> In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way?
>
>
Either that or ⊂[2] (or in general ⊂[n] where n is the rank of the
argument).


Re: [Bug-apl] Monadic form of ↓

2017-10-09 Thread Elias Mårtenson
On 9 October 2017 at 17:32, Jay Foad  wrote:

> On 9 October 2017 at 10:06, Elias Mårtenson  wrote:
>
>> In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way?
>>
>>
> Either that or ⊂[2] (or in general ⊂[n] where n is the rank of the
> argument).
>

Thank you. I learned something new today. I didn't realise that ⊂ accepted
an axis argument.

Regards,
Elias


Re: [Bug-apl] ⎕RE merged

2017-10-09 Thread Elias Mårtenson
One more issue. The last snippet in the info manual for regexp (great work,
and thanks for doing it, by the way) looks really weird, probably because
the content is too wide.

Regards,
Elias

On 9 October 2017 at 17:02, Jay Foad  wrote:

> Could you please update https://www.gnu.org/software/apl/apl.html ? Or
> will it update automatically in due course?
>
> Thanks,
> Jay.
>
> On 8 October 2017 at 17:47, Juergen Sauermann <
> juergen.sauerm...@t-online.de> wrote:
>
>> Hi,
>>
>> I have merged Elias' *⎕RE* implementation into GNU APL.
>> Thanks, Elias, for contributing it. See *'info apl**'* for a description
>> and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*.
>>
>> *SVN 1012*.
>>
>> Enjoy,
>> /// Jürgen
>>
>>
>