Re: [Bug-apl] ⎕RE merged
One more bug: The call to pcre2_compile_32 should be changed from: code = pcre2_compile_32(pattern_ucs, pattern.size(), PCRE2_NO_UTF_CHECK | flags, &error_code, &error_offset, 0); To: code = pcre2_compile_32(pattern_ucs, pattern.size(), *PCRE2_UTF | **PCRE2_UCP* | flags, &error_code, &error_offset, 0); Without *PCRE2_UTF*, proper Unicode semantics will not be applied (such as properly handling case matching for non-ASCII characters). *PCRE2_UCP*, is a little less obvious. I think it would make sense to enable it, since we care more for correctness than performance. Here's what the documentation has to say about it: *“This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to classify characters. More details are given in the section on generic character types in the pcre2pattern page. If you set PCRE2_UCP, matching one of the items it affects takes much longer.”* Finally, I don't think it makes sense to use *PCRE2_NO_UTF_CHECK* since at best it's a no-op (since we're using UTF-32) and at worst it can cause a crash when trying to match an invalid string. That's not worth what little performance benefit there is to gain from it. Regards, Elias On 9 October 2017 at 11:12, Elias Mårtenson wrote: > I found another bug. ↓ is used to indicate that string indexes are > requested, but the error message when multiple output types are requested > is wrong: > > * "foo" ⎕RE["⊂↓"] "bar"* > DOMAIN ERROR+ > 'foo' ⎕RE['⊂↓']'bar' > ^ ^ > * )more* > Multiple ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/ > > Note the ⍳ in the error message instead of ↓. > > Regards, > Elias > > On 9 October 2017 at 10:45, Elias Mårtenson wrote: > >> I fixed the problem by adding a static_cast(len), but I >> found another issue: The testcases file is missing. >> >> Regards, >> Elias >> >> On 9 October 2017 at 10:41, Elias Mårtenson wrote: >> >>> Thank you. >>> >>> There are some errors when compiling on my Arch system: >>> >>> g++ -DHAVE_CONFIG_H -I. -I..-Wall -I sql -Wold-style-cast -Werror >>> -I/usr/include -I/usr/include -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP >>> -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo >>> './'`Quad_RE.cc >>> Quad_RE.cc: In static member function ‘static Value_P >>> Quad_RE::partition_result(const Regexp&, const Quad_RE::Flags&, const >>> UCS_string&)’: >>> Quad_RE.cc:211:42: error: comparison between signed and unsigned integer >>> expressions [-Werror=sign-compare] >>> for (ShapeItem match_id = 1; B_offset < len; match_id += >>> match_id_inc) >>> ~^ >>> cc1plus: all warnings being treated as errors >>> make[3]: *** [Makefile:2725: apl-Quad_RE.o] Error 1 >>> make[3]: Leaving directory '/home/emartenson/src/apl/src' >>> make[2]: *** [Makefile:: all-recursive] Error 1 >>> make[2]: Leaving directory '/home/emartenson/src/apl/src' >>> make[1]: *** [Makefile:514: all-recursive] Error 1 >>> make[1]: Leaving directory '/home/emartenson/src/apl' >>> make: *** [Makefile:401: all] Error 2 >>> >>> Regards, >>> Elias >>> >>> On 9 October 2017 at 00:47, Juergen Sauermann < >>> juergen.sauerm...@t-online.de> wrote: >>> Hi, I have merged Elias' *⎕RE* implementation into GNU APL. Thanks, Elias, for contributing it. See *'info apl**'* for a description and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*. *SVN 1012*. Enjoy, /// Jürgen >>> >> >
Re: [Bug-apl] Monadic form of ↓
On 9 October 2017 at 04:56, Elias Mårtenson wrote: > Currently, monadic ↑ acts as if it was called dyadically with 1 as its > left argument, > That's not quite true: ⍴⍴1↑'ABC' 1 ⍴⍴↑'ABC' 0 while monadic ↓ raises a VALENCE ERROR. In almost every single case where I > have used ↓, it has been in the form 1↓X. Is there a reason why the monadic > form is not allowed? > FYI in Dyalog APL monadic ↓ is Split: ↓3 3⍴⎕A ┌───┬───┬───┐ │ABC│DEF│GHI│ └───┴───┴───┘ I believe this came from STSC's NARS. Jay.
Re: [Bug-apl] ⎕RE merged
Could you please update https://www.gnu.org/software/apl/apl.html ? Or will it update automatically in due course? Thanks, Jay. On 8 October 2017 at 17:47, Juergen Sauermann wrote: > Hi, > > I have merged Elias' *⎕RE* implementation into GNU APL. > Thanks, Elias, for contributing it. See *'info apl**'* for a description > and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*. > > *SVN 1012*. > > Enjoy, > /// Jürgen > >
Re: [Bug-apl] Monadic form of ↓
I was thinking about the usefulness of a monadic ↓ in terms of the new regexp feature. In the current version, when using subexpressions, the return value is always 1+the number of subexpressions, where the first one is always the full matched string. Monadic ↓ would be a neat way of dropping that part. In any case, my point is that monadic ↓ should do something useful. I guess split is one such useful thing. In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way? Regards, Elias On 9 October 2017 at 16:58, Jay Foad wrote: > On 9 October 2017 at 04:56, Elias Mårtenson wrote: > >> Currently, monadic ↑ acts as if it was called dyadically with 1 as its >> left argument, >> > > That's not quite true: > > ⍴⍴1↑'ABC' > 1 > ⍴⍴↑'ABC' > 0 > > while monadic ↓ raises a VALENCE ERROR. In almost every single case where >> I have used ↓, it has been in the form 1↓X. Is there a reason why the >> monadic form is not allowed? >> > > FYI in Dyalog APL monadic ↓ is Split: > > ↓3 3⍴⎕A > ┌───┬───┬───┐ > │ABC│DEF│GHI│ > └───┴───┴───┘ > > I believe this came from STSC's NARS. > > Jay. >
Re: [Bug-apl] Monadic form of ↓
On 9 October 2017 at 10:06, Elias Mårtenson wrote: > In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way? > > Either that or ⊂[2] (or in general ⊂[n] where n is the rank of the argument).
Re: [Bug-apl] Monadic form of ↓
On 9 October 2017 at 17:32, Jay Foad wrote: > On 9 October 2017 at 10:06, Elias Mårtenson wrote: > >> In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way? >> >> > Either that or ⊂[2] (or in general ⊂[n] where n is the rank of the > argument). > Thank you. I learned something new today. I didn't realise that ⊂ accepted an axis argument. Regards, Elias
Re: [Bug-apl] ⎕RE merged
One more issue. The last snippet in the info manual for regexp (great work, and thanks for doing it, by the way) looks really weird, probably because the content is too wide. Regards, Elias On 9 October 2017 at 17:02, Jay Foad wrote: > Could you please update https://www.gnu.org/software/apl/apl.html ? Or > will it update automatically in due course? > > Thanks, > Jay. > > On 8 October 2017 at 17:47, Juergen Sauermann < > juergen.sauerm...@t-online.de> wrote: > >> Hi, >> >> I have merged Elias' *⎕RE* implementation into GNU APL. >> Thanks, Elias, for contributing it. See *'info apl**'* for a description >> and *src/testcases/Q**uad_RE.tc* for examples of how to use *⎕RE*. >> >> *SVN 1012*. >> >> Enjoy, >> /// Jürgen >> >> >