Re: [Bug-apl] ⎕RE merged
Hi Elias, thanks, fixed in SVN 1013. /// Jürgen On 10/09/2017 04:41 AM, Elias Mårtenson wrote: Thank you. There are some errors when compiling on my Arch system: g++ -DHAVE_CONFIG_H -I. -I.. -Wall -I sql -Wold-style-cast -Werror -I/usr/include -I/usr/include -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo './'`Quad_RE.cc Quad_RE.cc: In static member function ‘static Value_P Quad_RE::partition_result(const Regexp&, const Quad_RE::Flags&, const UCS_string&)’: Quad_RE.cc:211:42: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (ShapeItem match_id = 1; B_offset < len; match_id += match_id_inc) ~^ cc1plus: all warnings being treated as errors make[3]: *** [Makefile:2725: apl-Quad_RE.o] Error 1 make[3]: Leaving directory '/home/emartenson/src/apl/src' make[2]: *** [Makefile:: all-recursive] Error 1 make[2]: Leaving directory '/home/emartenson/src/apl/src' make[1]: *** [Makefile:514: all-recursive] Error 1 make[1]: Leaving directory '/home/emartenson/src/apl' make: *** [Makefile:401: all] Error 2 Regards, Elias On 9 October 2017 at 00:47, Juergen Sauermannwrote: Hi, I have merged Elias' ⎕RE implementation into GNU APL. Thanks, Elias, for contributing it. See 'info apl' for a description and src/testcases/Quad_RE.tc for examples of how to use ⎕RE. SVN 1012. Enjoy, /// Jürgen
Re: [Bug-apl] ⎕RE merged
Hi Elias, thanks, fixed in SVN 1013. /// Jürgen On 10/09/2017 05:12 AM, Elias Mårtenson wrote: I found another bug. ↓ is used to indicate that string indexes are requested, but the error message when multiple output types are requested is wrong: "foo" ⎕RE["⊂↓"] "bar" DOMAIN ERROR+ 'foo' ⎕RE['⊂↓']'bar' ^ ^ )more Multiple ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/ Note the ⍳ in the error message instead of ↓. Regards, Elias On 9 October 2017 at 10:45, Elias Mårtensonwrote: I fixed the problem by adding a static_cast(len), but I found another issue: The testcases file is missing. Regards, Elias On 9 October 2017 at 10:41, Elias Mårtenson wrote: Thank you. There are some errors when compiling on my Arch system: g++ -DHAVE_CONFIG_H -I. -I.. -Wall -I sql -Wold-style-cast -Werror -I/usr/include -I/usr/include -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo './'`Quad_RE.cc Quad_RE.cc: In static member function ‘static Value_P Quad_RE::partition_result(const Regexp&, const Quad_RE::Flags&, const UCS_string&)’: Quad_RE.cc:211:42: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (ShapeItem match_id = 1; B_offset < len; match_id += match_id_inc) ~^ cc1plus: all warnings being treated as errors make[3]: *** [Makefile:2725: apl-Quad_RE.o] Error 1 make[3]: Leaving directory '/home/emartenson/src/apl/src' make[2]: *** [Makefile:: all-recursive] Error 1 make[2]: Leaving directory '/home/emartenson/src/apl/src' make[1]: *** [Makefile:514: all-recursive] Error 1 make[1]: Leaving directory '/home/emartenson/src/apl' make: *** [Makefile:401: all] Error 2 Regards, Elias On 9 October 2017 at 00:47, Juergen Sauermann wrote: Hi, I have merged Elias' ⎕RE implementation into GNU APL. Thanks, Elias, for contributing it. See 'info apl' for a description and src/testcases/Quad_RE.tc for examples of how to use ⎕RE. SVN 1012. Enjoy, /// Jürgen
Re: [Bug-apl] ⎕RE merged
Hi Elias, thanks, fixed in SVN 1013. /// Jürgen On 10/09/2017 10:11 AM, Elias Mårtenson wrote: One more bug: The call to pcre2_compile_32 should be changed from: code = pcre2_compile_32(pattern_ucs, pattern.size(), PCRE2_NO_UTF_CHECK | flags, &error_code, &error_offset, 0); To: code = pcre2_compile_32(pattern_ucs, pattern.size(), PCRE2_UTF | PCRE2_UCP | flags, &error_code, &error_offset, 0); Without PCRE2_UTF, proper Unicode semantics will not be applied (such as properly handling case matching for non-ASCII characters). PCRE2_UCP, is a little less obvious. I think it would make sense to enable it, since we care more for correctness than performance. Here's what the documentation has to say about it: “This option changes the way PCRE2 processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode properties are used instead to classify characters. More details are given in the section on generic character types in the pcre2pattern page. If you set PCRE2_UCP, matching one of the items it affects takes much longer.” Finally, I don't think it makes sense to use PCRE2_NO_UTF_CHECK since at best it's a no-op (since we're using UTF-32) and at worst it can cause a crash when trying to match an invalid string. That's not worth what little performance benefit there is to gain from it. Regards, Elias On 9 October 2017 at 11:12, Elias Mårtensonwrote: I found another bug. ↓ is used to indicate that string indexes are requested, but the error message when multiple output types are requested is wrong: "foo" ⎕RE["⊂↓"] "bar" DOMAIN ERROR+ 'foo' ⎕RE['⊂↓']'bar' ^ ^ )more Multiple ⎕RE output flags: '⊂↓'. Output flags are: ⊂⍳/ Note the ⍳ in the error message instead of ↓. Regards, Elias On 9 October 2017 at 10:45, Elias Mårtenson wrote: I fixed the problem by adding a static_cast(len), but I found another issue: The testcases file is missing. Regards, Elias On 9 October 2017 at 10:41, Elias Mårtenson wrote: Thank you. There are some errors when compiling on my Arch system: g++ -DHAVE_CONFIG_H -I. -I.. -Wall -I sql -Wold-style-cast -Werror -I/usr/include -I/usr/include -rdynamic -g -O2 -MT apl-Quad_RE.o -MD -MP -MF .deps/apl-Quad_RE.Tpo -c -o apl-Quad_RE.o `test -f 'Quad_RE.cc' || echo './'`Quad_RE.cc Quad_RE.cc: In static member function ‘static Valu
Re: [Bug-apl] ⎕RE merged
Hi Jay, thanks, done. Normally the doc subdir (e.g. in the savannah SVN repsitory) contains the latest version of this file, and I sometimes (read: usuaally) forget to also commit it to the GNU web repository. /// Jürgen On 10/09/2017 11:02 AM, Jay Foad wrote: Could you please update https://www.gnu.org/software/apl/apl.html ? Or will it update automatically in due course? Thanks, Jay. On 8 October 2017 at 17:47, Juergen Sauermannwrote: Hi, I have merged Elias' ⎕RE implementation into GNU APL. Thanks, Elias, for contributing it. See 'info apl' for a description and src/testcases/Quad_RE.tc for examples of how to use ⎕RE. SVN 1012. Enjoy, /// Jürgen
Re: [Bug-apl] ⎕RE merged
Hi Elias, thanks, fixed in SVN 1013. /// Jürgen On 10/09/2017 11:46 AM, Elias Mårtenson wrote: One more issue. The last snippet in the info manual for regexp (great work, and thanks for doing it, by the way) looks really weird, probably because the content is too wide. Regards, Elias On 9 October 2017 at 17:02, Jay Foadwrote: Could you please update https://www.gnu.org/software/apl/apl.html ? Or will it update automatically in due course? Thanks, Jay. On 8 October 2017 at 17:47, Juergen Sauermann wrote: Hi, I have merged Elias' ⎕RE implementation into GNU APL. Thanks, Elias, for contributing it. See 'info apl' for a description and src/testcases/Quad_RE.tc for examples of how to use ⎕RE. SVN 1012. Enjoy, /// Jürgen
Re: [Bug-apl] Monadic form of ↓
Hi Elias, I believe ↓ for 1↓ is too trivial to be useful. Unoccupied variants of APL primitives (like monadic ↓ or monadic =) are a very scarce resource that we should not use for trivial things. /// Jürgen On 10/09/2017 11:06 AM, Elias Mårtenson wrote: I was thinking about the usefulness of a monadic ↓ in terms of the new regexp feature. In the current version, when using subexpressions, the return value is always 1+the number of subexpressions, where the first one is always the full matched string. Monadic ↓ would be a neat way of dropping that part. In any case, my point is that monadic ↓ should do something useful. I guess split is one such useful thing. In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way? Regards, Elias On 9 October 2017 at 16:58, Jay Foadwrote: On 9 October 2017 at 04:56, Elias Mårtenson wrote: Currently, monadic ↑ acts as if it was called dyadically with 1 as its left argument, That's not quite true: ⍴⍴1↑'ABC' 1 ⍴⍴↑'ABC' 0 while monadic ↓ raises a VALENCE ERROR. In almost every single case where I have used ↓, it has been in the form 1↓X. Is there a reason why the monadic form is not allowed? FYI in Dyalog APL monadic ↓ is Split: ↓3 3⍴⎕A ┌───┬───┬───┐ │ABC│DEF│GHI│ └───┴───┴───┘ I believe this came from STSC's NARS. Jay.
Re: [Bug-apl] Regex support
Hi Peter, the current syntax is A ⎕RE [X] B where A is the matching RE, B is the subject (sthe string being matched) and X is matching flags. I never liked it when programs lumped these strings together into a single string (or argument). What hasn't been addressed yet is substitution as opposed to matching. I tend to believe that APL2 selective specification of some kind would be an elegant solution, but details have not yet been worked out. Best Regards, /// Jürgen On 09/29/2017 11:41 AM, Hans-Peter Sorge wrote: Hi Jürgen, The construct regex ⎕Regex string looks OK to me. However having the following regex patterns match: 'regexm' ['modifier'] ⎕Regex string and substitute: 'regexs' 'regexr' ['modifier'] ⎕Regex string the patterns 'regexm' 'modifier' ⎕Regex string and 'regexs' 'regexr' ⎕Regex string are contradictory. Either 'm' 'regexm' ['modifier'] ⎕Regex string and 's' 'regexs' 'regexr' ['modifier'] ⎕Regex string or 'regexm' '' ⎕Regex string and 'regexs' 'regexr' '' ⎕Regex string would solve this syntactical problem. But typing is a bit tedious. So I would rather go with regex =^= 'm/.../mod' and 's///mod' which makes expressions like (⊂'s///mod') ⎕Regex ¨ string string string easier to read. (⊂'m//mod') ⎕Regex ¨ string string string should return 1 for match and 0 for non match to be used in a subsequent scan. .. (⊂'m//mod') ⎕Regexi ¨ string string string could return the indexes as vector of vectors using selective specification: (matching_index non_matching_index) ← ... ... (⊂'m//mod') ⎕Regexc ¨ string string string should return the content as vector of vectors using selective specification: (matching_content non_matching_content) ← ... and further: dates ← '2017-01-02' '2017-01-03' (⊂'s/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/') ⎕Regex ¨ dates results in ('2017' '01' '02') ('2017' '01' '03') and dates ← ⊃ '2017-01-02' '2017-01-03' 's/([0-9]+)-([0-9]+)-([0-9]+)/\1 \2 \3/' ⎕Regex dates results in '2017' '01' '02' '2017' '01' '03' My be I prefer ⎕Regex['i'] over ⎕Regexi ->> ⎕Regex['option' 'option'] to handle various transform alternatives from regex results to apl. FWIIW Hans-Peter Sorge Am 22.09.2017 um 23:55 schrieb Peter Teeson: Hi Jürgen: Thanks for your usual gracious reply. I understand the points you present. Perhaps my perspective is too narrow? The way I see it the key “module” is the interpreter of the language. IMHO display of the results, means to enter and store data of various types, providing an environment where the interpreter executes are really separate, but necessary, components. You mentioned that rationals need to be explicitly configured. Personally I would prefer that approach rather than encrusting the interpreter. Each capability added to the interpreter just complicates it - of course not for you as the author but for us lesser mortals. As you may recall I am on a Macintosh. One project I pickup and work on from time to time is to try and extract only the interpreter and then use the Mac OS facilities for the rest. Of course that is only of use to other Mac users (if at all). Separating the interpreter from the rest allows for different “models” - OS’s. What we have right now is a monolithic code base which becomes more fragile with each added feature, version of GCC, or HW box - desirable as that might be. I suppose what I am suggesting is that perhaps it’s time to take a fresh look at the project architecture and ask ourselves if we can improve. FWIW respect…. Peter On Sep 22, 2017, at 11:48 AM, Juergen Sauermann wrote: Hi Peter, I mostly agree with your concerns. As you may have noticed, I already regretted some of the things that I implemented earlier in GNU APL. On the other hand, you also see on the GNU APL mailing list the proposals of other GNU APL users to implement certain things. I haven't really found a way out of this dilemma. My current thinking is this: 1. If a feature affects the APL language itself then it is probably a bad thing to do. Examples for this are, IMHO, changing the scoping of variables, lexical binding and stuff like that. As useful as these may be in other languages, my feeling is that they would turn GNU APL into something else which is no longer APL. For example, I am a big fan of the powerful matching capabilities in Erlang but I believe as useful as they may be, they simply do not belong into GNU APL (or any APL for that matter). Those who really need that (as opposed to only believing it would improve GNU APL) might be better off with one of the successors of APL. 2. Some areas, most notably FILE I/O have traditionally not been part of the APL language itself, but are unfortunately needed in the real world. I am equally concerned about a prol
Re: [Bug-apl] Monadic form of ↓
Since the subject has been brought up, how about using it as the analog of first (monadic take), but instead unboxing the last element of an array in ravel order? I don’t think this can generally be done on an array X in a more concise way than first reverse ravel X or (shape X) pick X which I suppose are both slower than a primitive could be. This might be considered trivial as well though. Just a suggestion! Louis > On 10 Oct 2017, at 18:46, Juergen Sauermann > wrote: > > Hi Elias, > > I believe ↓ for 1↓ is too trivial to be useful. > > Unoccupied variants of APL primitives (like monadic ↓ or monadic =) are > a very scarce resource that we should not use for trivial things. > > /// Jürgen > > >> On 10/09/2017 11:06 AM, Elias Mårtenson wrote: >> I was thinking about the usefulness of a monadic ↓ in terms of the new >> regexp feature. In the current version, when using subexpressions, the >> return value is always 1+the number of subexpressions, where the first one >> is always the full matched string. Monadic ↓ would be a neat way of dropping >> that part. >> >> In any case, my point is that monadic ↓ should do something useful. I guess >> split is one such useful thing. >> >> In GNU APL, I'd use ⊂⍤1 to achieve Split. Is that the most efficient way? >> >> Regards, >> Elias >> >>> On 9 October 2017 at 16:58, Jay Foad wrote: >>> On 9 October 2017 at 04:56, Elias Mårtenson wrote: Currently, monadic ↑ acts as if it was called dyadically with 1 as its left argument, >>> >>> That's not quite true: >>> >>> ⍴⍴1↑'ABC' >>> 1 >>> ⍴⍴↑'ABC' >>> 0 >>> while monadic ↓ raises a VALENCE ERROR. In almost every single case where I have used ↓, it has been in the form 1↓X. Is there a reason why the monadic form is not allowed? >>> >>> FYI in Dyalog APL monadic ↓ is Split: >>> >>> ↓3 3⍴⎕A >>> ┌───┬───┬───┐ >>> │ABC│DEF│GHI│ >>> └───┴───┴───┘ >>> >>> I believe this came from STSC's NARS. >>> >>> Jay. >> >
[Bug-apl] Suggestion for Quad-RE
Sometimes we only want to know if it match or not. I suggest a new flag ['m'] (as match) that will return ... for a string: either 0 or 1 as a scalar for "not matching" or "matching" for an array of strings: a vector of 0/1 for each string saying like above. lets say: z←⎕fio[49] '/var/log/messages' // beware that this file is inaccessible by default unless being "root" on linux // or you chmod a+r /var/log/messages # as root who may return 50,000 lines or even 2 millions, on an average of say ~120 characters each. I would hope to be able to use a flag as ['m']: 'Started|Stopped' ⎕RE['m'] z who will return an array of (0/1) telling which lines match or not the pattern, so I can only retain those matching for further fine tuning (via diadic operator "/"). It will be a LOT faster than letting ⎕RE returning the whole result of pcre2 INTO the physical Gnu-APL memory engine creating a lot of integers arrays for no real purpose, ie: seen from the application. comments welcome, my usual 2 cents, Xtian.
Re: [Bug-apl] Suggestion for Quad-RE
I think you have a point. It would be very useful to be able to have ⎕RE filter the results for you. In experimenting with your specific case, I came across another use-case that might warrant another flag: One that does not return the full match, but only the parenthesised subexpressions (this used to be the default in my initial draft version). Now I have to use 1↓ to remove this. Here is my somewhat realistic test case that takes the log file, and extracts the date and the name of the service that was started or stopped: * file ← ⎕FIO[49] "/some/file/name"* * x ← "^([a-zA-Z]{3} [0-9]+ [0-9]{2}:[0-9]{2}:[0-9]{2}).*: (Started|Stopped) (.*)$" ⎕RE file* * ⍴ x* ┏→┓ ┃69339┃ ┗━┛ * result ← ⊃ 1↓¨ ({⍬≢⍵}¨x) / x* * ⍴ result* ┏→━┓ ┃7269 3┃ ┗━━┛ This is a lot more complicated than it needs to be. The two new flags mentioned would completely remove the last line and replace it with a simple pair of ⎕RE["XY"] flags. Regards, Elias On 11 October 2017 at 11:12, Christian Robert wrote: > Sometimes we only want to know if it match or not. > > I suggest a new flag ['m'] (as match) that will return ... > > for a string: either 0 or 1 as a scalar for "not matching" or "matching" > for an array of strings: a vector of 0/1 for each string saying like > above. > > > lets say: > > z←⎕fio[49] '/var/log/messages' // beware that this file is > inaccessible by default unless being "root" on linux > // or you chmod a+r > /var/log/messages # as root > > who may return 50,000 lines or even 2 millions, on an average of say ~120 > characters each. > > > I would hope to be able to use a flag as ['m']: > > 'Started|Stopped' ⎕RE['m'] z > > who will return an array of (0/1) telling which lines match or not the > pattern, so I can > only retain those matching for further fine tuning (via diadic operator > "/"). > > It will be a LOT faster than letting ⎕RE returning the whole result of > pcre2 INTO the physical Gnu-APL memory engine > creating a lot of integers arrays for no real purpose, ie: seen from the > application. > > comments welcome, > > my usual 2 cents, > Xtian. > >