Re: [Bug-apl] Regex support

Elias Mårtenson Mon, 02 Oct 2017 01:27:58 -0700

Some progress:

The behaviour I described earlier still works, but now has the ability to
work N-dimensional arrays of strings, compiling the regex only once and
then applying it on all the cells.


In addition to this, I have now also added a flag "B" (meaning "bitmap")
that creates a bitmap of all matches and can be used in conjunction with ⊂
to split strings by regex.

Here's an example:

*      " +" ⎕RE["B"] "this is   a     test"*
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

This matches any sequence of spaces, and we can easily use ⊂ to split the
string:

*      {⍵ ⊂⍨ 0=" +" ⎕RE["B"] ⍵} "this is   a     test"*
┏→━━━━━━━━━━━━━━━━━━━━━┓
┃"this" "is" "a" "test"┃
┗∊━━━━━━━━━━━━━━━━━━━━━┛

However, I'm not sure if the value returned from the function are ideal.
The idea of the increasing numbers is to be able to differentiate between
the result of:

*      " " ⎕RE["B"] "    "*
┏→━━━━━━┓
┃1 2 3 4┃
┗━━━━━━━┛

vs:

*      " +" ⎕RE["B"] "    "*
┏→━━━━━━┓
┃1 1 1 1┃
┗━━━━━━━┛

Should it be left like this, or should it be done in some other way?

Regards,
Elias

On 25 September 2017 at 20:10, Juergen Sauermann <
juergen.sauerm...@t-online.de> wrote:

> Hi Elias,
>
> making a quad function an operator is simple if the function argument(s)
> is/are primitive functions
> and a little more complicated if not.
>
> First of all you have to implement (read: overload) some of the eval_XXX()
> function that have function
> arguments. For monadic operators these eval_XXX() functions areare:
>
>    virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
>    virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
>    virtual Token eval_LB(Token & LO, Value_P B)
>    virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)
>
> where L resp. LO stands for the left function argument. For a dyadic
> operators they are:
>
>    virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
>    virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P X,
> Value_P B)
>    virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
>    virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)
>
> where L resp. LO and R resp. RO stand for the left and right function
> argument(s), A and B
> are the value arguments, and X the axis.
>
> Not all of them need to be implemented only those that have function
> signatures that
> are supported by the operator (mainly in terms of allowing an axis
> argument X or a
> left value argument A).
>
> If an operator supports defined functions (as opposed to primitive
> functions) then it will typically
> implement the operator itself as a macro, which means that the
> implementation is written in APL
> rather than in C++ (similar to "magic functions" in NARS). This is needed
> because primitive functions
> are atomic (they either succeed or fail, but cannot be continued after a
> failure) while defined functions
> (and operators) can continue at the point of interruption after having
> fixed the values that have cause
> the fault.
>
> Some of the build-in operators in GNU APL have both a primitive
> implementation (which is used when
> the function arguments are primitive) and a macro based implementation if
> not. This is for performance
> reasons so that the ability to take defined functions as arguments does
> not performance-wise harm the
> cases where the function arguments are primitive.
>
> The Macro definitions are contained in Macro.def
>
> Please note that in GNU APL functions cannot return functions, which may
> or may not be a problem
> in your case, depending on whether the function argument(s) of the
> ⎕-operator is/are primitive or not.
> In standard APL you cannot assign a function to a name. The usual
> work-around return a string and ⍎ it.
>
> My guts feeling is that if you need function arguments for implementing
> regular expressions then
> something has been going into the wrong direction somewhere else.
>
> Best Regards,
> /// Jürgen
>
>
>
> On 09/25/2017 05:18 AM, Elias Mårtenson wrote:
>
>> Dyalog's implementation is much more expressive than what I had proposed.
>>
>> There are technical reasons why we have no hope of replicating their
>> functionality (in particular, GNU APL does not have support for namespaces).
>>
>> Their function takes arguments and returns a function, which is a matcher
>> function that can be reused, which is useful since you'd only compile the
>> regexp once. Jürgen, how can I make a quad-function behave like below? It
>> seems to be similar in behaviour to ⍤ and ⍣.
>>
>> *      ('.at' ⎕R '\u0') 'The cat sat on the mat' *
>> The CAT SAT on the MAT
>>
>> It can also accept a function, in which case the function is called for
>> each match, to return a replacement string. Can you explain how to make a
>> quad-function an operator?
>> *
>> *
>> *      ('\w+' ⎕R {⌽⍵.Match}) 'The cat sat on the mat'*
>> ehT tac tas no eht tam
>>
>> As you can see, they leverage namespaces in order to pass a lot of
>> different fields to the replace-function. If we want to do something
>> similar, ⍵ would probably have to be the match string, and we'll have to
>> live without the remaining fields.
>>
>> Regards,
>> Elias
>>
>>
>> On 23 September 2017 at 00:08, Juergen Sauermann <
>> juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>>
>> wrote:
>>
>>     Hi,
>>
>>     I have not looked into Dyalogs implementation myself, but if they
>>     have it then we should aim at being as compatible as it makes sense.
>>     No problem if some of their capabilities are not supported (please
>>     avoid
>>     going over the top in the GNU APL implementation)
>>
>>     Unfortunately ⎕R is already occupied in GNU APL (inherited from
>>     IBM APL2),
>>     so some other name(s) are needed.
>>
>>     Before implementing too much in advance, it would be good to
>>     present the
>>     intended syntax and semantics on bug-apl and solicit opinions.
>>
>>     /// Jürgen
>>
>>
>>     On 09/22/2017 04:59 PM, Elias Mårtenson wrote:
>>
>>>     I did not know this. I took a look at Dyalog's API and it's not
>>>     possible to implement it fully, as it relies on their object
>>>     oriented features. However, the basic functionality wouldn't be
>>>     hard to replicate, if that is something that is desired.
>>>
>>>     Jürgen, what is your opinion on this?
>>>
>>>     On 22 September 2017 at 20:21, Jay Foad <jay.f...@gmail.com
>>>     <mailto:jay.f...@gmail.com>> wrote:
>>>
>>>         FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
>>>         are implemented with PCRE:
>>>
>>>         ('[Aa]..'⎕S'&')'Dyalog APL'
>>>         ┌───┬───┐
>>>         │alo│APL│
>>>         └───┴───┘
>>>         ('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
>>>         green orange yellow blue blue
>>>
>>>         http://help.dyalog.com/16.0/Content/Language/System%20Functi
>>> ons/r.htm
>>>         <http://help.dyalog.com/16.0/Content/Language/System%20Funct
>>> ions/r.htm>
>>>
>>>         Jay.
>>>
>>>
>>>
>>
>>
>

Re: [Bug-apl] Regex support

Reply via email to