Re: [Bug-apl] Regex support

Elias Mårtenson Mon, 02 Oct 2017 01:49:16 -0700

In playing around with this, I realise that the "B" mode is quite useful.
So much so, in fact, that I'm wondering if it's warranted to have a
dedicated quad-function for this specific behaviour.


Here's an example of extracting sequences of 4 characters:

*      {⍵ ⊂⍨ "[a-z]{4}" ⎕RE['B'] ⍵} 'abcdef45abchello9'*
┏→━━━━━━━━━━━━━━━━━━━┓
┃"abcd" "abch" "ello"┃
┗∊━━━━━━━━━━━━━━━━━━━┛

Regards,
Elias

On 2 October 2017 at 16:27, Elias Mårtenson <loke...@gmail.com> wrote:

> Some progress:
>
> The behaviour I described earlier still works, but now has the ability to
> work N-dimensional arrays of strings, compiling the regex only once and
> then applying it on all the cells.
>
> In addition to this, I have now also added a flag "B" (meaning "bitmap")
> that creates a bitmap of all matches and can be used in conjunction with ⊂
> to split strings by regex.
>
> Here's an example:
>
> *      " +" ⎕RE["B"] "this is   a     test"*
> ┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
> ┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
> ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
>
> This matches any sequence of spaces, and we can easily use ⊂ to split the
> string:
>
> *      {⍵ ⊂⍨ 0=" +" ⎕RE["B"] ⍵} "this is   a     test"*
> ┏→━━━━━━━━━━━━━━━━━━━━━┓
> ┃"this" "is" "a" "test"┃
> ┗∊━━━━━━━━━━━━━━━━━━━━━┛
>
> However, I'm not sure if the value returned from the function are ideal.
> The idea of the increasing numbers is to be able to differentiate between
> the result of:
>
> *      " " ⎕RE["B"] "    "*
> ┏→━━━━━━┓
> ┃1 2 3 4┃
> ┗━━━━━━━┛
>
> vs:
>
> *      " +" ⎕RE["B"] "    "*
> ┏→━━━━━━┓
> ┃1 1 1 1┃
> ┗━━━━━━━┛
>
> Should it be left like this, or should it be done in some other way?
>
> Regards,
> Elias
>
> On 25 September 2017 at 20:10, Juergen Sauermann <
> juergen.sauerm...@t-online.de> wrote:
>
>> Hi Elias,
>>
>> making a quad function an operator is simple if the function argument(s)
>> is/are primitive functions
>> and a little more complicated if not.
>>
>> First of all you have to implement (read: overload) some of the
>> eval_XXX() function that have function
>> arguments. For monadic operators these eval_XXX() functions areare:
>>
>>    virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
>>    virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
>>    virtual Token eval_LB(Token & LO, Value_P B)
>>    virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)
>>
>> where L resp. LO stands for the left function argument. For a dyadic
>> operators they are:
>>
>>    virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
>>    virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P X,
>> Value_P B)
>>    virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
>>    virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)
>>
>> where L resp. LO and R resp. RO stand for the left and right function
>> argument(s), A and B
>> are the value arguments, and X the axis.
>>
>> Not all of them need to be implemented only those that have function
>> signatures that
>> are supported by the operator (mainly in terms of allowing an axis
>> argument X or a
>> left value argument A).
>>
>> If an operator supports defined functions (as opposed to primitive
>> functions) then it will typically
>> implement the operator itself as a macro, which means that the
>> implementation is written in APL
>> rather than in C++ (similar to "magic functions" in NARS). This is needed
>> because primitive functions
>> are atomic (they either succeed or fail, but cannot be continued after a
>> failure) while defined functions
>> (and operators) can continue at the point of interruption after having
>> fixed the values that have cause
>> the fault.
>>
>> Some of the build-in operators in GNU APL have both a primitive
>> implementation (which is used when
>> the function arguments are primitive) and a macro based implementation if
>> not. This is for performance
>> reasons so that the ability to take defined functions as arguments does
>> not performance-wise harm the
>> cases where the function arguments are primitive.
>>
>> The Macro definitions are contained in Macro.def
>>
>> Please note that in GNU APL functions cannot return functions, which may
>> or may not be a problem
>> in your case, depending on whether the function argument(s) of the
>> ⎕-operator is/are primitive or not.
>> In standard APL you cannot assign a function to a name. The usual
>> work-around return a string and ⍎ it.
>>
>> My guts feeling is that if you need function arguments for implementing
>> regular expressions then
>> something has been going into the wrong direction somewhere else.
>>
>> Best Regards,
>> /// Jürgen
>>
>>
>>
>> On 09/25/2017 05:18 AM, Elias Mårtenson wrote:
>>
>>> Dyalog's implementation is much more expressive than what I had proposed.
>>>
>>> There are technical reasons why we have no hope of replicating their
>>> functionality (in particular, GNU APL does not have support for namespaces).
>>>
>>> Their function takes arguments and returns a function, which is a
>>> matcher function that can be reused, which is useful since you'd only
>>> compile the regexp once. Jürgen, how can I make a quad-function behave like
>>> below? It seems to be similar in behaviour to ⍤ and ⍣.
>>>
>>> *      ('.at' ⎕R '\u0') 'The cat sat on the mat' *
>>> The CAT SAT on the MAT
>>>
>>> It can also accept a function, in which case the function is called for
>>> each match, to return a replacement string. Can you explain how to make a
>>> quad-function an operator?
>>> *
>>> *
>>> *      ('\w+' ⎕R {⌽⍵.Match}) 'The cat sat on the mat'*
>>> ehT tac tas no eht tam
>>>
>>> As you can see, they leverage namespaces in order to pass a lot of
>>> different fields to the replace-function. If we want to do something
>>> similar, ⍵ would probably have to be the match string, and we'll have to
>>> live without the remaining fields.
>>>
>>> Regards,
>>> Elias
>>>
>>>
>>> On 23 September 2017 at 00:08, Juergen Sauermann <
>>> juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>>
>>> wrote:
>>>
>>>     Hi,
>>>
>>>     I have not looked into Dyalogs implementation myself, but if they
>>>     have it then we should aim at being as compatible as it makes sense.
>>>     No problem if some of their capabilities are not supported (please
>>>     avoid
>>>     going over the top in the GNU APL implementation)
>>>
>>>     Unfortunately ⎕R is already occupied in GNU APL (inherited from
>>>     IBM APL2),
>>>     so some other name(s) are needed.
>>>
>>>     Before implementing too much in advance, it would be good to
>>>     present the
>>>     intended syntax and semantics on bug-apl and solicit opinions.
>>>
>>>     /// Jürgen
>>>
>>>
>>>     On 09/22/2017 04:59 PM, Elias Mårtenson wrote:
>>>
>>>>     I did not know this. I took a look at Dyalog's API and it's not
>>>>     possible to implement it fully, as it relies on their object
>>>>     oriented features. However, the basic functionality wouldn't be
>>>>     hard to replicate, if that is something that is desired.
>>>>
>>>>     Jürgen, what is your opinion on this?
>>>>
>>>>     On 22 September 2017 at 20:21, Jay Foad <jay.f...@gmail.com
>>>>     <mailto:jay.f...@gmail.com>> wrote:
>>>>
>>>>         FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
>>>>         are implemented with PCRE:
>>>>
>>>>         ('[Aa]..'⎕S'&')'Dyalog APL'
>>>>         ┌───┬───┐
>>>>         │alo│APL│
>>>>         └───┴───┘
>>>>         ('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
>>>>         green orange yellow blue blue
>>>>
>>>>         http://help.dyalog.com/16.0/Content/Language/System%20Functi
>>>> ons/r.htm
>>>>         <http://help.dyalog.com/16.0/Content/Language/System%20Funct
>>>> ions/r.htm>
>>>>
>>>>         Jay.
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: [Bug-apl] Regex support

Reply via email to