On Mon, May 01, 2023 at 09:45:46PM +0200, Francesco Ariis wrote:
> A workaround is to `cp /etc/urlview/system.urlview ~/.urlview` and
> then replace REGEXP with
>     REGEXP (((http|https|ftp|gopher)|mailto):(//)?[^ 
> <>"\t]*|(www|ftp)[0-9]?\.[-a-z0-9.]+)[^ .,;\t\n\r<">\):]?[^, <>"\t]*[^ 
> .,;\t\n\r<">:]
> (i.e. erasing that last “\)”).
This is quite detrimental to the very common case of a URL that's
entirely parenthesised, or one that ends a parenthetical; compare:
  $ tail -n3 text
  Debian#1035358: https://en.wikipedia.org/wiki/Close_Combat_(series)
              vs (https://en.wikipedia.org/wiki/Debian)
              vs  
https://en.wikipedia.org/wiki/(You_Gotta)_Fight_for_Your_Right_(To_Party!)
  $ grep -Eio '((http|https|ftp|gopher|gemini|mailto):(//)?[^ 
<>"]*|(www|ftp)[0-9]?\.[-a-z0-9.]+)[^ .,;<">\):]?[^, <>"]*[^ .,;<">:\)]' text | 
tail -n3
  https://en.wikipedia.org/wiki/Close_Combat_(series
  https://en.wikipedia.org/wiki/Debian
  https://en.wikipedia.org/wiki/(You_Gotta)_Fight_for_Your_Right_(To_Party!
  $ grep -Eio '((http|https|ftp|gopher|gemini|mailto):(//)?[^ 
<>"]*|(www|ftp)[0-9]?\.[-a-z0-9.]+)[^ .,;<">\):]?[^, <>"]*[^ .,;<">:]' text | 
tail -n3
  https://en.wikipedia.org/wiki/Close_Combat_(series)
  https://en.wikipedia.org/wiki/Debian)
  https://en.wikipedia.org/wiki/(You_Gotta)_Fight_for_Your_Right_(To_Party!)
so this trivial solution fixes an IME rare case of an URL ending with a ')'
by breaking the much more common one.

It is quite likely something /can/ be cooked here,
I haven't managed to in a good few minutes of fiddling.

Attaching my test driver.

Best,
наб
static auto regex =
    // R"DUPA(((http|https|ftp|gopher|gemini|mailto):(//)?[^ 
<>"]*|(www|ftp)[0-9]?\.[-a-z0-9.]+)[^ .,;<">\):]?(([^, <>"]*)|(\([^, 
<>"]*\)))*[^ .,;<">:\)])DUPA";
    R"DUPA(((http|https|ftp|gopher|gemini|mailto):(//)?[^ 
<>"]*|(www|ftp)[0-9]?\.[-a-z0-9.]+)[^ .,;<">\):]?[^, <>"]*[^ .,;<">:\)])DUPA";

#include <cstdio>
#include <cstdlib>
#include <initializer_list>
#include <regex.h>


int main() {
        regex_t rgx;
        if(regcomp(&rgx, regex, REG_EXTENDED | REG_ICASE))
                abort();

        for(auto l : {"Debian#1035358: 
https://en.wikipedia.org/wiki/Close_Combat_(series)",  //
                      "            vs (https://en.wikipedia.org/wiki/Debian)",  
              //
                      "            vs  
https://en.wikipedia.org/wiki/(You_Gotta)_Fight_for_Your_Right_(To_Party!)"}) {
                regmatch_t matches[20];
                if(regexec(&rgx, l, 20, matches, 0))
                        abort();

                puts(l);
                for(size_t i = 0; i <= rgx.re_nsub; ++i)
                        std::printf("%zu: \"%.*s\"\n", i, 
(int)(matches[i].rm_eo - matches[i].rm_so), l + matches[i].rm_so);
                puts("");
        }
}

Attachment: signature.asc
Description: PGP signature

Reply via email to