https://bugs.documentfoundation.org/show_bug.cgi?id=140708

            Bug ID: 140708
           Summary: The REGEX function accepts all (ismx) but one (w)
                    flags and only directly in the regular expression and
                    does not allow all matches to be found at once
           Product: LibreOffice
           Version: 7.0.4.2 release
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: medium
         Component: Calc
          Assignee: [email protected]
          Reporter: [email protected]

Description:
We have:
 REGEX(Text;Expression[;[Replacement][;Flags|Occurrence]])
 Flag settings: "g" only (means "Global")
Desirable:
 REGEX(Text;Expression[;[Replacement][;Flags][;Occurrence]])
 Flag settings: "g" + "ismxw"

Flag Settings - Description
i - Ignore case (case insensitive)
s - Make . match newline too (single-line, dot all)
m - Make begin/end {^, $} consider each line
x - Allow comment in regex
w - Make {\w, \W, \b, \B} follow Unicode rules

Steps to Reproduce:
See "Actual Results".

Actual Results:
1. Either the first occurrence or the given one is extracted. Now if the
replacement parameter is not specified, the flag "g" is ignored.
2. All flags (ismx) work if you insert them directly into a regular expression:
"(?ismx)…" or "(?ismx:…)" when the corresponding option is enabled. Except for
one (w).
3. Flag "w". E.g.:
=REGEX("The quick (""brown"") fox can’t jump 32.3 feet,
right?";"(?w)\b\w+\b";;5)
returns "jump", not "can't". Why?


Expected Results:
1. When the "g" flag is set, all occurrences should also be returned.
Parameters "Flags|Occurrence" should be isolated.
2. Flag settings: "g" + "ismxw"
3. Word boundaries are recognized as in the example above according to the
specification
(https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundaries).


Reproducible: Always


User Profile Reset: No



Additional Info:
The use of the "w" flag remains unclear. For example, words with an accent in a
word are recognized with the "w" flag disabled (?-w), and the examples of the
words above are not recognized at all.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to