https://bugs.kde.org/show_bug.cgi?id=433924
Bug ID: 433924
Summary: regular expressions unicode support
Product: kate
Version: 20.12.2
Platform: Other
OS: Microsoft Windows
Status: REPORTED
Severity: normal
Priority: NOR
Component: search
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
SUMMARY
Regular expressions in Kate search are not optimized for non-Western languages
(or even, any language that uses non-ASCII letters).
For example, the /\w/ regex only matches ASCII letters (+ underscore), not all
unicode letter characters as it does in most modern programming languages and
editors. /\d/ likewise only covers ASCII digits, not digits in other scripts.
This makes it exceedingly difficult to write a regex for languages with
non-Latin script, or even languages like French that use a good number of
non-ASCII characters.
POSIX character classes are implemented but also only support ASCII characters.
It would be great if Unicode classes/properties would be supported.
STEPS TO REPRODUCE
1. write any text that contains non-ASCII letters, e.g., "café القهوة"
2. try to match words using /\w+/
(obviously very simplistic example, but imagine writing any regex without being
able to use \w, \d, or [a-zA-Z])
OBSERVED RESULT
only the letters `caf` are matched
EXPECTED RESULT
`café` and `القهوة` should be matched
SOFTWARE/OS VERSIONS
Windows:
macOS:
Linux/KDE Plasma:
(available in About System)
KDE Plasma Version:
KDE Frameworks Version:
Qt Version:
ADDITIONAL INFORMATION
--
You are receiving this mail because:
You are watching all bug changes.