https://bugs.kde.org/show_bug.cgi?id=433924
Bug ID: 433924 Summary: regular expressions unicode support Product: kate Version: 20.12.2 Platform: Other OS: Microsoft Windows Status: REPORTED Severity: normal Priority: NOR Component: search Assignee: kwrite-bugs-n...@kde.org Reporter: peter.verkinde...@gmail.com Target Milestone: --- SUMMARY Regular expressions in Kate search are not optimized for non-Western languages (or even, any language that uses non-ASCII letters). For example, the /\w/ regex only matches ASCII letters (+ underscore), not all unicode letter characters as it does in most modern programming languages and editors. /\d/ likewise only covers ASCII digits, not digits in other scripts. This makes it exceedingly difficult to write a regex for languages with non-Latin script, or even languages like French that use a good number of non-ASCII characters. POSIX character classes are implemented but also only support ASCII characters. It would be great if Unicode classes/properties would be supported. STEPS TO REPRODUCE 1. write any text that contains non-ASCII letters, e.g., "café القهوة" 2. try to match words using /\w+/ (obviously very simplistic example, but imagine writing any regex without being able to use \w, \d, or [a-zA-Z]) OBSERVED RESULT only the letters `caf` are matched EXPECTED RESULT `café` and `القهوة` should be matched SOFTWARE/OS VERSIONS Windows: macOS: Linux/KDE Plasma: (available in About System) KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION -- You are receiving this mail because: You are watching all bug changes.