https://bugs.kde.org/show_bug.cgi?id=433924

            Bug ID: 433924
           Summary: regular expressions unicode support
           Product: kate
           Version: 20.12.2
          Platform: Other
                OS: Microsoft Windows
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: search
          Assignee: kwrite-bugs-n...@kde.org
          Reporter: peter.verkinde...@gmail.com
  Target Milestone: ---

SUMMARY

Regular expressions in Kate search are not optimized for non-Western languages
(or even, any language that uses non-ASCII letters). 

For example, the /\w/ regex only matches ASCII letters (+ underscore), not all
unicode letter characters as it does in most modern programming languages and
editors. /\d/ likewise only covers ASCII digits, not digits in other scripts.
This makes it exceedingly difficult to write a regex for languages with
non-Latin script, or even languages like French that use a good number of
non-ASCII characters. 

POSIX character classes are implemented but also only support ASCII characters.
It would be great if Unicode classes/properties would be supported. 



STEPS TO REPRODUCE
1. write any text that contains non-ASCII letters, e.g., "café  القهوة"
2. try to match words using /\w+/

(obviously very simplistic example, but imagine writing any regex without being
able to use \w, \d, or [a-zA-Z])

OBSERVED RESULT

only the letters `caf` are matched

EXPECTED RESULT

`café` and `القهوة` should be matched

SOFTWARE/OS VERSIONS
Windows: 
macOS: 
Linux/KDE Plasma: 
(available in About System)
KDE Plasma Version: 
KDE Frameworks Version: 
Qt Version: 

ADDITIONAL INFORMATION

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to