Re: Searching for combined characters

Dominique Pellé Fri, 17 Aug 2012 20:15:07 -0700

On Fri, Aug 17, 2012 at 9:48 PM, Steffen Daode Nurpmeso
<[email protected]> wrote:
> Hello,
> well - this is my first post here after using vim(1) for so long
> (it's something in between 10 and 12 years, i've forgotten), so
> let me thank the vim(1) developers first -- YYAAAAAAAAAAAAAAAAA!!!
>
> On the unicode@unicode list there was a thread on combining characters,
> (Why no combining-character form for U+00F8?), and it turns out that
> vim(1) isn't capable to perform a normalized search either!?
> E.g., given a file
>
>   |é
>   |é
>   |e
>
> which is (except empty lines stripped)
>
>   |00000000  0a c3 a9 0a 65 cc 81 0a  65 0a 0a                 |....e...e..|
>   |0000000b
>
> then with 'Vi IMproved 7.3 (2010 Aug 15, compiled Jan  7 2011 14:27:00)',
> old but never failed, pretty stripped, but very Unicode friendly,
>
>   /\%xe9\|e\%u0301\|e
>
> finds the first and the last, and
>
>   /[=\%xE9=]
>
> finds the second and the third, which is wrong.
> Searching for \%u0301 will find the second, but \.\%u0301 won't.
> \Ze will also find the second and the third.
>
> Should i update?  Or what is the state of Unicode normalization
> support for searching and replacement?  Will it be implemented?
> Am i missing something?
> Thanks you and ciao,
>
> --steffen



Maybe you're interested in this patch:

---
Patch 7.3.259
Problem:    Equivalence classes only work for latin characters.
Solution:   Add the Unicode equivalence characters. (Dominique Pelle)
Files:      runtime/doc/pattern.txt, src/regexp.c, src/testdir/test44.in,
            src/testdir/test44.ok
---

In your example, all 3 lines match with Vim-7.3.633 when I do:

/[[=e=]]

See :help \[==\]

-- Dominique

-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Searching for combined characters

Reply via email to