These properties come from the Unicode definitions in the file Scripts.txt, 
not from the Go language. It is the same in Perl, \p{Katakana} does not 
match U+30FB or U+30FC, but \p{InKatakana} does, similarly with U+30A0.

Here is the relevant portion of Scripts.txt:

30A1..30FA    ; Katakana # Lo  [90] KATAKANA LETTER SMALL A..KATAKANA 
LETTER VO
30FD..30FE    ; Katakana # Lm   [2] KATAKANA ITERATION MARK..KATAKANA 
VOICED ITERATION MARK
30FF          ; Katakana # Lo       KATAKANA DIGRAPH KOTO
31F0..31FF    ; Katakana # Lo  [16] KATAKANA LETTER SMALL KU..KATAKANA 
LETTER SMALL RO
32D0..32FE    ; Katakana # So  [47] CIRCLED KATAKANA A..CIRCLED KATAKANA WO
3300..3357    ; Katakana # So  [88] SQUARE APAATO..SQUARE WATTO
FF66..FF6F    ; Katakana # Lo  [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH 
KATAKANA LETTER SMALL TU
FF71..FF9D    ; Katakana # Lo  [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH 
KATAKANA LETTER N
1B000         ; Katakana # Lo       KATAKANA LETTER ARCHAIC E
1B164..1B167  ; Katakana # Lo   [4] KATAKANA LETTER SMALL WI..KATAKANA 
LETTER SMALL N


I imagine that the reason for this is that U+30FC, the KATAKANA-HIRAGANA 
PROLONGED SOUND, isn't specifically a katakana symbol, it can be used with 
either katakana or hiragana (らーめん etc.), and U+30FB, although it's called 
KATAKANA MIDDLE DOT, is actually a punctuation mark and it also is not 
actually a katakana symbol. 

But the Unicode definitions are not easy to work with for people handling 
Japanese text. Generally speaking, if you want to match a Japanese word you 
want to get U+30FC, but you don't want U+30FB, which is why I made 
something like this:

https://metacpan.org/pod/Lingua::JA::Moji#InKana


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/6e0c78c3-e3e3-46c2-9cc4-05de41e1a80f%40googlegroups.com.

Reply via email to