Re: [racket] pregexp to detect Japanese characters

Geoffrey S. Knauth Tue, 16 Sep 2014 07:18:07 -0700

NEVER MIND.  I figured it out:

; detects japanese characters
; contains-japanese-characters? str -> bool
(define (contains-japanese-characters? s)
 (or (regexp-match #rx"[\u3041-\u3096]" s)  ; Hiragana
     (regexp-match #rx"[\u30A0-\u30FF]" s)  ; Katakana (Full Width)
     (regexp-match #rx"[\u3400-\u4DB5\u4E00-\u9FCB\uF900-\uFA6A]" s)  ; Kanji
     (regexp-match #rx"[\u2E80-\u2FD5]" s)  ; Kanji Radicals
     (regexp-match #rx"[\uFF5F-\uFF9F]" s)  ; Katakana and Punctuation (Half 
Width)
     (regexp-match #rx"[\u3000-\u303F]" s)  ; Japanese Symbols and Punctuation
     (regexp-match #rx"[\u31F0-\u31FF\u3220-\u3243\u3280-\u337F]" s)  ; Misc. 
Japanese Symbols/Chars
     (regexp-match #rx"[\uFF01-\uFF5E]" s)))  ; Alphanumeric and Punctuation 
(Full Width)


On Sep 16, 2014, at 09:22 , Geoffrey S. Knauth <ge...@knauth.org> wrote:

> I'm writing a function to detect Japanese characters in a string. I found 
> this page:
>  
> http://www.localizingjapan.com/blog/2012/01/20/regular-expressions-for-japanese-text/
>  
> So, for example, the example Perl regexp [\x{3041}-\x{3096}] would detect 
> Hiragana characters (as would \p{Hiragana}).  How do I express such a Unicode 
> range with Racket regexps?
>  
> I looked at the docs below and it wasn't obvious to me how to do it.  In 
> other languages there might be, for example, a \xnnnn or \uxxxx construct.
>  
> http://docs.racket-lang.org/reference/regexp.html#%28elem._%28rxex._30%29%29
>  
> --
> Geoffrey S. Knauth | http://knauth.org/gsk
>  
> ____________________
>  Racket Users list:
>  http://lists.racket-lang.org/users

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Re: [racket] pregexp to detect Japanese characters

Reply via email to