[julia-users] question on utf8 strings

William Macready Mon, 12 Jan 2015 07:58:26 -0800

I've been using Julia for about a month now, and I'm really enjoying the 
language. My thanks to all who've contributed to it's development!


I'm developing a parser for first-order logic, and wanted to use the logic 
symbols available in unicode. I've come across behaviour that I don't 
understand.

In the REPL I define the string
s = "¬(a<b)" with the unicode negation symbol (obtained from \neg<tab>) as 
the first element

As I expected s[1] returns '¬', but s[2] returns the error
ERROR: invalid UTF-8 character index
  in next at ./utf8.jl:68
  in getindex at string.jl:57
Then s[3]='(' which I would have thought was at position 2. Similarly, 
length(s)=6, but s[6]='b'.

Regular expression search also seems to be off by one after the negation 
symbol, e.g. if m=match(r"a",s) then m.offset is 4 rather than 3.

Is this a known issue, or am I doing something incorrectly?

I'm using: 
Julia Version 0.3.5
Commit a05f87b* (2015-01-08 22:33 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

[julia-users] question on utf8 strings

Reply via email to