I have now released regex-tdfa version 1.1.1 (with GNU anchors! bug
fixes!)
And more interestingly I have release a new package: regex-tdfa.utf8
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/regex-tdfa-utf8
This uses the utf8 decoding in the utf8-string package and a new
"Utf8" new
>From hackage, I suspect you use use
>http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string
for the utf8 translation?
To get this working with regex-tdfa I need only one very small thing.
You have to wrap the Lazy ByteString in a newtype so the instance can
be different.
In fa
With [Char] and (Seq Char) the text is full unicode.
With ByteString and ByteString.Lazy you are really using
ByteString.Char8 and ByteString.Lazy.Char8
Here is a test (I saved the source file in utf8):
import Text.Regex.TDFA
text = "☮☯♲☢☣☠☃"
regex = "(☢|☣)"
search :: [[String]]
search = text =