On September 8, 2002 13:12, Chris Little wrote: > FWIW, we need to upgrade our regexp engine. The current one (from GNU) > has a couple of problems that I was aware of. First it is GPL--this is > the last GPL component in the library. If it were replaced with something > else, we could license Sword under non-GPL licenses to other entities > (e.g. Bible societies that don't want to deal with GPL's restrictions) or > put it out publicly under a license that we write that better meets our > needs than the GPL. Second (and probably more immediately important) it > doesn't handle UTF-8.
Wouldn't it make more sense to use UTF-16 than UTF-8 in regular expressions. At least with UTF-16, in most cases, 1 character == 1 symbol so regular expressions would be more managable (e.g. what does a dot mean in a regular expression when being matched against symbols that can be represented in 1,2 or 3 chars?). Does ICU have regular expression support? I know the regular expression support in Java 1.4 is very nice and uses UTF-16 but alas we can't really use that in Sword unless we come up with a CNNI (C non-native interface :-). Joel