I'd like to make a slight change to the is_digit, is_wordchar, and other is_* ops. Currently calling these ops at the offset following the last codepoint results in a "get_byte past the end of the buffer (1 of 1)" error, it would be nicer if they simply returned false (0) at this one position. (Going any further than that could generate the error message.) This would also be consistent with several of the other string ops that don't return errors just because the offset is at the end of the string.
As it is now, I'm having to put in extra checks to watch for the end of the string when I'm just wanting to run along a string of digits. Of course, if we had an op to find the offset of the first non-digit/non-word/non-whitespace/etc. codepoint then I could use that. :-) I'll be glad to write the patch for is_digit and friends if it's appropriate. Otherwise I'll just continue to work around the current behavior with the extra checks for end of string. Pm