I set up a RFC page for this in wiki.php.net. Here it goes: http://wiki.php.net/rfc/altmbstring
Moriyoshi 2009/7/26 Moriyoshi Koizumi <m...@mozo.jp>: > Hi there, > > I almost finished an alternative implementation of mbstring that uses > ICU instead of the exotic libmbfl in hope of replacing the current one > for 5.4 (and possibly, 6.0.) > > Although there are admittingly some known incompatibilities that need > extra libraries to resolve them besides a number of missing functions > that are intentionally removed for simplicity's sake, frequently used > functions are fully usable, and more compliant with the standard (e.g. > case insensitive matches). > > Any comments are appreciated. > > The source is ready in the following location: > > http://github.com/moriyoshi/mbstring-ng/ > > > Implemented functions: > > - mb_convert_encoding() > - mb_detect_encoding() > - mb_ereg() > - mb_ereg_replace() > - mb_internal_encoding() > - mb_list_encodings() > - mb_output_handler() > - mb_parse_str() > - mb_preferred_mime_name() > - mb_regex_set_options() > - mb_split() > - mb_strcut() > - mb_strimwidth() > - mb_stripos() > - mb_stristr() > - mb_strlen() > - mb_strpos() > - mb_strripos() > - mb_strrpos() > - mb_strstr() > - mb_strtolower() > - mb_strtotitle() > - mb_strtoupper() > - mb_strwidth() > - mb_substr() > - mb_substr_count() > > Removed functions and reasons behind it: > > - mb_check_encoding() > Not that usable as it is advertised, period. First of all, validation > in terms of encoding is just as same as filtering through the > converter supplied with the same value for the input and output > encoding. Thus just use mb_convert_encoding(). > > - mb_convert_case() > Use mb_strtoupper(), mb_strtolower() and mb_strtotitle() > > - mb_convert_kana() > This can't be standard-compliant. In addition, part of the > functionality is already covered by Normalizer of intl extension, so > we need to carefully consider what is actually needed here again. > > - mb_convert_variables() > This can be implemented as a script. > > - mb_decode_mimeheader(), mb_encode_mimeheader() > Non-standard compliancy. > > - mb_decode_numericentity() > Removed in favor of html_entity_decode(). > > - mb_encode_numericentity() > Removed in favor of htmlentities() and htmlspecialchars(). > > - mb_encoding_aliases() > Just unnecessary. > > - mb_ereg_match() > Use mb_ereg(). > > - mb_ereg_search(), mb_ereg_search_getpos(), mb_ereg_search_getregs(), > mb_ereg_search_init(), mb_ereg_search_pos(), mb_ereg_search_regs() and > mb_ereg_search_setpos() > I rarely heard a script that actively uses these functions. They > involve an internal state that is not visible to users, and thus it > most likely causes confusion when used across the function calls. > Need to be reimplemented as a class. > > - mb_eregi() > Use mb_regex_options() and mb_ereg() > > - mb_eregi_replace() > I wonder why this function was added in the first place because giving > 'i' option to mb_ereg_replace() works in the same way. > > - mb_detect_order(), mb_get_info(), mb_http_input(), mb_http_output(), > mb_language() and mb_substitute_character() > ini_set() and ini_get() are your friend, I guess... > > - mb_regex_encoding() > It is really confusing that the current mbstring allows two different > encoding defaults that are applied to regex functions and the rest. > Those settings are unified in the alternative version and so this is > no longer necessary. > > - mb_send_mail() > The behavior of this function relies on the pseudo-locale setting > called "mbstring.language" that supports just a limited set of > possible locales. As not everyone can benefit from the function and > most significant applications implement their own mail functions, I > suppose this is no longer wanted. > > - mb_strrchr() > Use mb_strrpos(). > > - mb_strrichr() > Use mb_strripos(). > > > Known limitations and incompatibilities: > > - mb_detect_encoding() doesn't work well anymore due to the > inaccuracy of ICU's encoding detection facility. > > - Request encoding translator now takes advantage of SAPI filter, > therefore the name parts of the query components are not to be > converted anymore. > > - The group reference placeholders for mb_ereg_replace() is now > $0, $1, $2... instead of \0, \1, \2. This can be avoided if we > don't use uregex_replaceAll() and implement our own. > > - ILP64 :-p > > > Regards, > Moriyoshi > > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php