I set up a RFC page for this in wiki.php.net.  Here it goes:
http://wiki.php.net/rfc/altmbstring

Moriyoshi

2009/7/26 Moriyoshi Koizumi <m...@mozo.jp>:
> Hi there,
>
> I almost finished an alternative implementation of mbstring that uses
> ICU instead of the exotic libmbfl in hope of replacing the current one
> for 5.4 (and possibly, 6.0.)
>
> Although there are admittingly some known incompatibilities that need
> extra libraries to resolve them besides a number of missing functions
> that are intentionally removed for simplicity's sake, frequently used
> functions are fully usable, and more compliant with the standard (e.g.
> case insensitive matches).
>
> Any comments are appreciated.
>
> The source is ready in the following location:
>
> http://github.com/moriyoshi/mbstring-ng/
>
>
> Implemented functions:
>
> - mb_convert_encoding()
> - mb_detect_encoding()
> - mb_ereg()
> - mb_ereg_replace()
> - mb_internal_encoding()
> - mb_list_encodings()
> - mb_output_handler()
> - mb_parse_str()
> - mb_preferred_mime_name()
> - mb_regex_set_options()
> - mb_split()
> - mb_strcut()
> - mb_strimwidth()
> - mb_stripos()
> - mb_stristr()
> - mb_strlen()
> - mb_strpos()
> - mb_strripos()
> - mb_strrpos()
> - mb_strstr()
> - mb_strtolower()
> - mb_strtotitle()
> - mb_strtoupper()
> - mb_strwidth()
> - mb_substr()
> - mb_substr_count()
>
> Removed functions and reasons behind it:
>
> - mb_check_encoding()
>  Not that usable as it is advertised, period.  First of all, validation
>  in terms of encoding is just as same as filtering through the
>  converter supplied with the same value for the input and output
>  encoding.  Thus just use mb_convert_encoding().
>
> - mb_convert_case()
>  Use mb_strtoupper(), mb_strtolower() and mb_strtotitle()
>
> - mb_convert_kana()
>  This can't be standard-compliant. In addition, part of the
>  functionality is already covered by Normalizer of intl extension, so
>  we need to carefully consider what is actually needed here again.
>
> - mb_convert_variables()
>  This can be implemented as a script.
>
> - mb_decode_mimeheader(), mb_encode_mimeheader()
>  Non-standard compliancy.
>
> - mb_decode_numericentity()
>  Removed in favor of html_entity_decode().
>
> - mb_encode_numericentity()
>  Removed in favor of htmlentities() and htmlspecialchars().
>
> - mb_encoding_aliases()
>  Just unnecessary.
>
> - mb_ereg_match()
>  Use mb_ereg().
>
> - mb_ereg_search(), mb_ereg_search_getpos(), mb_ereg_search_getregs(),
>  mb_ereg_search_init(), mb_ereg_search_pos(), mb_ereg_search_regs() and
>  mb_ereg_search_setpos()
>  I rarely heard a script that actively uses these functions. They
>  involve an internal state that is not visible to users, and thus it
>  most likely causes confusion when used across the function calls.
>  Need to be reimplemented as a class.
>
> - mb_eregi()
>  Use mb_regex_options() and mb_ereg()
>
> - mb_eregi_replace()
>  I wonder why this function was added in the first place because giving
>  'i' option to mb_ereg_replace() works in the same way.
>
> - mb_detect_order(), mb_get_info(), mb_http_input(), mb_http_output(),
>  mb_language() and mb_substitute_character()
>  ini_set() and ini_get() are your friend, I guess...
>
> - mb_regex_encoding()
>  It is really confusing that the current mbstring allows two different
>  encoding defaults that are applied to regex functions and the rest.
>  Those settings are unified in the alternative version and so this is
>  no longer necessary.
>
> - mb_send_mail()
>  The behavior of this function relies on the pseudo-locale setting
>  called "mbstring.language" that supports just a limited set of
>  possible locales. As not everyone can benefit from the function and
>  most significant applications implement their own mail functions, I
>  suppose this is no longer wanted.
>
> - mb_strrchr()
>  Use mb_strrpos().
>
> - mb_strrichr()
>  Use mb_strripos().
>
>
> Known limitations and incompatibilities:
>
> - mb_detect_encoding() doesn't work well anymore due to the
>  inaccuracy of ICU's encoding detection facility.
>
> - Request encoding translator now takes advantage of SAPI filter,
>  therefore the name parts of the query components are not to be
>  converted anymore.
>
> - The group reference placeholders for mb_ereg_replace() is now
>  $0, $1, $2... instead of \0, \1, \2.  This can be avoided if we
>  don't use uregex_replaceAll() and implement our own.
>
> - ILP64 :-p
>
>
> Regards,
> Moriyoshi
>
>
>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to