Hey Joe, I think there are a few issues with the proposal, although I like the general idea. I've had the tab with the RFC open since October... but never looked at it until now :-/. So, a few comments:
- UString as a name. I think I am going to prefer "Text" as a class name. Unicode (and intl/icu) have lots of operators acting on items containing unicode strings. But they are really pieces of text. For example sentences, word break iterators, etc. UString *feels* clunky, and not "standard". If it's going to be part of PHP core, then we should pick a "core" name. (I might prefer String, but that's going to cause a whole lot of issues obviously). - "Needs More Methods" I had a look at the API that that links to, and I miss operators like iterators. Over words, sentences, characters, etc. Basically the functionality of http://docs.php.net/manual/en/class.intlbreakiterator.php, http://docs.php.net/manual/en/class.intlrulebasedbreakiterator.php and http://docs.php.net/manual/en/class.intlcodepointbreakiterator.php I realize intl already immplements, this, but it's really beneficial to have for a "Text" class - especially for replacing functionality where people now look over a string - with a character index. - "Not a full String API Replacement" I would certainly expect more from it than just the UnicodeString API. Perhaps not for a first iteration, but certainly for subsequent versions. Things like transliterations, and specifically iterators would be high on my list. - "Patch" toUpper/toLower, there is a missing one for toTitle - In the code's README: "Note: UString is interchangable with zend strings for method parameters and can be cast for output/conversion to zend strings" How does that work? And what would it convert to? - How are "characters" counted? Is a character a Code Point, or is a character a base character + combining diacritics. In the first form, A + ° is considered as characters, in the second option, just one. For wordwrap, splice, substring, it is really important that only the *full sequence* is considered as a character. And hence, a character really should be the full sequence. The text in "charAt" seems to contradict that, and that is a mistake. In the original PHP 6 we didn't do that due to perormance reasons, but that point is moot now as only people who opt into using "Text" will suffer from this. - "trim" What is a leading or trailing space? Is it just U+0020, or other Unicode defined space characters as well? ( , U+00A0 comes to mind here) - What is "UG(defaultpad)," about? - For the code: - there is some interesting, non standard whitespaceing going on: - { goes on next line after a func decl - sometimes 4 spaces in stead of a tab are used for indentation, - Why is there no __toString() ? - How can other extensions, not really making use of "Text", use there strings (as UTF8 strings f.e.) cheers, Derick On Sat, 28 Feb 2015, Joe Watkins wrote: > Morning internals, > > This is just a quick note to announce my intention to ready this RFC > for voting next week. > > I know I'm a little late maybe, I was real sick most of last week, so > couldn't do anything useful. > > A couple of us intend to fix outstanding issues on github and those > raised here, tidy the RFC and open the vote for 7. > > I would ask anyone interested to scan through this thread and announce > concerns that are not mentioned asap. > > Cheers > Joe > > On Fri, Oct 24, 2014 at 3:01 PM, Chris Wright <daveran...@php.net> wrote: > > > On 24 October 2014 07:03, Joe Watkins <pthre...@pthreads.org> wrote: > > > >> On Thu, 2014-10-23 at 12:54 -0700, Stas Malyshev wrote: > >> > Hi! > >> > > >> > > P.S. u() is a bad name, will break lots of code, i.e. > >> > > >> > Maybe __u()? It's a bit ugly but you're not allowed to use __ so it's > >> safe. > >> > > >> > >> /me cringes ... > >> > >> I wonder how much of a problem it really is, usually when we say some > >> function name is a problem is because of hundreds and hundreds of > >> results on github. > >> > >> If it's a huge problem then we should rename it, if we have to dig > >> around for a single project that's incompatible, or even a handful, then > >> it's not really a problem. > >> > >> Cheers > >> Joe > > > > > > I can see this being something relatively common. While I personally would > > never do it, there are a few reasons I can think of that people *might* do > > it: > > > > - Wrapper for creating <u> HTML output > > - urlencode() shortcut > > - (obviously) various unicode-related things > > > > Searching on codesearch [1] revealed (amongst a few other hits on the > > first page) another interesting use of it in the hhvm test suite [2]. It's > > difficult to search for this because all the available public search > > engines that I know of do fuzzy matching. > > > > Sorry. This sucks, because every other option we have for this is sucks. > > > > On the bright side, anything chosen could always be aliased at the top of > > the file: > > > > use function __u as u; > > > > This also sucks, but it sucks a little bit less because the collisions are > > avoided - or at least, avoided in such a way that the onus is on the user - > > and one can still have the sane name. > > > > First-class support at the syntax level (presumably $foo = u"unicode > > string" since we already have $foo = b"binary string") would IMO be better > > and (hopefully?) a long-term goal, but I am aware that it is - and probably > > should be - outside the scope of the current proposal. > > > > [1] https://searchcode.com/?q=function+u+lang%3Aphp > > [2] > > https://github.com/facebook/hhvm/blob/master/hphp/test/slow/ext_icu/uspoof.php#L13 > > > -- http://derickrethans.nl | http://xdebug.org Like Xdebug? Consider a donation: http://xdebug.org/donate.php twitter: @derickr and @xdebug Posted with an email client that doesn't mangle email: alpine
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php