On Tue, Apr 26, 2016 at 2:06 AM, Yasuo Ohgaki <yohg...@ohgaki.net> wrote: > Things might have been changed, but as you've mentioned encoding > detection is unstable and ICU is poor compared to mbstring's detection > at least for Japanese encodings. > For me, the difference is that I expect further work to be done on improving ICU, while I lack that confidence for mbstring. If the API is in place early on, the library can improve underneath it to the point it becomes more trustworthy later, but still be usable on older versions of PHP (linked against newer libicu).
Maybe, I dunno. I lack the motivation to push this feature forward atm, merely because it's not trust-worthy now. > Developers should not rely on encoding detector, but they should validate > encoding. > I think everyone agrees on that. :) > Problem is there are cases that developers cannot determine used encoding... > If we are going to have this API, it would be better to validate string with > detected encoding by default and disable encoding validation optionally. > There are cases that developers have to deal with broken string data > on occasion. > What do you have in mind? Full-on pre-request input filtering? 'cause that's never worked right (we tried really hard to make PHP6 do that and it failed badly) Or do you mean something like wrapping the ucsdet API in a coercer function that only returned the original string if it detected at high confidence and then validated against that detection? 'cause honestly, that should also be left to the application IMO. -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php