I'll implement optional (and not default) support of IDN in filter_var(). Does anyone known if it's better to use libIDN (LGPL) or ICU (custom license deviated from the X license) from a license point of view?
2014-09-19 16:18 GMT+02:00 Chris Wright <[email protected]>: > On 19 September 2014 14:48, Kévin Dunglas <[email protected]> wrote: > > Support of IDN in streams is a must have. > > But there is a lot of other use cases for URL with IDN validation. The > most > > common is probably form validation (test if an user submitted URL has a > > valid format and can be used to create an HTML link...). > > > > I'm ok making IDN validation optional and not used by default until PHP > > natively support IDN in other features such as streams. > > But IDN are used more and more in the wild, and from a user point of > view it > > is disappointing that a valid URL, working in browsers and even > displayed by > > Google Search is not considered as a valid URL by a PHP-based website > using > > filter_var() without a specific flag. > > > > Even some TLD are using non-ASCII characters, exemple: http://旅游气象.中国 > <http://xn--zfv73l7xbp87c.xn--fiqs8s> > > (popular Chinese weather site). > > > > About the library, I've not preference between libidn and icu. If the > > licence is libidn fit better with the PHP one, libidn is probably the > better > > choice. Having a PHP specific implementation of STRINGPREP and Punnycode > > sounds not like a good idea (reinventing the wheel, more code to > maintain). > > > > Chris, is there a chance to have your work on streams merged in PHP 7? > > It's very hacky and PoC at the moment. I've got a bunch of > time-consuming personal things going on right now, but within the next > couple of weeks I will try and polish it up into something > serviceable, maintainable and tested/less likely to explode with > edge-cases and then I'll put it up for discussion. > > I'm also fine if someone else wants to have a crack in the meantime, I > can push my work so far to github early next week when I get access to > the machine. > > I'd certainly like the functionality to be in 7 if it's viable from a > licensing and dependency PoV - I had been holding off bringing it up > to see what happened with the more general unicode support discussion > (which I somewhat lost track of and seems to have died out) as there > was talk of introducing a hard dependency on ICU-or-similar at one > point, which would have made this a no-brainer. > > > What do you thing about the following planning: > > - 5.7 (if exists): add IDN support in filter disabled by default. Use > libidn > > if selected to be used for streams too. > > - 7 (if IDN support for streams is completed): validate IDN by default > (what > > the user expect), add a flag to disable IDN validation. Of course we'll > > update the doc explaining the new behavior. > > > > 2014-09-19 12:28 GMT+02:00 Chris Wright <[email protected]>: > >> > >> On 19 September 2014 10:58, Pierre Joye <[email protected]> wrote: > >> > Hi, > >> > > >> > On Sep 19, 2014 4:03 PM, "Chris Wright" <[email protected]> wrote: > >> >> > >> >> Kévin > >> >> > >> >> On 18 September 2014 21:26, Kévin Dunglas <[email protected]> wrote: > >> >> > Hello, > >> >> > > >> >> > I'm working on enhancing the FILTER_VALIDATE_URL filter ( > >> >> > https://github.com/php/php-src/pull/826). > >> >> > The current implementation does not support validation of > >> >> > internationalized > >> >> > domain names (i.e: http://www.académie-française.fr/ > <http://www.xn--acadmie-franaise-npb1a.fr/> > >> >> > <http://www.xn--acadmie-franaise-npb1a.fr/>). > >> >> > > >> >> > Support of IDN validation can be easily added using ICU's > >> >> > uidna_toASCII() > >> >> > function. > >> >> > > >> >> > Is it acceptable to add a dependency to ICU for ext/filter? > >> >> > Another option is to add a HAVE_ICU constant in main/php_config.h > and > >> >> > to > >> >> > validate IDN only if ICU is present. > >> >> > > >> >> > What strategy is preferred? > >> >> > >> >> I've done some work around this area previously, and all I will say > >> >> is: be careful with what you do with this from a userland PoV. > >> >> > >> >> PHP does not natively support IDN in stream open routines or SSL > >> >> verification routines. It will never support these things without at > >> >> least one of: > >> >> - a core dependency on ICU, libidn or similar > >> >> - moving streams into an extension so a dependency can be introduced > >> >> there (probably not sanely possible) > >> >> - an in-house NAMEPREP implementation (this is the hard part of IDN, > >> >> punycode itself is pretty trivial to implement once you have a > >> >> canonical set of codepoints) > >> >> > >> >> These things can be implemented with *a lot* of boilerplate in > >> >> userland when you have ext/intl, but it's not pretty. libcurl *can* > >> >> support IDN if it was built against libidn, I'm not sure if this is > >> >> currently the case in common distributions or not. Since one almost > >> >> never just validates a URL string, it's usually a precursor to > >> >> attempting to open it, this could lead to some pretty hefty wtfs. > >> >> > >> >> In short, while I'm generally for ext/filter being able to handle > IDN, > >> >> I *do not* believe it should do it implicitly, it should require an > >> >> explicit flag, because it will break *a lot* of code if IDN is > >> >> suddenly treated as valid where it previously wasn't. > >> > > >> > I am really not sure about that especially the enabling by default > part. > >> > > >> > The doc is pretty clear about what this filter supports and allowing > idn > >> > may > >> > break a lot of codes out there. > >> > > >> > From an implementation point of view we may not need ICU to support > IDN. > >> > Windows does not use it and there are license friendly decoder > >> > implementations too. > >> > >> If we can agree on adding a core dependency on <some IDN support lib>, > >> I already have an experimental local branch that adds full IDN support > >> to streams. It's based on libidn but it would be easy enough to swap > >> it out for something else that provides the same functionality. > >> > >> In my (biased) opinion, streams are a far more important element of > >> IDN support. Filter validation is just polish/a nicety on top. > > > > > > > > > > -- > > Kévin Dunglas > > > > http://dunglas.fr > -- Kévin Dunglas Consultant et développeur freelance http://dunglas.fr Tél. : 06 60 91 20 20
