Re: [PHP-DEV] Unicode and XML

2008-06-01 Thread Edward Z. Yang
Edward Z. Yang wrote: > My proposal is to introduce a new filter (for the filter extension) > which performs codepoint sanitization appropriate for HTML/XML contexts > (alternatively, this could be an option on the FILTER_DEFAULT filter, > which would be for Unicode strings, I assume). This filter

Re: [PHP-DEV] Unicode and XML

2008-05-29 Thread Edward Z. Yang
Chris Stockton wrote: > I think that internal string handling so be very respective to the > specification as you said. Perhaps code points which are not valid for a > separate specification, protocol etc, the conversion should be done in the > functions dealing with those formats. Like if extensio

Re: [PHP-DEV] Unicode and XML

2008-05-29 Thread Chris Stockton
I think that internal string handling so be very respective to the specification as you said. Perhaps code points which are not valid for a separate specification, protocol etc, the conversion should be done in the functions dealing with those formats. Like if extension family xmlfoo does not like

[PHP-DEV] Unicode and XML

2008-05-28 Thread Edward Z. Yang
In PHP 6, incoming user data will automatically be in (unicode) form. (That is, assuming that the JIT functionality for converting gets implemented). One of the implementation details I'd like to consider involves non-XML and/or non-SGML codepoints inside markup. As per the Unicode specification,

Re: [PHP-DEV] unicode and xml extensions

2006-07-24 Thread Rob Richards
imo, this would probably the easiest and best way to handle the conversions. Rob Andrei Zmievski wrote: Maybe. An alternate way would be to add modifier to 's' that makes it accept a converter to use for conversion. if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s>", &str, &str_len, U

Re: [PHP-DEV] unicode and xml extensions

2006-07-22 Thread Andrei Zmievski
Maybe. An alternate way would be to add modifier to 's' that makes it accept a converter to use for conversion. if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s>", &str, &str_len, UG(utf8_conv)) == FAILURE) { return; } This does mean that the caller will have to instantiate t

Re: [PHP-DEV] unicode and xml extensions

2006-07-22 Thread Marcus Boerger
Hello Andrei, don't we have a char left for UTF-8 (maybe 8) as it would be a case that we will have to use very often and checking for a string in braces will take some time. best regards marcus Friday, July 21, 2006, 9:39:32 PM, you wrote: > Awesome. > I am planning to add "s(encoding)" supp

Re: [PHP-DEV] unicode and xml extensions

2006-07-22 Thread Andrei Zmievski
I probably won't get to it this weekend. Might have it done during OSCON next week, so it's up to you. -Andrei On Jul 22, 2006, at 6:30 AM, Rob Richards wrote: Andrei Zmievski wrote: Awesome. I am planning to add "s(encoding)" support to parameter parsing, by the way, so getting strings

Re: [PHP-DEV] unicode and xml extensions

2006-07-22 Thread Rob Richards
Andrei Zmievski wrote: Awesome. I am planning to add "s(encoding)" support to parameter parsing, by the way, so getting strings in UTF-8 encoding will be a bit easier. Would probably need to change the relevant portions of your commits. Any idea when this should be ready, or should I just go a

Re: [PHP-DEV] unicode and xml extensions

2006-07-21 Thread Andrei Zmievski
Awesome. I am planning to add "s(encoding)" support to parameter parsing, by the way, so getting strings in UTF-8 encoding will be a bit easier. Would probably need to change the relevant portions of your commits. -Andrei On Jul 21, 2006, at 5:45 PM, Rob Richards wrote: Almost done with

Re: [PHP-DEV] unicode and xml extensions

2006-07-21 Thread Rob Richards
Almost done with DOM (3 more files to go), so hopefully by Monday. This one will need a lot of testing though. Rob Andrei Zmievski wrote: Great! I'll put a slide about this into my talk for OSCON. What're your plans for the rest of the XML extensions? -Andrei -- PHP Internals - PHP Runtim

Re: [PHP-DEV] unicode and xml extensions

2006-07-21 Thread Andrei Zmievski
Great! I'll put a slide about this into my talk for OSCON. What're your plans for the rest of the XML extensions? -Andrei On Jul 20, 2006, at 6:15 PM, Rob Richards wrote: Andrei Zmievski wrote: Hey Rob, Looks good. Have you tested the filesystem (filename) related functions with non-ASCI

Re: [PHP-DEV] unicode and xml extensions

2006-07-20 Thread Rob Richards
Andrei Zmievski wrote: Hey Rob, Looks good. Have you tested the filesystem (filename) related functions with non-ASCII filenames? Try making a file called "informaçon.xml" for example, set unicode.filesystem_encoding=utf-8 (or whatever encoding your filesystem uses) and see if you can read it

Re: [PHP-DEV] unicode and xml extensions

2006-07-20 Thread Andrei Zmievski
Hey Rob, Looks good. Have you tested the filesystem (filename) related functions with non-ASCII filenames? Try making a file called "informaçon.xml" for example, set unicode.filesystem_encoding=utf-8 (or whatever encoding your filesystem uses) and see if you can read it. -Andrei On Jul 19,

Re: [PHP-DEV] unicode and xml extensions

2006-07-19 Thread Rob Richards
Andrei Zmievski wrote: Rob, I have not tested the patch, but it looks good to me on cursory overview. I assume it passes your tests? The only comment I have is regarding the usage of 't' and 'T' specifiers. Since you always have to pass binary UTF-8 strings to libxml, we should always use 's'

Re: [PHP-DEV] unicode and xml extensions

2006-07-18 Thread Andrei Zmievski
Rob, I have not tested the patch, but it looks good to me on cursory overview. I assume it passes your tests? The only comment I have is regarding the usage of 't' and 'T' specifiers. Since you always have to pass binary UTF-8 strings to libxml, we should always use 's' specifier and let PHP d

Re: [PHP-DEV] unicode and xml extensions

2006-07-17 Thread Rob Richards
Had some feedback about a problem with the attached file, so here's also link to the diff. http://www.ctindustries.net/patches/xmlunicode.diff.txt Rob -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] unicode and xml extensions

2006-07-17 Thread Rob Richards
Attached is a patch for my initial cut for unicode and XML (made against the /ext directory). I started with XMLReader since it was the smallest. The code can probably be optimized a bit, but I want to make sure this is how it should be because the changes made here will be the changes needed f