On Sun, Mar 17, 2019 at 2:52 PM C. Scott Ananian <canan...@wikimedia.org> wrote:
> On Sun, Mar 17, 2019, 9:34 AM Benjamin Eberlei <kont...@beberlei.de> > wrote: > >> >> It is still a draft but Thomas and I have started working on an RFC and >> code to update ext/dom to cover the latest standard release: >> https://wiki.php.net/rfc/dom_living_standard_api - we plan on proposing >> that soon, maybe you have some feedback. >> > > Updating the DOM extension would be something the Wikimedia Foundation > would very much like to see happen. It's more complicated than just adding > some new methods, though: there are significantly spec-compliance issues > with the current code and performance problems too. We've been porting > code from JS to PHP which (in the JS version) used a good spec-compliance > DOM implementation, and have been keeping a list of all the crazy bugs and > workarounds that have been necessary. > > Start from the basic fact that the modern DOM requires Node#nodeName to be > uppercase for HTML elements, and the current code uses all lowercase. It's > hard to see how that could be addressed without breaking backward compat. > > Here are our notes/discussions/etc: > https://phabricator.wikimedia.org/T215000 > That is a really good resource of thinks we should look at :-) But it is not fully true that this javascript library is DOM spec compliant. It does provide extra features that https://dom.spec.whatwg.org/ doesn't have such as "body", "title", "head" properties on DOMDocument, or innerHtml/outerHtml attributes on elements. I didn't find a spec where these were defined, the html spec also doesn't mention them. The DOM Spec also doesn't impelement HtmlElement, that is from the HTML spec. The way forward without BC break like uppercase nodeName or getAttribute returning NULL and not empty strings in newer implementations could be to allow users to specify which implementation they want the DOMDocument to follow. > > https://mediawiki.org/wiki/Parsoid/PHP/Help_wanted > (and there's more where that came from) > --scott > > PS. My personal feeling at this time is that it would be better to put the > core libxml abstractions in an extension, to allow fast xpath and perhaps > parse/serialize, but that the actual DOM should be built as a php library > on top of that, in order to allow rapid changes (the WHATWG is pretty > actively making additions/changes to the we spec these days) which are > decoupled from the PHP release cycle. > It would still require libxml to be an extension, which would only happen in a newer version, so it is not going to help without requiring that version. I don't think the effort is worth it though, compared to just working on the existing ext/dom.