Hi internals!

While browsing through bugsnet I encountered this SimpleXML issue with 252 
votes: https://bugs.php.net/bug.php?id=54632
TLDR: when you have a XML document (modified a bit from the example in the 
bugtracker):

<?xml version="1.0" encoding="UTF-8" ?>
<a><b id="foo">foo</b>bar</a>

And you load it into simpleXML, the result of calling 
json_encode($the_simplexml_object) on that is:
{"b":{"@attributes":{"id":"foo"}}}

There's 2 strange things here:
- Where is a?
- Where is the text for b (and a)?

What's going on here is that json_encode() gives the JSON representation of 
what var_dump() gives you.
This behaviour is perceived as a bug, given the number of votes and the comment 
section.

It's possible to change the JSON encoding, without affected var_dump() and the 
way you access simpleXML objects.
One comment suggests the following JSON representation for the above XML:
{"a":{"b":{"@attributes":{"id":"foo"},"@text":"foo"},"@text":"bar"}}

This seems reasonable. Let's take a look at how multiple tags are handled right 
now and how that would work for text nodes.
SimpleXML currently handles multiple tags with the same name by placing them in 
an array:
Given: <?xml version="1.0" encoding="UTF-8" ?><a><b id="foo"/><x/><y/><x/></a>
You'll get: {"b":{"@attributes":{"id":"foo"}},"x":[{},{}],"y":{}}

We could do the same for text nodes. Given: <?xml version="1.0" 
encoding="UTF-8" ?><a><b id="foo"/>foo<x/>bar<y/>baz<x/></a>
Could give: {"a":{"b":{"@attributes":{"id":"foo"}},"x":[{},{}],"y":{}}, 
"@text": ["foo", "bar", "baz"]}}

Now, this would still not allow to reconstruct the document based on the JSON 
however, as the ordering between tags&text is lost (just as is the case now for 
ordering between different tags).

I'm not sure what the community specifically wants here.
Are there opinions on how this should behave?

Kind regards
Niels

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to