On Tuesday, March 18, 2003, at 09:55 AM, Austin Hastings wrote:
To me, this says that there's no real commitment to "doing XML". What there is seems to be a recognition that XML format is regular and comprehensible to others, so writing "XML-like" files becomes popular.
Yep. Which makes things even worse. And this is pretty important stuff.
We do a *lot* of XML parsing here (Cognitivity, that is) and even more "XML-like" parsing. And even with Perl, it's a royal pain. There are P5 XML modules out there which tie into C-based XML libraries... those are quite fast, but fail badly if the XML isn't 100% well-formed, and are largely not extensible for "XML-like" situations. You'd have to rip one up and rewrite it, in C, for every iteration of "-like", which we cannot credibly do.
A perl5-native parser can be rigged up fairly easily, but it's *numbingly* slow compared to the C version. I mean, 20-50 times slower, by my guess. The speed issue when importing XML-like data (which we do *very frequently*) is a constant sticking point for us and our clients. Damian's Parse::RecDescent has been a godsend, implementation-wise -- but it of course suffers the same nasty speed issues.
This is a big, big issue, and one that P6 needs to address well, because this is how many businesses will judge it. What I'm hoping, obviously, is that the new P6 regexes -- which will be *perfect* for writing and maintaining our umpteen quite-similar parsing rulesets -- will be fast enough to at least be in the same order of magnitude as a middling C solution. They don't have to be as fast as C, obviously, but they can't be 20x worse.
Why does this matter so much? Because it's a barn door. Even though it's so much easier to write XML-like parsers in Perl than, well, anything else, the speed issue will at some point dictate moving to a non-Perl parsing solution. At which point, the issue becomes how much of the rest of the related system to move into that other solution as well, since it is much cheaper to maintain expertise in one toolset than two. So within a company, it can lead to greater use of Perl -- or abandonment of Perl -- depending on success in this one key area. (I have seen this in action at a number of companies.)
It is therefore critically important that P6 allows easy, fast parsing for XML-like things, not necessarily just XML proper, because that's the way the business winds have been blowing. And it needs to support it out-of-the-box. Seriously, it's that important.
MikeL