uff... that was long but very useful, thanks Terry - a good contribution for
our developers team in here whom I just forwarded your email and for
PHP-General archives.
Cheers,
Maxim Maletsky
-----Original Message-----
From: Terrence Chay [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, January 17, 2001 9:41 AM
To: PHP User Group
Subject: [PHP] Re: XML, what is that supposed to do?
on 1/16/00 9:38 AM, Brandon Orther at [EMAIL PROTECTED] wrote:
> I have seen a lot of people ask for XML support for PHP. I was wondering
> what it does that makes it good for PHP.
That's a hard one to answer--I'll try anyway. I'm sorry if it sounds a
bit simple-minded but that's the sort of person I am ;-).
XML stands for eXtensible Markup Language. It's an attempt by both large
corporations (notably Microsoft, Oracle, and IBM) as well as the standards
organizations (notable the W3C) to create a "lingua franca" for the web. A
lot of understanding it has first to have a slightly different perspective
of what we mean by "the internet" and "the web".
First when most people say "the internet" now, they pretty much mean
"the web" and e-mail. When people say "the web" they mean HTTP and HTML
(with a little SSL thrown in for e-commerce). HTTP is the transport protocol
(how it is delivered) and HTML is the markup language (the message). XML
attempts to replace and supersede HTML without saying anything about HTTP
(though one can assume that most of the delivery will be done via HTTP, much
to the chagrin of many security administrators who depend on firewalls).
XML is a markup language like HTML. Unlike HTML, the markup language is
extensible (basically think of it as saying you can define your own tags and
attributes). This means you can make descriptive tags such as
<book type="paperback"><AUTHOR>Joe Blogs</AUTHOR><TITLE>SATs - How to be
beaten by the system</TITLE><SUBJECT>Test preparation</SUBJECT></book>
Which looks a lot like HTML but isn't. Interestingly, the tags are
descriptive of the content which beats the hell out of UN/EDIFACT if you've
ever had to do any work for big business. Other differences are the rules
are more rigid than HTML: all tags must close, all attributes must be
quoted, all reserved characters must be escaped properly, all tags and
attributes are case sensitive. The default format for display is double byte
encoded characters (UTF-16 / UNICODE) (Note: The default used by PHP seems
to be UTF-8 so you should change it to that charset in the XML directives
line).-
So basically what you have when you are done is a text based
hierarchical data structure that's extensible and machine readable. That's
all XML is.
Now the things you can do with it. Obviously for one I can use this to
serialize objects in PHP very easily since I can store objects in XML
representation which is just a string to be saved. The WDDX module does that
in some standard way.
A note about standards. Since XML is extensible, there is a need to be
specified so that I can communicate with you and we understand each other.
XML is really more like a markup language FORMAT than a language (or seen
another way, it's a standard but not a specification). There are various
specifications and attempts at specifications out there and are usually
referred to as DTDs, Document Type Definitions, or Schema. It used to be you
specified your Schema in another markup language called SGML but then some
people figured if XML is so extensible you should be able to specify your
own Schema in an Schema language which itself is XML. This is known, not
surprisingly, as XML Schema. Which represents another thing you can do with
XML: Use XML to specify XML data formats.
A useful one for web programmers right now is you can use XML to turn
XML into other XML formats. This is done through XSL-T (eXtensible
Stylesheet Language - Transform) which is built into a PHP module called
Sablotron (Side Note: I couldn't compile Sablotron 0.50 in PHP yet, it
failes during the linking step in Apache and claims that it can't find some
library that is in Expat). Sablotron (and many XML-T parsers) is a little
robust in that you can use it to transform it into HTML and text too. This
warrants a bit larger description...
Basically XSL works by taking an input XML file (we'll call this the
"data store") and using another XML file written using the XSL specification
(we'll call this the "rules file") to create another file in a different XML
format (we'll all this the "presentation file"). Obviously when the
presentation file is in XML, we can chain another rules file to it to make
another presentation file and so on. XSLT parsers such as Sablotron allow us
to do just that. Why is this powerful? The best way is through examples
(1) Our company builds a search engine that goes out and does a
real-time travel comparison engine of 25 separate travel websites. Given
that each search does this, we offload this to a business rules server that
creates this and returns the results. Because we add sites and features
almost at will, this messaging standard had to be extensible. The webserver
has to communicate with this business rules server and understand it. A
stylesheet can ensure that the message that gets sent to the web server is
always in line with what the webserver can understand even if we upgrade our
features on the business rules server.
(2) Furthermore, we have some nasty internal business rules embeded in
our XML data store on the business rules server. An XSLT filter allows us to
remove these internal business rules before delivery. This makes our
business objects resellable to third parties as an application service
without compromising our internal ones and requiring much coding. we can use
the same XML data store to store private and public information.
(3) The webserver itself needs to parse and deliver the data. That data
may vary on our site vs. a cobranded site. With XSLT you can transform XML
on the fly to XHTML (a superset of HTML) and tack on your presentation layer
(nice little font tags and setting the color and whatnot). A different XSLT
for a different browser or cobrand, yet the same datastore for all of them.
This is called "separating your presentation from your data". Microsoft
calls this 3-tiering, n-tiering, DNA, NetDocs, and now dotNet. (Well some of
the later ones are a bit more than just 3-tiering, but the basic idea is in
tact).
Well I hope you get the idea. I'm sure you can thing of other things
such as...
(4) Oracle, Microsoft and others now allow you to query their databases
in XML. So do companies (in my field) such as Apollo and Sabre. These are
hardly compatible, nor do they in any way represent something that is
comfortable to manipulate. An XSLT layer as a data abstraction layer allows
you to transform someone else's standard into an internal one you can
manipulate in a known way.
(5) I mentioned XML is machine readable. Imagine: web pages are not very
machine readable. Need I say more?
I could go on, but I'm not an imaginative fellow.
Let's see. There's also PDF. PDF isn't in XML but there is something
called XSL-FO (formatting objects) which some PDF generators (perhaps the
two that PHP has modules for?) understand. So writing XSL-FO for output for
a PDF generator is "doing XML" also.
Then there is the fact that it hierarchical data which sometimes,
despite all these neat tools, needs to be parsed and understood. If we had a
standardized API for manipulating it, then all the knowledge in learning the
API for say visual basic on a Windows box can be transferred to doing in in
C++ in AIX or perhaps PHP? Yes, there are two such standards, one is an
event driven one (reads a tag and calls a callback function) known as "SAX"
(Simple API for XML) and implemented in the Expat (--with-xml in PHP which
is compiled by default in PHP4) and the other reads the whole thing into a
hierarchical (treelike) object structure called the DOM (Document Object
Model) which is implemented in libxml (--with-domxml or somesuch in PHP).
I personally prefer the DOM version of looking at things (it's a bit
slower and chews more memory). Unfortunately the dom-xml in PHP doesn't much
resemble anyone elses DOM (at least not Oracle's or Microsofts), it's a bit
buggy (for instance, you can't remove a node, nor can you seem to modify the
text in a node) and it chews a whole slew of memory (much more than I'd
expect and that amount has almost doubled since they've incorporated XPath
support with PHP 4.0.4+).
Now a final reason why PHP and XML should go hand in hand. (Because if
you haven't figured out by now, I'm really big on PHP and on XML). A study I
read estimates that by the end of this year over 50% of the Fortune 500 will
be using XML in some "test bed" situation and by 2004 80% of all business
communication on the internet will use XML.
I hope this gives you some motivation to pick up XML and use PHP to do
so. Because the more of us that are doing so, the more developers there will
be working (or pushing others to work) on improving XML support in PHP. With
more robust cool tools like the PHP developers have already given me (and
hopefully will continue to do so), I won't feel so bad about studying
condensed matter physics and neuroscience for the last 9 years instead of
majoring in something real like computer science and learning how to code in
Java and C++.
Take care,
terry chay
--
terry chay, Director of Engineering, <http://www.QIXO.com/>
QIXO /kick.so/ - Integrating Many Travel Web Sites Into One
W: 1.408.394-8102 F:1.408.516.9090 M: 1.408.314.0717
E-Mail: <mailto:[EMAIL PROTECTED]> ICQ: 16069322
PGP Fingerprint: 6DCF 1634 547C 935D 4912 2A44 A4A2 79AB DFFF F110
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]