The bug here is in your expectations. Order of attributes is not significant in XML, and no XML applications promises to preserve a specific order.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that com
That's correct behavior. To do XPaths against namespaced nodes, you must use prefixes and provide a namespace context... or (horribly ugly solution) write the XPaths with wildcards and test namespace URI and localname in predicates.
__
"... Three things see no
There are very few parsers, for any language, which can "report all errors" reliably. Hitting a parse error often makes determining whether later code is correct very difficult. It's possible to write parsers which attempt to recover and continue, but there are generally performance costs in doing
Create a subset of the schema-for-schemas which implements your restrictions, and validate the schema document(s) against that?
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "
>
> Value
>
>
> My DOM tree will look like:
>
> + ELEMENT: Root
> + #TEXT:
> + ELEMENT: FirstElement
> + #TEXT: Value
> + #TEXT:
That is correct. Xerces, and the DOM specification, make no assumptions regarding whether your particular application considers the whitespace to be meaningful or n
>I think XPath is implemented by Xalan, not Xerces. You can also look into XPath
>API's such as Jaxen and commons-JXPath.
Just wondering: Does Xerces currently implement the proposed DOM XPath API (http://www.w3.org/TR/DOM-Level-3-XPath/)? And if so, does it do so by invoking Xalan?
I remember
>SAX Exception Content is not allowed in prolog.
This means you have something before the root element which is not permitted to appear there according to the XML spec. Make sure that NOTHING comes before the XML declaration except the optional byte-order mark, and that NOTHING comes between the
[Fatal Error] output.xml:1:40: Content is not allowed in prolog.
You have something other than the Byte Order Mark, the XML Declaration, Processing Instructions, or whitespace before the document's root element. Fix the file so it's well-formed XML.
__
"...
Just to get the stupid question out of the way : Have you tried using the -Xmx (or equivalent) option on your JVM to increase the amount of heap memory it's allowed to use before giving up?
__
"... Three things see no end: A loop with exit code done wrong,
A s
>
> 'wan-gw' >
>
> coming out of the serialization it looks like
>
> wan-gw">
Uhm. An XSLT transformation, even an indentity transformation, really shouldn't be preserving entity declarations at all; they aren't part of the XPath/XSLT data model, so they should be expanded on t
> > I removed the "encoding", but am still getting the same result. (The
> source
> > file is plain old ASCII but also using several of the characters in the
> > range 128-255. I'm not getting any problem with them.)
>
> Why dont'y you try the encoding apropriate to the characters you use ?
Ol
At a quick guess, that sounds more like a problem in how you're building
the InputSource than in the parser itself...
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev
For what it's worth, recent discussion in comp.text.xml suggests that
XMLSpy's validation is somewhat buggy.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Du
Probably the usual problem:
http://xerces.apache.org/xerces2-j/faq-general.html#faq-4
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
How does this differ from all the (many!) other attempts to come up with a
less verbose mapping of XML content? When that's been tried in the past, it
has generally turned out to be not much more efficient than just processing
compressed XML, sometimes less so... except in those cases where the
au
The first problem to fix is that this isn't a legal namespace name.
Namespace names must be absolute URI references -- which means they have to
start with a scheme, and then should follow that scheme's syntax.
They also use URI character escaping conventions when necessary; I haven't
checked the s
On Friday, 01/19/2007 at 05:58 GMT, Mark Goodhand <[EMAIL PROTECTED]> wrote:
> Where is the "absolute URI reference" requirement? Is it an XML
> Namespaces constraint or a Schema constraint?
XML Namespaces errata, after an extended and very painful debate over how
namespaces should be tested fo
> Could you please provide me some info about the impact of using xerces
> regarding to DST (Daylight Saving Time)?
Could you clarify your question? Are you asking whether Xylem is going to
have trouble with the new definition of DST that takes effect this year?
> Joe-user expects an ID that
> exists the source document to continue to exist in the destination
document,
> especially when the two documents use the same schema.
Correction: You expect it. Not everyone does. Not everyone wants it. That's
a clear argument for the DOM not doing it unless told t
Posted my understanding of this into that Jira entry. If anyone has doubts,
checking with the DOM WG rather than taking my word for it would be
perfectly reasonable.
I didn't close the issue because I'm no longer part of the core Xerces
development team, so it shouldn't be my call. The Xerces DOM
Are you sure this isn't just a change in the error message? The document
contains an unexpected end-tag (because the begin-tag is missing, but
there's no way the lexer can determine that). The current message tells you
what end-tag was expected rather than which one was found, but it's
correct.
_
>The example imports org.w3c.dom.DOMImplementationRegistry.
That API name changed between the working draft and the final DOM Level 3
spec. The current name is indeed
org.w3c.dom.bootstrap.DOMImplementationRegistry, and the example should be
updated.
See http://www.w3.org/TR/DOM-Level-3-Core/java
The standard solution is to structure the data properly, so you have a
which contains all the information about a person, rather than
relying on sequence of children to imply grouping.
If you must rely on sequence, basically you're writing a simple FSM that
accumulates data, acting on that accum
> if there have been already some standard solutions from SAX,
No off-the-shelf libraries that I know of at the SAX level, since it's
usually easy to hand-code.
There are also data binding tools, which focus on parsing XML into
application-specific data structures. Those do_generally produce str
>We would like to maintain the HTML text in the XML file.
Don't think of it as text, then; think of it (and build it as) XML document
structure. Note that you will have to use XHTML rather than HTML (or some
approximation of XHTML), or your file will almost certainly be ill-formed
and hence not u
If end tags are missing, the data simply isn't XML and you shouldn't expect
XML tools to handle it. Part of the point of moving from SGML to XML was
precisely to drive folks toward writing well-formed documents rather than
trying to guess past their errors.
If it's HTML (which is based on SGML),
Are you sure you're building a namespace-aware DOM?
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs
Per the W3C spec, the DOM API does not promise to support multithread
access. Doing so would impose performance/implementation limitations on all
applications, even single-threaded ones, which was considered undesirable.
Also, in most cases what you really need is threadsafety at a much higher
lev
Right. I wouldn't expect nodelists to be threadsafe, but do we reuse
nodelists, or would independent calls to retrieve the same nodelist return
separate (non-entangled) objects?
I would expect the latter, since there could be several nodelist accesses
in progress at once even for a single thread.
http://www.w3.org/DOM/faq.html#SAXandDOM
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev
When proposing changes to Nodelist, make sure you've considered its "live
view" semantics, where changes to the model are immediately visible in the
list. That behavior seriously complicates implementing this interface, and
it's required for a correct DOM implementation. The Xerces DOM has
experim
The DOM is not promised to be threadsafe.
The Xerces DOMs are not designed to be threadsafe; they are designed to be
reasonably fast and compact.
The fact that Node read access happens to be safe in one of our DOM
implementations does not require that Nodelist also be made safe -- and in
fact it
>Would you be so kind as to provide me a rough estimate of the man hours
that expended in developing the XML Parser
Probably not possible, but it's a significant number of man-years.
Xerces started off as an early prototype of IBM's XML4J parser, which went
through several complete redesigns and
As far as XML is concerned, there is absolutely no semantic difference
between a character and its Numeric Character Reference form. XML parsers
generally discard this distinction; XML serializers generally write out the
character unless the encoding can't represent it (forcing the numeric
form).
As you said: Your original document is being parsed into Xalan's internal
DTM data model, which is MUCH more compact than the more common
Java-object-based DOM implementation. When you import it into a DOM, it's
likely to get larger; depending on the structure of the document and the
details of th
The DOM API does not promise threadsafety. If that's an issue for you,
perform application-level locking on access to the DOM.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Th
XML Schema is an XML language. You should be able to use standard DOM or
SAX programming techniques to build a Schema document.
(I don't know whether the Xerces Schema-specific data model can be
serialized back to XML syntax. If it can, that might be another approach --
but would be nonportable.)
For what it's worth, 0xF7 is one of the characters which both the XML 1.0
and 1.1 recommendations suggest should be avoided by document authors. "?
They are either control characters or permanently undefined Unicode
characters" (http://www.w3.org/TR/REC-xml/#charsets)
>I don't see 0xF7 in that list.
You're right. Apparently I've still got a bit of dyslexia; I managed to
misread #x7F as #xF7. Sorry about the confusion.
("Caution: To avoid damage to reputation, engage brain before putting
fingers in gear.")
__
"... Three thin
>I believe, this is a critical shortcoming in DOM (i.e., not able to
>create/modify internal DTD subset in standard DOM). Is there some
>place, where I can ask for this functionality in DOM?
DOM Level 3 considered adding DTD/schema support, but in the end the
validation module was all that surviv
7F is a legal XML 1.0 character and XML 1.0 should accept it. And, yes, I
believe that in UTF8 (are you SURE you're reading the file as UTF8 rather
than some other encoding?) it should be a legitimate single byte.
However, the XML 1.0 spec's section 2.2 says "Document authors are
encouraged to av
In XML, a stand-alone & character must be escaped to keep it from being
interpreted as introducing an entity reference or numeric character
reference. See http://www.w3.org/TR/xml/#syntax
Fix your input document.
__
"... Three things see no end: A loop with exi
If you're working with XML tools, these entity references and numeric
character references WILL be expanded.
If that isn't what you intended, your document should have escaped the &
character.
__
"... Three things see no end: A loop with exit code done wrong,
is the carriage return character. Some systems use the
sequence to break lines (MS systems among others); some just use
(Unix systems, among others), and there are a few rare cases that use
something else. XML parsers are able to tolerate any of these on input and
will convert them all into
>Hmm. I can see the point about properties affecting everything running
>under a single JVM in an app server but it still seems like there ought
>to be a convenient way to do this from the command line where the JVM
>instance is of course only going to apply to the one process being run.
If you're
The samples illustrate the use of the APIs by performing simple (often
trivial) tasks. Their main purpose is additional documentation. They are
not necessarily intended to do anything actually useful, though they may
contain code worth reusing.
They are not intended to be a full tutorial in the u
If you need to know entity-reference boundaries, you can get that
information by asking the parser to generate a DOM which has Entity
Reference Nodes and then looking at the children of the attribute nodes.
(I'm not sure offhand whether there's a SAX equivalent.)
If you want to suppress the oth
> > If you need to know entity-reference boundaries, you can get that
> > information by asking the parser to generate a DOM which has Entity
> > Reference Nodes and then looking at the children of the attribute
> > nodes.
>
> Xerces-J doesn't support that. See the rationale here [1] from Andy
Cl
Whitespace will always be normalized in XML attribute values. If you don't
want that happening, put the text in a child element instead.
http://www.w3.org/TR/REC-xml/#AVNormalize
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore un
As far as XML is concerned, numeric character references are identical to
the characters they represent. The XML APIs shouldn't make any
distinction.
For implementation reasons, you *may* find that SAX delivers these as
separate characters() events. But that is not guaranteed.
No XML applicati
> & might be treated as being the same as &, but these are both
> distinct from ordinary text
As far as XML is concerned, neither is "distinct from ordinary text" --
they're just representations of the & character.
For comparison, consider A. XML doesn't distinguish between this and a
simple
> processing instruction outside the root element (at the very end of the
document)
By XML's grammar rules, nothing meaningful may follow the root element.
That includes PIs. Any tool which is processing that PI is actually
behaving incorrectly.
Fix your document design?
___
>That is not true. The definition of document in the XML 1.1 spec is:
> ( prolog element Misc* )
Hmmm. You're right; my error. That's true even in 1.0.
Tim Bray, in his Annotated XML Specification, said:
"The fact that you're allowed some trailing junk after the root element, I
decid
The order of _elements_ is guaranteed in any XML API.
The order of _attributes_ is not.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fi
SequenceType and ItemType checked in. The class Javadocs spell out my
understanding of how we're dividing these; critiques welcome.
I believe the interpreter currently runs but compiler doesn't, due to my
having made SequenceType no longer a stream at the FIL level. (On the
other hand, I may ma
>
That's a Numeric Character Reference. XML correctly converted it to the
reference character.
If you want to represent hex data, don't use -- instead, use some
other notation and convert it in your application.
__
"... Three things see no end: A loop
> XInclude support in Xerces is only available as a parsing feature.
> If there is some processing you need to do prior to XInclude you
> need to serialize the document (after you've done your
> preprocessing) and then feed it back to the parser.
Alternatively, do the processing and then implem
> Thanks for your response, but I'm not looking for column and row
> numbers in the XML document. I might have been a little ambiguous
> in using the word "location". What I need is the reference to the
> Node object in the DOM tree that caused the validation error. The
> perfect solution wo
As far as XPath/XSLT is concerned, there is no such thing as "an empty
text node" -- if it's empty, it's absent. So the Xalan serializer will
almost certainly treat empty text nodes as not existing. Any properly
written XML application should treat and as IDENTICAL.
If you really need to draw
The other kluge-around would be a postprocessor that converted the XML
into the SGML .
But if the problem really is that the next stage is an SGML tool, I'd try
HTML mode serialization and see if you can get away with it.
__
"... Three things see no end: A l
> If you need support for other kinds of XPointers then you may want to
> take a look on the net at some of the other XInclude processors that
> are available. e.g. perhaps XOM [4] supports what you're looking for.
Another quick idea: It's possible to implement an XInclude subset as an
XSLT styl
>
> TV Cap
> HDTV
> x264
> 720p
> Action/Adv
> Drama
> English
>
>
> The "report:attributes" node is returning as having 0 children, as
> this debug output shows:
Hard to believe, since this so
Carriage return is ASCII 13, so
or &xD; will represent that
character.
However, be sure you understand XML's rules for whitespace normalization
in attribute values. Depending on what you're trying to do, you may want
to replace that attribute with a child element... or replace the offending
The purpose of an XML parser is to read correct XML. Get whoever's
generating that file to produce XML that expresses their intent correctly,
or throw in a filtering stage that corrects their error. Personally, I
would apply a clue-by-four to the author of whatever's generating that
document r
Actually,
or
are technically "numeric character references",
not entity references. Check the spec, but if I'm remembering the
whitespace rules correctly, these may get converted early enough not to
help in this case. You may need an actual &CR; entity defined in the DTD.
__
In the DOM, namespace declaration attributes are displayed as real
attributes -- but are in fact optional in many cases. See current version
of the DOM spec for discussions of Namespace Well-Formedness, Namespace
Normalization, and normalization during serialization.
In the XPath data model, na
My solution would be to tell the parser to read from an in-memory stream
acting as a FIFO buffer, and run it in its own thread; then push data into
that stream from the communications thread as it becomes available.
Of course the hard thing is going to be carrying this handshaking through
to t
> UTF-8 and UTF-16 are character encodings [1], representing the
> characters defined by Unicode as sequences of bytes. These encodings
> have a representation for every character in Unicode. Like any of
> the other encodings they're decoded into Java chars on input so it's
> all the same to the
Xerces supports SAX.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)
Since prefixes are considered "syntactic sugar", this is arguably legal;
in a properly namespace-sensitive application it should have the right
result. But not everything is fully namespace-aware (sigh) and I agree
that this appears to be a typo.
__
"... Thre
> Thanks for the explanation although I am a bit confused as to why we
> would bother to have an interface when we require a specific
> implementation.
Sorry, but this really is the way the W3C intends it to work.
The interface standardizes most usages of the DOM. But for efficiency of
implemen
Those are all namespace declarations. If you register the right SAX
handler, you should have no trouble seeing them.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev
Are you sure you're reading the file you think you're reading? Try
printing out the contents rather than parsing them.
(This isn't something Xerces should be doing for XML, so I'm betting that
you either are reading the wrong thing or are reading it from a server
which is wrapping it as an HTML
Closet thing I can think of is the W3C's "tidy" tool, which repairs some
of the common/obvious errors.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane E
> is that if I use the Internet Explorer to open this xml, it does not
> render anything except the hardcoded text that I have in the XSLT,
> when I have the xmlns="www.ncr.com/ocz" attribute in the root node
> of the xml. As soon as I remove this attibute, it works fine and the
> xslt gets appli
Parse without XInclude processing, walk the tree to find the XIncludes,
fetch the referenced documents and attach them to the include element?
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes alo
To guarantee UTF-8 output (assuming the processor is writing directly out
to the file rather than producing a SAX or DOM output which other code
then writes out), specify the encoding in the stylesheet's
directive.
Though I'd be sorta surprised if UTF-8 isn't the default...
__
> There is no stylesheet, I'm not using any XSLT file. It is simply SAX
reading the XML file and writing to standard output.
Sorry; I'm used to thinking in terms of Xalan rather than Xerces and gave
the wrong answer.
Can you confirm whether the problem is occurring in the parser or on the
the
Yeah, that will do it. If you want to fix it at this level, you need to
set the output stream to use UTF8 encoding rather than the JVM's default
for that platform.
__
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And t
Or, better, escape the individual troublesome character by expressing it
as <
XML also considers > and & to be reserved characters; they should be
expressed as > and &.
<[[CDATA[]]> sections, which provide a block-escaping mechanism, are
sometimes useful for hand-generated XML; less so for ma
Xerces-J has an option to control whether EntityReference notes are
generated in the DOM output. See
http://xerces.apache.org/xerces-j/features.html
... specifically, the create-entity-ref-nodes feature.
I haven't checked whether Xerces-C offers the same choice, but I would be
a bit surp
> I can reproduce a problem parsing certain XML 1.1 files that contain
lots of
> character entities (escaped control chars like "").
> At some point in the file the parser calls my characters() method with
> garbage text.
In XML 1.0, most control characters were simply illegal. Did we ever
updat
See http://xml.apache.org/xalan-j/getstarted.html. The instructions
haven't been updated since Java 1.4, but as far as I know everything
should still work roughly the same way. (You'll generally get more/better
answers if you ask Xalan questions on the Xalan list rather than the
Xerces list.)
First obvious question: You say you have the document in a Map; I presume
you mean a Java Map indexed by the filename. Xerces and Xalan won't look
there unless you have plugged in a user-written Resolver which recognizes
the URI you are requesting and retrieves the document from the Map object
I presume you don't need to be reminded that the prefix used in the
instance document may be different from the prefix used in the schema
document, and that in fact either or both of these may have several
prefixes bound to the same namespace simultaneously and/or sequentially...
__
The node value of a Document node is null. [1]
The text content of a Document node is the concatenation of the
textContent value of every child node, excluding COMMENT_NODE and
PROCESSING_INSTRUCTION_NODE nodes. [2]
[1] http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1950641247
[2] http://www
As a reminder: The reasons the DOM was specified by the W3C as not being
theadsafe are twofold.
1) Performance would be significantly impacted by excess locking. (Since
we're talking about Xalan-J, notice that Java users have almost completely
given up use of the inherently threadsafe hash and
> > Since we're talking about Xalan-J
> We're talking about Xerces-J. :-)
Thanks. "To avoid damage, engage mind before putting fingers in gear."
I think I should be talking about a cup of coffee right now.
(T-t-t-alking 'bout code generation... no, that's also Xalan.)
If you are validating against a DTD, and IF the enclosing element does not
have mixed content, look at the SAX/DOM defiinitions of "ignorable
whitespace" and how to handle it. (The term is unfortunately; it's better
described as "whitespace in element-only content")
If you are not validating th
Interesting, Mike; didn't know that. Makes a certain amount of sense,
since it's based on the definition of the containing element rather than
what it actually contains.
(I've rarely counted on it; I get too many documents thrown at me without
DTDs, or am processing in a context where I want to
In 99% of the use cases, locking the individual DOM objects/operations
would be the wrong level of granularity -- what you really need to prevent
unexpected results is transaction locking for a group of related changes.
That really does have to be done at the application level.
Locking every in
Happens automagically when the external entry is expanded.
__
"You build world of steel and stone
I build worlds of words alone
Skilled tradespeople, long years taught:
You shape matter; I shape thought."
(http://www.songworm.com/lyrics/songworm-parody/Shapeso
There are a number of good tutorials on XML programming in Java at
http://www.ibm.com/xml, along with a lot of articles on more specific
techniques. For basic "how do I get started" questions, I'd recommend
reading some of those.
XML itself is a strict tree structure, with the only "links" bein
Speaking as someone who was on the DOM Working Group at the time: Michael
is entirely correct.
The DOM Level 1 non-namespace-aware nodes and methods should be considered
DEPRECATED. They are simply not interoperable with DOM Level 2 code. The
only justification for continuing to use them is if
> in your XML-Documet change //$xrd*($v*2.0) to a valid URI
Websearching for "URI RFC" will find the formal specification for URIs,
including the grammar that defines what is and isn't legal; namespace
names must meet the syntactic constraints of URI References. You may have
to escape some char
I sit corrected... though if you want the document to be interoperable
with other XML tools, you should hew close to the standard.
BTW, I was just reminded that XML Namespaces 1.1 declared that namespace
names should be IRIs, not just URIs. Of course how many tools tracked
that chance is an op
96 matches
Mail list logo