Hi, 

i had a lot of troubles with Entity-References in Document Fragments. I
found a solution at last, but I would like to know whether there is a better
approach. 
My scenario is like this: There is a Main Document which defines some
internal Entities, but it does not use them. Say:
<!DOCTYPE article [
<!ENTIY foo "foo expanded" >
]>
<article />

There is a separate file with an xml Document Fragment. I would like to
parse them as Fragments in Context of the Main Document. I makes use of the
Entity which is defined in the Main Document.  Say:
<?xml version='1.0' standalone='no'?>
<para>Example only: &foo;</para>

First approach was to use DOM Level 3 parserInContext -- well, it is not
supported by Xerces up to now. So I had to do something like "parse in
context" by my own. I set up a DOM Document from the Main Document using
xerces as LSParser, Then, I tried to parse the fragment and to generate DOM
Nodes which are to appended to the Main Document. 

I have tried SAX Parser for the fragment file. No way, because it complains
about the undeclared Entity. SAX knows nothing about the context.

I tried StaX XMLStreamReader for parsing of the fragment file . The
difference to SAX is the ability to set
javax.xml.stream.isReplacingEntityReferences=false. Then, where getting an
EntityRefererence event, I generated an appropriate EntityReference from the
main Document and appended this as a child Node. E. g. (pseudo Code for
clarification):
EntityReference er=mainDocument.createEntityReference
(name-from-parsed-fragment);
mainDocument.appendChild(er);
This works without any Error, but not as expected. Serializing the
mainDocument shows the EntityReference empty (no value).

Debugging the code, i ended up with the information, that the Entity "foo"
has a null value in the DocType of the main Document, because it is not used
there. Fortunately,  I found the DOM Level 3 normalizeDocument function,
which says "This method acts as if the document was going through a save and
load cycle, putting the document in a "normal" form. As a consequence, this
method updates the replacement tree of EntityReference nodes ...". However,
it was of no use. After doing so, the foo-EntityReference still shows up
without any value in the normalized then serialized Document.

The only solution that I found is an ugly hack:
- add a new Element to the main Document with a name that is hopefully
unique, immediately after parsing;
- Iteration over all the Entities that are defined in the main Document. Add
an EntiityReference to the newly created Element.
- Serialize it to a ByteStream. Set DomConfig Parameter "entities" to "true"
keeps EntityReference nodes in the document, that means the reference will
be serialized as "&foo;".
- Parse the content of the Bytestream gives us a new Document, which is
essentially the same as the mainDocument. The difference is: there is an
extra Element that has a Reference to each Entity, so that all Entities are
in use now
- Remove the extra Element.
>From that moment on, the StAX parser functionality (described above) works
well. But this is a lot of work for a problem that sounds very simple. Is
there a simpler solution which I haven't seen yet?

Also, I wonder, if the upcoming  parserWithContext support for Apache Xerces
will help me in this situation. Since the Entity foo *IS* defined in my
example, I would expect that adding an EntityReference within in Fragment
that is parsed in Context will work as expected - whether the Entity has
been in use in the Context or not.

Thank you,
Frank Steimke



---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to