Hello,
I've got an XML document of the following form:
<?xml version="1.0"?>
<Entrezgene>
...
<Entrezgene>
<Entrezgene>
...
</Entrezgene>
When I feed this to the sax parser it throws a SAXParser exception:
"SystemID:
C:\data\entrez_gene\DATA\ASN_BINARY\Mammalia\Homo_sapiens\Homo_sapiens_s
mall.xml
Location: 3443:2
Description: The markup in the document following the root element must
be well-formed."
The XML Schema defines that there can only be a single
<Entrezgene></Entrezgene> at the root of the document. So I can see why
it might be throwing me this exception.
I tried to remedy this by enclosing the group of these elements by a
single <Entrezgene-Set> ... </Entrezgene-Set> tag which according to the
schema allows zero or more of these Entrezgene elements. However,
validator/editors I've used to validate such a document against the
schema, fail to find any Entrezgene-Set declaration in the schema, which
itself is confusing me because it is in there and I'm wondering if it's
because of the way the schema uses a lot of includes and references.
Will the Schema object handle schemas defined in this way?
Indeed, I still receive the above error even after introducing filters
to mimic the Entrezgene-Set in my code.
The alternative is to parse out chunks of <Entrezgene> </Entrezgene>
blocks and stream each chunk as their own XML document to the parser.
Does anyone know how I can do this? Would this require XML filters?
Thank You,
John Ling