Re: xsi:schemaLocation in XML files

Chris Bray Tue, 09 Oct 2007 04:10:27 -0700

It looks like xjParse does what I need by specifying -S a.xsd -S b.xsdetc on the command line, so I do still have a "no code" solution :)

The Xerces plugin for jEdit does also have the facility to importcatalog files, would I be right in assuming I can write a catalog file,use that in jEdit, and with xjParse?


Now I'm really confused with catalogs though!

If my catalog has a

<uri name="a.xsd"uri="file:///c:/svn/project/trunk/source/schemas/a.xsd"/>


does it also need a

<uri name="../a.xsd"uri="file:///c:/svn/project/trunk/source/schemas/a.xsd"/>

for when the files included by b.xsd also include a? The relative pathis correct but I don't know if I need an entry in the catalog?


Michael Glavassevich wrote:

Chris,

If you're trying to avoid writing code to make this work you may want to
consider using a more schema centric command-line program like xjparse [1]
or jaxp.SourceValidator [2] instead of dom.Counter. With either of those
you can specify a list of schema documents to use for validation.
Additionally xjparse provides an option for specifying an XML Catalog [3]
for resolving the schema locations.

Thanks.

[1] http://nwalsh.com/java/xjparse/
[2] http://xerces.apache.org/xerces2-j/samples-jaxp.html#SourceValidator
[3]
http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

[EMAIL PROTECTED] wrote on 10/08/2007 08:20:18 PM:

I think there's a better way which I'll sketch (because my project
uses a version of Xerces that is from before the DOM Level 3
interfaces were included, so does something similar using older
stuff).

A standard XML parser may be associated with an EntityResolver, which
supports a method taking a URI and returning an InputSource from which
the content may be read.  Similarly, when a reference to a schema
namespace is found in a document (instance or schema) being read by a
validating parser, some kind of resolver will be called, if one has
been attached to the parser, to find the definition of the schema for
that namespace.  The namespace URI is the argument to the relevant
method.  This resolver thing (might be called LSResolver in the DOM
Level 3 L&S) is an interface, and your implementation may do whatever
it wants.  Thus, you could create the resolver with some root location
in the file system as argument, or you could use
ClassLoader.getSystemResourceAsStream() or you  could put the schemas
in a database and retrieve their text from there.  Your resolver could
consult any schema locations it accumulated during its lifetime if you
had a way to capture these, and wouldn't have to use them literally,
but could interpret them as it wished.

I suggest you consult the Xerces docs about how to install a resolver
for schemas.

Jeff

On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote:

Michael, I'm using Xerces-J 2.9.1, I even upgraded from 2.9.0 today to
test any changes!

Jeff, can you bear with me here I think I understand you...

Jeff Greif wrote:

Maybe an example will be clearer.

The instance document is, relative to some subtree of the file

system, in

instances/articles/doc1.xml

There is a set of schemas that apply in

schemas/{a,b,c,d}.xsd

Suppose a.xsd imports b.xsd, and in addition, doc1.xml refers to
components from nsa, the namespace of a, and nsb, the namespace of b.

So there are schema locations of the form {nsa, ../../schemas/a.xsd
nsb ../../schemas/b.xsd, ... }

Now when the reference from doc1 -> nsb is found, the schema

locations

can be used to find b.xsd.

I'm with you up to here, because the schema locations were defined in
doc1.xml they are relative to doc1.xml and therefore point to the
correct xsd files.

 > If the reference from a.xsd -> nsb is

found, the schema locations will not work, because the location is
incorrect relative to the location of a.xsd.

My reference from a.xsd -> nsb is in the form
        <xsd:import namepsace="nsb" schemaLocation="./b.xsd" />
This path to b.xsd is correct with respect to the a.xsd it is defined

in

(although incorrect with respect to doc1.xml).

However this schema location hint is second in the queue behind the one
specified in doc1.xml, when Xerces tries to use the one specified in
doc1.xml here it fails with File Not Found(because when relative to
a.xsd the doc1.xml's schema location is not valid), reports the error
and stops parsing so the schema location specified here is never used.

Other parsers continue looking at the hints in schema location and find
the correct one specified on the <xsd:import> line, is there any way of
telling Xerces to try all hints matching that namespace (in the same

way

XMLSpy, Microsoft .NET's System.Xml and Saxonica seem to do) rather

than

stop on the first "not found"?

 > You couldn't solve the

problem by changing the schema locations to look like {nsa,
../../schemas/a.xsd nsb ./b.xsd, ... } because the doc1 -> nsb
reference would fail.  However, in the first case, if the parser is
caching grammars, and the reference from doc1 -> nsb has already been
processed, the a.xsd -> nsb reference might not be a validation error
-- the schema locations are only a hint to the parser, and if it has
located and parsed the right grammar already, it can use it.

So changing the schemaLocation  works in my case because in processing
a.xsd the parser finds b.xsd (via the schemaLocation relative to a.xsd)
and caches it, therefore meaning it can use the cached copy in

doc1.xml.

These are the problems with using relative URLs for the schema
locations, except in certain special cases.  For example, if the
instance doc is

instances/doc1.xml

and the schemas are in

schemas/{a,b,c,...}.xsd

Then these schema locations:  {nsa ../schemas/a.xsd nsb
../schemas/b.xsd ...} will work successfully, but only because the
paths work whether the reference is from the instance doc or a schema
doc.

Ideally I'd like to specify a "try all schema locations before error"

or

"do not stop on file not found error" property since there will

*always*

be one that works when used relative to the current location, is there

way of doing this?

I'm guessing there is no "schema locations per file" property to turn
off the global cache of schema location and switch to a per-file cache?
Thus forcing Xerces to use the hint found at the current location.

Maybe the easiest way to solve my problem is to re-jig my document
locations so that the same relative path can be used to locate each of
the schemas? Not ideal mind since I've spent a long time developing the
inter-schema links to ensure they can always be linked together and I'd
like to use that investment in some way and I can't help but think that
moving the files so the relative paths fit for both scenarios is more

of

a by-product than something implemented by design.

I'm under some commercial pressure here to switch to the method that
works with the system that the customers use (XMLSpy et al) but I'd
really like the same examples to work in Xerces-J, we've been extolling
the virtues of XML and XMLSchema as the "common language" to unify our
industry's data exchange and it'd look bad to have to change the
examples we are producing to make them work in different parsers!

Once again, that ended up a lot longer than I expected and I hope it
makes sense, thanks for your time and patience.
Chris

Jeff



On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote:

Jeff.

My comments inline.

Chris

Jeff Greif wrote:

When a relative URL is used for the location of an imported schema,

it

is supposed to be relative to the URL of the importing document.

So

if your instance document directly references the namespaces of one

or

more schemas for validation, whose URLs are interpreted relative to
the location of the instance document.  Probably some of the

schemas

So my instance document _should_ have relative paths to the

individual

schemas in it's schemaLocation?
Does the fact that Xerces is "changing" the base path to that of the
first specified schema for each subsequent schema constitute a bug?
Should I log this somewhere more formal?

contain <xsd:import> elements; those would require URLs relative to
the schema importing them.

Each of those schemas then further includes others using

<xsd:import>

and <xsd:include> (for example core.xsd actually includes about 30

or 40

smaller schemas from ./Core/schemaname.xsd) and this works as I'd
expected it to.

Some of the schemas might be referenced both in the instance

document

and in imports from other schemas referenced in the instance

document.

 I'm not sure there's a specification of where they must be found

if

relative URLs are used.  This may depend on the ordering of

processing

of those references by the parser/validator.

When that is the case I am 100% sure that both the instance document

and

the "sub schemas" refer to the exact same document, so it shouldn't
matter which of the references Xerces is using, it will resolve to

the

same schema anyway.

There is a section in the XML Schema 1.0 spec addressing this

issue.

Jeff



On 10/8/07, Chris Bray <[EMAIL PROTECTED]> wrote:

Parshant,

Changing the working dir of the JVM doesn't seem to make any

difference,

using dom.Counter from the Xerces-J samples the parser still seems

to

change the working dir first to wherever the xml file is located,

then

to wherever the first xsd file specified is located and need all
subsequent locations to be relative to that.

Absolute paths work fine but I'm trying to include these files

bundled

in with a set of schema as examples of how to use the format,

hence I

don't know where my users will unzip the archives to (C:

\Users\username,

c:\projects\projectname\, /usr/local/projects, /home etc) so

I can't set

absolute paths in my distributed files.

I was hoping to not need to actually write my own parsing program,

just

use the output from dom.Counter and a schemaLocation hint

(which fits my

needs perfectly) since I'm not really a Java developer.

I saw that jEdit page but I'd rather make my schemas

validateagainst a

standard Xerces installation than modify my jEdit installation to

make

them work, I feel this would be more useful for my users.

Chris


Prashant Reddy wrote:

I think the relative paths you have specified in the

schemaLocation will

be resolved against the "working dir". The working dir is usually

the

directory at the cmd prompt when you launched the JVM.

Have you tried giving absolute path to the XSD files ?

A more portable solution to finding schema files locally is to

use

EntityResolver[1].

If you are using JAXP 1.3/ JDK 1.5+ see :
https://jaxp.dev.java.net/article/jaxp-1_3-article.html


[1]:http://java.sun.com/j2se/1.5.

0/docs/api/org/xml/sax/EntityResolver.html

Hope this helps.
-Prashant


On Mon, 2007-10-08 at 13:17 +0100, Chris Bray wrote:

All.

Please go easy on me as I'm a newbie here, if this is a

really obvious

problem I'm really sorry!
I've been using Xerces to validate XML for a while now, and

I've found a

troublesome scenario.

In the top of my xml files I have a line specifying the

location of the

external schemas required for this xml file like so:

    xsi:schemaLocation="http://www.diggsml.org/0.9.2
../Schemas/diggs/core.xsd http://www.diggsml.org/0.9.2

/geotechnical

../Schemas/diggs/geotechnical.xsd "

In this case specifying two namespaces and their associated

schema files

(files exist and paths are correct).

However this doesn't work using Xerces. I am required to change

my

schemaLocation attribute so that the first path points to

its xsd, then

subsequent entries are relative to that first xsd, not to the

current

file, like so:

    xsi:schemaLocation=" http://www.diggsml.org/0.9.2
../Schemas/diggs/core.xsd http://www.diggsml.org/0.9.2

/geotechnical

../geotechnical.xsd "

Is there any way I can change this to work like the first

example, as

other parsers (XMLSpy and Stylus Studio in particular)

require the first

syntax, all paths relative to current doc, what I believe

to be correct

behaviour. I don't know how to build Xerces-J from source

to fix(?) this

myself but I'd be willing to try if anyone can help me get

it building.

Since my customers are all using XMLSpy etc I'm having to

produce my

example files in the earlier syntax, stopping my from

usingXerces to

validate them.

As the biggest advocate of Free/OpenSource software in our

group (jEdit

with Xerces plugin in particular) I really don't want to

have to change

to use XMLSpy or Stylus Studio but this is quite awkward for me!

That ended up being a longer mail than I'd expected! I hope you

can

help, if there's any more information you need (or a small

set of sample

files) let me know.


Chris Bray
Software Engineer (DIGGS Project)
Keynetix Lt.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: xsi:schemaLocation in XML files

Reply via email to